An Alternative Statistical Model for Predicting Salinity Variations in Estuaries

Accurate salinity prediction can support the decision-making of water resources management to mitigate the threat of insufficient freshwater supply in densely populated estuaries. Statistical methods are low-cost and less time-consuming compared with numerical models and physical models for predicting estuarine salinity variations. This study proposes an alternative statistical model that can more accurately predict the salinity series in estuaries. The model incorporates an autoregressive model to characterize the memory effect of salinity and includes the changes in salinity driven by river discharge and tides. Furthermore, the Gamma distribution function was introduced to correct the hysteresis effects of river discharge, tides and salinity. Based on fixed corrections of long-term effects, dynamic corrections of short-term effects were added to weaken the hysteresis effects. Real-world model application to the Pearl River Estuary obtained satisfactory agreement between predicted and measured salinity peaks, indicating the accuracy of salinity forecasting. Cross-validation and weekly salinity prediction under small, medium and large river discharges were also conducted to further test the reliability of the model. The statistical model provides a good reference for predicting salinity variations in estuaries.


Introduction
Saltwater intrusion (SI) has posed a great threat to the freshwater supply for domestic, agricultural and industrial demands in densely populated estuaries. Better understanding of SI in estuaries is fundamental for properly managing water resources and ensuring freshwater supply [1].
SI involves complex saltwater-freshwater mixing, and the process is influenced by various factors, mainly including river discharge, tides, wind and sea-level rise (SLR). Many studies have confirmed a negative correlation between river discharge and the intensity of SI [2][3][4]. The salt transport associated with SI under different tidal mixing schemes may differ greatly in terms of timing [1]. Specifically, saltwater mainly intrudes landward during spring tides in well-mixed estuaries [5][6][7] and during neap tides in partially mixed estuaries [4,6,8]. The wind is also an important factor affecting SI, controlling estuarine circulation and stratification [9][10][11][12]. Traditionally, the wind is thought to promote vertical mixing [13,14]. Based on field observations at the York River, [15] discovered that the upstream wind can weaken salinity stratification, while the downstream wind might strengthen (lowintensity) Modaomen Channel, located in the lower reaches of the West River, was selected as the study area. The annual runoff of Modaomen Channel accounts for 26.6% of the total runoff of the PRE, making it the most important discharge channel. Meanwhile, the runoff distribution of Modaomen Channel is quite uneven in a year, with 75.7% of therunoff occurring during the flood season (from April to September) and 24.3% of the runoff occurringduring the dry season (from October to March). The tidal signal of Modaomen Estuary is micro (mean annual tidal range of 0.85 m) and irregular semidiurnal (average flood and ebb tidal durations are 5.37 h and 7.25 h, respectively) [Ye et al., 2017]. During the flood season, the intense runoff inhibits SI,while during the dry season, the reduced runoff allows seawater from the outer sea to move upstream, thereby intensifying SI. There are many water intakes along the Modaomen Channel (e.g., Zhuzhoutou pumping station, Pinggang pumping station and Guangchang pumping station), making it the main source of drinking water for Zhongshan, Zhuhai and Macao. However, in recent years, the reclamation projects at the entrance and the sand extraction in the upstream region have led toa deepening of the riverbed. As a result, the tides have been enhanced and the SI has been intensified, greatly threatening the upstream freshwater supply andcausing water from these intakes not to meet the standard of drinking water (the standard in China defined the salinity of drinking water should bebelow0.5). At thePinggang pumping station, as an example, the total duration of substandard water quality was 1612 h during [2005][2006], but this increased to 2117 hin 2011-2012.

Data Source
This study considered runoff and tide, the relative strength between which affects the intensity of SI. The river discharge was measured at Wuzhou station and Shijiao station, while water level data were collected at Sanzao station ( Figure 1). Surface salinity at three stations was selected for validating the statistical model proposed in this study (model development provided in the next section), including Pinggang pumping station, Lianshiwan station and Guangchang pumping station. This paper focused on dry seasons, during which SI is more intense and salinity change is more variable, thereby better testing the ability of the model to accurately predict salinity variations. Modaomen Channel, located in the lower reaches of the West River, was selected as the study area. The annual runoff of Modaomen Channel accounts for 26.6% of the total runoff of the PRE, making it the most important discharge channel. Meanwhile, the runoff distribution of Modaomen Channel is quite uneven in a year, with 75.7% of therunoff occurring during the flood season (from April to September) and 24.3% of the runoff occurringduring the dry season (from October to March). The tidal signal of Modaomen Estuary is micro (mean annual tidal range of 0.85 m) and irregular semidiurnal (average flood and ebb tidal durations are 5.37 h and 7.25 h, respectively) [Ye et al., 2017]. During the flood season, the intense runoff inhibits SI, while during the dry season, the reduced runoff allows seawater from the outer sea to move upstream, thereby intensifying SI. There are many water intakes along the Modaomen Channel (e.g., Zhuzhoutou pumping station, Pinggang pumping station and Guangchang pumping station), making it the main source of drinking water for Zhongshan, Zhuhai and Macao. However, in recent years, the reclamation projects at the entrance and the sand extraction in the upstream region have led toa deepening of the riverbed. As a result, the tides have been enhanced and the SI has been intensified, greatly threatening the upstream freshwater supply andcausing water from these intakes not to meet the standard of drinking water (the standard in China defined the salinity of drinking water should bebelow0.5). At thePinggang pumping station, as an example, the total duration of substandard water quality was 1612 h during 2005-2006, but this increased to 2117 hin 2011-2012.

Data Source
This study considered runoff and tide, the relative strength between which affects the intensity of SI. The river discharge was measured at Wuzhou station and Shijiao station, while water level data were collected at Sanzao station ( Figure 1). Surface salinity at three stations was selected for validating the statistical model proposed in this study (model development provided in the next section), including Pinggang pumping station, Lianshiwan station and Guangchang pumping station. This paper focused on dry seasons, during which SI is more intense and salinity change is more variable, thereby better testing the ability of the model to accurately predict salinity variations. All the data were measured hourly from 2007 to 2008 and from 2011 to 2013 during dry seasons. Daily average river discharge/salinity data were obtained by averaging the hourly river discharge/salinity data, and the daily maximum tidal range was derived from hourly water level data. Unless specified, the river discharge, tidal range and salinity hereinafter refer to the processed values. These time-series data provide a valuable resource to validate the statistical model proposed for salinity prediction.

Relationship between Salinity and River Discharge
Among all the factors controlling SI, runoff is a critical one. The comparison in Figure 2a shows that, despite the similar variation pattern, river discharge at Wuzhou station is generally larger than that at Shijiao station. From the comparison of time-series salinity in Pinggang, Lianshiwan and Guangchang pumping stations (Figure 2b), it is clear that salinity gradually increases from the upper to the lower reaches of the Pearl River.
All the data were measured hourly from 2007 to 2008 and from 2011 to 2013 during dry seasons. Daily average river discharge/salinity data were obtained by averaging the hourly river discharge/salinity data, and the daily maximum tidal range was derived from hourly water level data. Unless specified, the river discharge, tidal range and salinity hereinafter refer to the processed values. These timeseries data provide a valuable resource to validate the statistical model proposed for salinity prediction.

Relationship between Salinity and River Discharge
Among all the factors controlling SI, runoff is a critical one. The comparison in Figure 2a shows that, despite the similar variation pattern, river discharge at Wuzhou station is generally larger than that at Shijiao station. From the comparison of time-series salinity in Pinggang, Lianshiwan and Guangchang pumping stations (Figure 2b), it is clear that salinity gradually increases from the upper to the lower reaches of the Pearl River. To characterize the memory effects of river discharge on salinity, we introduced the Gamma distribution function, which has been widely used in hydrology [33][34][35] and is given as follows: where t is the present day, t − i is the i day(s) before the present day, is the gamma coefficient of the previous day t − i,α is the shape factor and β is the scale factor. The daily average river discharge of the previous 3 days at Wuzhou and Shijiao stations can be weighted according to Equation (2) below. To simplify the analysis, we put together the weighed river discharges at the two stations: where Qt−i is the daily average river discharge of the previous day t−i and is the weighed daily average river discharge. To characterize the memory effects of river discharge on salinity, we introduced the Gamma distribution function, which has been widely used in hydrology [33][34][35] and is given as follows: where t is the present day, t − i is the i day(s) before the present day, ω t−i is the gamma coefficient of the previous day t − i, α is the shape factor and β is the scale factor. The daily average river discharge of the previous 3 days at Wuzhou and Shijiao stations can be weighted according to Equation (2) below.
To simplify the analysis, we put together the weighed river discharges at the two stations: where Q t−i is the daily average river discharge of the previous day t − i and Q t is the weighed daily average river discharge. The exponential decay function has been widely applied to characterize the relationship between river discharge and salinity [30,31,36]. However, regression analysis in Figure 3 shows small coefficients of determination (R 2 ) for the three stations, indicating that it is insufficient to simply consider the memory effects of river discharge on salinity. Moreover, when the station is closer to the estuary, the effect of river discharge becomes weaker, as manifested by R 2 being reduced from 0.3586 to 0.2684 ( Figure 3). Sustainability 2020, 12, x FOR PEER REVIEW 5 of 17 The exponential decay function has been widely applied to characterize the relationship between river discharge and salinity [30,31,36]. However, regression analysis in Figure 3 shows small coefficients of determination (R 2 ) for the three stations, indicating that it is insufficient to simply consider the memory effects of river discharge on salinity. Moreover, when the station is closer to the estuary, the effect of river discharge becomes weaker, as manifested by R 2 being reduced from 0.3586 to 0.2684 ( Figure 3).

Relationship between Salinity and Tidal Range
Apart from river discharge, tidal range also greatly affects salinity distribution in estuaries [31,37]. As mentioned above, saltwater intrudes landward mainly during spring tides for well-mixed estuaries and during neap tides for partially mixed estuaries. Considering that the PRE is partially mixed, SI occurs primarily during neap tides. The field data of vertical salinity and velocity in the Modaomen Channel show that tidal current during neap tides is the main driving force of SI, which is the most intense during the transition periods from neap to spring tides [38,39]. Figure 4 shows a significant phase difference between the peaks of salinity and tidal range [40]. Since statistical methods cannot effectively characterize the relationship between tidal range and salinity, we integrated salinity changes driven by tidal range into the memory effect of salinity.

Relationship between Salinity and Tidal Range
Apart from river discharge, tidal range also greatly affects salinity distribution in estuaries [31,37]. As mentioned above, saltwater intrudes landward mainly during spring tides for well-mixed estuaries and during neap tides for partially mixed estuaries. Considering that the PRE is partially mixed, SI occurs primarily during neap tides. The field data of vertical salinity and velocity in the Modaomen Channel show that tidal current during neap tides is the main driving force of SI, which is the most intense during the transition periods from neap to spring tides [38,39]. Figure 4 shows a significant phase difference between the peaks of salinity and tidal range [40]. Since statistical methods cannot effectively characterize the relationship between tidal range and salinity, we integrated salinity changes driven by tidal range into the memory effect of salinity.

Model Formulation
This study introduced an autoregressive model to account for the hysteresis effect of salinity: where St is the salinity on day t and St−i is the salinity of the previous day t − i. We also used the Gamma distribution function to analyze the long-term memory effects of salinity, based on the following corrections:

Model Formulation
This study introduced an autoregressive model to account for the hysteresis effect of salinity: where S t is the salinity on day t and S t−i is the salinity of the previous day t − i.
We also used the Gamma distribution function to analyze the long-term memory effects of salinity, based on the following corrections: where ξ a , ξ b and ξ c are the correction coefficients of memory effect based on long-time series of salinity, which reflect the effects of long-term changes in SLR and estuarine topography. The values of α and β for calculating the Gamma coefficients are 1 and 8.6 for Pinggang pumping station, 1 and 6.2 for Lianshiwan station and 1 and 4.8 for Guangchang pumping station. With the correction coefficients determined, the daily average salinity of the previous 3 days at the three stations can be corrected according to the following formula: where S t is the predicted salinity on day t. Despite the correction, the predicted salinity is still inaccurate and lags behind the measured daily average salinity ( Figure 5). To further improve accuracy, we incorporated the dynamic corrections of short-term effects into the fixed corrections of long-term effects: where φ 1 , φ 2 and φ 3 are the correction coefficients of memory effect based on short-time series of salinity, which explain the effects of short-term changes in tidal range, wind and water diversion in the upper reaches and other anthropogenic activities. Based on Equation (6), S t in Equation (5) can be rewritten as The comparison in Figure 5 confirms the improved prediction accuracy based on Equation (7). However, the predicted salinities are higher than the measurements. Given the poor accuracy of salinity prediction using either river discharge or memory salinity alone, we developed a statistical model that considers the memory effects of both factors: where S t is the predicted salinity on day t and A, B and C are the correction coefficients. The comparison in Figure 5 confirms the improved prediction accuracy based on Equation (7). However, the predicted salinities are higher than the measurements. Given the poor accuracy of salinity prediction using either river discharge or memory salinity alone, we developed a statistical model that considers the memory effects of both factors: where St is the predicted salinity on day t and A, B and C are the correction coefficients.

Model Calibration
To obtain the values of correction coefficients A, B and C in Equation (8), we calibrated the statistical model using salinity data collected during October 2007 and February 2008 (values listed in Table 1). The comparisons in Figures 5 and 6 show slight differences between the measured salinity variations and predicted values based on Equation (8), indicating that considering the memory salinity can improve prediction accuracy. It should be noted that the measured maximum salinity values are in good agreement with the predicted ones due to the correction of the long-and shortterm effects that weakened the hysteresis effects.

Model Calibration
To obtain the values of correction coefficients A, B and C in Equation (8), we calibrated the statistical model using salinity data collected during October 2007 and February 2008 (values listed in Table 1). The comparisons in Figures 5 and 6 show slight differences between the measured salinity variations and predicted values based on Equation (8), indicating that considering the memory salinity can improve prediction accuracy. It should be noted that the measured maximum salinity values are in good agreement with the predicted ones due to the correction of the long-and short-term effects that weakened the hysteresis effects. To further evaluate the performance of the model, we used three statistical parameters: (1) the coefficient of determination (R 2 ), an indicator of the percent of variation of the measured salinity explained by the predicted salinity; (2) the root-mean-square error (RMSE), a measure of the deviation of the predicted salinity from the measured salinity; and (3) the Nash-Sutcliffe efficiency coefficient (NSE), which indicates how well the plot of observed versus predicted data fits the 1:1 linear regression line. The NSE may range between −∞ and 1, with 1 being the optimal value [30]. Mathematical equations of R 2 , RMSE and NSE are givenas follows: where n is the number of data, M i is the measured salinity, M is the mean of the measured salinity, P i is the predicted salinity and P is the mean of the predicted salinity. To further evaluate the performance of the model, we used three statistical parameters: (1) the coefficient of determination (R 2 ), an indicator of the percent of variation of the measured salinity explained by the predicted salinity; (2) the root-mean-square error (RMSE), a measure of the deviation of the predicted salinity from the measured salinity; and (3) the Nash-Sutcliffe efficiency coefficient (NSE), which indicates how well the plot of observed versus predicted data fits the 1:1 linear regression line. The NSE may range between −∞ and 1, with 1 being the optimal value [30]. Mathematical equations of R 2 , RMSE and NSE are givenas follows: where n is the number of data, Mi is the measured salinity, is the mean of the measured salinity,  Table 2 lists the statistical parameters of model calibration for Pinggang, Lianshiwan and Guangchang pumping stations. Overall, the values of R 2 , RMSE and NSE show the improved prediction accuracy of the model. For example, R 2 gradually increases to 0.9208 from downstream to upstream, and the RMSE ranges from 0.4201 at Pinggang pumping station to 1.2363 at Guangchang pumping station. Tidal dynamics mainly affect the salinity of the upstream river and barely exert influence on that of the downstream river. Therefore, the hysteresis effect on the salinity of the upstream river is stronger than that on the salinity of the downstream river.

Model Application
We applied the calibrated model to predict the salinity measured from September 2011 to February 2012 and from September 2012 to February 2013 at Pinggang, Lianshiwan and Guangchang pumping stations. Table 2 lists the values of R 2 , RMSE and NSE. For the three stations, the values of R 2 all reach 0.85, indicating a good match between the measured and predicted salinities. The agreement is further demonstrated by the comparison of salinity measurements and predictions in Figure 7. These results of real-world model application prove that the statistical model considering the memory effects of river discharge and salinity is more accurate.

Sensitivity Analysis of River Discharge
As a dominant factor affecting SI, runoff can vary greatly under anthropogenic influence. Based on the Guangchang pumping station, we conducted a sensitivity analysis by increasing and decreasing the river discharge by 50%, with the other factors remaining unchanged. The model developed in the previous section was used: where S Q is the predicted salinity contributed by river discharge. Figure 8 shows that when river discharge increases by 50%, S Q changes considerably, whereas S t varies slightly. In contrast, with a 50% reduction in river discharge, both S Q and S t increase notably. The comparison indicates that, when river discharge is small, salinity variation at estuaries is more sensitive to the changes. While under conditions of large river discharge, salinity variation becomes less sensitive.

Cross-Validation
Salinity data collected from Pinggang, Lianshiwan and Guangchang stations were used for cross-validation. Data from September 2011 to February 2012 were used for model calibration, while data from October 2007 to February 2008 and from September 2012 to February 2013 were used for model application. Due to the lack of salinity data at Lianshiwan station from 2011 to 2012, the prediction was conducted based on data at Pinggang and Guangchang pumping stations. The comparison between measured and predicted salinities (Figures 9 and 10) and statistical parameters (R 2 , RMSE and NSE) of model calibration and application (Table 2) show that the prediction results are satisfactory, further demonstrating the improved model performance.

Analysis of Weekly Prediction
A representative analysis was conducted at the Guangchang pumping station. The reliability of predicted salinity for the next 7 days was studied under three river discharges: 1400 m 3 /s (small), 2000 m 3 /s (medium) and 9000 m 3 /s (large). Figure 11 shows that salinity prediction for the next 7days in

Analysis of Weekly Prediction
A representative analysis was conducted at the Guangchang pumping station. The reliability of predicted salinity for the next 7 days was studied under three river discharges: 1400 m 3 /s (small), 2000 m 3 /s (medium) and 9000 m 3 /s (large). Figure 11 shows that salinity prediction for the next 7days in

Analysis of Weekly Prediction
A representative analysis was conducted at the Guangchang pumping station. The reliability of predicted salinity for the next 7 days was studied under three river discharges: 1400 m 3 /s (small), 2000 m 3 /s (medium) and 9000 m 3 /s (large). Figure 11 shows that salinity prediction for the next 7days in all three cases is relatively accurate. In particular, when river discharge is large enough (e.g., 9000 m 3 /s in Figure 11c), it almost completely prevents SI, and the salinity for the next 7 days is close to 0‰, which can be replicated by the statistical model. all three cases is relatively accurate. In particular, when river discharge is large enough (e.g., 9000 m 3 /s in Figure 11c), it almost completely prevents SI, and the salinity for the next 7 days is close to 0‰, which can be replicated by the statistical model. Figure 11. Comparison of measured and predicted salinity of the upcoming 7 days under conditions of (a) small, (b) medium and(c) large river discharge at Guangchang station.

Analysis of Short-Term Time-Series
The prediction model in some estuaries where measured data are inadequate can only be constructed with limited short-term data. In this section, data from October 2007 to December 2007 were used for model calibration while data from January 2008 to February 2008 were used for model application. The comparison between measured and predicted salinities presented in Figures 12 and  13 and statistical parameters listed in Table 3 show satisfactory prediction. More importantly, the results show that the statistical model proposed in this study can be applied to predicting salinity variations at estuaries with not only long-term data but also short-term data.

Analysis of Short-Term Time-Series
The prediction model in some estuaries where measured data are inadequate can only be constructed with limited short-term data. In this section, data from October 2007 to December 2007 were used for model calibration while data from January 2008 to February 2008 were used for model application. The comparison between measured and predicted salinities presented in Figures 12 and 13 and statistical parameters listed in Table 3 show satisfactory prediction. More importantly, the results show that the statistical model proposed in this study can be applied to predicting salinity variations at estuaries with not only long-term data but also short-term data.

Conclusions
This study developed a new statistical model that can accurately predict the salinity variations in estuaries. This model is based on an autoregressive model and incorporates the Gamma distribution function to correct the hysteresis effects of salinity. This study also added the dynamic corrections of short-term effects, based on the fixed corrections of long-term effects, to weaken the hysteresis effects. The model was tested against salinity data collected at different pumping stations

Conclusions
This study developed a new statistical model that can accurately predict the salinity variations in estuaries. This model is based on an autoregressive model and incorporates the Gamma distribution function to correct the hysteresis effects of salinity. This study also added the dynamic corrections of short-term effects, based on the fixed corrections of long-term effects, to weaken the hysteresis effects. The model was tested against salinity data collected at different pumping stations of the PRE. The comparison showed satisfactory prediction according to the statistical parameters of R 2 , RMSE and NSE.
The value of river discharge was increased and decreased to study its effects on salinity. Results showed that when the river discharge decreases by 50%, the increasesin S Q and S t are significantly higher than the decreasesin S Q and S t in the case of a 50% increase in river discharge. Salinity variation is more sensitive to smaller river discharge. In addition, the reliability of salinity prediction for the next 7days was examined using river discharge of three levels: small (1400 m 3 /s), medium (2000 m 3 /s) and large (9000 m 3 /s). In all three cases, the model accurately predicted the salinity variation.
Overall, this study proposed a new model based on comprehensive memory effects of river discharge and salinity, providing an accurate method of predicting salinity for effective water resources management. However, hourly changes in salinity cannot be predicted, and there are some limitations.