A Gap-Filling Tool: Predicting Daily Sediment Loads Based on Sparse Measurements

: Sediment load in streams is known as both a carrier and a potential source of contaminants, while sediment deposition can alter stream ﬂow, stage and morphology, and thereby has broad impacts on stream hydrology, aquatic life, and recreation activity. For vast amounts of watersheds around the world, sparse daily measured sediment data may exist, but continuous and multi-year daily measured sediment data are largely unavailable because of time-consuming and budget constraint for measurements. However, when developing total maximum daily load (TMDL) and calibrating/validating watershed models for sediments, such continuous and multi-year datasets are inevitably required. This study extended the ﬂow-weighted method, developed by Ouyang (Ouyang, Y. Environ. Monit. Assess. 193, 422 (2021)) to predict the continuous and multi-year daily sediment loads based on sparse, limited, and discontinuous measured data. This daily sediment load gap-ﬁlling tool was validated using measured data from six different US Geological Survey (USGS) gage stations across US. Results showed that the ﬂow-weighted method well predicted daily sediment loads when a good linear correlation existed between measured seasonal sediment loads and measured seasonal stream discharges, which is a prerequisite to apply the ﬂow-weighted method. Five out of six selected USGS gage stations used in this study met this prerequisite. The ﬂow-weighted method (along with an example R script for implementing the method) is a useful tool for ﬁlling the daily sediment load gaps.


Introduction
Sediment in rivers and streams is recognized as both a carrier and a potential source of contaminants to aquatic environments due to their adsorption of toxic chemicals, excess nutrients, and pathogens [1]. Sediment deposition can significantly modify river flow, stage, and morphology, which has broad impacts on surface water hydrology, aquatic life, and recreation activity. When soil erosion and sediment transport have adverse effects on terrestrial communities, rivers, and streams, the surface water systems are impaired by sediments [2]. Agricultural, industrial, biotic/abiotic disturbances and urbanization activities are the major sources of sediment contamination and deposition in rivers, streams, and lakes [3][4][5][6][7].
Numerous computer models are applied to predict sediment concentrations and loads in watersheds around the world [8][9][10][11][12][13][14][15][16]. Among them, the AnnAGNPS (Ann-Agricultural Non-Point Source Pollution), APEX (Agricultural Policy Environmental eXtender), HSPF (Hydrological Simulation Program-FORTRAN), SPARROW (SPAtially Referenced Regressions on Watershed attributes), and SWAT (Soil and Water Assessment Tool) are the most widely used watershed models [10][11][12][13][14]. While they are essential tools to estimate sediment status in watersheds, the applicability of these models depends on the availability of long-term and continuous daily sediment concentration and/or load datasets for model calibrations and validations. In addition, a development of total maximum daily load (TMDL) for sediment in waterbodies requires daily sediment load data. TMDL is defined as the maximum daily allowable pollutant loading into a waterbody without exceeding water quality standards and is used as a starting point and a planning tool for restoring surface water quality [17]. Unfortunately, such daily datasets are not available for most watersheds in the world because of time-consuming and budget constraint for sediment measurements.
To overcome this obstacle, Runkel et al. [18] developed the Load Estimator (LOADEST) model to estimate water quality constituent loads in rivers and streams. For a given time series dataset of streamflow and constituent concentration, LOADEST provides 11 natural log regression equations for users to select the best equation to estimate constituent loads. Despite its usefulness, this approximately two-decade-old model requires vast amounts of efforts to prepare model input data (i.e., streamflow and water quality data) with specific formats for model execution. To this end, Park et al. [19] created a userfriendly web-based tool to estimate constituent loads using LOADEST as a core engine (https://engineering.purdue.edu/~ldc/LOADEST accessed on 11 October 2022). This web-based tool was applied to two watersheds, namely the Fall Creek watershed near Fortville (US Geological Survey (USGS) Station #03351500) and the Little Buck Creek watershed near Indianapolis (USGS Station #03353637) in Indiana, USA. These authors found that annual sediment load at the Fall Creek watershed exceeds the target load and concluded that their web-based tool correctly identifies the watershed needed for sediment load reduction. However, large biases may exist in applying LOADEST to estimate water quality constituent loads [20]. Using the equations with five and seven parameters of LOADEST, Hirsch [20] showed that LOADEST could produce severe biased results under the following three conditions: (1) weak relationship between log constituent concentration and log stream discharge, (2) substantial discrepancies of the relationship shape with seasons, and (3) severely heteroscedastic residuals. In addition, LOADEST is not able to predict the peaks and valleys of constituent loads occurred at certain dates as those regression equations used in LOADEST do not cope with peaks and valleys of streamflow. This is a disadvantage for estimating constituent loads under extreme wet (or high flow) and drought (or low flow) conditions. Furthermore, LOADEST only predicts mean daily constituent loads on monthly or seasonal basis but not on daily basis [18].
Recently, Ouyang [21] developed a flow-weighted method to predict continuous daily total phosphorus (TP) loads in streams using the seasonal TP loads. A flow-weighted average is the average of a quantity after weighted proportional to a corresponding flow rate. For instance, if the daily TP concentrations are measured continuously from a stream, their flow-weighted mean TP concentration would be the sum of the products of each measured daily TP concentration times its respective stream flow rate, and then divided by the sum of the measured flow rates. After validations with measured daily TP data from three different USGS gage stations with good statistical comparisons, the author concluded that the flow-weighted method is a useful tool to disaggregate the seasonal TP loads into the daily TP loads when the continuous measured daily TP data are not available. The author further postulated that the method could be used to predict other daily water quality constituent loads based on sparse measured data. The purpose of this study was, therefore, to predict continuous and multi-year daily sediment loads based on sparse sediment measurements using the flow-weighted method. The specific objectives were to: (1) present the step-by-step procedures on application of the flow-weighted method; (2) validate the method using the field observations from six different geographical locations across the US; and (3) present example applications and discuss assumptions in applying the flow-weighted method.

Method Description
The flow-weighted method [21] employed in this study includes the following three major steps: (1) Check for correlation between the measured seasonal sediment load and the measured seasonal discharge in a dataset. A good linear correlation between these two Hydrology 2022, 9, 181 3 of 16 seasonal variables is a prerequisite to apply the flow-weighted method. It should be noted that the measured seasonal sediment load is obtained by averaging sparse daily measured sediment concentrations within a season and multiplying by seasonal discharge volume in that season. The linear correlation is determined by using coefficient of determination (R 2 ) and p-value; (2) Calculate flow-weighted partitioning coefficients. Once a good linear correlation is obtained for a dataset, the daily flow-weighted partitioning coefficients are calculated as [21]: and where F is the daily flow-weighted partitioning coefficient (day/season), D season is the measured seasonal discharge volume (L), D daily is the measured daily discharge volume (L), i is the specific date, and n is the number of dates in a season; and (3) Determine measured daily sediment load. The measured daily sediment load is calculated as [21]: where L ds is the measured daily sediment load (g/d) and L ss is the measured seasonal sediment load (g/season). For a good application of the flow-weighted method, one daily measured sediment concentration each month within a season would be better although a missing sediment concentration in some months can be still acceptable. It should be also noted that the continuous measured daily discharges within a season are needed to calculate the daily flow-weighted coefficients although some occasional missing daily discharges in that season may be acceptable.
As a demonstration, Table 1 lists the sparse daily measured sediment data from three out of six USGS stations used in this study. There was one daily measured sediment data point each month for most of the months although the missing measured data points did occur for some months. The data in Table 1 were used to predict continuous daily sediment loads with the flow-weighted method. point each month for most of the months although the missing measured data points did occur for some months. The data in Table 1 were used to predict continuous daily sediment loads with the flow-weighted method.

Example of Method Application
Station #01358000 in Hudson River at Green Island, New York was used as an example application of the flow-weighted method. This station has some sparse, discontinuous, and intermittent measured daily total solid concentrations from March 1971 to May 1975 ( Table 1). The first step in flow-weighted method application is to calculate the seasonal Hydrology 2022, 9, 181 6 of 16 sediment load using the sparse daily measured data. In this study, seasons are defined as spring from March to May, summer from June to August, fall from September to November, and winter from December to February. As shown in Table 2, an average seasonal total solid concentration (Column C) was calculated by averaging daily total solid concentrations (Column B) within a season. The seasonal sediment load (Column E) was then calculated by multiplying the average season total solid concentration (Column C) with seasonal discharge volume (Column D). The seasonal discharge volume (Column D) was calculated using Equation (2). It should be noted that the total solid concentrations (Column B) for some months are missing (e.g., August 1972), which are still acceptable in using the flow-weighted method. It should be also kept in mind that the seasonal discharge volume in Column D is the sum of continuous measured daily stream discharges (not shown in Table 2 but can be obtained from USGS Station #01358000) within a season. The second step is to calculate the flow-weight partitioning coefficients using Equation (1). As shown in Table 3, the measured daily discharges in m 3 /s (Column B) were converted to m 3 /day (Column C). The flow-weighted partitioning coefficients (Column F) were then obtained by dividing the measured daily discharge (Column C) with the measured seasonal discharge (Column D). The last step is to predict the daily total solid loads using Equation (3). This was accomplished by multiplying daily flow-weighted coefficients (Column F) with measured seasonal total solid load (Column E). The results were shown in Column G. The same steps were used to analyze the daily sediment loads for the rest of the USGS stations.

Method Validation
The flow-weighted method was validated by comparing the method predicted daily sediment loads with field measured data. The goodness of the comparison was determined using coefficient of determination (R 2 ), standard deviation (SD), normalized root mean square error (nRMSE), and Nash-Sutcliff efficiency (NSE). The nRMSE is calculated as [22]: where O i is the field observation, S i is the model prediction, O is the average of field observation, and n is the total number of field observations. The NSE is given as [23]: NSE ranges from −∞ to 1 with the values of 1 for a perfect fit, > 0.75 for very good fit, between 0.36 and 0.75 for a reasonable fit, and < 0.36 for unsatisfied fit of the method [24] The USGS data (Tables S1 and S2) and the associated R script ( Figure S1) used to implement the flow-weighted method for Station #01358000 are given in Supplementary Materials Section. Interested users can slightly modify the R script for their own datasets and study sites when applying the flow-weighted method.

Correlation between Measured Seasonal Stream Discharge and Sediment Load
A good linear correlation between the measured seasonal discharge and the measured seasonal sediment load is a prerequisite to apply the flow-weighted method. Such correlation was evaluated using R 2 and p-value. As shown in Figure 2, the values of R 2 were 0.87 for Station #01358000, 0.85 for Station #02231000, 0.10 for Station # 11447650, 0.90 for Station # 013342500, 0.67 for Station # 14211720, and 0.94 for Station # 05378500; whereas the values of p were < 0.001 for all stations except for Station #11447650. The linear regression equation for each station was also given in Figure 2. Based on R 2 and p values, good to very good linear correlations existed between the measured seasonal discharge and the measured seasonal sediment load for all the stations except for Station #11447650 (R 2 = 0.10, p = 0.1947) in Sacramento River at Freeport, California. It should be noted that sediment concentration for Station #01358000 in Hudson River at Green Island, New York was measured as total solid, sediment concentration for Station #11447650 in Sacramento River at Freeport, California was measured as suspended solid, and sediment concentration for the rest of the stations was measured as dissolved solid.

Method Prediction vs. Field Measurement
A comparison of daily sediment loads between the method predictions and the field measurements for all stations is given in Figure 3. The statistical values ranged from 0.59 to 0.96 for R 2 , from 0.31 to 117 g/d for nRMSE, from 0.59 to 0.89 for NSE, 6.85 × 10 8 to 2.40 × 10 10 for SD of observed data, and less than 0.001 for p-value. These statistics demon-

Method Prediction vs. Field Measurement
A comparison of daily sediment loads between the method predictions and the field measurements for all stations is given in Figure 3. The statistical values ranged from 0.59 to 0.96 for R 2 , from 0.31 to 117 g/d for nRMSE, from 0.59 to 0.89 for NSE, 6.85 × 10 8 to 2.40 × 10 10 for SD of observed data, and less than 0.001 for p-value. These statistics demonstrated that the flow-weighted method predicted the daily total sediment loads from fairly to very good, depending on the USGS stations. The graphical visualization of the peaks and valleys of daily sediment loads between the method predictions and the field measurements for all six USGS stations is shown in Figures 4 and 5. In general, the predicted peaks and valleys of the daily sediment loads matched most of those measured ones well visually except for Station #11447650 located in Sacramento River at Freeport, California (Figure 4). In particular, the predicted peaks were much lower than the measured peaks, whereas the predicted valleys were higher than the measured valleys for Station #11447650.

Daily Sediment Load Prediction
Predicted daily continuous sediment loads along with corresponding measured discharges for all six stations are shown in Figures 6 and 7. The predicted peaks and valleys of the sediment loads had similar fluctuated patterns to those of the measured discharges for most stations excepted for Station #11447650 (Figure 6). That is, an increase in measured discharges increased the predicted sediment loads, whereas a decrease in the measured discharges decreased the predicted sediment loads. For example, the stream discharge increased from 810 m 3 /s on 1 May 1972 to 1421 m 3 /s on 17 May 1972 for Station #01358000 in Hudson River at Green Island, New York; whereas the predicted total solid load increased from 1.05 × 10 10 to 1.85 × 10 10 g/d for the same period and station. A 76% Figure 5. Comparisons of the daily sediment loads between the method predictions and the field measurements for Stations #013342500, #14211720, and #05378500.

Daily Sediment Load Prediction
Predicted daily continuous sediment loads along with corresponding measured discharges for all six stations are shown in Figures 6 and 7. The predicted peaks and valleys of the sediment loads had similar fluctuated patterns to those of the measured discharges for most stations excepted for Station #11447650 (Figure 6). That is, an increase in measured discharges increased the predicted sediment loads, whereas a decrease in the measured discharges decreased the predicted sediment loads. For example, the stream discharge increased from 810 m 3 /s on 1 May 1972 to 1421 m 3 /s on 17 May 1972 for Station #01358000 in Hudson River at Green Island, New York; whereas the predicted total solid load increased from 1.05 × 10 10 to 1.85 × 10 10 g/d for the same period and station. A 76% increase in discharge increased the predicted total solid load by about 75%. When the stream discharge decreased from 1421 m 3 /s on 17 May 1972 to 331 m 3 /s on 31 May 1972 at the same station, the predicted total solid load decreased from 1.85 × 10 10 to 4.31 × 10 9 g/d. A 54% decrease in discharge decreased the predicted total solid load by 42%.
For Station #11447650 in Sacramento River at Freeport, California, the fluctuated pattern of the measured discharges did not correspond well with those of the predicted daily sediment loads (Figure 6). In other words, an increase in the measured daily discharges may not increase the predicted daily sediment loads. For instance, there was a highest peak of the predicted sediment load (5.1 × 10 9 g/d) on 11 May 2017, but there was no peak of the measured discharge at the same date ( Figure 6).
Hydrology 2022, 9, x FOR PEER REVIEW 13 of 17 increase in discharge increased the predicted total solid load by about 75%. When the stream discharge decreased from 1421 m 3 /s on 17 May 1972 to 331 m 3 /s on 31 May 1972 at the same station, the predicted total solid load decreased from 1.85 × 10 10 to 4.31 × 10 9 g/d. A 54% decrease in discharge decreased the predicted total solid load by 42%. Figure 6. Predicted daily sediment loads and their corresponding measured daily discharges for Stations #013580003500, #02231000, and #11447650. For Station #11447650 in Sacramento River at Freeport, California, the fluctuated pattern of the measured discharges did not correspond well with those of the predicted daily sediment loads ( Figure 6). In other words, an increase in the measured daily discharges may not increase the predicted daily sediment loads. For instance, there was a highest peak of the predicted sediment load (5.1 × 10 9 g/d) on 11 May 2017, but there was no peak of the measured discharge at the same date ( Figure 6).

Discussion
A major assumption in applying the flow-weighted method to predict the daily continuous sediment loads is that a good linear correlation exists between the measured seasonal discharges and the measured seasonal sediment loads. This correlation was tested

Discussion
A major assumption in applying the flow-weighted method to predict the daily continuous sediment loads is that a good linear correlation exists between the measured seasonal discharges and the measured seasonal sediment loads. This correlation was tested using the R 2 and p value. It should be noted that a good linear correlation determined by R 2 varies with research discipline and there is no standard guideline to decide which value is acceptable. Henseler et al. [25] proposed a rule of thumb for acceptable R 2 with 0.75 as substantial, 0.50 as moderate, and 0.25 as weak. Since watershed stream discharge and sediment load are highly dynamic and complex processes, a good linear correlation is determined when R 2 ≥ 0.75 and p < 0.01. Based on this criterion, the flow-weighted method can be used to predict daily continuous sediment loads for all stations except for Station #11447650 (R 2 = 0.10, p = 0.20). In other words, five USGS stations, namely Stations #01358000, #02231000, # 013342500, # 14211720 and #05378500, should be retained for predicting daily sediment loads using the flow-weighted method.
An elaborate observation of the measured sediment data listed in Table 1 revealed that although there was one measured sediment data point in most of the months within a season, there were some missing measured data in certain months within a season for each USGS station. For example, the measured sediment data were missing in August 1972 for Station #01358000, December 1976 for Station #02231000, and February 1980 for Station #013342500. These occasional missing monthly data seem to be acceptable as the good linear correlations between the measured seasonal discharges and the measured seasonal sediment loads still existed in the datasets for the three stations ( Figure 2). Ideally, it would be better if there was one measured sediment data each month within a season in applying the flow-weighted method.
To develop confidence in applying the flow-weighted method, a comparison of daily sediment loads between method predictions and field measurements is necessary. Based on the statistical analyses (R 2 , nRMSE, NSE, SD, and p-value) shown in Figure 3, the flowweighted method well predicted the daily sediment loads for the five USGS stations that had good linear correlations between the measured seasonal discharges and the measured seasonal sediment loads. It should be pointed out that the values of the nRMSE shown in Figure 3 varied from 0.31 to 117 g/d. Ideally, the best value of nRMSE or RMSE (root mean square error) for an optimal comparison between the method predictions and field measurements would be zero. In practice, the value of the zero nRMSE or RMSE is seldom achieved. Kastridis et al. [26] reported an acceptable RMSE is that the value of RMSE/SD is < 0.65 and SD is from observed data. In this study, the values of RMSE/SD and PBIAS (percent bias) are, respectively, 0.33 and −2.5 for Station #01358000, 0.15 and −1.5 for Station #02231000, 0.60 and −2.7 for Station #11447650, 0.36 and 3.6 for Station #013342500, 0.67 and 6.9 for Station #14211720, and 1.18 and 17.2 for Station #05378500. Although the value (1.18) of RMSE/SD for Station #05378500 is > 0.65, the other statistical measures such as R 2 , NSE, PBIAS, and p-value for this station are acceptable. Additionally, very good agreements of the peaks and valleys between the measured and predicted daily sediment loads for the five stations (Figures 4 and 5) further confirmed that the flow-weighted method is a useful tool to predict daily sediment loads when the sparse measured data are available.
Very few studies are performed to predict sediment loads based on sparse, discontinuous, and intermittent measured data. Park et al. [27] identified the correlation between the sparse measured data and the LOADEST model predictions in annual sediment loads for five USGS stations across the US. These authors found that using the mean flow calibrated by the regression equations of LOADEST reduces errors in annual sediment load from −39.7 to −10.8% as compared to using the measured data. Apparently, the LOADEST model is not an ideal tool to predict annual sediment loads in their study. As shown in Figure 3, the flow-weighted method used in this study not only predicted daily sediment loads but also matched the peaks and valleys of the daily sediment loads well.
For a demonstration purpose on how the flow-weighted method predicts daily sediment loads when a poor linear correlation presents between the measured seasonal discharges and the measured seasonal sediment loads, Station #11447650 (R 2 = 0.10, p = 0.20) was included in the validation study. Results indicated that the flow-weighted method did not predict the daily sediment loads well, especially the peaks and valleys (Figure 6), for this station. Therefore, it is not advised to use the flow-weighted method when a poor correlation presented. Although the reasons for a poor correlation between the measured seasonal discharges and the measured seasonal sediment loads for this station remain to be investigated, a possible explanation would be the sediment concentration at this station was measured as suspended solid. Suspended solids may deposit to the stream bed when the stream discharges are at a low flow velocity. In other words, a poor linear correlation may exist between suspended solid concentrations and stream discharge at a low flow velocity [28].
The predicted daily sediment loads had a similar fluctuation pattern as that of the measured daily stream discharges. This is expected because a good linear correlation between the seasonal sediment loads and the seasonal discharges exists, which is a prerequisite in applying the flow-weighted method. When this prerequisite did not meet, different fluctuating patterns in peaks and valleys between the predicted daily sediment loads and the measured daily stream discharges occurred as shown in Figure 6 for Station #11447650 in Sacramento River at Freeport, California. Under this condition, the flow-weighted method may not be applicable to predict daily sediment load.

Summary
Sediment contamination and deposition in rivers and streams are serious environmental and ecological concerns. While the sparse, intermittent, and discontinuous measured daily sediment data may exist for some watersheds, the long-term and continuous measured daily sediment data in most watersheds around the world are not available because of time-consuming and budget constraint. However, when developing TMDL and performing model calibration and validation for sediments, such a long-term and continuous daily dataset is essential.
The flow-weighted method, developed by Ouyang [20], was employed to predict daily sediment loads using the sparse, intermittent, and discontinuous measured data. The method predictions were validated with measured data from six different USGS gage stations across the US, indicating that the flow-weighted method is capable to fill the daily sediment load data gaps.
A good linear correlation between measured seasonal discharge and measured seasonal sediment load is a prerequisite in applying the flow-weighted method. Ideally, it would be better if there is one measured daily sediment data each month within a season in applying the flow-weighted method although some missing daily data for certain months seem to be acceptable provided a good linear correlation between the measured seasonal discharges and measured seasonal sediment loads exists.