Copula-Based Inﬁlling Methods for Daily Suspended Sediment Loads

: Less-frequent and inadequate sampling of sediment data has negatively impacted the long and continuous records required for the design and operation of hydraulic facilities. This data-scarcity problem is often found in most river basins of Taiwan. This study aims to propose a parsimonious probabilistic model based on copulas to inﬁll daily suspended sediment loads using streamﬂow discharge. A copula-based bivariate distribution model of sediment and discharge of the paired recorded data is constructed ﬁrst. The conditional distribution of sediment load given observed discharge is used to provide probabilistic estimation of sediment loads. In addition, four different methods based on the derived conditional distribution of sediment load are used to give single-value estimations. The obtained outcomes of these methods associated with the results of the traditional sediment rating curve are compared with recorded data and evaluated in terms of root mean square error (RMSE), mean absolute percentage error (MAPE), Nash-Sutcliffe efﬁciency (NSE), and modiﬁed Nash-Sutcliffe efﬁciency (MNSE). The proposed approach is applied to the Janshou station located in eastern Taiwan with recorded daily data for the period of 1960–2019. The results indicate that the inﬁlled sediments by the sediment rating curve exhibit better performance in RMSE and NSE, while the copula-based methods outperform in MAPE and MNSE. Additionally, the inﬁlled sediments by the copula-based methods preserve scattered characteristics of observed sediment-discharge relationships and exhibit similar frequency distributions to that of recorded sediment data.


Introduction
Hydrologic and climate data play a significant role in water-resources engineering planning, design, and management. Sufficiently long and complete data are essential for providing accurate statistical analysis in design and establishing efficient operation rules of hydraulic facilities. Hydrologic and climate data are uniquely recorded in time and space. If the data are not recorded at a specific time and location, the lost values can only be estimated [1]. Incomplete and missing data are frequently met in many applications worldwide since a considerable amount factors lead to missing data. These factors include equipment failures, extreme natural disasters (e.g., typhoon, earthquake, and landslide), mishandling of recorded data, malfunction of data storage systems, and others [2,3]. Infilling the missing data has thus become a common practice when pre-processing data to provide long and complete data for optimal hydrologic modeling and design purposes.
Infilling or imputing missing data is a process of substituting the missing values with the most plausible values [4,5]. A vast amount of approaches such as linear regression, multiple linear regression, machine learning techniques, copula-based estimation, and others have been proposed in the literature to infill missing data. Bárdossy and Pegram [1], Ben Aissia et al. [3], and Hamzah et al. [6] provide detailed reviews on methods used in infilling missing hydrologic data.
Suspended sediment load is an important variable in water-resources engineering since it affects reservoir sedimentation, hydraulic structures design, water quality, ecological and recreation, watershed management, and channel stability [7,8]. In addition to missing data, infrequent or periodical sampling is another primary reason for incomplete sedimentation data in most river basins. Suspended sediment load transported in rivers is a complex process which relates to physical characteristics of watershed and rivers. Predicting sediment loads through physical models might be prohibitive due to the complex process and lack of long-term observations [9]. Instead, statistical empirical models serve as alternative approaches to infill missing or unrecorded (missing thereafter) sediment data since these models relax the requirement of detailed physical information. For example, the sediment rating curve based on the empirical relationship between suspended sediment load and streamflow discharge is traditionally used to infill missing sediment data since continuous discharge records are available in most river basins [10]. Several recent studies [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27] have proposed various techniques or adopted various variables to estimate or predict suspended sediment loads. Ben Aissia et al. [3] indicated that the copula-based method is one of the recent methods and provides probabilistic characteristics of the missing data. Di Lascio et al. [5] revealed that Käärik and Käärik [4] firstly propose the Gaussian copula to impute correlated incomplete data.
Inherently scattered characteristics between observed suspended sediment load and streamflow discharge indicate that joint modeling of the probabilistic properties between suspended sediment loads and streamflow discharge is a proper approach to achieve this purpose. Difficulties in deriving such bivariate distribution of sediment load and discharge stem from different marginal distributions used to fit sediment load and discharge. Copulas have recently gained popularity worldwide [28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47] in hydrology as they construct a multivariate distribution by separately linking different marginal univariate distributions and the joint dependence structure among random variables. However, Zhang et al. [48] indicated that few studies use copulas to explore the joint distribution of sediment and discharge. The related studies include Bezak et al. [49], who conducted frequency analysis of annual peak discharge and the corresponding hydrograph volume and suspended sediment concentration from stations located in Slovenia and USA using trivariate symmetric and asymmetric copula functions. The authors of [48] constructed a bivariate probability distribution of annual runoff and sediment in the Wei River (China) using copulas for estimating synchronous-asynchronous encounter probabilities of annual rich-poor runoff and sediment. Guo et al. [50] used the double-mass-curve method to detect the inflection point in the runoff-sediment relationship of the Weihe River (China) and analyzed the synchronous-asynchronous joint properties of high-low runoff and sediment based on copulas. Bezak et al. [18] estimated event-based suspended sediment loads based on measured precipitation sums and discharge using copulas in two catchments of Slovenia. Huang et al. [51] used the copula-based method to detect the nonstationarity of the relationship between annual runoff and sediment load in the Wei River, China. Shojaeezadeh et al. [9] proposed a probabilistic method based on copulas with Bayesian networks to predict the suspended sediment load given discharge for high flow events in seven major rivers of the contiguous US. Peng et al. [52] simulated daily suspended sediment concentration through the copula-based multivariate conditional distribution using previous daily suspended sediment concentration and concurrent daily streamflow of the Jinsha River basin, China. Peng et al. [53] proposed a copula-based model of the annual maximum suspended sediment concentration, peak discharge, and flood volume and analyzed multivariate joint and conditional return periods for suspended sediment concentration under flood conditions in the Jinsha River basin, China.
The copula-based method infills the missing data by using the conditional distribution of missing variables given the value of observed variable, which describes possible outcomes of missing data associated with corresponding probabilities. However, singlevalue estimation of missing variable is often required in practical applications. Several approaches are proposed in the literature to estimate the single missing value by copula-based conditional distribution. Käärik and Käärik [4] suggested the most likely value (i.e., the mode) as the imputed value. Di Lascio et al. [5] obtained the single missing value by the Hit or Miss Monte Carlo method. Bezak et al. [18] suggested the median of 10,000 possible sediment values from the copula-based conditional distribution of 10,000 randomly generated peak discharge and precipitation. Peng et al. [52] proposed a stochastic simulation procedure to obtain the daily suspended sediment concentration using previous daily suspended sediment concentration and concurrent daily streamflow.
The main aim of this study is to infill daily suspended sediment loads based on copulas to provide probabilistic as well as single-value estimations using streamflow discharge. A copula-based bivariate probability model of concurrent daily suspended sediment load and discharge is firstly constructed. The conditional probability distribution of sediment data is then derived given the recorded discharge. Four single-value imputation methods for sediment data associated with the traditional sediment rating curve are analyzed and compared with the observed sediment data to demonstrate the statistical performance in terms of goodness-of-fit measures between imputation and recorded data. The proposed approach is applied to Jenshou station located in eastern Taiwan with 1960-2019 recorded data for demonstration.

Copula-Based Joint Probability Distribution of Sediment and Discharge
The basic principle of infilling missing data is based on the relationship between missing and observed variables and the specific value of the observed variable. Probabilistic estimation of missing data is made possible through constructing the multivariate probability model of missing and observed variables. Copulas offer flexibility to decompose the multivariate probability model by marginal distributions and the link between them.
Let L and Q denote the continuous random variables of suspended sediment load and streamflow discharge, respectively. F L (l) and F Q (q) are the corresponding cumulative distribution function (CDF) of L and Q, respectively. The Sklar theorem [54] states that if F L (l) and F Q (q) are continuous, then there is a unique copula such that the joint cumulative distribution function (JCDF) of L and Q can be written as where C(·) is the copula function. The corresponding joint probability density function (JPDF) of L and Q thus becomes where f L (l) and f Q (q) are the probability density functions (PDF) of L and Q, respectively; and c(·) is the copula density, which is obtained by where u and v denote two dependent CDFs. The two-stage maximum likelihood method, called the method of inference for margins (IFM) proposed by Joe [55], is adopted in this study to estimate parameters of marginal distributions and copulas since less computational intensive. This method consists of separate estimations of the parameters of univariate marginal distributions, followed by an estimation of copula parameters.
Three copulas commonly used in constructing bivariate model of sediment and discharge, including Clayton, Frank, and Gumbel-Hougaard, are employed in this study [18,[48][49][50][51][52][53]. These three copulas and the corresponding copula densities are summarized in Table 1 [56]. The Kolmogorov-Smirnov (K-S) test and the Cramér-von Mises (C-M) test are conducted for accessing goodness-of-fit of the marginal distributions and copulas, respectively [57]. The best-fitted marginal distributions and copula are determined by the minimum AIC (Akaike Information Criterion). Table 1. Summaries of copulas and the corresponding copula densities and range of parameter used in this study.

Name
Copula Copula Density

Range of Parameter
Clayton

Copula-Based Conditional Distribution of Sediment
Probabilistic estimation of the suspended sediment load is implemented by the conditional probability density function (CPDF) of L given an observed discharge q o . This CPDF f L|q o (l) in terms of copula is written as The corresponding conditional cumulative distribution function (CCDF) F L|q o (l) in term of copula is expressed as These conditional functions describe all possible outcomes of the sediment data given the observed discharge, which reflects inherent scatter between sediment and discharge.

Copula-Based Single-Value Sediment Imputation Methods
Although copula-based conditional distribution offers probabilistic estimations and uncertainty assessments about imputed value, single-value estimation of sediment data is often required in practical applications. Several approaches are proposed in the literature to obtain the single-value imputed data from the derived conditional distributions. A total of four methods using the copula-based conditional distributions to infill the suspended sediment loads are adopted in this study. The first two methods use the CPDF defined in Equation (4) and the remaining methods employ the CCDF defined in Equation (5).
• Method 1. The most natural thought to obtain the imputed data is the mode of the CPDF, which represents the most likely value (i.e., the quantile with the highest CPDF value) .
where l e denotes the estimated imputed sediment data with the highest CPDF value (i.e., the mode of the CPDF); and ψ denotes the maximum value of the CPDF.
• Method 2. Di Lascio et al. [5] proposed the Hit or Miss Monte Carlo method to estimate missing data, which uses the CPDF and random numbers. The following steps are used to estimate the imputed sediment data.
Step 1. Obtain F Q (q o )= v by an observed discharge q o .
Step 2. From the CPDF f L|q o (l) in Equation (4), define the l min and l max as the minimum and maximum sediment loads of the CPDF, and ψ as the maximum value of the CPDF.
Step 3. Generate two random numbers r 1 and r 2 from the uniform distribution U(0, 1).
Step 5. If r 2 ψ ≤ f L (e)c(F L (e), v) then l e = e, else return to Step 3.
• Method 3. Peng et al. [52] used the CCDF and a random number to infill the missing sediment data. Step Step 2. Generate a random number r from the uniform distribution U(0, 1).
Step 3. According to the CCDF defined in Equation (5), solve r = C L|q o (F L (l)|v) and obtain F L (l)= u.
Step 4. The estimated imputed sediment load is l e = F −1 L (u). • Method 4. Bezak et al. [18] used a similar procedure in Method 3, but with 10,000 generations, to have 10,000 imputed values, and selected the median of these 10,000 data as the imputed sediment data.

Suspended Sediment Load and Streamflow Discharge Data
The recorded daily suspended sediment load and streamflow discharge data at Jenshou station (121.50 • E, 23.96 • N) located in the Hualian River of eastern Taiwan for the period of 1960-2019 are employed to demonstrate the proposed approach. Annual mean rainfall in the Hualian River basin is 2550 mm and approximate 70% of annual rainfall is clustered within the wet-season (June−November). Annual average temperature in this basin is 22.8 • C with highest temperatures exceeding 30 • C occurring in summer. Suspended sediment load and streamflow discharge data have been measured and collected by the Water Resource Agency in Taiwan. Basic information, including river, catchment area, data length, number of sediment and discharge data, are summarized in Table 2. Infrequent sampling of daily sediment data leading to the number of recorded suspended sediment load is approximately 2% of recorded discharge data at Jenshou station. Mean and standard deviation of entire daily discharge series and paired sediment-discharge data in the period of 1960-2019 are also reported in Table 2, respectively. Recorded paired sediment-discharge data at Jenshou station are shown in Figure 1. Positive correlated sediment-discharge relationship is observed in Figure 1. However, clear scatter between recorded sediment and discharge data cannot be ignored, especially at the moderate-to high-flow conditions. Recorded suspended sediment load data are highly clustered below mean discharge of recorded paired data. For example, approximately 76% of sediment data are measured below 99.96 m 3 /s, the mean discharge of recorded paired data.
Recorded paired sediment-discharge data of the period of 1960-2019 at Jenshou station are split into two periods. Approximately 70% of recorded data of the period of 1960-2000 are employed to construct models and calibrate model parameters and the remaining data of the period of 2001-2019 are used to validate the performance of the constructed models. mately 76% of sediment data are measured below 99.96 m 3 /s, the mean discharge of recorded paired data.
Recorded paired sediment-discharge data of the period of 1960−2019 at Jenshou station are split into two periods. Approximately 70% of recorded data of the period of 1960−2000 are employed to construct models and calibrate model parameters and the remaining data of the period of 2001−2019 are used to validate the performance of the constructed models.

Empirical Sediment Rating Curve
The sediment rating curve is a traditional approach to infilling sediment data based on the empirical relationship between recorded sediment and discharge data. A commonly used relationship between sediment and discharge data is the power law function, which is expressed as where L and Q denote suspended sediment load and streamflow discharge, respectively; and a and b are coefficients which are estimated from recorded data. The empirical sediment rating curve at Jenshou station, shown in Figure 1, is determined by nonlinear least squares regression [58] based on recorded data of the period of 1960−2000. The empirical sediment rating curve at Jenshou station is written as L = 448.82Q 1.1449 (8) where L denotes the estimated suspended sediment load in unit of ton/day; and Q is recorded streamflow discharge in unit of m 3 /s.

The Best-Fitted Marginal Distributions and Copulas
A total of five widely used two-parameter distributions, including normal (NO), lognormal (LNO), gamma (GA), Gumbel (GU), and Weibull (WEI), are used to model the

Empirical Sediment Rating Curve
The sediment rating curve is a traditional approach to infilling sediment data based on the empirical relationship between recorded sediment and discharge data. A commonly used relationship between sediment and discharge data is the power law function, which is expressed as where L and Q denote suspended sediment load and streamflow discharge, respectively; and a and b are coefficients which are estimated from recorded data. The empirical sediment rating curve at Jenshou station, shown in Figure 1, is determined by nonlinear least squares regression [58] based on recorded data of the period of 1960-2000. The empirical sediment rating curve at Jenshou station is written aŝ L= 448.82Q 1.1449 (8) whereL denotes the estimated suspended sediment load in unit of ton/day; and Q is recorded streamflow discharge in unit of m 3 /s.

The Best-Fitted Marginal Distributions and Copulas
A total of five widely used two-parameter distributions, including normal (NO), lognormal (LNO), gamma (GA), Gumbel (GU), and Weibull (WEI), are used to model the suspended sediment load and streamflow discharge. Distribution parameters are estimated by the maximum likelihood method. The goodness-of-fit of each distribution for sediment load and discharge are accessed by the K-S test and the best-fitted distribution is determined by the minimum AIC. The recorded daily suspended sediment load and discharge at Jenshou station for the period of 1960-2000 are best-fitted by the lognormal distribution with the corresponding parameters are summarized in Table 3. The PDFs of suspended sediment load and discharge are respectively written as .134 2 (10) Figure 2a,b illustrates the fitted lognormal distributions associated with recorded suspended sediment and discharge, respectively. Good agreements between the fitted distribution and recorded data for sediment and discharge are observed in Figure 2. suspended sediment load and streamflow discharge. Distribution parameters are estimated by the maximum likelihood method. The goodness-of-fit of each distribution for sediment load and discharge are accessed by the K-S test and the best-fitted distribution is determined by the minimum AIC. The recorded daily suspended sediment load and discharge at Jenshou station for the period of 1960−2000 are best-fitted by the lognormal distribution with the corresponding parameters are summarized in Table 3. The PDFs of suspended sediment load and discharge are respectively written as  Figure 2a,b illustrates the fitted lognormal distributions associated with recorded suspended sediment and discharge, respectively. Good agreements between the fitted distribution and recorded data for sediment and discharge are observed in Figure 2.  The IFM is used to estimate copula parameter, the C-M test is then used to access goodness-of-fit of each copula, and the minimum AIC is employed to determine the bestfitted copula. The best-fitted copula for the paired sediment-discharge data (period of 1960-2000) at Jenshou station is the Gumbel-Hougaard copula. The copula parameter is also reported in Table 3. Copula-based JCDF of sediment and discharge is written as  (11) where F L (l) and F Q (q) denote the values of CDFs of suspended sediment load and discharge, respectively, which are determined by Figure 2c illustrates the contours of probabilities determined by the fitted copula associated with recorded paired sediment-discharge data.

Probabilistic Sediment Estimations Using Copula-Based Conditional Probability Density Function
Probabilistic estimation of missing suspended sediment load is made possible using the copula-based CPDF and CCDF of sediment given an observed discharge q o (Equations (4) and (5)) and the best-fitted distributions of sediment and discharge (Equations (9)-(13)). The copula-based CPDF and CCDF of sediment given observed discharge at Jenshou station are given below, respectively.
where q o denotes an observed discharge. Figure 3 illustrates the CPDFs of suspended sediment load given observed discharge equals 20, 30, and 50 m 3 /s for demonstration. The derived CPDFs quantify the estimation uncertainty of suspended sediment load. For example, given the observed discharge of 20 m 3 /s, the probabilities that suspended sediment loads exceeding 100, 1000, and 5000 ton/day are 0.955, 0.547, and 0.133, respectively. Given the observed discharge of 30 m 3 /s, the interquartile range of suspended sediment load is bounded between 989 and 5348 ton/day. The probability that the suspended sediment load ranged between 1000 and 5000 ton/day is 0.349 given the observed discharge of 50 m 3 /s. The CPDFs generally shift rightward and become flat with increasing discharge. For instance, the modes of the CPDFs given a discharge of 20, 30, and 50 m 3 /s are 168, 384, and 1226 ton/day, respectively. The flat CPDF of a greater discharge implies that suspended sediment load is distributed in a very large range. That is, greater uncertainties exist in an estimation of suspended sediment load for the condition of large discharge. For instance, the interquartile range increases from 2327 ton/day for the observed discharge of 20 m 3 /s to 4359 and 10,087 ton/day for observed discharges of 30 and 50 m 3 /s, respectively.
The inherently scattered relationship between recorded sediment and discharge leads to different recorded sediments observed for nearly identical discharge.  Figure 3, which provides probabilistic estimation of missing sediment data. However, single-value estimation of sediment data is often required in practical applications.
For instance, the interquartile range increases from 2327 ton/day for the observed discharge of 20 m 3 /s to 4359 and 10,087 ton/day for observed discharges of 30 and 50 m 3 /s, respectively.
The inherently scattered relationship between recorded sediment and discharge leads to different recorded sediments observed for nearly identical discharge. For instance, recorded suspended sediment loads of 1250.4, 3944.3, 2084.0, 960.3, and 1379.0 ton/day are noted for discharges of 19.8, 20.1, 20.1, 20.0, and 19.8 m 3 /s, respectively. The scattered characteristics are captured by the derived CPDF shown in Figure 3, which provides probabilistic estimation of missing sediment data. However, single-value estimation of sediment data is often required in practical applications.

Single-Value Sediment Estimations
Four different methods based on the derived copula-based CPDF and CCDF associated with sediment rating curve are used to estimate the suspended sediment loads given the observed discharge for the period of 2001−2019 in this study. The estimations of these methods are compared with recorded data, shown in Figure 4, and evaluated in terms of root mean square error (RMSE), mean absolute percentage error (MAPE), Nash-Sutcliffe efficiency (NSE) [59], and modified Nash-Sutcliffe efficiency (MNSE) [60], which are defined below, respectively.

Single-Value Sediment Estimations
Four different methods based on the derived copula-based CPDF and CCDF associated with sediment rating curve are used to estimate the suspended sediment loads given the observed discharge for the period of 2001-2019 in this study. The estimations of these methods are compared with recorded data, shown in Figure 4, and evaluated in terms of root mean square error (RMSE), mean absolute percentage error (MAPE), Nash-Sutcliffe efficiency (NSE) [59], and modified Nash-Sutcliffe efficiency (MNSE) [60], which are defined below, respectively.
where n denotes the number of data; l i andl i denote the ith observed and estimated sediment data, respectively; and l denotes the mean observed sediment. The model with smaller RMSE and MAPE and close-to-1 NSE and MNSE denotes that it has better capability to infill missing data and has fewer deviations from the observed data. The results of RMSE, MAPE, NSE, and MNSE of these five infilling methods for the calibration  and validation (2001-2019) periods are reported in Table 4. The results indicate that sediment rating curve has best performance in RMSE and NSE. On the other hand, copula-based models (Methods 1-4) generally outperform in MAPE and MNSE and Method 1 (infilling missing value by mode) is the best model. Sediment rating curve is obtained by the least squares regression with minimized least squared deviations from the recorded data. It thus leads to better performance in criteria with the square term such as RMSE and NSE. Conditional distribution-based infilling methods, on the other hand, outperform in the other criteria.
that it has better capability to infill missing data and has fewer deviations from the observed data. The results of RMSE, MAPE, NSE, and MNSE of these five infilling methods for the calibration (1960−2000) and validation (2001−2019) periods are reported in Table 4.
The results indicate that sediment rating curve has best performance in RMSE and NSE.
On the other hand, copula-based models (Methods 1-4) generally outperform in MAPE and MNSE and Method 1 (infilling missing value by mode) is the best model. Sediment rating curve is obtained by the least squares regression with minimized least squared deviations from the recorded data. It thus leads to better performance in criteria with the square term such as RMSE and NSE. Conditional distribution-based infilling methods, on the other hand, outperform in the other criteria.       Table 5. Similar performances of these estimation methods are observed for various periods. Sediment rating curve loses its dominance in all indices for the low-and moderate-flow states. In contrast, Method 4 (median of 10,000 sediment estimations based on CCDF) outperforms other methods in several indices for these two flow states. However, sediment rating curve is the best model in RMSE and NSE and Method 1 outperforms in MAPE and MNSE of high-flow state for various periods. Different performance evaluations between Tables 4 and 5 are attributed to different infilling schemes used in various methods. The worse performances of sediment rating curve in low-and moderate-flow states are caused by its overestimations of sediment. The overestimations in low-and moderate-flow states do not produce greater squared deviations from the recorded data due to smaller sediment loads in low-and moderate-flow states. Fewer deviations in high-flow states lead to the sediment rating curve with better performance in RMSE and NSE of high-flow states for various periods (Table 5).

Discussion
Methods 1 to 4 depend on the derived conditional distributions to estimate the sediments in all flow states. No clear overestimations for these conditional distribution-based methods are observed in low-and moderate-flow states ( Figure 3) and induce better performance for all indices. However, less squared deviations for the sediment rating curve observed in high-flow state lead to only Method 1 (infilling missing value by mode) outperforming in MAPE and MNSE. Figure 5 illustrates the frequency histograms of observed sediment data and the estimations of various infilling methods for the period of 1960-2019. The results indicate that the sediment rating curve produces a histogram contradicting the histogram of the observed sediment data. This contradiction is induced by overestimations for smaller sediments, which are evidently observed for the low-flow state in Figure 4 and Table 5. In contrast, copula-based Methods 1-4 generate high frequencies in smaller sediments and low frequencies in greater sediments. However, similar histograms among Methods 2-4 are close to the histogram of recorded sediments when sediments exceed 1000 ton/day. Only Method 1 (infilling missing value by mode) reflects the similarity in all scopes of suspended sediment load. This similarity between histograms of recorded sediments and the estimated sediments by Method 1, especially when sediment <1000 ton/day, is attributed to the right-skewed CPDFs shown in Figure 3. Method 1 estimates sediments using the mode of the CPDFs, which is clustered in smaller sediments due to right-skewed CPDFs and leads to high frequencies in smaller sediments.  Figure 5 illustrates the frequency histograms of observed sediment data and the estimations of various infilling methods for the period of 1960-2019. The results indicate that the sediment rating curve produces a histogram contradicting the histogram of the observed sediment data. This contradiction is induced by overestimations for smaller sediments, which are evidently observed for the low-flow state in Figure 4 and Table 5. In contrast, copula-based Methods 1-4 generate high frequencies in smaller sediments and low frequencies in greater sediments. However, similar histograms among Methods 2-4 are close to the histogram of recorded sediments when sediments exceed 1000 ton/day. Only Method 1 (infilling missing value by mode) reflects the similarity in all scopes of suspended sediment load. This similarity between histograms of recorded sediments and the estimated sediments by Method 1, especially when sediment <1000 ton/day, is attributed to the right-skewed CPDFs shown in Figure 3. Method 1 estimates sediments using the mode of the CPDFs, which is clustered in smaller sediments due to right-skewed CPDFs and leads to high frequencies in smaller sediments. Shojaeezadeh et al. [9] and Guo et al. [50] indicated that greater discharges are associated with larger intervals of conditional marginal distribution of sediments. Further, the relationship between discharge and sediment is nonlinear and highly stochastic. That is, a similar discharge can yield hugely different sediments. These properties of probabilistic sediment estimations are in line with the results of copula-based CPDFs shown in Figure 3. Bezak et al. [18] revealed that a copula-based estimation model yields the worst fit (greatest RMSE) when compared with the results of multiple regression and exponential models in some cases. However, a copula-based model produces the smallest re- Shojaeezadeh et al. [9] and Guo et al. [50] indicated that greater discharges are associated with larger intervals of conditional marginal distribution of sediments. Further, the relationship between discharge and sediment is nonlinear and highly stochastic. That is, a similar discharge can yield hugely different sediments. These properties of probabilistic sediment estimations are in line with the results of copula-based CPDFs shown in Figure 3. Bezak et al. [18] revealed that a copula-based estimation model yields the worst fit (greatest RMSE) when compared with the results of multiple regression and exponential models in some cases. However, a copula-based model produces the smallest residuals and better results in low-medium-flow events. These findings are consistent with the results of this study reported in Table 5.

Conclusions
Based on the daily paired sediment-discharge data at Jenshou station located in eastern Taiwan for the period of 1960-2019, probabilistic and four single-value estimation models of sediment data are constructed using copulas. The Gumbel-Hougaard copula (Figure 2c) is used to model the joint probability distribution of discharge and sediment data with best-fitted lognormal distributions (Figure 2a,b) as the marginal distributions.
The copula-based CPDF ( Figure 3) and CCDF of sediments given various observed discharges provide probabilistic properties of estimated sediments such as the highly likely range of estimations, probabilities of sediments greater than or less than certain values, and various quantiles of specific probabilities. The derived CPDFs at Jenshou station shift toward the right and become flatter with increasing discharge. This phenomenon implies that the estimated suspended sediment load is distributed in a very large range for a greater discharge. That is, greater uncertainties exist in an estimation of suspended sediment load for the condition of large discharge.
The results of single-value sediment estimations for various infilling methods indicate that no single method outperforms in all evaluation criteria. The sediment rating curve has the best performance in RMSE and NSE, while copula-based methods generally outperform in MAPE and MNSE and Method 1 (infilling missing value by mode) is the best model among these copula-based methods. However, the frequency histogram of infilled sediments by the sediment rating curve contradicts the frequency histogram of recorded sediment. In contrast, the infilled sediments of the copula-based methods preserve a similar frequency histogram as noted in the recorded sediments. That is, high frequency is observed in small sediment and low frequency occurs in great sediment. Among these four methods, the frequency histogram of Method 1 is close to that of recorded sediment data.
Infilling missing sediments to have long and continuous data provides the necessary information for design and operation of water-resources engineering. Statistical methods alleviate the need for physical factors of watersheds and rivers to infill sediments. However, uncertainties existing in estimated sediments are attributed to the proposed statistical models using discharge only. Incorporating additional available parameters such as rainfall and maintaining models to increase the accuracy of infilled sediments remain as topics for further extending this study. Additionally, selecting the best estimation method among the conflicting indices using the multi-criteria evaluation approach is also important in model construction processes.