An Efficient Data Driven-Based Model for Prediction of the Total Sediment Load in Rivers

Sediment load in fluvial systems is one of the critical factors shaping the river geomorphological and hydraulic characteristics. A detailed understanding of the total sediment load (TSL) is required for the protection of physical, environmental, and ecological functions of rivers. This study develops a robust methodological approach based on multiple linear regression (MLR) and support vector regression (SVR) models modified by principal component analysis (PCA) to predict the TSL in rivers. A database of sediment measurement from large-scale physical modelling tests with 4759 datapoints were used to develop the predictive model. A dimensional analysis was performed based on the literature, and ten dimensionless parameters were identified as the key drivers of the TSL in rivers. These drivers were converted to uncorrelated principal components to feed the MLR and SVR models (PCA-based MLR and PCA-based SVR models) developed within this study. A stepwise PCA-based MLR and a 10-fold PCA-based SVR model with different kernel-type functions were tuned to derive an accurate TSL predictive model. Our findings suggest that the PCA-based SVR model with the kernel-type radial basis function has the best predictive performance in terms of statistical error measures including the root-mean-square error normalized with the standard deviation (RMSE/StD) and the Nash–Sutcliffe coefficient of efficiency (NSE), for the estimation of the TSL in rivers. The PCA-based MLR and PCA-based SVR models, with an overall RMSE/StD of 0.45 and 0.35, respectively, outperform the existing well-established empirical formulae for TSL estimation. The analysis of the results confirms the robustness of the proposed PCA-based SVR model for prediction of the cases with high concentration of sediments (NSE = 0.68), where the existing sediment estimation models usually have poor performance.


Introduction
Natural and anthropogenically driven forces during the Anthropocene are threatening the sustainable function of rivers by introducing large sediment loads to the freshwater ecosystems and fluvial systems. Extreme total sediment load (TSL) in the rivers can threaten the aquatic life and impose negative impacts on the ecosystem's health and function. Large TSL can clog the fish gills, limit the light penetration into the riverbed, decrease the photosynthesis of hydrophytes, change the riverbed with profound implications for benthic organisms, transport large pollution loads, and deplete the river dissolved oxygen content [1][2][3][4][5][6][7][8][9][10][11][12][13]. Additionally, the sediments could affect the river hydraulic and geometry by decreasing the cross-section dimension and the flow discharge capability, reducing the active storage available within the rivers, clogging bridges and culverts, and changing vegetation growth patterns that impacts roughness and ecosystem of rivers [6,8,[14][15][16][17][18][19]. Thus, an accurate predictive capability of the TSL is required to understand the physical, environmental and ecological responses of the rivers to sediment load, and minimize the negative impacts of sediments on the river ecosystem.
The quantity of the TSL in fluvial systems is usually studied by two well-established approaches. The first approach studies the bed loads and suspended loads separately, as the components of the TSL [20][21][22], while the second approach quantifies the TSL by considering the combined effects of both the bed and suspended load components [23][24][25][26]. In general, the decision to choose one of the above approaches is based on a wide range of factors such as the available data, the accuracy needed for the design studies, and the riverbed type. For example, in gravel-bed rivers, the TSL is mainly transported as the bed load, whereas in sandy-bed rivers, both suspended and bed load phases contribute to the TSL [27]. Therefore, the decision on the appropriateness and robustness of the two described approaches for TSL quantification is complex, given that the addition of hydro-morphological factors to input-output relationships can introduce different levels of uncertainty in the outcomes of numerical models. Overall, the TSL model that considers both bed and suspended load components together has been widely used by researchers given that the detection and differentiation between the bed and suspended loading is not readily possible [23][24][25][26]28,29].
Several laboratory and field-based studies have been carried out to model the river sediment loads using dimensional variables such as flow rate and geometric characteristics of the riverbed [30][31][32][33][34]. In controlled laboratory flumes, a wide range of hydraulic and morphological variables can be modelled, enabling detailed analysis of the effects of different variables influencing the sediment transport, and derivation of empirical formulae for TSL estimation. In recent years, applications of advanced data driven and machine learning (ML) approaches have been explored to provide a more accurate estimation in hydrological applications, especially suspended loads [28,[35][36][37][38][39][40][41]. However, due to the difficulty in measuring some of the effective parameters that influence the transport of sediments in rivers (hereafter referred to as "drivers"), the available empirical-based equations and data driven models for the estimation of TSL only rely on few easily measurable variables in rivers. Meanwhile, ignoring the main drivers of the TSL in rivers, can reduce the accuracy of the existing methods for predicting the sediment concentration in rivers. Therefore, a comprehensive review of the drivers of the TSL is necessary to derive more robust formulae for the estimation of the TSL for a range of hydro-environmental and geomorphological conditions. Another challenge associated with the existing empirical-based models is their applicability for extreme TSL conditions, given that such models are derived based on a limited range of sediment concentration in rivers [30]. Although empirical-based models provide reasonable accuracy for TSL estimation under the usual mild environmental conditions, the main concern is always associated with the extreme events that introduce high concentration of sediments into the rivers.
In this study, a dimensional analysis is carried out based on the literature to determine the main drivers of the TSL in rivers. Following dimensional analysis, two robust models are developed for estimation of the TSL, using multiple linear regression (MLR) and support vector regression (SVR) techniques modified by the principal component analysis (PCA) model (PCA-based MLR and PCA-based SVR models). Application of PCA on the main drivers of TSL in rivers can improve the predictive performance of MLR and SVR models by removing the dependency occurrence among inputs. Our modeling results suggest the robustness of the proposed PCA-based MLR and PCA-based SVR models for the prediction of the cases with high concentration of sediments, where the existing sediment estimation models usually have poor performance.

Dimensional Analysis
River flow properties, sediment load characteristics, and geometrical configurations are the main effective parameters determining the variation of TSL in rivers [23,24,[42][43][44]. River flow properties include the flow rate (q), velocity (u), shear velocity (u*), flow depth (H), kinematic viscosity (ϑ), and flow temperature (T). The average particle diameter by mass (d 50 ), sediment particle settling velocity (ω), and specific density of sediment particles (G) can be considered as the key sediment load characteristics. Literature suggests that the river longitudinal slope (S) is a key geometrical configuration that impacts the TSL in rivers [26]. Mathematically, the concentration of sediments in rivers C, can be described by Equation (1): In Equation (1), the sediment particle's settling velocity is determined as: where, g is the gravitational acceleration constant.
The existing data-driven models for the estimation of the TSL in rivers have used different combinations of the parameters described in Equation (1), in both dimensional and dimensionless forms. A review of the literature suggests that the dimensionless parameters adopted in previous studies can be summarized as: 50 , Equation (3) described the ten main drivers of TSL which are used as predictors of sediment concentration in this study.

Database
This study adopts sediment measurement data from large-scale physical modelling tests. Database includes 4759 laboratory experiments performed by [25], including measurements of C, q, H, G, S, T, and d 50 parameters, as the most effective factors contributing to the TSL in rivers. The flow velocity and shear velocity were calculated for the database as u = q/W (W is the flow width) and u * = gSH, respectively. The kinematic viscosity ϑ, was determined for all the data using the methodology proposed by [45] and T measurements. Figure 1 illustrates the variation of the parameters in the raw database used in this study. The statistical characteristics of the raw database was determined and presented in Table S1 and Supplementary Materials.

Development of the TSL Regression-Based Models
All the ten dimensionless drivers given in Equation (3) are considered as the predictors for development of the TSL regression-based predictive models. The literature suggests that these drivers have major influence on the TSL in rivers [23,24,[42][43][44]. Following the standardized protocols for development of data-driven regression models, approximately 75% of the database (3569 datapoints) was used to calibrate the TSL regression models, and the remaining 1190 datapoints (25% of database) were used to verify the calibrated models.
Suppose P = {C i , x i ; i = 1, · · · , n} consists of the calibration datapoints, where the index i labels the n calibration datapoints (n = 3569), and C and x are the vectors of sediment concentration and the drivers of the TSL in rivers (Equation (3)), respectively. To develop the regression-based predictive models for TSL, the functional dependence of the sediment concentration C, on the drivers, i.e. x, should be estimated using calibration datapoints. The relationship between C and x can be described by a deterministic function ℵ as: where, h is additive noise. In this study, the functional form ℵ is explored to enable the accurate prediction of the verification datapoints (i.e., unseen data), such that the developed regression-based TSL models did not experience before. The functional form ℵ can be reached by tuning the developed models during the calibration process, and by considering a mechanism for optimization of the defined error function.
Hydrology 2022, 9, x 4 of 17 Figure 1. Variation of the parameters tested in [25], used as the raw dataset used in this study. q: river flow rate; H: flow depth; S: river longitudinal slope; d50: average particle diameter by mass; and G: specific density of sediment particles.

Development of the TSL Regression-Based Models
All the ten dimensionless drivers given in Equation (3) are considered as the predictors for development of the TSL regression-based predictive models. The literature suggests that these drivers have major influence on the TSL in rivers [23,24,42,43,44]. Following the standardized protocols for development of data-driven regression models,  Figure 1. Variation of the parameters tested in [25], used as the raw dataset used in this study. q: river flow rate; H: flow depth; S: river longitudinal slope; d 50 : average particle diameter by mass; and G: specific density of sediment particles.

Development of PCA-Based MLR Model for TSL Prediction
MLR is a linear tool to model the relationship between a scalar response (known as a dependent variable) and multi-explanatory drivers (known as independent variables). Considering the database of P, the theory of the MLR supposes that the deterministic function ℵ can be represented as a linear relationship described by Equation (5): where, r = 10 (the number of drivers), β is the vector of coefficients, superscript T denotes the transpose of a matrix or a vector, h i is supposed as an independent parameter with a Gaussian distribution [46]. In this study, there are n number of calibration datapoints (=3569 datapoints), leading to the formation of a system of 3569 linear equations described in a compact form by Equation (6).
Refereeing to Equations (4) and (6), the deterministic function ℵ is defined as Xβ. Thus, the aim is to determine the vector β using the ordinary least square procedure in the MLR model. This can be carried out by the minimization of Xβ − C 2 , which is also known as the Euclidean norm of the error function. Finally, the regression coefficient β can be determined as [46,47]: In the case of strong correlations among the exploratory drivers, inversion of matrix X T X introduces a large error to the tuned MLR model [48]. To address this problem in the calibration dataset, the PCA method was applied to remove multicollinearity between the drivers. The PCA method converts the drivers to new and uncorrelated principal components (PCs) [49,50]. These new uncorrelated inputs, i.e., PCs, can be used as independent variables for the MLR model (PCA-based MLR model) [46]. Variance inflation factor (VIF), as a universal criterion for the detection of multicollinearity in the MLR model, is adopted in this study [51]. The VIF varies from 1 to ∞, with the larger values corresponding to higher probability of multicollinearity occurrence in the input data. The VIF > 2 can usually be a sign of multicollinearity in the data; however, this study adopted the VIF > 10 as a condition that can introduce a significant error in the modeling process [46,52].

Development of PCA-Based SVR Model for TSL Prediction
SVR is a robust machine learning model for problems with complex nonlinear relationships between the input and output variables. The computational intricacies of the SVR model do not depend on the input space dimensionality [53]. Therefore, the SVR model has successfully been applied to river engineering problems, where multidimensional drivers impact the water quality in the rivers. However, the occurrence of a strong correlation between the independent variables can result in a poor prediction of the sediment concentration using the SVR model [54]. Adoption of PCA can help reduce the effects of dependency between the drivers. Therefore, this study replaced the ten influential drivers with the corresponding uncorrelated PCs as inputs to the SVR model (PCA-based SVR model).
The error function in the PCA-based SVR model is determined according to Equation (8): where, τ is the vector of coefficients, m is the capacity constant, ξ i and ξ * i are slack variables that handle non-separable input data.
SVR model is capable of handling highly complex datasets using an efficient kernel- PC j ) that reorganizes the datasets for linear solution transformation [57,58]. In this study, the kernel-type function is used for transforming the input parameters, i.e., PCs, to the feature space. SVR models have the flexibility of using a range of kernel-type functions including linear-type (Equation (10)), polynomialtype (Equation (11)), sigmoid-type (Equation (12)), and radial basis function kernel, i.e., RBF-type (Equation (13)) [55,56].
In this study, we examined the performance of the developed PCA-based SVR model with different kernel-type functions to propose an accurate and robust TSL estimation model for rivers based on the ten uncorrelated PCs obtained from the conversion of the highly influential drivers given in Equation (3). The PCs and output variable C were first ranged from −1 to 1, and consequently fed to the developed SVR model. A two-step grid search algorithm is implemented to tune the PCA-based SVR model. Detailed information on the kernel-type functions and two-step grid search algorithm are given by [54,59].

Statistical Measures
The Nash-Sutcliffe coefficient of efficiency (NSE) index (Equation (14)) [60] and the root-mean-square error (RMSE) (Equation (15)) are used to evaluate the performance of the developed models: where, C i andĆ i , in our study, are measured and predicted sediment concentration, respectively, and C i is the average value of the measured sediment concentration in the database. The perfect model has closer NSE and RMSE values to one (1) and zero (0), respectively [60,61]. However, RMSE defines the deviation of the model outputs from the observations in the units of the used variable. Therefore, normalizing the RMSE with a statistic that describes the range of the used variable, such as standard deviation (StD), can help the readers to better judge the model's performance. In this regard, [13,19] concluded that RMSE values smaller than half of the standard deviation of the observations are acceptable for the evaluation of a model. Here, we use the NSE and RMSE/StD to evaluate the performance of the developed PCA-based MLR and PCA-based SVR models.

Results and Discussion
3.1. Pre-Processing Data Using PCA Figure 2 shows the results of the dependence test for the ten input parameters used in this study. The results suggest a significant dependency (correlation coefficient > 0.6) between some of the drivers (e.g., u/ω and H/d 50 , u 3 /gHω and u/(sqrt(G−1)gd 50 ), u/ω and u 3 /gHω, ω/u* and ωd 50 /v, u*d 50 /v and ωd 50 /v, and u 3 /gHω and uS/ω). These strong correlations can break down the independent condition between the drivers, leading to multicollinearity problems in fitting the TSL regression models and interpretation of the results. Additionally, high correlation between the drivers hinders the generalization performance of the ML models such as SVR [62].   To apply the PCA to the drivers, a symmetrical correlation matrix was created (Table S2). The matrix's eigenvalues and their corresponding eigenvectors were determined.
Statistical analysis shows KMO (Kaiser/Meyer/Olkin) = 0.72, where KMO values larger than 0.5 denote the suitability of a database for PCA implementation [46,63]. Bartlett's sphericity statistical test was conducted to understand the redundancy amongst the input parameters, and further confirm the appropriateness of using PCA for the database investigated in this study [46]. Following the statistical evaluations, the ten uncorrelated PCs corresponding to the drivers were determined using the PCA application (Figures S1-S10). Table 1 shows the general characteristics of each PC determined for the input parameters, where the first PC (i.e., PC1) with more than 42% conservation of the drivers' variance is the most important component. The larger index of "s" in the PCs, can be translated to less importance of that PC in conservation of the drivers' variance. According to Table 1, the contribution of all PCs in conservation of the drivers' variance reaches to 100%, indicating that the ten uncorrelated PCs can be used instead of the ten drivers in the development of the TSL regression models.

PCA-Based MLR Results
Following the computation of the uncorrelated PCs, the PCA-based MLR model was developed for TSL prediction in rivers, using a stepwise approach. In this regard, the PCA-based MLR model was constructed step by step, by addition of the potential PCs in succession until they can satisfy the statistical significance after each iteration. The results obtained suggest that all ten uncorrelated PCs satisfy the conditions required by the stepwise algorithm for the developed PCA-based MLR model ( Table 2). The VIF values equal to 1 confirm no multicollinearity occurrence in the developed model. To better understand the impacts of strong dependency between drivers on the TSL regression model, the results of the developed MLR model with the ten raw drivers (Equation (3)) are also computed and presented in Table 2. The VIF values > 2 and especially those greater than 10 for some of the drivers (i.e., ω/u*, ωd 50 /v, and u*d 50 /v) denote the occurrence of multicollinearity in the developed MLR model that can lead to poor performance of the model. In addition, the stepwise algorithm has resulted in the exclusion of the HS/(G − 1)d 50 parameter in the construction of the MLR model with the raw drivers. Meanwhile, HS/(G − 1)d 50 parameter has been determined as a main driver used in other well-established sediment estimation equations [23].  (Figure 3b,c), denoting the acceptable accuracy of the model for the TSL prediction in rivers. However, the PCA-based MLR model is less accurate in the prediction of extreme sediment concentrations (Figure 3b,c). This can be related to the complex processes that govern the sediment transport in the rivers, influenced by a wide range of drivers that highly fluctuate in both time and space. Although this study includes the main drivers of the TSL in rivers, the effects of additional factors such as the river aspect ratio, friction term, and channel sinuosity, that may contribute to change in sediment concentration in rivers [64][65][66][67][68], are not considered in this study.

PCA-Based SVR Results
Similar to the MLR model development, ten uncorrelated PCs were fed to the SVR model (PCA-based SVR) for prediction of TSL in rivers. Then, a range of kernel-type functions were examined for the PCA-based SVR models. No optimization algorithm was

PCA-Based SVR Results
Similar to the MLR model development, ten uncorrelated PCs were fed to the SVR model (PCA-based SVR) for prediction of TSL in rivers. Then, a range of kernel-type functions were examined for the PCA-based SVR models. No optimization algorithm was used to drive the SVR model's parameters during the kernel-type function evaluation process. Comparison of the prediction results for all the kernel-type functions tested within this study suggests that the SVR model with all the four kernel-types (Equations (10) and (13)) estimates the sediment concentration with an acceptable accuracy. The results of the developed PCA-based SVR model with the RBF-type kernel shows the best predictive performance, for the sediment concentration, followed by sigmoid-type and polynomial-type kernels. The model with linear-type kernel showed the least accuracy in the sediment concentration estimation in rivers when compared with the PCA-based SVR model with the RBF-type kernel ( Figure 4). However, detailed analysis of the results shows a significant difference between the calibration and verification results of the PCA-based SVR model with polynomial-type kernel. The absolute value of this difference is more than 39% for the NSE, which can be associated with the over-fitting problem that could lead to weak performance of the model in real life applications. Therefore, it can be concluded that the PCA-based SVR model with sigmoid-type kernel has the second-best performance for sediment concentration estimation in rivers.  Given the superior performance of the PCA-based SVR model with RBF-type kernel (PCA-based RBF-SVR) in terms of NSE and RMSE/StD statistical error measures, this model is selected for detailed calibration process to estimate the sediment concentration in rivers. A 10-flod cross-validation approach was used to optimize the PCA-based RBF-SVR model's parameters (m, ε, and γ) using a two-step grid search algorithm. Accordingly, a coarse grid search was carried out with the selected minimum and maximum values of m equal to 2 −11 and 2 11 , respectively, and a coarse increment of 2 2 . The minimum, maximum, and coarse increment values for ε were set as 2 −19 , 2 3 , and 2 2 , respectively. Additionally, the minimum, maximum, and coarse increment values for γ were set as 2 −15 , 2 7 , and 2 2 , respectively. The optimal values of m, ε, and γ were determined as 2 6 , 2 −7 , and 2 −5 , respectively. Following the coarse grid search, a finer grid search with an increment of 2 0.25 , on the neighbor of the optimal values of m, ε, and γ was carried out. In this step, the search was implemented with the minimum (maximum) values of m, ε, and γ as 2 5 (2 7 ), 2 −8 (2 −6 ), and 2 −6 (2 −4 ), respectively. The final optimal values were determined as 38.055, 0.006, and 0.019 for m, ε, and γ, respectively, that yield a PCA-based RBF-SVR model with the NSE of 0.89 (Figure 5a) and 0.86 (Figure 5b)   Given the superior performance of the PCA-based SVR model with RBF-type kernel (PCA-based RBF-SVR) in terms of NSE and RMSE/StD statistical error measures, this model is selected for detailed calibration process to estimate the sediment concentration in rivers. A 10-flod cross-validation approach was used to optimize the PCA-based RBF-SVR model's parameters (m, ε, and γ) using a two-step grid search algorithm. Accordingly, a coarse grid search was carried out with the selected minimum and maximum values of m equal to 2 −11 and 2 11 , respectively, and a coarse increment of 2 2 . The minimum, maximum, and coarse increment values for ε were set as 2 −19 , 2 3 , and 2 2 , respectively. Additionally, the minimum, maximum, and coarse increment values for γ were set as 2 −15 , 2 7 , and 2 2 , respectively. The optimal values of m, ε, and γ were determined as 2 6 , 2 −7 , and 2 −5 , respectively. Following the coarse grid search, a finer grid search with an increment of 2 0.25 , on the neighbor of the optimal values of m, ε, and γ was carried out. In this step, the search was implemented with the minimum (maximum) values of m, ε, and γ as 2 5 (2 7  The predictions obtained from the PCA-based MLR and PCA-based RBF-SVR models are compared with the existing empirical relationships to evaluate the performance of the proposed models for estimating the sediment concentration. Table 3 compares the statistical error measures between the models developed in this study and the existing empirical models. The results show that, the proposed PCA-based RBF-SVR model has much better accuracy in terms of NSE and RMSE/StD compared to the models suggested by [23,30,42,44]. Additionally, the PCA-based MLR is the second-best model with regards to the NSE and RMSE/StD. Analysis of the results presented in Table 3 highlights that [42,44] empirical relations have the least accuracy for estimating sediment concentration in rivers. Poor accuracy of the equation suggested by [42] can be associated with the limitation of this formula for estimating the total load of fine-grained sediments [30].  The predictions obtained from the PCA-based MLR and PCA-based RBF-SVR models are compared with the existing empirical relationships to evaluate the performance of the proposed models for estimating the sediment concentration. Table 3 compares the statistical error measures between the models developed in this study and the existing empirical models. The results show that, the proposed PCA-based RBF-SVR model has much better accuracy in terms of NSE and RMSE/StD compared to the models suggested by [23,30,42,44]. Additionally, the PCA-based MLR is the second-best model with regards to the NSE and RMSE/StD. Analysis of the results presented in Table 3 highlights that [42,44] empirical relations have the least accuracy for estimating sediment concentration in rivers. Poor accuracy of the equation suggested by [42] can be associated with the limitation of this formula for estimating the total load of fine-grained sediments [30]. Table 3. Comparison between the performance of PCA-based MLR and PCA-based RBF-SVR models and the well-established empirical equations for sediment concentration estimation. To better understand the robustness of the PCA-based MLR and PCA-based RBF-SVR models developed in this study, the models' performance for the extreme events, with high sediment concentration in rivers, is examined (Figure 6a,b). In this regard, the observational data include the highest 5% and 1% of the sediment concentration in the verification step and the corresponding predictions of the developed models were selected for further statistical analysis. The results obtained for the case of the highest 5% of extreme sediment concentration confirms the appropriateness and robustness of the PCA-based RBF-SVR model, with the NSE and RMSE/StD values of 0.68 and 0.56, respectively, followed by the PCA-based MLR model (Table 3). Both models also outperform other empirical-based models for the case of the highest 1% of the sediment concentrations (Table 3). It should be noted that existing models for the study of sediment concentration in rivers usually have poor accuracy in the estimation of high concentration of the sediments [59,69]. Few observational data for the case of large sediment load, and different underlying processes that govern the behavior of bed-load sediment transport, between the low and high turbulent flows, contribute to the poor performance of the existing models in the estimation of high sediment concentrations. For the case of highly turbulent flows, river flow properties are the main drivers of the sediment concentration in rivers, with large spatial and temporal fluctuations. This increases spatiotemporal variation in flow properties, increases the randomness of the particle's size/shape and position, leading to the complexity of quantifying bed load transport during high flows [70]. Turbulent flow conditions decrease the coarsening of the bed sediments, leading to poor performance of the existing models in strong currents compared with weak currents [71]. To better understand the robustness of the PCA-based MLR and PCA-based SVR models developed in this study, the models' performance for the extreme e with high sediment concentration in rivers, is examined (Figure 6a,b). In this regar observational data include the highest 5% and 1% of the sediment concentration verification step and the corresponding predictions of the developed models we lected for further statistical analysis. The results obtained for the case of the highest extreme sediment concentration confirms the appropriateness and robustness of the based RBF-SVR model, with the NSE and RMSE/StD values of 0.68 and 0.56, respec followed by the PCA-based MLR model (Table 3). Both models also outperform othe pirical-based models for the case of the highest 1% of the sediment concentrations 3). It should be noted that existing models for the study of sediment concentration in usually have poor accuracy in the estimation of high concentration of the sedi [59,69]. Few observational data for the case of large sediment load, and different un ing processes that govern the behavior of bed-load sediment transport, between th and high turbulent flows, contribute to the poor performance of the existing models estimation of high sediment concentrations. For the case of highly turbulent flows flow properties are the main drivers of the sediment concentration in rivers, with spatial and temporal fluctuations. This increases spatiotemporal variation in flow p ties, increases the randomness of the particle's size/shape and position, leading to the plexity of quantifying bed load transport during high flows [70]. Turbulent flow tions decrease the coarsening of the bed sediments, leading to poor performance existing models in strong currents compared with weak currents [71].

Conclusions
Two predictive models were developed using multiple linear regression (MLR) and support vector regression (SVR) models fed by the outputs of PCA as inputs to estimate the total sediment load (TSL) in open channel streams. A large database of physical modelling tests including 4759 data records from previous studies were used for the model development, calibration, and verification. Dimensional analysis was performed to determine the ten main drivers of the TSL in rivers. Given the strong dependency between some of the drivers, these input parameters were converted to the uncorrelated PCs to feed both MLR and SVR models. The PCA-based SVR model was tested with four kerneltype functions, and the model with the radial basis function kernel (RBF-type) was selected as the best model for deep calibration. The PCA-based RBF-SVR model developed in this study was adopted for the prediction of TSL in rivers across a wide range of test conditions. Statistical error measures indicated the robust performance of the proposed PCA-based RBF-SVR model for estimating TSL. Both models developed in this study were then compared with the well-established empirical relations, for the case of large sediment concentrations. The comparison of the results shows the superior performance of the PCA-based MLR and PCA-based RBF-SVR models compared to the empirical-based estimation models. The statistical error analysis shows that the proposed models outperform the predictions obtained from the empirical formulae, specifically for the case of extreme events where the existing models had poor performance. Although this study includes the main drivers of the TSL in rivers using laboratory data, the effect of additional factors such as the river aspect ratio, friction term, and channel sinuosity on the change in sediment concentration in rivers remains unsolved. Therefore, further investigations must be carried out to better understand the complex nature of TSL in rivers.

Supplementary Materials:
The following are available online at www.mdpi.com/xxx/s1, Figure S1 Table S1: The main statistical characteristics of the raw database used in this study. StD is the standard deviation; and Table S2: Symmetrical correlation matrix among the drivers.

Conclusions
Two predictive models were developed using multiple linear regression (MLR) and support vector regression (SVR) models fed by the outputs of PCA as inputs to estimate the total sediment load (TSL) in open channel streams. A large database of physical modelling tests including 4759 data records from previous studies were used for the model development, calibration, and verification. Dimensional analysis was performed to determine the ten main drivers of the TSL in rivers. Given the strong dependency between some of the drivers, these input parameters were converted to the uncorrelated PCs to feed both MLR and SVR models. The PCA-based SVR model was tested with four kernel-type functions, and the model with the radial basis function kernel (RBF-type) was selected as the best model for deep calibration. The PCA-based RBF-SVR model developed in this study was adopted for the prediction of TSL in rivers across a wide range of test conditions. Statistical error measures indicated the robust performance of the proposed PCA-based RBF-SVR model for estimating TSL. Both models developed in this study were then compared with the well-established empirical relations, for the case of large sediment concentrations. The comparison of the results shows the superior performance of the PCAbased MLR and PCA-based RBF-SVR models compared to the empirical-based estimation models. The statistical error analysis shows that the proposed models outperform the predictions obtained from the empirical formulae, specifically for the case of extreme events where the existing models had poor performance. Although this study includes the main drivers of the TSL in rivers using laboratory data, the effect of additional factors such as the river aspect ratio, friction term, and channel sinuosity on the change in sediment concentration in rivers remains unsolved. Therefore, further investigations must be carried out to better understand the complex nature of TSL in rivers.  Table S1: The main statistical characteristics of the raw database used in this study. StD is the standard deviation; and Table S2: Symmetrical correlation matrix among the drivers.

Conflicts of Interest:
The authors declare no conflict of interest.