Application of Soft Computing Models with Input Vectors of Snow Cover Area in Addition to Hydro-Climatic Data to Predict the Sediment Loads

: The accurate estimate of sediment load is important for management of the river ecosystem, designing of water infrastructures, and planning of reservoir operations. The direct measurement of sediment is the most credible method to estimate the sediments. However, this requires a lot of time and resources. Because of these two constraints, most often, it is not possible to continuously measure the daily sediments for most of the gauging sites. Nowadays, data-based sediment prediction models are famous for bridging the data gaps in the estimation of sediment loads. In data-driven sediment predictions models, the selection of input vectors is critical in determining the best structure of models for the accurate estimation of sediment yields. In this study, time series inputs of snow cover area, basin e ﬀ ective rainfall, mean basin average temperature, and mean basin evapotranspiration in addition to the ﬂows were assessed for the prediction of sediment loads. The input vectors were assessed with artiﬁcial neural network (ANN), adaptive neuro-fuzzy logic inference system with grid partition (ANFIS-GP), adaptive neuro-fuzzy logic inference system with subtractive clustering (ANFIS-SC), adaptive neuro-fuzzy logic inference system with fuzzy c-means clustering (ANFIS-FCM), multiple adaptive regression splines (MARS), and sediment rating curve (SRC) models for the Gilgit River, the tributary of the Indus River in Pakistan. The comparison of di ﬀ erent input vectors showed improvements in the prediction of sediments by using the snow cover area in addition to ﬂows, e ﬀ ective rainfall, temperature, and evapotranspiration. Overall, the ANN model performed better than all other models. However, as regards sediment load peak time series, the sediment loads predicted using the ANN, ANFIS-FCM, and MARS models were found to be closer to the measured sediment loads. The ANFIS-FCM performed better in the estimation of peak sediment yields with a relative accuracy of 81.31% in comparison to the ANN and MARS models with 80.17% and 80.16% of relative accuracies, respectively. The developed multiple linear regression equation of all models show an R 2 value of 0.85 and 0.74 during the training and testing period, respectively.


Introduction
Eroded sediment originating from drainage basins due to hydrometeorological processes like rainfall, snow melt, and ice melting, etc., is transported in the form of suspended loads and bed loads [1][2][3]. The bed loads are transported in the form of coarse particles of different shapes and sizes continuously in contact with the river bed [4]. The suspended load, transports in suspension state formed due to the erosion of fine particles from the sheet and gully, runoff in the catchment, river banks, and channel beds [5]. The increased runoff due to the rising rainfall, snow cover depletion,

Background
In the recent few decades, many researchers have used several black-box models for the prediction of sediment yield. The most widely used models among these black-box models include artificial neural networks (ANN), support vector machines (SVM), artificial neuro-fuzzy logic inference systems (ANFIS), and genetic programming (GP). Mostly, more than two models were used to compare the results for finding the best model for the prediction of sediment yields along with the rating curve (RC) model. For example, in some studies [20][21][22], ANN was found to be better for the prediction of sediment yields than the sediment rating curve (SRC) model. Similarly, ANN and multiple linear regression (MLR) models were used in some studies [23,24] for the estimation of sediment yields. In these studies, the sediment prediction results of ANN were found to be better than the sediment prediction results obtained by MLR. In yet another study [25], the grid rainfall and measured flows are used to predict sediment yields with ANNs by Levenberg-Marquardt (LM), scaled conjugated gradient (SCG), and Bayesian regulation (BR) algorithms. It was concluded that ANN with Levenberg-Marquardt algorithm performed fairly better than the other two ANN algorithms for sparsely distributed catchments with limited climatic recorded data. The results of ANN and ANFIS were compared by [26,27] for the prediction of sediment yield. In these studies, researchers found that ANFIS models show a higher accuracy than the ANN and SRC models. It was found in studies by [28,29] that gene expression algorithms are better than ANN and ANFIS models for predictions of sediment loads.

Study Area
The present study was carried out in the Gilgit River basin situated in the Hindukush Mountains of the Upper Indus Basin (UIB). The Gilgit River originates from Shandoor Lake north of the Gilgit-Baltistan region in Pakistan. The Baha Lake is the right tributary of the Gilgit River with small tributaries being e.g., Yasin, Ishkoman, and Phandar. The Phandar Lake is located in Ghizer. The Yasin tributary joins the main Giglit River near Gupis. Figures 1 and 2 show the hydrological characteristics of the Gilgit basin that has a drainage area of 12,095 km 2 . The geographical location of the Gilgit basin is between latitude 35 • Table S2 in supplementary materials shows the key features of the Gilgit basin. About 10% of the total catchment area is covered with glaciers and lies above an elevation of 5000 m. During the winter season, approximately 87% of the catchment area is covered with snow cover which reduces to 11% during the ablation period in summer. The mean annual discharge and suspended sediment concentrations (SSC) of the Gilgit basin are 291 m 3 /sec and 448 mg/L, respectively. The ablation period starts in July after seasonal snow melts. The melting of the glacier is slow and continues until the month of October. Then, the accumulation period of snow starts at the end of October. The Gilgit basin receives 75% of its rainfall starting from the mid of spring (April) to the end of summer (October). The mean annual basin rainfall from grid data in the Gilgit basin is approximately 670 mm. The mean monthly basin average temperature for the Gilgit basin ranges from −19.8 to 7.20 • C.
The Water and Development Authority (WAPDA) of Pakistan had also installed stream gauging stations at an altitude of 1430 m a.m. sea level for measuring flows and suspended sediment concentrations (SSC). The climatic stations installed in the Gilgit basin are sparsely distributed in the catchment. The climatic stations installed in the valley by the Pakistan Meteorological Department (PMD) at Gilgit and Gupis have at their disposal long-term daily climatic data collected from 1981-2010. However, the climatic stations of Uskhkore, Yasin, and Shendure located on higher altitudes are sparsely distributed and have short-term recorded data accumulated from 1996-2010 which are available from WAPDA. However, the suspended sediment concentrations (SSC) are recorded on intermittent days per week. Table 1 shows the detailed information on the data used in this study. The flows, temperature, and rainfall are recorded on a daily basis. Because of the scarcity of climatic information and the sparse distribution of climatic stations in the Gilgit catchment (see Figures 1 and 2), the information of grid climatic, snow cover fractions, and grid evapotranspiration datasets of Table 1 were used for the period 1981-2010 during analysis of this research work. These grid datasets were extracted using the Shuttle Radar Topography Mission's (SRTM) Digital elevation model (DEM) of 30 m for Gilgit catchment.   HI-AWARE project [47,48] Note * The variable of discharge (Q) and suspended sediment concentrations (SSC) are measured at Gilgit gauging station and variables of SCF, T, P, and Evap are basin averages grid datasets.
The Moderate Resolution Imaging Spectroradiometer (MODIS) MOD10A2 product was downloaded on a weekly basis for the period of 2000-2010 from the National Snow and Ice Data Center (NSIDC) online server. The MODIS data with 500 m resolution was used for estimating the snow cover area and snow melt runoff [49,50]. The same procedure was adopted in other studies to estimate and linearly interpolate the snow cover fractions for daily snow cover fractions of the Gilgit basin for the period of 2000-2010 [49,50]. The temperature-index snow model was further used to estimate the snow cover fraction for the period of 1981-2010 after calibration and validation of the snow model with MODIS snow cover. Table 2 shows the Pearson's correlations of input variables used in this study. Generally, correlation analysis such as cross correlation, auto-correlation, and partial auto-correlation are also used to determine the input combinations of various variables with lag times. However, the main deficiency of these methods is the inability to cover the nonlinear relationship between the input and   [47,48] Note * The variable of discharge (Q) and suspended sediment concentrations (SSC) are measured at Gilgit gauging station and variables of SCF, T, P, and Evap are basin averages grid datasets.
The Moderate Resolution Imaging Spectroradiometer (MODIS) MOD10A2 product was downloaded on a weekly basis for the period of 2000-2010 from the National Snow and Ice Data Center (NSIDC) online server. The MODIS data with 500 m resolution was used for estimating the snow cover area and snow melt runoff [49,50]. The same procedure was adopted in other studies to estimate and linearly interpolate the snow cover fractions for daily snow cover fractions of the Gilgit basin for the period of 2000-2010 [49,50]. The temperature-index snow model was further used to estimate the snow cover fraction for the period of 1981-2010 after calibration and validation of the snow model with MODIS snow cover. Table 2 shows the Pearson's correlations of input variables used in this study. Generally, correlation analysis such as cross correlation, auto-correlation, and partial auto-correlation are also used to determine the input combinations of various variables with lag times. However, the main deficiency of these methods is the inability to cover the nonlinear relationship between the input and output variables like discharge sediment, etc. For this reason, in the current study, the various input combinations were identified by examining the test accuracy of the model output.
In general, the discharges trigger the channel erosion. However, in addition to discharges, the temperature and snow cover area of the snow-and ice-dominated basin also triggers hillslope erosion, snow melt erosion and glacier melt erosion. The evapotranspiration also has an indirect relationship with erosion processes in the form of the vegetative cover of the plants and forests. Keeping in view the importance of direct and indirect factors controlling the erosion of catchments, different variables other than discharges such as snow cover area, effective rainfall, and evapotranspiration were also chosen in this study for the prediction of sediment yields. Prior to the analysis for prediction of sediment yields, the flows and suspended sediment load (SSL) were transferred into a log transformation form to compensate the biases and very high values in datasets. The datasets were divided into 70% and 30% for training and testing of the model, respectively. Shahin et al. [51] suggested that for optimum performance of soft computing methods datasets should be divided into training (i.e., 70%) and testing (i.e., 30%) phases. The daily datasets of measured SSC were not available for continuous days. The measured SSC values were available for total 767 days during the period 1981-2010. For the sediment rating curve (SRC) the flows and SSC values for the period 1981-2003 (i.e., 1-537 days) and 2003-2010 (i.e., 538-6767 days) were used for training and testing respectively. However, the random sampling [52] of whole datasets for training (70%) and remainder datasets as testing (30%) were conducted in MATLAB to reduce over and under fitting of network. Then ANN, ANFIS, and MARS models were trained and tested in MATLAB with various input combinations.

Application of Temperature-Index Snow Model for Snow Cover Estimates
The climatic stations in the Gilgit basin have less availability of long-term climatic records for the catchment. Previous studies [53][54][55] reported that the rainfall on higher elevations starting above 5000 m in the Upper Indus Basin (UIB) is 5-10 times higher than the rainfall recorded in the valley. For this reason, the grid data for rainfall and temperature from the HI-AWARE project [47,48] was used in this study. Keeping in view the above-mentioned constraints, the temperature-index snow model is used in this study. The temperature-index snow model is a simple and spatially distributed model which, in addition, has less data requirements. In this study, this method is used to simulate the long-term snow melts and snow cover fractions after calibration and validation of the simulated snow cover fraction with the MODIS snow cover fractions for the period of 2000-2010.
In the temp-index snow melt model [56,57], precipitation P is first separated into snow and liquid rain on a daily time scale. The threshold temperature T RS ( • C), daily maximum temperature ( • C), and daily minimum temperature ( • C) separate the snow and liquid rainfall as: where, Precipitation factor C p proportionate to temperature difference is calculated as: The threshold temperature T RS is used to define the type of precipitation into rain/snow and the threshold temperature T SM for the snow melt process which depends on numerous factors like the boundary layer condition of atmosphere, temperature, and air humidity, etc.
Then, daily rates of snow melt, i.e., M snow (mm/day) are estimated as: Here, the K snow (mm/day • C) is the degree day factor for snow melts, T mean ( • C) is the mean daily air temperature, and T SM ( • C) is the threshold temperature.
After this, the snow model simulates the snow water equivalent or snow depth SD (mm) for each grid number of i as: Finally, the snow cover fraction SCF for i = 1, 2, 3, 4, . . . , N number of grids for the whole basin is estimated for calibration and validation with the MODIS snow cover fraction as: Here, H = unit step function; when H = 0, SD = 0 and H = 1 then SD > 0. The area of integration N is the entire basin, sub-basins, and elevation bands, etc.

Artificial Neural Networks (ANN)
Artificial neural networks (ANNs) are data-based black box models primarily inspired by the concept of functioning of the biological nervous system. ANNs consist of a set of processing elements referred to as neurons. These neurons work in the parallel systems for acquiring the information and storing the knowledge for computational use. ANNs consist of three layers as their basic structure. These layers are the input layer, the hidden layer (processed layer), and the output layer. Each layer is connected by networks of neurons with preceding layers. This system of networks connected with neurons is called multilayer perceptron (MLP). There are various types of ANNs that perform various assignments in science and engineering. Among these ANNs of MLP, feed-forward back propagation FFBP-ANN is most popular. The literature [58][59][60][61][62][63][64] explains the details of the ANN model and its application to water resources with FFBP-MLP algorithm. In FFBP-MLP, the input data are learned in forward direction of network from input nodes to the hidden nodes with some transfer functions in the hidden layer. Then, the information is forwarded from the hidden layer to the output nodes. Figure S1 in supplementary materials explains the architectures of the FFBP ANN. In the output layer, an output is generated by the network, and the error between predicted and model output is computed. This output error of the network is back-propagated through the network to correct the connection weights of neurons in the hidden layer. This learning process of the network is performed until the minimum error is optimized to avoid overfitting as well underfitting of the network.
A neural network is described with (1) architectures of layers connected with networks of neurons, (2) transfer functions, and (3) training methods for estimation of weights in nodes. In general, the performance of ANN depends on its model network, learning complexity, and problem complexity. The performance of ANN depends on the number of neurons in hidden layers and the number of hidden layers to avoid the over-and underfittings of the network. The literature suggests the optimum neurons to be in the range of 2 √ N 1 + N 0 , where N 1 and N 0 are the number of input and output neurons, respectively.
For this study, ANN with FFNN-MLP with Levenberg-Marquardt has been used with one hidden layer as more than one hidden layer increases the complexity of the network and does not improve the results, either. The FFNN-MLP with Levenberg-Marquardt is a robust and powerful tool. It has a high and fast ability of data convergence, and produces more accurate results than other ANN algorithms.

Adaptive Neuro-Fuzzy Logic Inference System (ANFIS)
The adaptive neuro-fuzzy logic inference system (ANFIS) is a novel architecture with combinations of neural networks and fuzzy inference systems (FIS). A basic ANFIS [65] structure is shown in Figure S2 in section of supplementary materials. The ANFIS works by tuning the parameters of FIS applying the neural network learning method. The ANFIS builds a network structure connected with a number of nodes. These nodes are characterized by fixed or adjustable parameters. The ANFIS uses neural networks with fuzzy logic if-then rules with appropriate membership functions to translate the input parameters into output values. Three inference systems are classified as Tsukamoto's, Mamdani's, and Sugenos's systems. The Mamdani's system [66] was mostly used in the past. The Sugeno's system [67] is more efficient than other systems. In this study, Sugeno's fuzzy logic structures were used.
As an example, it is assumed that a FIS has two inputs x 1 and x 2 with target values of z. Here, input of discharge and snow cover can be supposed as x 1 and x 2 with output z as sediment yield for a particular time t. Then, in Sugeno's fuzzy logic structures, typical rule sets with two IF/THEN rules are expressed as: where p i , q i , and r i are parameters corresponding to Rule 1, Rule 2 . . . Rule n. The ANFIS consist of five layers. Layer 1: In the first layer, each node generates a membership grade for the variable of each input. The output of ith node with generalized bell membership function in the first layer is expressed as: where, {a i , c i , N i } are the parameter sets for x 1 input in ith node. These parameters change the shape of the bell function in the range of 0-1. Layer 2: Layer 2 is labeled with II in each node. In this layer, each node multiplies the incoming signals coming from layer 1 as: Layer 3: In layer 3, each node calculates the normalized firing strength as its relationship between firing strength of i th rule to the sum of all rules: Layer 4: In layer 4, the sums of signals from second-and third-layer networks are calculated for each ith node toward the model output as: Here, w is the output from layer 3 in this equation.
Layer 5: Layer 5 calculates the overall output in the form of a single node as the ANFIS model output against each target value as: In the ANFIS model, to obtain the model parameters, a hybrid learning method is used for this study. Further details about the ANFIS model can be found in [68].
In this study, three strategies are used to produce the initial fuzzy inference system for the ANFIS model. These strategies are grid partition (ANFIS-GP), subtractive clustering (ANFIS-SC), and fuzzy c-means clustering (ANFIS-FCM). The ANFIS-GP is a combination of ANFIS and grid partition. In grid partition, the input linguistic variables are partitioned by fuzzy numbers and their membership functions (MFs). The grid partition uses predefined numbers of MFs to optimize the MFs according to input-output datasets. The quantitative characteristics of datasets are separated into n partitions (n = 2, 3, 4 . . . ). In this study, eight MFs were used, such as gaussmf, gauss2mf, trimf, trapmf, gbellmf, pimf, dsig, mf, and psigmf. In the AFNIS-GP model, the number of rules increases exponentially with the increase in the number of input variables. For details about the ANFIS-GP, see [65].
The ANFIS-SC model is the extended model derived from the mountain clustering model [69] with combination of the ANFIS model by using the subtractive clustering strategy. This model was modified by Chiu [70]. This method has an advantage over the mountain clustering method. It eliminates the grid resolution to reduce the complex computations in the mountain clustering method. In the ANFIS-SC model, each dataset is considered as potential cluster. Then, the potential of each data point of a given dataset is calculated by its distance from all other data points. These data points having many neighboring data points show a high potential value. The influential radius decides the number of clusters in the ANFIS-SC model. The small value of influential radius has many numbers of clusters with more rules in comparison to its large value [71]. Using a hit-and-trial procedure, the suitable critical value of influential radius is sorted out during the data space clustering procedure. [70,72] further explain the detailed procedure of the ANFIS-SC model.
The ANFIS-FCM model was proposed in the literature [73][74][75][76][77] and enhanced by Zhang and Chen [78]. The ANFIS-FCM minimizes the errors by partitioning the X datasets into C clusters. This method reduces the errors regarding the weighted distance of each data point xi toward all centroids of the C clusters. After this, the ANFIS-FCM model minimizes the objective function as: where C, N, w ic , v, and x are the number of clusters, number of data points, degree belongs to ith data point of Cith clusters data points, and input data sets. The p (p > 1) entitles to the fuzzifier exponent.
In ANFIS-FCM, w ic is calculated as: In the FCM model after initialization of the center vectors, centers are recomputed as: The algorithm is run until the convergence condition is completed.

Multivariate Adaptive Regression Splines (MARS)
MARS is a non-parametric technique for the prediction of nonlinear processes developed in 1991 by Friedman [79]. The MARS model is a flexible and precise prediction model. It has been successfully applied in different studies [80][81][82] for prediction and forecasting purposes. In the MARS model, the MARS function develops a series of linear segments having different slopes from the input-output relationships of given datasets. Each linear segment of MARS is then fitted with a linear basis function. For this study, the datasets were separated into break values between different regions or segments referred to as knots. Each region has its own regressions line. The shape of a piecewise linear basis function is expressed as: Here, x represents the predictor variable and k explains about the threshold value of the knots. In general, MARS consists of combinations of basis functions (BFs) given as: In the above Equation (12), the variable y is dependent on the estimated values of function f (x) with the error ε. In Equation (13), β o is a constant value, BF m is the basis function, and β m represents the coefficient for the maximum number of basis functions (BFs) depending on the input's datasets.
In the MARS model with polynomial knots, there exist two phases called forward step phase and backward step phase. The forward step phase generates all possible BFs. After generation of the BFs, the generalized cross validation criterion (GCV) is used for determining the BFs and appropriate nodes. After this forward step phase, the backward step phase of the MARS model works to reduce the number of BFs for improving the predictions and avoiding overfitting of the model. [79] gives detailed information about the MARS model.

Sediment Rating Curve (SRC)
The sediment rating curve is an empirical relationship of flows and sediment load or concentrations described as: where Q (m 3 /day) is discharge, SSL (tons/day) both in log transformation form, and a and b are the constants that depend on the characteristics of a river and its catchments.

Performance Measurement Metrics for Model Evaluation
The performance of models was measured and assed using the following statistics: Root-mean-square error (RMSE): Nash-Sutcliffe efficiency (NSE): Pearson's correlation coefficient (R 2 ): (22) where N refers to the data quantity, S io is the observed sediment, S is is the simulated sediments, and S is is the mean of the simulated sediments. Relative accuracy: The relative accuracy is the %age of accuracy expressed as: Relative where S po is the observed peak value of SSY, S ps is the simulated peak value of SSY.

Simulation of Snow Melts and Snow Cover Area
The results of the calibrated temperature-index snow melt model are shown in Table 3. The model was calibrated and validated to simulate the snow cover by using the degree day factor for the snow model. Table 3 shows the value of the degree day factor k snow = 4.2 (mm/day/ • C) for the Gilgit basin. The literature review [49,50,[83][84][85][86] for the regional case studies shows that the value of K snow ranges from 3-7 (mm/day/ • C) in the Upper Indus Basin (UIB). Thus, during calibration and validation of the temperature-index snow model for this study, the value of k snow = 4.2 (mm/day/ • C) lies within the range of the values of previous studies carried out for snow melt runoff modeling in the UIB. The difference between the K snow value in the current study and that of previous studies is probably due to the use of different resolutions of input datasets, lengths of calibration datasets, threshold temperatures for separating rainfall and snow, threshold temperatures for snow melts, and characteristics of the catchment.  Table 3 also shows the performance measurement statistics for the snow model during the calibration and validation periods. The value of R 2 is found at 0.90 between the MODIS-observed snow covered area and model simulated snow cover area both during the calibration and validation periods. The performance evaluation criteria using the three criteria of R 2 , NSE, and RMSE show that goodness of fit between the model and observed MODIS snow cover maps is more than 70% which is satisfactory in estimation of both the snow melts and snow cover area.   Table 3 also shows the performance measurement statistics for the snow model during the calibration and validation periods. The value of R 2 is found at 0.90 between the MODIS-observed snow covered area and model simulated snow cover area both during the calibration and validation periods. The performance evaluation criteria using the three criteria of R 2 , NSE, and RMSE show that goodness of fit between the model and observed MODIS snow cover maps is more than 70% which is satisfactory in estimation of both the snow melts and snow cover area.  For application of the ANN model, the transfer functions logsig, purelin, tansig, and radbas were used in the hidden layers. The network was trained by using 16 combinations of four transfer functions for input and output layers. The optimum number of neurons was determined ranging from 3-8 in single hidden layers for overall input scenarios giving best results at the end. Table 4 shows the results of various input combinations using ANN model. For the ANFIS-GP, ANFIS-SC, and ANFIS-FCM models, the hybrid algorithm was used in this study.
For the ANFIS-GP model application, the gaussmf, gauss2mf, trimf, trapmf, gbellmf, pimf, dsigmf, and psigmf membership functions were used. In ANFIS-GP, the type of membership functions and number of member functions are important for training the network. Table 5 shows the results of all scenarios using the ANFIS-GP model with optimal number and type of membership functions. The optimal number of functions ranges between 2 and 4 for all scenarios.
For application of the ANFIS-SC model, the network is trained with an optimal range of the radius of clusters which give a minimum value of RMSE and highest values of R 2 and NSE. The optimal value of the cluster radius represents the influence of the cluster radius on the dataset clusters. If the cluster radius is small, then there are numerous small cluster datasets.
On the other hand, a large value of the cluster radius means that there are a few large cluster datasets for training the network. During training of the network, the hit-and-trial method was used to find out the optimum value of the cluster radius with the smallest value of RMSE for all scenarios during the testing period. Table 6 shows the results of the ANFIS-SC model for all scenarios. It was For application of the ANN model, the transfer functions logsig, purelin, tansig, and radbas were used in the hidden layers. The network was trained by using 16 combinations of four transfer functions for input and output layers. The optimum number of neurons was determined ranging from 3-8 in single hidden layers for overall input scenarios giving best results at the end. Table 4 shows the results of various input combinations using ANN model. For the ANFIS-GP, ANFIS-SC, and ANFIS-FCM models, the hybrid algorithm was used in this study.
For the ANFIS-GP model application, the gaussmf, gauss2mf, trimf, trapmf, gbellmf, pimf, dsigmf, and psigmf membership functions were used. In ANFIS-GP, the type of membership functions and number of member functions are important for training the network. Table 5 shows the results of all scenarios using the ANFIS-GP model with optimal number and type of membership functions. The optimal number of functions ranges between 2 and 4 for all scenarios.
For application of the ANFIS-SC model, the network is trained with an optimal range of the radius of clusters which give a minimum value of RMSE and highest values of R 2 and NSE. The optimal value of the cluster radius represents the influence of the cluster radius on the dataset clusters. If the cluster radius is small, then there are numerous small cluster datasets.
On the other hand, a large value of the cluster radius means that there are a few large cluster datasets for training the network. During training of the network, the hit-and-trial method was used to find out the optimum value of the cluster radius with the smallest value of RMSE for all scenarios during the testing period. Table 6 shows the results of the ANFIS-SC model for all scenarios. It was found that the optimal range of the cluster radius is from 0.5-0.9 for all scenarios.
For application of the ANFIS-FCM model, the various numbers of clusters were used to train and test the network for all scenarios. Table 7 shows the results of the ANFIS-FCM model for all input combinations. The optimal number of clusters ranges between 2 and 6 for this study with the lowest value of RMSE and highest value of R 2 during testing of the network for all input combinations.
For application of the MARS model, the controlling parameters generally include the maximum basis functions, maximum interaction, speed factor, minimum number of observations between knots, penalty of variable, and degree of freedom. However, for this study, the hit-and-trial method was used to train the model with an optimal number of maximum basis functions ranging from 5 to 25 for all input scenarios with the remaining parameters being default values in the model. Table 8 shows the results of the MARS model for various input scenarios used in this study.
For application of the sediment rating curve (SRC) model, the power law function was used to train the model with 70% of the datasets after transformation of flows and sediment yields into logarithm form.
After training of SRC with 70% of the data sets, the model was tested with 30% of the remaining data. Figure 4 shows the plot of the sediment rating curve using the power law functions. Table 9 also shows the results of training and testing of the sediment rating curve (SRC) model and compares its model performance statistics with other models used for predictions of sediment yields used in this study. For application of the sediment rating curve (SRC) model, the power law function was used to train the model with 70% of the datasets after transformation of flows and sediment yields into logarithm form.
After training of SRC with 70% of the data sets, the model was tested with 30% of the remaining data. Figure 4 shows the plot of the sediment rating curve using the power law functions. Table 9 also shows the results of training and testing of the sediment rating curve (SRC) model and compares its model performance statistics with other models used for predictions of sediment yields used in this study.

Comparison of the ANN, ANFIS-GP, ANFIS-SC, ANFIS-FCM, MARS, and SRC Models
The results of the training and validation of the various scenarios are shown in Tables 4-9 for the ANN, ANFIS-GP, ANFIS-SC, ANFIS-FCM, MARS, and SRC models for predictions of the sediment yields for the Gilgit basin. In Table 4, the ANN shows the best performance of S10 scenarios with model inputs of Qt, Tt-1, Evapt-1, SCAt, and SCAt-4. In the ANN, the model parameters having radbas and tansig as input and output transfer functions along with five numbers of neurons performed best with S10 input scenarios during the training and validation phases. Table 5 shows the results of the ANFIS-GP for all input scenarios. Here, the ANFIS-GP shows the best performance of the model with S7 scenarios consisting of inputs of Qt, SCAt, and SCAt-1. The ANFIS-GP model performs best with model parameters consisting of triangular (trimf) membership functions along

Comparison of the ANN, ANFIS-GP, ANFIS-SC, ANFIS-FCM, MARS, and SRC Models
The results of the training and validation of the various scenarios are shown in Tables 4-9 for the ANN, ANFIS-GP, ANFIS-SC, ANFIS-FCM, MARS, and SRC models for predictions of the sediment yields for the Gilgit basin. In Table 4, the ANN shows the best performance of S 10 scenarios with model inputs of Q t , T t−1 , Evapt −1 , SCA t , and SCA t−4 . In the ANN, the model parameters having radbas and tansig as input and output transfer functions along with five numbers of neurons performed best with S 10 input scenarios during the training and validation phases. Table 5 shows the results of the ANFIS-GP for all input scenarios. Here, the ANFIS-GP shows the best performance of the model with S 7 scenarios consisting of inputs of Qt, SCA t , and SCA t−1 . The ANFIS-GP model performs best with model parameters consisting of triangular (trimf) membership functions along with two numbers of membership functions (MFs). The results of the ANFIS-SC model are shown in Table 6.   From Table 6, the input scenario S 10 involving the inputs of Q t , T t−1 , Evap t−1 , SCA t , and SCA t−4 gives the best performance of the ANFIS-SC model. The ANFIS-SC uses the model parameters having the value of a cluster radius of 0.90 to perform best with S 10 input combinations. Table 7 shows the results of input scenarios by using the ANFIS-FCM model. It is evident that the best performance of the ANFIS-FCM model, too, was obtained with S 10 scenarios having inputs of Q t , T t−1 , Evap t−1 , SCA t , and SCA t−4 . In the ANFIS-FCM model, the best network was developed by using the model parameter having two numbers of clusters with S10 input scenario. Table 8 represents the results of the MARS model used in this study for prediction of the sediment yield of the Gilgit River basin. As shown in Table 8, again the input scenario S 10 involving the inputs of Q t , T t−1 , Evap t−1 , SCA t , and SCA t−4 developed the best-performing network in the MARS model. The MARS model performed best with its basis function (BF) parameter having the value of 10 with the S 10 scenario. Table 9 shows the overall results of the best networks of the ANN, ANFIS-GP, ANFIS-SC, ANFIS-FCM, and MARS models compared with the sediment rating curve performance for the Gilgit basin. Table 8 shows that the ANN model performs better than all other models with the least values of the RMSE errors of 0.42 and 0.43 during the training and testing phase.
Similarly, Figure 5 shows the scatter plot between the observed and predicted SSY by using ANN, ANFIS-GP, ANFIS-SC, ANFIS-FCM, MARS, and SRC during the testing phase for overall best input scenarios. From the scatter plot graphs, it can be observed that the ANN-based model has the least scatters with the highest value of R 2 during the testing phase. The ANN has improved the results of the scatter plot of the R 2 value to up to 0.82 in comparison to the rating curve R 2 value of 0.71 during the testing period. Figure 6 shows the annual time series variation graphs of the observed and estimated SSY by using the ANN, ANFIS-GP, ANFIS-SC, ANFIS-FCM, MARS, and SRC models with best-performed input combinations. This Figure 6 also includes the one detailed graph derived from the main time series plot to compare all model performances during the peak annual suspended sediment yields (SSY) period of the year 2005.
It is illustrated in Figure 6 that during the peak SSY period of the year 2005, the estimated SSY of the models ANN, MARS, and ANFIS-FCM are relatively closer to the observed SSY than those of the other models. However, the models ANFIS-GP and ANFIS-SC significantly underestimated the SSY during this peak year period of 2005. Similarly, the SRC model significantly overestimated the SSY during that period. Figure 7 shows an overall comparison of different input variable scenarios developed from flows Q (m 3 /day), snow cover area SCA (fractions), effective mean basin rainfall R (mm/day), mean basin average temperatures T ( • C/day), and mean basin evapotranspiration Evap (mm/day) for predictions of SSY during the testing period in the Gilgit basin. The model performance of R 2 was improved up to the value of 0.82 by introducing the combinations of the snow cover area along with flows, effective rainfall, temperatures, and evapotranspiration. The input combinations consisting of only the mean basin average temperature T perform less than other combinations consisting of flows, snow covers, effective rainfall etc. However, the mean basin average temperature T variable scenarios' performance with an R 2 value of 0.76 is better than the rating curve with an R 2 value of 0.71.
Rajaee et al. [22] applied artificial neural networks (ANNs), neuro-fuzzy (NF), multiple linear regression (MLR), and sediment rating curve (SRC) for prediction of suspended sediment concentrations (SSC) for Little Black River and Salt River in United states of America (USA). For example, in Little Black River gauging station, the value of R 2 was 0.69 for NF model, while it was 0.45, 0.25, and 0.23 for ANN, MLR, and SRC models respectively. In the present study, the value of R 2 ranges from 0.78-0.82 using ANN and ANFIS models. It suggests that the soft computing models could be successfully applied for daily prediction sediment yields.      The mean values of SSY and relative accuracies of the ANFIS-GP, ANFIS-SC, ANFIS-SC, ANFIS-FCM, MARS, and SRC models at Gilgit gauging station are shown in Table 10. The ANN model predicted the means of the peak sediment fluxes to be 6613 (tons/day) and 5186 (tons/day), while the ANFIS-GP, ANFIS-SC, ANFIS-FCM, MARS, and SRC models resulted in less accurate outcomes. However, Table 10 also shows that the ANFIS-FCM model with a relative accuracy of 81.31% has a superior accuracy in predicting the peak values of sediment yields compared to the ANN (80.17%), ANFIS-GP (78.45%), ANFIS-SC (75.49%), MARS (80.16%), and SRC (66.33%) models.

Deveoplement of Multiple Linear Regression Equation
The relationships between the measured sediment yields and the best-performing scenarios of the ANN, ANFIS-GP, ANFIS-SC, ANFIS-FCM, and MARS models have been developed using 70% of the data. The remaining 30% of the data was used to test the equation of multiple linear regression developed between the measured sediments and data-based model outputs. Equation (24) represents the relation between log-transferred measured sediments loads and data-based log-transferred modeled sediment loads as: (24) where y = observed/measured sediment load in log form (tons/day),

Conclusions
This study was designed to improve the predictions of sediment yields by using different input variables applying the ANN, ANFIS-GP, ANFIS-SC, ANFIS-FCM, and MARS models in addition to the SRC model to the snow-and ice melt-dominated Gilgit basin. The objective of the study was to compare and examine the appropriate input variables based on the knowledge of hydrological process-and snow-and ice melt-dominated factors controlling erosion and sediment transport for predictions of sediment yields. To accomplish this objective, we investigated the input such as flows affecting channel erosion; temperature and snow cover area as snow melt erosion, glacier melt erosion and hillslope erosions; effective rainfall as mass wasting erosion, hillslope erosions and channel erosion; and evapotranspiration as effect of vegetation cover controlling catchment erosion for the prediction of sediment yields. It was concluded that for the prediction of sediment yields, the inputs of snow cover area, effective rainfall, and evapotranspiration significantly improve the accuracy of the ANN model when used in addition to flows and temperature as inputs. Combining the snow cover maps, effective rainfall, temperature, and evapotranspiration as inputs slightly increased the model performance (0.80 and 0.82) of R 2 when using the ANN model during the testing phase for the Gilgit River basin. It was concluded that the estimated snow cover area on land use maps and spatially distributed climatic information can improve the prediction of sediment yields when using data-based models.
It was also concluded that predictions of the peak values of sediment yields by means of the ANN, ANFIS-FCM, and MARS models are relatively closer to the values of the observed sediments than when using the SRC, ANFIS-GP, and ANFIS-SC models. The ANFIS-FCM, ANN, and MARS models predicted the sediment with relative accuracies of 81.31%, 80.17%, and 80.16%, respectively, against the peak values of the observed time series. Overall, the ANFIS-FCM model was found to be more successful than the other models for predicting the peak values of sediments in the Gilgit basin.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1: Schematic diagram of the ANN model for prediction of sediment yields with one hidden layer, Figure S2: Schematic diagram of the ANFIS model for prediction of sediment yields with two inputs., Table S1: Summary of the reviewed publications of data-based models sorted by year and input variables, Table S2: Characteristics of the Gilgit River basin in the Upper Indus River.
Author Contributions: W.U.H. designed the work, highlighted the problem, formulated the work plan, analyzed the data and write up the paper. M.K.S. assisted in improving the work methodology, improving the write up and reviewing the paper. F.S. and F.N assisted with correction of the methodology and practical knowledge.

Conclusions
This study was designed to improve the predictions of sediment yields by using different input variables applying the ANN, ANFIS-GP, ANFIS-SC, ANFIS-FCM, and MARS models in addition to the SRC model to the snow-and ice melt-dominated Gilgit basin. The objective of the study was to compare and examine the appropriate input variables based on the knowledge of hydrological processand snow-and ice melt-dominated factors controlling erosion and sediment transport for predictions of sediment yields. To accomplish this objective, we investigated the input such as flows affecting channel erosion; temperature and snow cover area as snow melt erosion, glacier melt erosion and hillslope erosions; effective rainfall as mass wasting erosion, hillslope erosions and channel erosion; and evapotranspiration as effect of vegetation cover controlling catchment erosion for the prediction of sediment yields. It was concluded that for the prediction of sediment yields, the inputs of snow cover area, effective rainfall, and evapotranspiration significantly improve the accuracy of the ANN model when used in addition to flows and temperature as inputs. Combining the snow cover maps, effective rainfall, temperature, and evapotranspiration as inputs slightly increased the model performance (0.80 and 0.82) of R 2 when using the ANN model during the testing phase for the Gilgit River basin. It was concluded that the estimated snow cover area on land use maps and spatially distributed climatic information can improve the prediction of sediment yields when using data-based models.
It was also concluded that predictions of the peak values of sediment yields by means of the ANN, ANFIS-FCM, and MARS models are relatively closer to the values of the observed sediments than when using the SRC, ANFIS-GP, and ANFIS-SC models. The ANFIS-FCM, ANN, and MARS models predicted the sediment with relative accuracies of 81.31%, 80.17%, and 80.16%, respectively, against the peak values of the observed time series. Overall, the ANFIS-FCM model was found to be more successful than the other models for predicting the peak values of sediments in the Gilgit basin.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4441/12/5/1481/s1, Figure S1: Schematic diagram of the ANN model for prediction of sediment yields with one hidden layer, Figure S2: Schematic diagram of the ANFIS model for prediction of sediment yields with two inputs., Table S1: Summary of the reviewed publications of data-based models sorted by year and input variables, Table S2: Characteristics of the Gilgit River basin in the Upper Indus River.
Author Contributions: W.U.H. designed the work, highlighted the problem, formulated the work plan, analyzed the data and write up the paper. M.K.S. assisted in improving the work methodology, improving the write up and reviewing the paper. F.S. and F.N. assisted with correction of the methodology and practical knowledge. All authors have read and agreed to the published version of the manuscript.