Using Machine Learning-Based Algorithms to Analyze Erosion Rates of a Watershed in Northern Taiwan

: This study continues a previous study with further analysis of watershed-scale erosion pin measurements. Three machine learning (ML) algorithms—Support Vector Machine (SVM), Adaptive Neuro-Fuzzy Inference System (ANFIS)


Background and Introduction
Soil erosion is of major concern to agriculture and has had a detrimental long-term effect on both soil productivity and the sustainability of agriculture in particular. Soil erosion can lead to water pollution, increased flooding, and sedimentation, which damage the environment [1]. This has influenced the introduction of erosion control practices and policies as a necessity in almost every country of the world and under virtually every type of land use. Soil erosion causes both on-site and off-site consequences [2]. The on-site effects, such as soil losses from a field and depleted organic matter or nutrients of the soil, are particularly relevant on agricultural land. Off-site, downstream study aims to extend the investigation to the application of other ML techniques. Our main goal is to further our understanding of soil erosion rates in the study area.

Dataset and Research Method
The research design involves the use of site-specific data collected in the study area. They are described in the sections below.

Area of Study
The Shihmen reservoir dam is situated in the northern part of Taiwan on the banks of the Tahan River. Its watershed has an area of approximately 759.53 km 2 (Figure 1), and elevation rises towards the south. Hills extend over most areas, and for more than 60% of the watershed, the slope is more than 55% [28]. The yearly average temperature is 19°C and average humidity, 82%. The typical rainy season is from May to October and the dry season from November to April. The average annual precipitation is approximately 2500 mm/yr [28].

Dataset and Research Method
The research design involves the use of site-specific data collected in the study area. They are described in the sections below.

Area of Study
The Shihmen reservoir dam is situated in the northern part of Taiwan on the banks of the Tahan River. Its watershed has an area of approximately 759.53 km 2 (Figure 1), and elevation rises towards the south. Hills extend over most areas, and for more than 60% of the watershed, the slope is more than 55% [28]. The yearly average temperature is 19 ℃ and average humidity, 82%. The typical rainy season is from May to October and the dry season from November to April. The average annual precipitation is approximately 2500 mm/yr [28].

Data Preparation
We collected various data about the locations of the installed erosion pins as well as the measurements of the erosion pins themselves, as described in the following sections.

Data Preparation
We collected various data about the locations of the installed erosion pins as well as the measurements of the erosion pins themselves, as described in the following sections.

Predictors
A total of 14 environmental factors were utilized as the predictors (independent factors or input variables) in the model, namely distance to river, distance to road, type of slope, sub-watershed, slope direction, elevation, slope class, rainfall, epoch, lithology, and the amount of organic content, clay, sand, and silt in the soil. These factors have been gathered from different sources and become a geospatial database, as described in Nguyen et al. [27].

Target
The specification and installation of the erosion pins were described in Lin et al. [12]. Within the boundary of the Shihmen reservoir watershed, a total of 550 pins were installed on 55 slopes (10 pins per slope). The measurement data were collected from 8 September 2008 to 10 October 2011. The annual erosion depths were averaged at each slope, and the value ranges from 2.17 to 13.03 mm/yr. The metal rods (pins) used in this analysis were mounted on slopes without any signs of a landslide, collapse, or gully erosion. Therefore, our findings cannot be generalized beyond sheet and rill erosion.

Model Configuration
In this study, ML algorithms were used to predict the erosion rates of sheet erosion and rill erosion. The overall framework of the study consisted of three main parts (Figure 2), as summarized below.

Machine Learning Algorithms
Three widely used and potentially applicable ML algorithms are used in this study. They are described below.

Artificial Neural Network
An Artificial Neural Network (ANN) mimics how the human brain processes information. The purpose of the ANN model is to predict a target outcome by using input data through a backpropagation learning algorithm [34]. A typical ANN model has a multi-layer feed-forward structure that is connected by nodes with three main layers, namely the input layer, the hidden layer(s), and the output layer. The ANN determines the weight for each node and builds its results through training. In this study, the ANN model was created using the 'nntool' in the MATLAB 2016 software. First, the entire dataset, which includes 14 predictors and one target, was divided into a training set (70% or 38 samples) and a test set (30% or 17 samples) as is commonly done in the literature [29][30][31][32]. This step was repeated three times (i.e., Grouping #1, Grouping #2, and Grouping #3) to reduce the data variability to sampling. Because stratified random sampling has been shown to produce better outcomes than the simple random sampling [27], only stratified random sampling was used in this study to ensure the proper representation of the population. In stratified random sampling, the dataset was divided into several strata, and each stratum was sampled proportionately (70/30). Sub-watershed was selected as the stratification variable for this study, because it had been shown that the erosion depths in different watersheds were statistically different [33]. Hence, using stratified random sampling with sub-watershed as the stratification variable can provide a better estimate of the sample statistics and therefore create a better ML model. Second, the ANN, ANFIS, and SVM models were built using the 70% training data, and the resulting models were tested using the test data.
Third, the performance metrics of R 2 , NSE (Nash-Sutcliffe Efficiency), RMSE, and MAE were calculated on the training and test data. The Wilcoxon signed-rank test was conducted to determine if a statistically significant difference exists between the three models. Finally, the results of ANN, ANFIS, and SVM were tabulated and compared with results from our previous research [27].

Machine Learning Algorithms
Three widely used and potentially applicable ML algorithms are used in this study. They are described below.

Artificial Neural Network
An Artificial Neural Network (ANN) mimics how the human brain processes information. The purpose of the ANN model is to predict a target outcome by using input data through a back-propagation learning algorithm [34]. A typical ANN model has a multi-layer feed-forward structure that is connected by nodes with three main layers, namely the input layer, the hidden layer(s), and the output layer. The ANN determines the weight for each node and builds its results through training. In this study, the ANN model was created using the 'nntool' in the MATLAB 2016 software. A three layers feed-forward back-propagation network type was used. It consists of an input layer (14 neurons representing 14 environmental factors), one hidden layer (29 neurons), and one output layer (erosion rate), as shown in Figure 3. The number of neurons of the hidden layer is determined based on the following equation [35]: where N is the number of hidden nodes, and x is the number of input nodes.  An Adaptive Neuro-Fuzzy Inference System (ANFIS) is a combination of ANN and fuzzy logic, which utilizes the strengths of both techniques. Jang [14] introduced the concept of ANFIS in 1993. ANFIS contains five layers connected by directed links. These five layers are the Fuzzification layer,

Adaptive Neuro-Fuzzy Inference System
An Adaptive Neuro-Fuzzy Inference System (ANFIS) is a combination of ANN and fuzzy logic, which utilizes the strengths of both techniques. Jang [14] introduced the concept of ANFIS in 1993. ANFIS contains five layers connected by directed links. These five layers are the Fuzzification layer, the Product layer, the Normalized layer, the Defuzzification layer, and the Output layer. The main purpose of ANFIS is to define the optimum parameter values of an equivalent fuzzy inference system by applying a learning algorithm. ANFIS can be constructed using two different methods, namely Genfis 1 (grid partitioning) and Genfis 2 (subtractive clustering). Genfis 1 has a limitation of six input variables. Since we have 14 factors, we used Genfis 2 in our analysis instead of Genfis 1. In this research, the ANFIS model was constructed using the "anfisedit" tool in the MATLAB 2016 software following the steps of loading data, generating FIS (sub clustering), and testing.

Support Vector Machine
A Support Vector Machine (SVM) is a supervised learning model developed by Schölkopf et al. [36] that can be used for regression and classification. The SVM model algorithm creates a line or a hyperplane that divides a dataset into two classes. The distance from the hyperplane to the nearest data points on both sides is defined as the margin. The purpose is to select a hyperplane with the greatest possible margin, thus giving a greater chance of new data being classified correctly into these two classes. In this study, the SVM model was implemented using custom codes and the 'fitrsvm' package on the MATLAB 2016 software.

Evaluation Criteria of Model Performance
Model evaluation is an indispensable part of developing a useful model. It supports the discovery and selection of a good model that can be used in the future. There are several statistical indices commonly used to estimate and calculate the performance and the validity of the ML algorithms. Here, the statistical parameters employed to measure the errors between the predicted and the observed values are R 2 , NSE, RMSE, and MAE [37][38][39].
The R 2 value indicates the consistency with which the predicted values versus the measured values following a regression line [27]. It ranges from zero to one. If the value is equal to one, the predictive model is considered "perfect." The definition of R 2 is as follows: where SST represents the total sum of squares, SSE is the error sum of squares, Y is the prediction of the model, Y 1 is the prediction of the regression line, and Y is the average of predicted values ( Figure 4). It is worth noting that although R 2 has been widely used for model evaluation, the statistics are highly sensitive to extreme values and are insensitive to additive and proportional differences between model and measured data [40]. More importantly, R 2 is calculated against the regression line ( Figure 4), not the 1:1 line. A high R 2 only means a good fit to the regression line, not necessarily a set of good predictions concerning the observations (see the distinction made between the regression line and the 1:1 line in Figure 4). Therefore, we retain R 2 in this study only for completeness. Instead of relying on R 2 , we computed RMSE, MAE, and NSE to compare the model performance. They are more appropriate than R 2 .
RMSE and MAE are statistical parameters that are 'dimensioned.' They express the average errors of the model in the unit of the output variable (Equations 3 and 4). The RMSE is of particular importance, because it is one of the most commonly reported parameters in the climatic and environmental literature [41]. Smaller values of RMSE and MAE suggest nearer approximation of observed values Sustainability 2020, 12, 2022 7 of 16 by the models. The RSME and MAE are widely used basic metrics for assessing the performance of predictive models [42]. They are defined as follows: where Y 2 is the predicted value of the 1:1 line, and n is the number of samples.
Sustainability 2020, 12, x FOR PEER REVIEW 7 of 17 where SST represents the total sum of squares, SSE is the error sum of squares, Y is the prediction of the model, Y1 is the prediction of the regression line, and ̅ is the average of predicted values ( Figure 4). It is worth noting that although R 2 has been widely used for model evaluation, the statistics are highly sensitive to extreme values and are insensitive to additive and proportional differences between model and measured data [40]. More importantly, R 2 is calculated against the regression line ( Figure 4), not the 1:1 line. A high R 2 only means a good fit to the regression line, not necessarily a set of good predictions concerning the observations (see the distinction made between the regression line and the 1:1 line in Figure 4). Therefore, we retain R 2 in this study only for completeness. Instead of relying on R 2 , we computed RMSE, MAE, and NSE to compare the model performance. They are more appropriate than R 2 .
RMSE and MAE are statistical parameters that are 'dimensioned.' They express the average errors of the model in the unit of the output variable (Equations 3 and 4). The RMSE is of particular importance, because it is one of the most commonly reported parameters in the climatic and environmental literature [41]. Smaller values of RMSE and MAE suggest nearer approximation of observed values by the models. The RSME and MAE are widely used basic metrics for assessing the performance of predictive models [42]. They are defined as follows: where Y2 is the predicted value of the 1:1 line, and n is the number of samples. In addition to R 2 , RMSE, and MAE, which all have been used in our previous study [27], an additional parameter (NSE) is included in this study. As shown in Equation (5), NSE is a normalized statistical parameter that defines the relative magnitude of the residual variance compared to the measured data variance [43]. It shows how well the plot of the predicted values versus the observed values fits the 1:1 line. The value of NSE ranges from −∞ to one, with NSE = 1 being the optimal value. In addition to R 2 , RMSE, and MAE, which all have been used in our previous study [27], an additional parameter (NSE) is included in this study. As shown in Equation (5), NSE is a normalized statistical parameter that defines the relative magnitude of the residual variance compared to the measured data variance [43]. It shows how well the plot of the predicted values versus the observed values fits the 1:1 line. The value of NSE ranges from −∞ to one, with NSE = 1 being the optimal value. If the value of NSE is between zero and one, the model is said to have acceptable performance (the higher, the better). On the other hand, if the value of NSE is smaller than 0.0, it will indicate that the average of observed values is better than the predicted value, and the model performance is not acceptable [43].
where X is the observed value and X is the average of observed values.

Wilcoxon Signed-Rank Test
In addition to using NSE, RMSE, and MAE to evaluate the effectiveness of the models, it is necessary to examine if the differences in NSE, RMSE, and MAE are statistically significant. In this study, we used the Wilcoxon signed-rank test to compare the errors generated by different predictive models because the Wilcoxon signed-rank test is non-parametric (distribution-free). The steps of the Wilcoxon signed-rank test are as follows: Let Y denote the observed value, M 1 denote the predicted  (6) and (7)) by: To determine whether one model predicts more accurately than the others do, we perform the one-tailed hypothesis test. The null hypothesis (H 0 ) is that "Two models have the same predictive error"(E 1 = E 2 ). The alternative hypothesis (H a ) is that "The first model has a smaller error than the second model"(E 1 < E 2 ). In the Wilcoxon signed-rank test, we chose a significance level of 0.05. Then, the decision to whether to reject the null hypothesis or not is based on the resulting p-value. If the p-value is greater than 0.05, it will fail to reject the null hypothesis. Otherwise, if the p-value is less than 0.05, the null hypothesis will be rejected at a confidence level of 95%. In this case, the Wilcoxon signed-rank test is run using the R command wilcox.test.

Results and Discussions
We conducted all analyses using the dataset described in Section 2. Our findings are presented in the following sections.

Evaluation of Predictive Models
This study used quantitative criteria to assess the performance of predictive models. Table 1 shows the computed R 2 , NSE, RMSE, and MAE statistics for ANFIS, ANN, and SVM under three different stratified random samplings. Among the four parameters, R 2 and NSE are statistics used to evaluate the predictive performance of the models under consideration. The higher the R 2 and NSE, the better the models simulate the results. However, as shown in Figure 4, R 2 is a measure of the goodness-of-fit of the regression line, not the 1:1 line. Therefore, a high R 2 does not necessarily mean a good performance of the model for predicting the soil erosion rate. In this regard, NSE is a much more suitable index to use because it is based on the differences between the predicted and observed values (1:1 line). Therefore, in the following, we will restrict our discussion to NSE only and ignore R 2 .
On the other hand, MAE and RMSE are evaluation metrics commonly used to gauge the performance of the models that output continuous numbers. Both RMSE and MAE produce average errors of the models in units of the model output variables. However, it is worth noting that RMSE and MSE could be calculated against the 1:1 line or the regression line [27], and different results could be obtained. In this study, we computed both RMSE and MSE based on the 1:1 line to reflect the differences between the predicted and observed values.
As can be seen from Table 1, the average NSE of the training data for the ANN, ANFIS, and SVM models is 0.62, 1.00, and 0.51, respectively. Theoretically, the value of NSE ranges from −∞ to one, and the higher the NSE, the better the model performs. Thus, it can be inferred from these numbers that ANFIS is the best model with a perfect NSE value of 1.00, and it outperforms the other two models substantially. However, the average NSE deteriorates significantly when the test data are used to judge the true performance of the models. They are 0.32, 0.49, and 0.17 for the ANN, ANFIS, and SVM models, respectively. The ANFIS model remains the best performing model, while the ANN model still beats the SVM. It is worth noting that a negative NSE was obtained for grouping #2 of SVM. This means that the model did not contribute to the improvement of the prediction, and the average of the observed values is better than the predicted value [44].
According to Table 1, there is also a significant difference in the RMSE values across the three models. The average RMSE of the training data for the ANN, ANFIS, and SVM models is 1.23, 0.01, and 1.43 mm/yr, respectively. Again, the ANFIS model outperforms the other two models by a substantial margin. The prediction of the ANFIS model was so accurate that there was almost no error in calculating the differences between the predicted and the observed values. However, the model performance falls rapidly when test data were used. The average RMSE of the test data for the Sustainability 2020, 12, 2022 9 of 16 ANN, ANFIS, and SVM models increases to 2.36 mm/yr, 2.05 mm/yr, and 2.61 mm/yr, respectively. The disparity in model performance between the training data and the test data indicates that an over-fitting problem has occurred in the training of the ANFIS model. The other two models do not show signs of overtraining, but their predictive performances are inferior to that of the ANFIS model.
Lastly, if we compare the MAE results of the three models, a largely similar conclusion to the observation made above will be reached. The average MAE of the training data for the ANN, ANFIS, and SVM models is 0.75 mm/yr, 0.01 mm/yr, and 0.99 mm/yr, respectively. The ANFIS model performs much better than the ANN and SVM models. Moreover, the average MAE of the test data for the ANN, ANFIS, and SVM models increases to 1.85, 1.67, and 2.14 mm/yr, respectively. The errors of the test data are bigger than the errors in the training data. This again shows that the ANFIS model has been over-fitted. Overall, a general picture emerging from the analysis of NSE, RMSE, and MAE is that ANFIS is the best model, followed by ANN and SVM. They are ranked as follows, from best to worst: ANFIS, ANN, SVM.

Visual Comparison of Models
The data in Table 1 provide convincing evidence that ANFIS is the best performing model. We further plotted the predicted values versus the observed values with the regression line and the 1:1 line in the scatter plots of Figure 5. The left-side figures show the training datasets, while the right-side figures are the test datasets. Three different sampling results (groupings) are presented in Figure 5. The first row is grouping #1 (5a and 5b). The second row is grouping #2 (5c and 5d). Finally, the bottom-most row is grouping #3 (5e and 5f). Figure 5 shows that during the training stage, ANFIS successfully tunes its parameters to minimize the errors associated with its predictions. Therefore, the regression line of ANFIS (red) almost coincides with the 1:1 line, and it resulted in an average NSE of 1.00. Coincidentally, the regression line of ANN (blue) is also very close to the 1:1 line in all three cases. However, the data points (blue) are much more scattered around the 1:1 line than those of ANFIS (red) are. Therefore, ANN has a much lower average NSE of 0.62. Sustainability 2020, 12, x FOR PEER REVIEW 11 of 17  As for the test data on the right-hand side of Figure 5, all three models showed substantially more scatter than that of the training data, which indicates substantially bigger errors between the predictions and the observations. The regression line of ANFIS is no longer the closest to the 1:1 line. However, the ANFIS model still has the highest average NSE (0.49) and the lowest average RMSE (2.05 mm/yr). It appears that ANFIS gained its performance advantage due to its hybrid learning approach of combining the ANN and the Fuzzy Inference System.

Results of Wilcoxon Signed-Rank Test
This study was undertaken to determine the best ML model using NSE, RMSE, and MAE. To determine if the differences between different model predictions were statistically significant, we used the Wilcoxon signed-rank test to compare the errors between different models. With the absolute error of each model being E ANFIS , E ANN, and E SVM, respectively, the results of the Wilcoxon signed-rank test on the training data for each combination of the three models are summarized in Table 2. The results of the test data were shown in Table 3.
It can be seen from Table 2 that for the training data, the p-values between ANFIS and ANN are all very small and less than the threshold value of 0.05. The same can be said for the p-values between ANFIS and SVM. Taken together, the data presented here provide evidence that ANFIS was a statistically better model than both ANN and SVM when the models were trained with the training data. By contrast, the p-values between ANN and SVM are not small enough to reject the null hypothesis. Therefore, it is inconclusive whether a statistical difference exists in the predictive performance between the two models.
In terms of the test data, however, no statistically significant difference exists between the three models. As shown in Table 3, the p-values are all higher than the threshold value of 0.05. Therefore, we cannot reject the null hypothesis in favor of the alternative hypothesis.  Table 3. The results of the Wilcoxon signed-rank test (test data).

Comparison with Other ML Algorithms
In our previous research, DT, RF, and MR were used to predict the depths of erosion at the Shihmen reservoir watershed study site [27]. The result showed that RF outperforms other ML algorithms. In this study, we further compared the performance of ANN, ANFIS, and SVM to predict soil erosion depths (rates). The results of the above six models are shown together in Figure 6. performance of models by training data could lead to over-optimistic and overfitting models. As a result, although ANFIS performs better in training, RF is still considered the best model for predicting the soil erosion rate in the study area. Hannan et al. [45] and Barenboim et al. [46] also compared the performance of RF and ANFIS in their studies. Both studies indicated that the ANFIS and RF models were effective; however, Hannan et al. [45] preferred RF to ANFIS and ANN, and Barenboim et al. [46] recommended both RF and ANFIS.
(a) (b) Figure 6. Radar plots of six ML models: (a) Training data; (b) Test data. Figure 7 shows the interpolated distribution of erosion rates (mm/yr) in the study watershed using the Inverse Distance Weighting (IDW) method. Figure 7a was obtained from erosion pin measurements, whereas Figures 7b and 7c were predicted by ANFIS and RF, respectively. It can be seen from the figures that the same distribution pattern is observed. The erosion rate is the highest on the east side of the study area, the lowest on the west side, and the north and south sides have an in-between erosion rate.  Figure 6 shows the comparison between six ML models in terms of the average RMSE (green points) and the average MAE (red points). The sub- Figure 6(a) is based on the training data, and 6(b) is based on the test data. If the points are closer to the center of the radar web, the model will have a better predictive performance. It can be seen from Figure 6a that in terms of RMSE starting from the best to the worst, the six models can be ranked as follows: ANFIS (0.01 mm/yr) < RF (0.93 mm/yr) < ANN (1.23 mm/yr) < MR (1.25 mm/yr) < SVM (1.43 mm/yr) < DT (1.73 mm/yr). It is obvious that ANFIS is the best model for training data. However, because models can perform very well during training but perform poorly in the testing against new data (over-fitted), it is the results of test data that distinguish a good model from a poor one. To evaluate model performance, we should focus on the metrics of test data. It can be seen from Figure 6 that although ANFIS is the best model among the three ML models compared in this study, it is still not quite as good as RF in the previous study. If we rank the six ML models again using the average RMSE of the test data, we will obtain a different rank (starting from the best to the worst): RF (1.75 mm/yr) < ANFIS (2.05 mm/yr) < ANN (2.36 mm/yr) < DT (2.45 mm/yr) < SVM (2.61 mm/yr) < MR (3.47 mm/yr). In other words, RF replaces ANFIS and becomes the favored choice. It is also possible to draw similar conclusions from the MAE values, also in Figure 6.
The results of test data are critical and are emphasized because evaluating the predictive performance of models by training data could lead to over-optimistic and overfitting models. As a result, although ANFIS performs better in training, RF is still considered the best model for predicting the soil erosion rate in the study area. Hannan et al. [45] and Barenboim et al. [46] also compared the performance of RF and ANFIS in their studies. Both studies indicated that the ANFIS and RF models were effective; however, Hannan et al. [45] preferred RF to ANFIS and ANN, and Barenboim et al. [46] recommended both RF and ANFIS. Figure 7 shows the interpolated distribution of erosion rates (mm/yr) in the study watershed using the Inverse Distance Weighting (IDW) method. Figure 7a was obtained from erosion pin measurements, whereas Figure 7b,c were predicted by ANFIS and RF, respectively. It can be seen from the figures that the same distribution pattern is observed. The erosion rate is the highest on the east side of the study area, the lowest on the west side, and the north and south sides have an in-between erosion rate.

Conclusions
In this study, the ANN, ANFIS, and SVM algorithms were used to create predictive models of soil erosion rates in the study area of the Shihmen reservoir. The soil erosion rates were measured by 550 erosion pins installed on 55 slopes, and the results of the measurements reflect the sheet and rill erosion that took place within the study area. After dividing the dataset by a 70/30 ratio into training and test datasets using stratified random sampling, ANN, ANFIS, and SVM were used to generate respective models based on the 14 types of factors included in the training data. Then the models were applied to the test data, and the discrepancies from the real measurements were evaluated by R 2 , NSE, RMSE, and MAE.
Without making an ex-ante choice of soil erosion model, the ex-post outcomes of ML models were quite satisfactory. The average RMSE of the training data ranges from mere 0.01 to 1.43 mm/yr. Among the three models, the performance of ANFIS is considerably higher than those of ANN and SVM, as indicated by its RMSE of 0.01 mm/yr. However, the performance of all three models degraded when they are applied to the test data. Results showed that the average RMSE of the test data varies from 2.05 to 2.61 mm/yr, with ANFIS still the best among the three models. To examine if the difference in prediction is statistically significant, the Wilcoxon signed-rank test was used to conduct pairwise comparisons of the three models. The results indicate that the ANFIS model is better than both the ANN and SVM models for the training data. However, no statistically significant difference exists between the three models when the models are applied to test data. Moreover, the advantage of ANFIS disappeared when it was compared with the ML models (DT, RF, and MR) developed in our previous study. Although the average RMSE of ANFIS on training data is still unmatched, the average RMSE of ANFIS on test data was worse than that of RF. This shows that ANFIS may have been over-trained, and RF is still considered the best model for predicting the soil erosion rate in the study area.
In this and previous studies, we have made a substantial effort and progress in applying ML algorithms to the prediction of soil erosion rates without resorting to any soil erosion models. Although the effort was made, there is still no shortage of ML algorithms that promise better results than what has been obtained to date. It remains to be seen if ML algorithms are truly viable alternatives to traditional soil erosion models. Future research will have to address this issue in more detail.
Finally, because of the easy installation and wide availability of erosion pins, we believe that the approach presented here is generally applicable to other regions of the world. It would be desirable to obtain such measurements and carry out similar analyses for comparison.

Conclusions
In this study, the ANN, ANFIS, and SVM algorithms were used to create predictive models of soil erosion rates in the study area of the Shihmen reservoir. The soil erosion rates were measured by 550 erosion pins installed on 55 slopes, and the results of the measurements reflect the sheet and rill erosion that took place within the study area. After dividing the dataset by a 70/30 ratio into training and test datasets using stratified random sampling, ANN, ANFIS, and SVM were used to generate respective models based on the 14 types of factors included in the training data. Then the models were applied to the test data, and the discrepancies from the real measurements were evaluated by R 2 , NSE, RMSE, and MAE.
Without making an ex-ante choice of soil erosion model, the ex-post outcomes of ML models were quite satisfactory. The average RMSE of the training data ranges from mere 0.01 to 1.43 mm/yr. Among the three models, the performance of ANFIS is considerably higher than those of ANN and SVM, as indicated by its RMSE of 0.01 mm/yr. However, the performance of all three models degraded when they are applied to the test data. Results showed that the average RMSE of the test data varies from 2.05 to 2.61 mm/yr, with ANFIS still the best among the three models. To examine if the difference in prediction is statistically significant, the Wilcoxon signed-rank test was used to conduct pairwise comparisons of the three models. The results indicate that the ANFIS model is better than both the ANN and SVM models for the training data. However, no statistically significant difference exists between the three models when the models are applied to test data. Moreover, the advantage of ANFIS disappeared when it was compared with the ML models (DT, RF, and MR) developed in our previous study. Although the average RMSE of ANFIS on training data is still unmatched, the average RMSE of ANFIS on test data was worse than that of RF. This shows that ANFIS may have been over-trained, and RF is still considered the best model for predicting the soil erosion rate in the study area.
In this and previous studies, we have made a substantial effort and progress in applying ML algorithms to the prediction of soil erosion rates without resorting to any soil erosion models. Although the effort was made, there is still no shortage of ML algorithms that promise better results than what has been obtained to date. It remains to be seen if ML algorithms are truly viable alternatives to traditional soil erosion models. Future research will have to address this issue in more detail.
Finally, because of the easy installation and wide availability of erosion pins, we believe that the approach presented here is generally applicable to other regions of the world. It would be desirable to obtain such measurements and carry out similar analyses for comparison.