Soil Erosion Prediction Based on Moth-Flame Optimizer-Evolved Kernel Extreme Learning Machine

: Soil erosion control is a complex, integrated management process, constructed based on uniﬁed planning by adjusting the land use structure, reasonably conﬁguring engineering, plant, and farming measures to form a complete erosion control system, while meeting the laws of soil erosion, economic and social development, and ecological and environmental security. The accurate prediction and quantitative forecasting of soil erosion is a critical reference indicator for comprehensive erosion control. This paper applies a new swarm intelligence optimization algorithm to the soil erosion classiﬁcation and prediction problem, based on an enhanced moth-ﬂame optimizer with sine– cosine mechanisms (SMFO). It is used to improve the exploration and detection capability by using the positive cosine strategy, meanwhile, to optimize the penalty parameter and the kernel parameter of the kernel extreme learning machine (KELM) for the rainfall-induced soil erosion classiﬁcation prediction problem, to obtain more-accurate soil erosion classiﬁcations and the prediction results. In this paper, a dataset of the Vietnam Son La province was used for the model evaluation and testing, and the experimental results show that this SMFO-KELM method can accurately predict the results, with signiﬁcant advantages in terms of classiﬁcation accuracy (ACC), Mathews correlation coefﬁcient (MCC), sensitivity (sensitivity), and speciﬁcity (speciﬁcity). Compared with other optimizer models, the adopted method is more suitable for the accurate classiﬁcation of soil erosion, and can provide new solutions for natural soil supply capacity analysis, integrated erosion management, and environmental sustainability judgment.


Introduction
With over 98.8% of the world's human food coming from the land and less than 1.2% from marine, aquatic ecosystems, protecting arable land and maintaining soil fertility is vital to human well-being [1,2]. Soil erosion is one of the most critical threats to the world's food production [3][4][5]. Globally, about ten million hectares of arable land are lost each year due to soil erosion, resulting in less arable land available for world food production [6]. The loss of arable land is a serious problem; according to the World Health Organization, and Food and Agriculture Organization of the United Nations (FAO) reports, two-thirds of the global population is still undernourished [7]. Soil erosion is the process MFO, thus further improving the detection capability. The exploratory and exploitative nature of the method and the convergence pattern are significantly improved, and are validated in engineering optimization problems.
The framework of this paper is schematically shown in Figure 1 below. This paper uses the SMFO-KELM approach, choosing a dataset of 236 samples from the Son La province of Vietnam, and constructing a prediction and validation model using ten explanatory factors as features. Firstly, we use the five-fold crossover method for optimizing the parameter settings of the KELM; secondly, we use the ten-fold crossover validation method for classifying soil erosion predictions; and finally, we compare six original algorithm models, such as BA-KELM models, and four improved algorithm models, such as CLOFOA-KELM. The experimental results show that the adopted SMFO-KELM method can obtain much higher soil erosion classification prediction results.
Electronics 2021, 10, x FOR PEER REVIEW 4 of 26 and is prone to falling into local or deceptive optimization (LO) during successive iterations. Many researchers have recently worked on adding improved mechanisms based on the MFO algorithm to address these problems. This paper uses a novel SMFO [94] algorithm that introduces a sine-cosine strategy into MFO, thus further improving the detection capability. The exploratory and exploitative nature of the method and the convergence pattern are significantly improved, and are validated in engineering optimization problems. The framework of this paper is schematically shown in Figure 1 below. This paper uses the SMFO-KELM approach, choosing a dataset of 236 samples from the Son La province of Vietnam, and constructing a prediction and validation model using ten explanatory factors as features. Firstly, we use the five-fold crossover method for optimizing the parameter settings of the KELM; secondly, we use the ten-fold crossover validation method for classifying soil erosion predictions; and finally, we compare six original algorithm models, such as BA-KELM models, and four improved algorithm models, such as CLOFOA-KELM. The experimental results show that the adopted SMFO-KELM method can obtain much higher soil erosion classification prediction results. In summary, the main contributions of this paper are as follow: 1. A new and improved swarm intelligence optimization algorithm SMFO combined with the machine learning model KELM is proposed.
2. SMFO is applied for the first time, applied to optimize and determine key parameters in KELM.
3. The first application of SMFO-KELM to a soil erosion classification prediction model. 4. SMFO-KELM classification prediction results of soil erosion are significantly higher than other algorithms, in the following four aspects: accuracy, Matthews correlation coefficient, sensitivity, and specificity.
The chapters of this paper are structured as follows: Section 2 presents the materials and methods, mainly including the methods SMFO, KELM, and the dataset. Section 3 presents the experimental results and evaluation indicators. Section 4 is the discussion and outlook section.

Dataset
This paper uses a soil dataset from two limited experimental areas eroded by heavy rainfall in the north-western city of Vietnam-Son La Province, during three years from 2009 to 2011 [26], shown in Appendix Table A1. The area has a tropical monsoon climate with higher levels of soil erosion from heavy rainfall than at other latitudes, so the area chosen as the site for the experiment has apparent contrasting data. The 4 m × 18 m plots In summary, the main contributions of this paper are as follow: 1. A new and improved swarm intelligence optimization algorithm SMFO combined with the machine learning model KELM is proposed.
2. SMFO is applied for the first time, applied to optimize and determine key parameters in KELM.
3. The first application of SMFO-KELM to a soil erosion classification prediction model. 4. SMFO-KELM classification prediction results of soil erosion are significantly higher than other algorithms, in the following four aspects: accuracy, Matthews correlation coefficient, sensitivity, and specificity.
The chapters of this paper are structured as follows: Section 2 presents the materials and methods, mainly including the methods SMFO, KELM, and the dataset. Section 3 presents the experimental results and evaluation indicators. Section 4 is the discussion and outlook section.

Dataset
This paper uses a soil dataset from two limited experimental areas eroded by heavy rainfall in the north-western city of Vietnam-Son La Province, during three years from 2009 to 2011 [26], shown in Appendix A Table A1. The area has a tropical monsoon climate with higher levels of soil erosion from heavy rainfall than at other latitudes, so the area chosen as the site for the experiment has apparent contrasting data. The 4 m × 18 m plots were selected as unit plots, and there are 24 such plots; the shape of the plots was chosen randomly without restriction. In order to ensure the accuracy of data acquisition, the explanatory factor data were acquired using surface runoff subsurface set water pipes, OC measurements with the carbonate component removed followed by a C/N analyzer, transect methods for coverage, and interpolation techniques for residues, respectively. Cultivation methods, fertilizer application, and soil conservation measures are all based Electronics 2021, 10, 2115 5 of 24 on the traditional farming practices of local farmers. Based on the multi-model for soil erosion prediction as a comprehensive reference and the experimentally obtained data, the following ten explanatory variables were identified as explanatory factors for conducting soil erosion classification, as shown in Table 1 below. Where EI30 is the extended peak rate of disengagement and runoff over 30 min. The following formula gives the storm energy E: where i is the 30-min maximum intensity. The multiplication of E and I 30 provided the dynamic rainfall energy (EI), a combination of the total and peak intensities in each storm. This number represents the combination of particles detachment and transport capabilities. Slope degree is the degree of slope in the terrain, i.e., the length and gradient of the slope. It is a key factor in soil erosion. The steeper and longer the slope, the more runoff accumulation it causes and the greater the probability of soil erosion. These data were collected using a Nikon Forestry (550) inclinometer to measure the slope of the plots.
Soil erodibility is also affected by permeability, structure, organic materials, and pH value. Two simple soil characteristics, OC (organic carbon) and pH, were used to apply interpretive parameters to soil erodibility. OC was obtained with a C/N analyzer (minus HCL), and pH was measured with a glass electrode using water-to-soil ratio of 2.5:1.
Bulk density, topsoil porosity, topsoil texture (silt fraction, clay fraction, sand fraction), and soil cover rate are important influencing factors normally used by traditional models [95].
In the model prediction of soil erodibility, we ultimately aim to classify the samples (related to each vector of the value of explanatory variables) into two categories, namely, the following: "erosion category" or "non-erosion category". To ensure the accuracy of the experimental classification, we used the same criteria for soil loss as in [96], with samples that lost more than 3 tons of soil per hectare being defined as 'erosive' and vice versa as 'non-erodible'. The experimental data consisted of 236 data samples, with 50% erosion and 50% non-erosion.

KELM
Kernel function-based extreme learning machine (KELM), the extreme learning machine (ELM) algorithm, was combined with a kernel function to replace the feature mapping of the implicit layer in ELM with a kernel function to form a kernel function-based ELM algorithm. Because of the kernel function, the data features were up-dimensioned and therefore could be divided more precisely.
ELM is a novel fast learning algorithm of single hidden-layer neural networks that randomly initialize the input weights and biases, and obtains the corresponding output weights. For every single hidden-layer neural network, assume that there are N arbitrary , t i2 , · · · , t im ] T ∈ R m . A single hidden L layer neural network with one hidden layer node can be represented as follows: where g(x) is the activation function, W i = [w i,1 , w i,2 , · · · , w i,n ] T is the input weight, β i is the output weight and b i is the bias of the ith hidden-layer unit. W i ·X j denotes the inner product of W i and X j . Minimal error in the output is the goal of single hidden-layer neural network learning, expressed as follows: β i , w i , and b i exist, such that the following applies: It can be matrixed as follows: where H is the hidden-layer node output, β is the weight of output, and T is the output of desired.
Some traditional gradient descent-based algorithms can solve such problems, but the basic gradient-based learning algorithms require all parameters to be adjusted during the iterative process. In the ELM algorithm, once the input weights W i and the bias of the hidden layer b i are determined randomly, the output matrix of the hidden layer H is determined uniquely. Training a single hidden-layer neural network could be transformed into solving a linear system Hβ = T. Moreover, the output weights β can be determined as follows:β = H + T where H + is the Moore-Penrose generalized inverse of the matrix. Meanwhile, there is H + = H T HH T −1 . Moreover, it can be shown that the norm of the resulting solution β is minimal and unique. As a result, it can help achieve a powerful performance in generalization and significantly increase learning speed. Kernel-based ELM was proposed in order to improve the ability of the ELM to generalize and outperform the least square-based ELM, and it is proposed to add a positive constant C to the diagonal of H T , which is used to calculate the output weight β, we have the following: where the coefficient C is the penalty parameter, whereas I is the identities matrix. Hence, the output function is defined below: A kernel matrix of the ELM is obtained by the following: where K x i , x j is one kind of kernel function. For the output function, then there is the following: The kernel implementation of the ELM, called KELM, has better stability and generalization capabilities than the basic ELM. The structure of the KELM model is schematically shown in Figure 2, where the kernel function acts as an alternative feature mapping function used to achieve the same mapping from the information input to the feature space. Hence, the neural network's output is independent of the feature mapping of the hidden layer, but depends on the kernel function, which is explicitly provided. Both the feature mapping of the hidden layer and the dimensionality of the feature space are not pre-defined.
Electronics 2021, 10, x FOR PEER REVIEW 7 of 26 minimal and unique. As a result, it can help achieve a powerful performance in generalization and significantly increase learning speed. Kernel-based ELM was proposed in order to improve the ability of the ELM to generalize and outperform the least square-based ELM, and it is proposed to add a positive constant to the diagonal of , which is used to calculate the output weight , we have the following: where the coefficient is the penalty parameter, whereas is the identities matrix. Hence, the output function is defined below: A kernel matrix of the ELM is obtained by the following: where , is one kind of kernel function. For the output function, then there is the following: The kernel implementation of the ELM, called KELM, has better stability and generalization capabilities than the basic ELM. The structure of the KELM model is schematically shown in Figure 2, where the kernel function acts as an alternative feature mapping function used to achieve the same mapping from the information input to the feature space. Hence, the neural network's output is independent of the feature mapping of the hidden layer, but depends on the kernel function, which is explicitly provided. Both the feature mapping of the hidden layer and the dimensionality of the feature space are not pre-defined. In this paper, use the Gaussian kernel function as the kernel function of KELM with the following formula:

Input Layer
Hidden Layer Feature Space Output Layer In this paper, use the Gaussian kernel function as the kernel function of KELM with the following formula: the penalty parameter C and the kernel parameter γ are two critical parameters in the KELM model. The penalty parameter C defines the balancing act between the minimal fitting error and the model's sophistication, and the kernel parameter γ determines the nonlinear mapping from the input space to a specific high-dimensional hidden-layer feature space. In general, these two main parameters can be optimized by using appropriate optimization algorithms in order to improve the performance of KELM better. KELM is widely used to solve problems in areas such as parameter optimization and model prediction because of its significant advantages in learning speed and generalization ability.
The basic MFO was proposed in 2015 [47], intended to be a swarm intelligence optimization algorithm based on moths' spiral flight path mechanism in flight.
MFO is an evolution of the moth lateral positioning navigation mechanism found in nature. At night, moths fly using the distant moon as a reference, which can be considered as parallel light, and the moths adjust their flight direction according to the direction of the light to the angle between themselves. Due to the proximity of the artificial flame, the moths fly at a fixed angle to the flame, and the distance between the moth and the flame changes continuously, eventually producing a flight path that spirals closer to the flame [97]. The MFO algorithm has strong parallel optimization capabilities and good overall properties for non-convex functions; the MFO algorithm can explore the search space extensively and find regions with a greater probability of global optimality, as non-convex functions have a large number of local optimality points [98][99][100].
By definition, moths and flames are two important components of the MFO algorithm. We can observe this from Figure where → M i indicate the i-th moth, F j is the j-th flame after sorting, and → S is the spiral function. This spiral function should fulfill the conditions below: (1) The vector position of the initial point of the S function needs to be given first before the MFO algorithm can perform the corresponding calculation.
(2) Before the end of each iteration of the MFO algorithm, the S function should preserve the location of the optimal solution found in this iteration.
(3) The function's magnitude is between the upper bound vector ub and the lower bound vector lb. Considering these points, the equation is defined as follows: where b is logarithmic helix shape constant, t is values in the range [−1,1], → D i is the distance of the i-th moth to the j-th flame, it can be calculated as follows: The t parameter determines the step size of the moth's next movement. Equation (16) has limitations, as it only defines how the moth flies towards the flame, which makes the MFO algorithm easily fall into a local optimum. To avoid this problem, an adaptive update of the flame is required, and the number of flames is gradually reduced, reducing the computation time and improving the operation efficiency. The updated formula of the flame is shown in Equation (18) as follows: where k is the number of current iterations, N is the maximum flame counts, and T indicates the maximum iterations. When the end-of-iteration condition is satisfied, the best moth is returned as the best obtained value.
Electronics 2021, 10, x FOR PEER REVIEW 9 of 26 The parameter determines the step size of the moth's next movement. Equation (16) has limitations, as it only defines how the moth flies towards the flame, which makes the MFO algorithm easily fall into a local optimum. To avoid this problem, an adaptive update of the flame is required, and the number of flames is gradually reduced, reducing the computation time and improving the operation efficiency. The updated formula of the flame is shown in Equation (18) as follows: where k is the number of current iterations, N is the maximum flame counts, and T indicates the maximum iterations. When the end-of-iteration condition is satisfied, the best moth is returned as the best obtained value.

SMFO
SMFO [94] improves the global exploration capability of MFO by incorporating the SCA, which increases the diversity of initial solutions and frees the solutions from local distress. At the same time, the adjustment parameters in the positive cosine strategy increase the accuracy of the optimal solution.
The core of the sine and cosine strategy (SCA) is to modify the position of the initial state through the change in the mathematical function [101][102][103], which is shown in Figure  4. The update of individual positions in the population relies on changes in the value of the sine and cosine function to randomly update the position of each individual in each iteration by using a multi-parameter adjustment, to ensure that the population remains diverse in the early stages and that individuals tend to localize in the later stages, eventually converging to the optimal solution. During each iteration, the state of the individual is updated using the following formula: where ⃗ is the position of the location of the current solution in -th dimension at -th iteration (solution), ⃗ is the position of the location of the current optimal solution inth dimension at -th iteration (destination), whereas | | denotes the absolute value.

SMFO
SMFO [94] improves the global exploration capability of MFO by incorporating the SCA, which increases the diversity of initial solutions and frees the solutions from local distress. At the same time, the adjustment parameters in the positive cosine strategy increase the accuracy of the optimal solution.
The core of the sine and cosine strategy (SCA) is to modify the position of the initial state through the change in the mathematical function [101][102][103], which is shown in Figure 4. The update of individual positions in the population relies on changes in the value of the sine and cosine function to randomly update the position of each individual in each iteration by using a multi-parameter adjustment, to ensure that the population remains diverse in the early stages and that individuals tend to localize in the later stages, eventually converging to the optimal solution. During each iteration, the state of the individual is updated using the following formula: where → X t i is the position of the location of the current solution in i-th dimension at t-th iteration (solution), → P t i is the position of the location of the current optimal solution in i-th dimension at t-th iteration (destination), whereas | | denotes the absolute value.  The parameter defines whether the next position is searched between the solution and the destination or beyond. It enhances the MFO algorithm global exploration capability. Parameter determines the next position update step. is a random weight with a range of values that decide the impact of the target destination on the current solution. The parameter r 1 defines whether the next position is searched between the solution and the destination or beyond. It enhances the MFO algorithm global exploration capability. Parameter r 2 determines the next position update step. r 3 is a random weight with a range of values that decide the impact of the target destination on the current solution. r 4 is the random probability of switching between the sine and cosine function. The recurring pattern of sine and cosine functions makes one solution relocate around another. To ensure the use of the space that is identified between the two solutions, Equation (20) is introduced as follows: where t is the current number of iterations, T is the maximum number of iterations, and a is a constant, generally set to 2. This formula can adaptively adapt the size of the parameters for the exploration to gradually converge to the global optimum.

SMFO-KELM for Soil Erosion Prediction Method
The performance of models in machine learning is often closely related to hyperparameters. Initially, the "optimal" hyperparameter is usually found by manual trial to find the best hyperparameter. However, the approach is inefficient, so swarm intelligence optimization has been proposed to find the optimal hyperparameters. From the experiments mentioned above comparing SMFO with other swarm intelligence optimization algorithms, it can be observed that the proposed SMFO is significantly better than other similar algorithms, in terms of exploration and detection capability, with competitive convergence and balance effect, and it has undeniable advantages in the optimization of SMFO in engineering problems [94]. The penalty parameters and kernel parameters of the SMFO optimized machine learning method kernel extreme learning machine are used to make more-accurate classification predictions for soil erosion classification prediction. Figure 5 shows the proposed SMFO-KELM soil erosion classification prediction model flowchart, which is mainly applied in the following two processes: model optimization and classification evaluation. As with the machine learning validation approach, to obtain reliable and unbiased results, the validation of the classification evaluation model uses a ten-fold crossover to evaluate the classifier's performance, where nine are the test set, and one is the validation set. At the same time, the process of optimizing the two parameters of the classifier uses a five-fold crossover validation, where five are the test set, and five are the validation set. This experimental scheme can help to obtain unbiased estimates of generalization accuracy and reliable results. The final evaluation of the metrics is carried out by accuracy (ACC), Matthews correlation coefficient (MCC), sensitivity, specificity. Due to random sampling, single 10-fold cross-validation will not be representative of the accuracy of the classification. Therefore, the results of 10 10-fold crossover runs are run for all methods to be averaged as the final result of the evaluation.
Two hundred and thirty-six soil erosion binary classification datasets from Son La city of Vietnam were used to evaluate the SMFO-KELM model with ten explanatory factors of EI30, slope degree, OC topsoil, pH topsoil, bulk density, topsoil porosity, topsoil texture (silt fraction), topsoil texture (clay fraction), topsoil texture (sand fraction), and soil cover rate as factors for classification assessment and SMFO for the KELM hyperparameter selection stage optimization of the penalty parameter c and the kernel parameter γ. Electronics 2021, 10, x FOR PEER REVIEW 11 of 26 Two hundred and thirty-six soil erosion binary classification datasets from Son La city of Vietnam were used to evaluate the SMFO-KELM model with ten explanatory factors of EI30, slope degree, OC topsoil, pH topsoil, bulk density, topsoil porosity, topsoil texture (silt fraction), topsoil texture (clay fraction), topsoil texture (sand fraction), and soil cover rate as factors for classification assessment and SMFO for the KELM hyperparameter selection stage optimization of the penalty parameter c and the kernel parameter γ.

Experimental Environment
To ensure the fairness and validity of the experiments [62,64,104,105], all the algorithms involved in the comparison in all experiments were conducted under the same experimental conditions. The population size was set to 20, the maximum number of evaluations MaxFEs was uniformly set to 100, and all algorithms were tested 30 times independently to reduce the influence of random conditions. The searching spaces of the two

Experimental Environment
To ensure the fairness and validity of the experiments [62,64,104,105], all the algorithms involved in the comparison in all experiments were conducted under the same experimental conditions. The population size was set to 20, the maximum number of evaluations MaxFEs was uniformly set to 100, and all algorithms were tested 30 times independently to reduce the influence of random conditions. The searching spaces of the two hyperparameters in KELM were set to C ∈ 2 −15 , · · · , 2 15 and γ ∈ 2 −15 , · · · , 2 15 . All experimental results were evaluated using box plots with the following four metrics: ACC, MCC, sensitivity, and specificity.
All experiments were performed on a computer with a 3.40 GHz Intel ® Core i7 processor and 16 GB RAM; the coding was conducted using Matlab2018b.

Measures for Performance Evaluation
For a binary classification problem, the actual values are only positive and negative, and the actual predicted results will only have two values, 0 and 1. If an instance is positive and is predicted to be positive, it is true positive; if it is negative and is predicted to be positive, it is false positive, and if it is negative and is predicted to be negative, it is a true negative, and if it is positive and is predicted to be negative, it is a false negative. The most widely used classifications based on the above are ACC, MCC, sensitivity, and specificity [106,107], which are used to assess the quality of the binary classification and evaluate the proposed method's performance.
The MCC is essentially a correlation coefficient between the actual classification and the predicted classification, which can range from a value of 1, which indicates a perfect prediction of the subject, to a value of 0, which indicates that the prediction is not a good random prediction, and −1, which means that the predicted classification and the actual classification do not agree at all. Sensitivity (also known as true positive rate) is the proportion of samples that are positive that are judged to be positive, and specificity (also known as true negative rate) is the proportion of samples that are actually negative that are judged to be negative.    It is clear from Figures 6-8 that the SMFO-KELM classifier generally outperforms the other competing models in comparing classical optimized and advanced algorithmic classifiers, since the SMFO optimizer that was used has the highest optimization power. All the experimental results can be viewed in Appendix Table A2. The improved KELM using It is clear from Figures 6-8 that the SMFO-KELM classifier generally outperforms the other competing models in comparing classical optimized and advanced algorithmic classifiers, since the SMFO optimizer that was used has the highest optimization power. All the experimental results can be viewed in Appendix A Table A2. The improved KELM using HGWO does not achieve champion classification performance; nevertheless, it is a secondtop optimizer in the competition. As per the features of the proposed model, the efficacy of the proposed MFO method can be further investigated in dealing with more complex problems, such as social recommendation and QOS-aware service composition [108][109][110], energy storage planning and scheduling [111], image editing [112][113][114], service ecosystem [115,116], epidemic prevention and control [117,118], active surveillance [119], large scale network analysis [120], pedestrian dead reckoning [121], and evaluation of human lower limb motions [122].

Conclusions
The use of swarm intelligence optimization algorithms to optimize machine learning parameters is becoming more widely used in the study of classification problems. These types of machine learning algorithms perform better than the original machine learning models. This paper proposes a robust and accurate machine learning method, SMFO-KELM, effectively solving the soil erosion prediction problem. The model's main idea is to apply a new and improved MFO algorithm, SMFO, by optimizing the penalty parameter c of the KELM and the generalization capability of the kernel parameter γ classifier. The improved SMFO is proposed after integrating the positive cosine mechanism in the original MFO. This approach provides higher performance, in terms of consistency in global optimization, improves the balance between exploration and exploitation, and increases the convergence speed.
From the results of the experiments in this paper, it can be concluded that, for the discrete soil data classification problem, the SMFO-KELM model is significantly superior compared to the MFO-KELM model. It can be observed that the positive cosine strategy that was used by SMFO in the improvement strategy, has a positive effect on optimizing the kernel limit learning machine parameter optimization; comparing this with other algorithms, such as BA-KELM, CLOFOA-KELM, IGWO-KELM, and OBLGWO-KELM models, it can be observed that SMFO-KELM outperforms several other classifier models in solving soil erosion classification problems in four commonly used performance metrics. Therefore, it can be derived that the usability of SMFO-KELM has been extended, and the proposed method can be considered as a valuable early warning tool for soil erosion prediction systems, helping land management agencies to make scientifically accurate de-

Conclusions
The use of swarm intelligence optimization algorithms to optimize machine learning parameters is becoming more widely used in the study of classification problems. These types of machine learning algorithms perform better than the original machine learning models. This paper proposes a robust and accurate machine learning method, SMFO-KELM, effectively solving the soil erosion prediction problem. The model's main idea is to apply a new and improved MFO algorithm, SMFO, by optimizing the penalty parameter c of the KELM and the generalization capability of the kernel parameter γ classifier. The improved SMFO is proposed after integrating the positive cosine mechanism in the original MFO. This approach provides higher performance, in terms of consistency in global optimization, improves the balance between exploration and exploitation, and increases the convergence speed.
From the results of the experiments in this paper, it can be concluded that, for the discrete soil data classification problem, the SMFO-KELM model is significantly superior compared to the MFO-KELM model. It can be observed that the positive cosine strategy that was used by SMFO in the improvement strategy, has a positive effect on optimizing the kernel limit learning machine parameter optimization; comparing this with other algorithms, such as BA-KELM, CLOFOA-KELM, IGWO-KELM, and OBLGWO-KELM models, it can be observed that SMFO-KELM outperforms several other classifier models in solving soil erosion classification problems in four commonly used performance metrics. Therefore, it can be derived that the usability of SMFO-KELM has been extended, and the proposed method can be considered as a valuable early warning tool for soil erosion prediction systems, helping land management agencies to make scientifically accurate decisions.
In addition, soil erosion prediction models can be combined with other optimization algorithms, and the SMFO used can also be used for parameter tuning of other machine learning models, such as KNN, support vector machine, and convolutional neural networks, and can also be applied to deal with pest and disease image segmentation, feature selection problems. Other potential applications, such as fertilizer effect function optimization, reservoir regulation optimization, and combined irrigation and groundwater optimal allocation, are also exciting topics for green sustainability in agricultural engineering. More agricultural engineering optimization problems will continue to be investigated in the future.

Institutional Review Board Statement:
The study has not involved humans or animals.

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data of the study can be gotten from the published paper.

Acknowledgments:
We acknowledge the comments of the editor and anonymous reviewers that enhanced this research significantly.

Conflicts of Interest:
The authors declare no conflict of interest.