Bio-Inspired Hybridization of Artiﬁcial Neural Networks: An Application for Mapping the Spatial Distribution of Soil Texture Fractions

: Soil texture and particle size fractions (PSFs) are a critical characteristic of soil that inﬂuences most physical, chemical, and biological properties of soil; furthermore, reliable spatial predictions of PSFs are crucial for agro-ecological modeling. Here, series of hybridized artiﬁcial neural network (ANN) models with bio-inspired metaheuristic optimization algorithms such as a genetic algorithm (GA-ANN), particle swarm optimization (PSO-ANN), bat (BAT-ANN), and monarch butterﬂy optimization (MBO-ANN) algorithms, were built for predicting PSFs for the Mazandaran Province of northern Iran. In total, 1595 composite surﬁcial soil samples were collected, and 64 environmental covariates derived from terrain, climatic, remotely sensed, and categorical datasets were used as predictors. Models were tested using a repeated 10-fold nested cross-validation approach. The results indicate that the hybridized ANN methods were far superior to the reference approach using ANN with a backpropagation training algorithm (BP-ANN). Furthermore, the MBO-ANN approach was consistently determined to be the best approach and yielded the lowest error and uncertainty. The MBO-ANN model improved the predictions in terms of RMSE by 20% for clay, 10% for silt, and 24% for sand when compared to BP-ANN. The physiographical units, soil types, geology maps, rainfall, and temperature were the most important predictors of PSFs, followed by the terrain and remotely sensed data. This study demonstrates the effectiveness of bio-inspired algorithms for improving ANN models. The outputs of this study will support and inform sustainable soil management practices, agro-ecological modeling, and hydrological modeling for the Mazandaran Province of Iran.


Introduction
Soil texture is one of the most critical physical properties of soil and is classified based on the relative proportions of sand, silt, and clay. These individual particle-size fractions

Study Area
The Mazandaran Province located in the northern region of Iran, within 50 • 31 21 E to 53 • 56 52 E longitudes and 36 • 38 06 N to 36 • 54 59 N latitudes, has an areal extent of 23,833 km 2 ( Figure 1). Forestlands, rangelands, croplands, and orchards are the dominant landcover types, occupying 46%, 24%, 10% and 4% of the total area, respectively. The soils of the province are mostly fertile and, due to the suitable climatic conditions, play a significant role in producing various crops (e.g., rice, wheat, and corn) and fruit trees (e.g., citrus and kiwi) in Iran. The elevation ranges from <5 m above mean sea level in the northern region, along the shoreline of the Caspian Sea, to >3000 m above mean sea level in the southern region, along the Alborz Mountain chains. According to the De Martonne methodology for climate classification [36], the Mazandaran Province is described as having a mixture of very humid, humid, Mediterranean, and semi-humid climatic conditions. Within the province, the rainfall in the eastern part is about 400 mm and increases along a westward gradient to 1400 mm; hence, the soil moisture regime is classified as xeric, ustic, and udic along that same gradient. The predominant soil temperature regime class is classified as being thermic and followed by mesic and cryic [37]. The most important soil environmental factor, which influences the soil development and variability at the provincial scale, is climate, followed by the relief, parent material, and vegetation, which control local-scale soil variation [38,39]. The dominant physiographic unit in this landscape are mountains (76%), followed by piedmont plains (6%), flood plains (6%), flood basins (5%), hills (2%), alluvial plains (2%), plateaus (0.2%), and alluvial fans (0.2%); in addition, miscellaneous land types occupy 3% of the total area. The main soil orders include Mollisols (39%), Entisols (17%), Inceptisols (7%), Alfisols (5%), and Ultisols (0.002%) and the remaining parts of the landscape consist of rocky outcrops, badlands, and waterbodies (32%).

Soil Data
The soil texture data for this study were acquired by the Mazandaran Agricultural Research, Education and Extension Organization (AREEO) in Sari, Iran and consists of 1595 surficial soil samples collected between 2014 and 2018. All samples were collected from the 0-20 cm depth increment and their corresponding geographical coordinates were recorded using a global positioning system (GPS) device. Apart from those samples that were randomly sampled in uncultivated areas, more than half of the samples were acquired from croplands and orchards using a gridded sample design with a 2 km × 2 km grid spacing ( Figure 1). All samples were air-dried and passed through a 2 mm sieve for measuring the PSFs using the Bouyoucos hydrometer method [40]. A computer program was used for converting the PSFs to soil textural classes based on the classification system of the United States Department of Agriculture (USDA) [41].

Soil Data
The soil texture data for this study were acquired by the Mazandaran Agricultural Research, Education and Extension Organization (AREEO) in Sari, Iran and consists of 1595 surficial soil samples collected between 2014 and 2018. All samples were collected from the 0-20 cm depth increment and their corresponding geographical coordinates were recorded using a global positioning system (GPS) device. Apart from those samples that were randomly sampled in uncultivated areas, more than half of the samples were acquired from croplands and orchards using a gridded sample design with a 2 km × 2 km grid spacing ( Figure 1). All samples were air-dried and passed through a 2 mm sieve for measuring the PSFs using the Bouyoucos hydrometer method [40]. A computer program was used for converting the PSFs to soil textural classes based on the classification system of the United States Department of Agriculture (USDA) [41].

Environmental Covariates
To represent the possible pedogenetic and hydro-ecological process involved in controlling the spatial distribution of PSFs, a large set of environmental covariates was generated for Mazandaran Province. The four main data sources included terrain-derived at-

Environmental Covariates
To represent the possible pedogenetic and hydro-ecological process involved in controlling the spatial distribution of PSFs, a large set of environmental covariates was generated for Mazandaran Province. The four main data sources included terrain-derived attributes, RS reflectance data, climatic characteristics, and digitized polygon maps. As tabulated in Table A1, 64 environmental covariates that represented the SCORPAN factors [42] were used to predict soil PSFs.
Geologic, soil taxonomic, and physiographic maps in the polygon format were included as categorical covariates and the remaining 61 variables were based on continuous data. Using the SAGA GIS software, 18 topographic covariates were derived from a digital elevation model (DEM) at a 30 m spatial resolution. The terrain attributes included slope, aspect, catchment area, catchment slope, channel networks base level, elevation, flow accumulation, multi-resolution valley bottom flatness index (MrVBF) [43], topographic openness (NegOpen and PoOpen), plane curvature, relative slope position, slope-length (LS) factor, topographic wetness index (TWI), total insolation, upslope curvature, valley depth, vertical distance to channel networks, and wind effect.
All environmental covariates were aggregated (average resampling) or disaggregated (bilinear resampling) to a common grid of a 30 m × 30 m spatial resolution and rescaled using Z-score standardization. With respect to the categorical covariates, they were all transformed into dummy variables.

Selection of Environmental Covariates
A feature selection technique was applied to identify the most suitable covariates for implementation in the model [54]. Here, a correlation-based feature selection method, which was fully automated. Here, an optimal covariate should be highly correlated with the soil response variables and not be correlated to any other relevant covariates [55]. The correlation-based feature selection method consists of two steps: (1) the selection of relevant covariates; and (2) the identification and removal of irrelevant covariates from the original database [55]. The correlation-based feature selection method removed more than half of the environmental covariates and selected only 23 covariates. It should be noted that the calibration and testing of the ML models was carried out based on the 23 selected environmental covariates.

Predictive Models
The proposed ANN-based algorithms, which are among the most efficient neural computing methods, include genetic algorithm (GA-ANN), particle swarm optimization (PSO-ANN), bat (Bat-ANN), and monarch butterfly optimization (MBO-ANN). The training for ANN was performed with various evolutionary algorithms to unveil the most efficient method. A multi-layered feed-forward perceptron neural network with three main layers (one input, one hidden, and one output) was used to identify the non-linear relationships between PSFs and environmental covariates. The neurons between the three layers are interconnected weighted connection (i.e., synapses) [56]. By running the ANN model, the weighted predictors from an input layer are added to a bias coefficient, and the results pass through the transfer function in the hidden and output layers to produce an output (i.e., PSFs). Figure 2 depicts a flowchart of the training process via the proposed hybridization methods. In this study, 2-20 neurons in the hidden layers were tested and the best architecture was selected considering the lowest mean squared error (MSE). In brief, the proposed ANNs consist of 23 neurons (i.e., covariates) in the input layer and three neurons (i.e., clay, silt, and sand contents) in the output layer. Sigmoid functions were used as the transfer functions in the hidden layers and softmax transfer functions were used in the output layers. sum up to 100%. The strict requirement of summing to 100% is not met in most unsampled locations without pre-treatment of the PSFs data. Therefore, in this study, the predicted sand, silt, and clay fractions in the output layer of ANNs are passed through a softmax transfer function to address the compositional constrains [12]. The softmax transfer function takes, as input, a vector (i.e., the predicted clay, silt, and sand fractions), and normalizes it into values ranging from 0 to 1. whereby the vector adds up to 1. Finally, the returned values from the softmax transfer function are multiplied by 100. The local optima, convergence rate, and training time of the neural networks are greatly affected by their weights and biases, thus resulting in the need for optimizing the ANN parameters during the iteration process [57]. The hybridization of ANN with bioinspired, optimization algorithms has the potential to be a powerful tool, which may overcome these challenges [28,57]. Hence, four bio-inspired, metaheuristic algorithms, namely, GA, BAT, PSO, and MBO, were trained to predict the PSFs as the target variables. Table 1 shows the hyper-parameter settings of the four metaheuristic algorithms, which were hybridized with ANN. The performances of these bio-inspired training algorithms were compared with the commonly used backpropagation algorithm (BP-ANN) for predicting PSFs. A brief description of each training algorithms is provided in the following subsections. The soil texture data are multivariate data with clay, silt, and sand fractions that must sum up to 100%. The strict requirement of summing to 100% is not met in most unsampled locations without pre-treatment of the PSFs data. Therefore, in this study, the predicted sand, silt, and clay fractions in the output layer of ANNs are passed through a softmax transfer function to address the compositional constrains [12]. The softmax transfer function takes, as input, a vector (i.e., the predicted clay, silt, and sand fractions), and normalizes it into values ranging from 0 to 1. whereby the vector adds up to 1. Finally, the returned values from the softmax transfer function are multiplied by 100.
The local optima, convergence rate, and training time of the neural networks are greatly affected by their weights and biases, thus resulting in the need for optimizing the ANN parameters during the iteration process [57]. The hybridization of ANN with bio-inspired, optimization algorithms has the potential to be a powerful tool, which may overcome these challenges [28,57]. Hence, four bio-inspired, metaheuristic algorithms, namely, GA, BAT, PSO, and MBO, were trained to predict the PSFs as the target variables. Table 1 shows the hyper-parameter settings of the four metaheuristic algorithms, which were hybridized with ANN. The performances of these bio-inspired training algorithms were compared with the commonly used backpropagation algorithm (BP-ANN) for predicting PSFs. A brief description of each training algorithms is provided in the following subsections. The BP neural network, which uses the Levenberg-Marquardt algorithm, is the simplest in terms of implementation and is also the most popular approach used for training neural networks [58]. BP is a type of supervised learning algorithm, which adjusts the weights and biases according to the target value(s). BP starts by randomly selecting initial weights and then comparing the outputs with the given target value(s) to calculate the difference in MSE [59]. The errors are then backpropagated through the earlier layers via a negative gradient descent direction for adjusting the weights and bias until at least one stopping criteria is reached. The maximum number of epochs and the MSE of the network output for each target PSFs were the two stopping criteria and were set as 500 and 0 in this study, respectively. It should be noted that the BP algorithm is a type of hill-climbing algorithm, whereby the algorithm may get "trapped" in local minima; furthermore, BP algorithms also tend to overfit the data [56].

Genetic Algorithm (GA)
A GA is frequently used for hybridization with an ANN. This algorithm mimics the process of natural selection and the survival-of-the-fittest for optimizing the weights and bias of neural network connections [60]. GA commences with a random population with individual chromosomes as solutions, whereby the genes of the chromosomes are the random values. Each solution/generation's conditions are calculated with fitness values to select the parents to produce the children in the next solution/generation. Upon sequential generations, the chromosomes of the population tend to achieve the best solutions. The crossover and mutation are two operators mostly used by the GA algorithm to generate new offspring. In this study, the hyperparameter settings are described in Table 1.

Particle Swarm Optimization (PSO)
PSO is a metaheuristic algorithm, which was first proposed in 1995 [61]. It is modeled based on the swarming nature of insects and birds. By tuning the model parameters, PSO algorithms are used for hybridization with different machine learning techniques-Remote Sens. 2021, 13, 1025 8 of 23 particularly with ANN models. To optimize an ANN with PSO, a recently outlined procedure was used [62]. PSO starts with random populations for a given fitness function. The individual solutions are "particles" in a population with random velocities and positions. The fitness value of particles is then calculated to get the required value for the best fitness function. Particles adjust their movement to the best local position with the best fitness, accompanied by other particles' best position in global optimization. Here, the local velocity adjustment of particles on their best fitness and the other particles' experience is considered in the PSO algorithm. The first step of the algorithm terminates following the mass movement. This process is repeated until the stopping criteria are met. By changing the particle's position and velocity from their solution space, PSO optimizes the weights and biases of ANN [57]. The cost function (fitness function) of the ith particle can be defined in the terms of RMSE in Equation (1): where, wi is the weights; bi is the biases; E is the cost value; Tpls is the target value; Ppls is the predicted output; O is the neuron numbers; and S is the number of training samples.

Bat Algorithms (BAT)
The BAT algorithm is inspired by the echolocation ability of bats in finding their prey with global optimization [63]. The BAT algorithm begins the local position's optimization with random fly followed by the sequential iterations to achieve the best local solution along with the optimum global location. Bats use echolocation for sensing the distances in search space for a global optimum. In this algorithm, bats at a position, X i , fly randomly with velocity, V i , to emit pulses at a frequency, f i , loudness, A i , and emission rate, r i . In this study, the minimum (f min ) and maximum (f max ) frequencies were set as 0 and 1, respectively. Bats automatically adjust f i , and r i is changed based on the adjacency of their prey. A i changes between 1, as a large positive value (A 0 ), and 0 as a minimum value (A min ). The rules apply an initial population of bats to update X i and V i with initial random values of f i for n virtual bats to select the best solution in local search with following equations: where 8is a vector with randomly generated values (0 to 1) and X best is the optimal local position of current bats. Subsequently, the new solution is generated using a random walk algorithm that selects the best local solution for a globally-optimal position of the bat in each iteration and balancing the A i (loudness) and r i (pulse emission rate) [63,64]. The algorithm initiates by randomly selecting an initial population of bats, and the weights and biases in an ANN. Afterward, the BAT algorithms assign the best solution for the output of the trained ANN. Subsequently, by searching in local space, a new solution is generated. Each local space with a satisfactory new solution will be substituted with an optimal global solution. These procedures are repeated until the maximum number of iterations (e.g., 100) is met and whereby the optimal values of weights and biases are found.

Monarch Butterfly Optimization (MBO) Algorithm
The MBO algorithm is a global-solving, metaheuristic algorithm with a high convergence rate for high-dimensional data [65] and was initially proposed in 2019 [66]. This algorithm mimics the migratory behavior of individual monarch butterflies between northern USA (Land1) to southern Mexico (Land2). MBO initiates with random generation of the population to form a local solution. Afterward, all populations are ranked by fitness Remote Sens. 2021, 13, 1025 9 of 23 functions. The ranked populations are separated into subpopulations, identified as Land1 and Land2, to consider the pre-defined fixed ratio of population p. The original population is kept unchanged by these two operators independently and where the objective is to simultaneously minimize the fitness functions. The migration operator generates the new candidate solutions/individuals and is adjusted by the migration ratio using the following equation: where x i is the position of a butterfly i; t is the generation number; k is the element of x i at a generation of t + 1; r 1 is the randomly generated position in Land1; r 2 is the randomly generated position in Land2; p is the ratio of butterfly in Lands; and r i is equal to the product of the period (peri) parameter, which was set to 1.2, and a random number (rand) between 0 and 1. The butterfly adjusting operator tunes the positioning behavior of other butterflies, j, in global best positions using the following equation: where x best is the best monarch butterfly position in Land1 and Land2; t is the current generation number; r 3 is the randomly generated from Land2; BAR is the butterfly adjusting ratio; dx k is kth element of the walk step; and α= (S max /t 2 ) is the weighing factor with maximum walk step (S max ). The models' weights and biases are encoded as the k elements of a D-dimensional vectors. In the process of optimization using MBO, the initial random weights and biases are updated at each generation with the best position of the monarch butterflies that achieve the optimal fitness functions. Here, MBO calculates the best weights and biases by allocating the best monarch butterflies' positions in a forward path and obtains the mean square error of fitness functions for ANN training. This process periodically continues until the lowest RMSE of fitness is achieved or the maximum generation (e.g., 100) is met.

Accuracy Assessment and Uncertainty Analysis
For evaluating the performances of the hybridized ANN models in predicting the PSFs for Mazandaran Province, the soil data were randomly partitioned into two sets. Here, 70% of the data were randomly selected to calibrate the ANN models, and the remaining 30% of the data were used to test the models and determine the accuracy metrics. When training the ANN models, the calibration dataset was further subdivided into training and validating datasets, whereby 80% of the data were used to train the ANN models, and the remaining 20% were used to validate the model. The nested cross-validation process was repeated 10 times to account for the effects of the internal (bootstrapping) and external randomness within the modeling procedure, thereby ensuring stability in the accuracy metrics [67].
The metrics used to evaluate model accuracy included mean absolute error (MAE), RMSE, coefficient of determination (R 2 ), and Lin's concordance correlation coefficient (CCC), which are formulated as follows: where n is the number of samples in the cross-validation dataset; Mi and Pi are the measured and predicted PSFs for each sampling points, respectively; M and P are the means for the measured and predicted PSFs values, respectively; and σ M and σ p are the variances of measured and predicted PSFs values. The model with the highest R 2 and CCC and the lowest MAE and RMSE values was determined to be the best hybridization of ANN for predicting PSFs and soil textural classes.
For analyzing the uncertainty of hybridized ANN, the PSF maps generated by each iteration of the models were used to calculate the mean and standard deviation (SD) of the PSF for each pixel. Then, the confidence interval (CI) was calculated as the prediction mean ± 1.64SD (standard deviation) for the given 90% CI. It should be noted that only the mean of the PSF contents for each pixel is presented. Table 2 shows the descriptive statistics of the measured topsoil sand, silt, and clay contents for the study area. The silt contents were the highest in this area, with an average of 45.2%, whereas the clay and sand contents were 31.4% and 23.4%, respectively. Considering the high silt contents, the soils were typically classified as clay loam, according to the USDA system of soil texture classification. The coefficient of variation (CV) was the highest for sand values (63%), which indicates high heterogeneity with a wide range of sand spread across Mazandaran Province; in contrast, clay and silt had a CV of 39% and 25%, respectively. The sand and clay values had high variability over the study area with CV > 35% [68], while the silt values had moderate variability (15% < CV < 35%). Regardless of the PSFs, the mean and medium values were similar, which suggests that the PSFs closely resembled a normal distribution and was further confirmed by the low skewness and kurtosis. The distribution of the soil textural classes for all sample points, according to the USDA soil texture triangle, is plotted in Figure 3. Here, the plot indicates that there was a wide range of soil texture classes over the study area (Figure 3), which included eight out of the 12 possible texture classes: sand, loamy sand, sandy loam, loam, silt loam, silt, sandy clay loam, clay loam, silty clay loam, sandy clay, silty clay, and clay.

Summary Statistics
The high variability in PSFs could be attributed to the variability in parent materials, topography, elevations, soil weathering rates, land uses and management practices, and other environmental elements [11]. The selective removal of particles that were <63 μm in diameter on steep slopes in the study area via soil erosion processes may have been a controlling factor on the accumulation of finer soil particles in the low-lying areas [69].

Accuracy Assessment and Uncertainty Analysis
The performances of the five ANN models (i.e., BP-ANN, GA-ANN, PSO-ANN, Bat-ANN, and MBO-ANN) for predicting three components of PSFs were compared in terms of MAE, RMSE, R 2 , and CCC. Once the ANN models were trained using the aforementioned algorithms, the optimal weights and biases of each input covariates were assigned to predict the PSFs [57,70]. Table 3 indicates the mean values of the accuracy metrics, calculated using the testing data from the repeated, nested 10-fold cross-validation procedure. As shown in Table 3, the accuracy of the ANN models varied amongst the different optimization learning algorithms. In terms of R 2 and CCC statistics, MBO-ANN was the most effective in predicting clay contents with mean accuracy values of MAE=5.6%, RMSE = 6.4%, R 2 = 0.68 and CCC = 0.69. In comparison, BP-ANN, the reference approach, had substantially lower R 2 (0.50) and CCC (0.51) and slightly higher MAE (6.5%) and RMSE (8%). Furthermore, BP-ANN was also considerably outperformed by the GA-ANN, PSO-ANN, and Bat-ANN models for clay predictions with CCC values of 0.68, 0.62, and 0.66, respectively. Across all ANN hybridizations, there were considerable gains in CCC values for clay predictions, whereby an increase of 35%, 33%, 29%, and 22% was observed for The high variability in PSFs could be attributed to the variability in parent materials, topography, elevations, soil weathering rates, land uses and management practices, and other environmental elements [11]. The selective removal of particles that were <63 µm in diameter on steep slopes in the study area via soil erosion processes may have been a controlling factor on the accumulation of finer soil particles in the low-lying areas [69].

Accuracy Assessment and Uncertainty Analysis
The performances of the five ANN models (i.e., BP-ANN, GA-ANN, PSO-ANN, Bat-ANN, and MBO-ANN) for predicting three components of PSFs were compared in terms of MAE, RMSE, R 2 , and CCC. Once the ANN models were trained using the aforementioned algorithms, the optimal weights and biases of each input covariates were assigned to predict the PSFs [57,70]. Table 3 indicates the mean values of the accuracy metrics, calculated using the testing data from the repeated, nested 10-fold cross-validation procedure. As shown in Table 3, the accuracy of the ANN models varied amongst the different optimization learning algorithms. In terms of R 2 and CCC statistics, MBO-ANN was the most effective in predicting clay contents with mean accuracy values of MAE = 5.6%, RMSE = 6.4%, R 2 = 0.68 and CCC = 0.69. In comparison, BP-ANN, the reference approach, had substantially lower R 2 (0.50) and CCC (0.51) and slightly higher MAE (6.5%) and RMSE (8%). Furthermore, BP-ANN was also considerably outperformed by the GA-ANN, PSO-ANN, and Bat-ANN models for clay predictions with CCC values of 0.68, 0.62, and 0.66, respectively. Across all ANN hybridizations, there were considerable gains in CCC values for clay predictions, whereby an increase of 35%, 33%, 29%, and 22% was observed for MBO-ANN, GA-ANN, Bat-ANN, and PSO-ANN, respectively, when compared to the reference BP-ANN model. These results indicate that the bio-spired algorithms hybridized with neural network greatly enhanced the accuracy of clay prediction. Similar trends of R 2 , CCC, MAE, and RMSE values were observed for the silt and sand predictions. With these PSFs, the results also showed that the BP-ANN model had the lowest R 2 and CCC and highest errors (MAE and RMSE). Again, the MBO-ANN model resulted in silt and sand predictions that had the highest accuracy across all five hybridized models. Furthermore, the improvement of prediction using the hybridization of ANN with bio-inspired algorithms was most pronounced for the silt prediction in comparison to the clay and sand predictions. With respect to the silt predictions, the MBO-ANN algorithm was an improvement on the BP-ANN algorithm by increasing R 2 from 0.32 to 0.62 (93.8% increase) and CCC from 0.25 to 0.65 (160% increase) and decreasing MAE from 5.76% to 4.81% (16.5% decrease) and RMSE from 7.34% to 6.60% (10.1% decrease). Again, similar to the clay predictions, MBO-ANN was the most effective bio-inspired hybridization of ANN followed by GA-ANN, PSO-ANN, and Bat-ANN, respectively.
With respect to the sand predictions, the MBO-ANN model was, again, the most effective approach amongst the hybridized models, yielding the highest R 2 (0.70) and CCC (0.71) values and lowest RMSE (6.59%) and MAE (5.74%) values. In comparison with the traditional BP-ANN model, the bio-inspired hybridized algorithms improved the sand predictions across all accuracy metrics when compared to BP-ANN. Specifically, with respect to CCC, the improvements for the MBO-ANN, GA-ANN, Bat-ANN, and PSO-ANN were 45%, 43%, 29%, and 29%, respectively.
When evaluating the uncertainty estimates, ideally, 90% of test data with corresponding observed values should fall within the confidence interval (CI) range for each of the PSFs (Table 4). Here, the evaluation of the uncertainty estimates showed a similar trend with respect to how the various ANN models performed, whereby the MBO-ANN results were the most optimal in generating uncertainty estimates because 85%, 83%, and 88% of the clay, silt, and sand observations, respectively, fell within the defined CI. Conversely, the bootstrapping of the BP-ANN model was the least effective in generating uncertainty estimates because only 78%, 72%, and 81% of the observed values fell within the 90% confidence interval range.

Covariate Importance
The weight values assigned to each input parameter by the ANN model were used to identify the most important covariates for predicting the PSFs. In this study, the relative importance (RI) of the covariates, were categorized as follows: weakly relevant (RI < 2%), slightly relevant (2% < RI < 4%), moderately relevant (4% < RI < 6%), and strongly relevant (RI > 6%). The RI values of all covariates are shown in Figure 4, where it shows that there was a wide range of RI amongst the covariates when predicting PSFs.

Covariate Importance
The weight values assigned to each input parameter by the ANN model were used to identify the most important covariates for predicting the PSFs. In this study, the relative importance (RI) of the covariates, were categorized as follows: weakly relevant (RI<2%), slightly relevant (2%<RI<4%), moderately relevant (4%<RI<6%), and strongly relevant (RI>6%). The RI values of all covariates are shown in Figure 4, where it shows that there was a wide range of RI amongst the covariates when predicting PSFs.  Table A1 for covariate codes).
The number of covariates with weakly, slightly, moderately, and strongly relevant contribution for PSFs prediction are shown in Table 5. The most critical factor controlling the PSFs was represented by the categorical data, which represented the study area's soil, geologic, and physiographic variability. These covariates were then followed by climatic, terrain-derived, and RS-derived data, respectively. The categorical maps, along with the two climatic datasets (rainfall and temperature), were the most strongly relevant covariates across all PSFs predictions for the study area.   Table A1 for covariate codes).
The number of covariates with weakly, slightly, moderately, and strongly relevant contribution for PSFs prediction are shown in Table 5. The most critical factor controlling the PSFs was represented by the categorical data, which represented the study area's soil, geologic, and physiographic variability. These covariates were then followed by climatic, terrain-derived, and RS-derived data, respectively. The categorical maps, along with the two climatic datasets (rainfall and temperature), were the most strongly relevant covariates across all PSFs predictions for the study area.
The importance of the covariates for predicting silt content were also similar, whereby the climatic data and categorical maps were ranked the highest and yielded a mean RI of 8.4% and 8.2%, respectively. The importance of these data sources was then followed by terrain-derived data, which had a mean RI of 4.1% and the remote sensing data, which had a mean RI of 3.0%, 2.6%, and 2.5% for the Sentinel-2-, MODIS, and Landsat-derived data, respectively. Similar to the clay predictions, the physiographic map was determined to be the most important covariate for silt predictions with RI = 9.8%, and followed by rainfall (8.4%), temperature (8.3%), geologic map (7.8%), soil map (7.0%), and MRVBF (6.1%), respectively ( Figure 4).
The ranking of the mean RI values by data source for the sand predictions were the same as for the clay predictions where the categorical maps had the highest mean RI of 8.5% and was subsequently followed by the climatic data (7.9%), terrain-derived data (3.9%), and remote sensing data, where the MODIS-, Sentinel-2-, and Landsat-derived data had a mean RI of 3.4%, 2.9%, and 2.5%, respectively. Again, the physiographic map, geologic map, temperature, rainfall, soil type map, MRVBF, and TWI were the most important covariates for sand predictions with the RI values of 9.6%, 8.6%, 7.9%, 7.8%, 7.2%, 5.5%, and 5.4%, respectively (Figure 4).

Spatial Prediction of Soil PSFs
The continuous map of clay, silt, and sand fractions and USDA soil texture classes for the whole region was produced using the MBO-ANN model due to its superior performance in comparison to the other approaches ( Figure 5). Furthermore, the MBO-ANN model, when coupled with the softmax transfer function in its output layer guarantees the summation of PSFs to 100%, which is necessary for precise predictions of PSFs (see the Supplementary Materials). Based on a visual assessment of the predictions, the silt contents were relatively higher compared with the clay and sand contents along the central strip of the study area. The predicted PSF maps, produced using the MBO-ANN model, were also reclassified in accordance to the USDA soil texture classification system ( Figure 5).  Eight main USDA textural classes were identified: clay, silty clay, and silty clay loam classes were mapped in the northeastern region; the silty clay loam class dominated the central region; the silt loam, loam, and sandy loam texture classes dominated the southern regions of the province; and sandy clay loam soil texture class had a limited distribution in the study areas ( Figure 5).
With respect to the predicted clay contents, there was a relatively higher abundance of clay in the northeastern region of the province, which corresponds with a flood basin on the low-lying area and where the fine-textured materials are dominant. Here, the clay contents were >40% and were mainly distributed at low elevations (<10 m) and where fine-textured parent materials are present ( Figure 5).
In comparison, the sand content was low in the northeastern; however, was it substantially higher in the southern and southeastern regions, where the soils are mostly degraded and the coarse-textured parent materials are exposed. In addition, the thin coastline (about 10-20 m from shorelines), adjacent to the Mazandaran Province, includes a sand dune that was omitted from this study given that these areas were not mapped as soils by surveyors. With respect to the silt predictions, the map showed highest amounts of silt along the central region of the study area along an east to west direction. This area of high silt contents corresponds with the presence of a loess deposit and loess-derived soils, which are highest in the eastern region of the province [71].

Environmental Covariates
The categorical maps, along with the two climatic datasets (rainfall and temperature), were the most strongly relevant covariates across all PSFs predictions for the study area. Soil taxonomic type, represented as great soil groups, was strongly relevant in explaining the total variation of PSFs soil in Mazandaran Province. The soil survey data were traditionally derived from the physiographic units in Iran and provide information directly related to the spatial patterns of PSFs; furthermore, the geological map also provided information on the soil parent material and its associated mineralogy-a key determinant of soil texture. By using these datasets as covariates, there is the potential to renew and improve upon existing soil textural and PSF maps for the Mazandaran Province using DSM approaches. The relationship between these datasets and PSFs was apparent in this study; for example, the spatial distribution of alluvial plains and flood basins were two physiographic units of the northeastern region of the study areas, which also coincided with the high clay contents that were observed. The information such as soil type and land cover could explain the considerable variation of PSFs [22,24,72,73]. Therefore, these results would seem to indicate that, for study areas where detailed and accurate soil maps exist, it would be beneficial to include them as a covariate in predicting PSFs [18]. Of the terrain-derived data, MRVBF was the most important-especially for the prediction of clay and silt. Here, MRVBF was effective in characterizing the low-lying terrains and the depositional features of the landscape, where fine particles tend to accumulate [43]. In cultivated alluvial/flood plains of low-lying areas, the use of land surface temperature data from MODIS was a promising covariate, particularly where the soils were clearly bare over a year [74].
RS-derived covariates were not strongly relevant for predicting PSFs in this study. The main reason was attributed to the effect of surficial soil moisture and roughness, the presence of vegetation canopy, and, in some cases, due to the low spatial resolution of the RS data [75,76]. All covariates derived from Landsat and MODIS imagery were classified as being weakly to slightly relevant covariates, whereas two covariates derived from the Sentinel-2 were moderately important. Specifically, the clay index was moderately relevant for predicting all PSFs, while NDVI was moderately relevant for predicting the clay and silt fractions (Table 4). With respect to the clay index of Sentinel-2, there was a negative relationship with soil clay content [77]. The soil texture mapping using Sentinel-2 data performance had a good discrimination for extremely different soil texture classes (e.g., sandy loam versus clay), whereas neighboring soil texture classes (e.g., sandy clay loam and clay class) had high uncertainty [75].
The weak relationship between the Landsat data and PSFs was most likely associated with lower spatial resolution of the images (i.e., 90 m) and the time of imagery acquisition [78]. A poor relationship between Landsat data and the sand and silt fractions was also reported for PSFs prediction in large-scale studies [79,80]. However, high correlations between clay contents and Landsat 7 images, particularly with Band 7, have been reported in other studies [81]. In comparison, the higher spectral and spatial resolution (10 m) of the Sentinel-2 data may have been the main reason the covariates derived from this data source had higher RI. Similar to this study, the covariates derived from Sentinel-2 data resulted in satisfactory/good predictors of clay contents for soils in the Czech Republic [82].
Although several variables were determined to not be important, accounting for them in the prediction process still has the potential to enhance the spatial predictions of PSFs [11,83,84]. PSFs have a complex, nonlinear hierarchal relationship between many covariates that were determined to be effective, and machine learning models are designed to discover these relationships. It could be concluded that the coarse-resolution categorical maps could potentially capture the average soil PSFs at large spatial extents and scales other covariates related to local terrain and vegetation would be more effective in capturing soil erosional and depositional processes.

Performance of Hybridized Models and Output Maps
The poorest model performance for the PSF predictions resulted from the BP-ANN (i.e., traditional implementation of ANN) and was substantially improved upon by using bio-inspired optimization algorithms. The MBO algorithm was the most effective approach within the studied bio-inspired algorithms while PSO was the lowest-performing hybridization for clay predictions. The prediction accuracy of clay in this large-scale study for the proposed model was better than that of other studies [20,85,86] ranging 9.2-12.2% for RMSE and 0.33-0.54 for R 2 . By considering the multispectral (VNIR-SWIR) ASTER data as covariates for mapping the surface clay content at large scales, the results yielded the R 2 of 0.61% [87] that is close but still lower than our findings. The prediction accuracy was close to another finding [20] with R 2 of 0.71 and RMSE of 7.3% but much better than many other studies [85,86,88,89]. The slightly lower model performance for predicting silt contents in the MBO-ANN model with lower R 2 and CCC compared with clay and sand contents may be due to the poorer relationship between the auxiliary covariates and silt content that is in agreement with other findings [24,85,86], which declared the low performance was due to high variability.
Regarding to the accuracy and uncertainty analysis, it was concluded that the hybridized ANN using bio-inspired algorithms resulted in a vast improvement in comparison to the standard reference approach using BP-ANN by potentially overcoming challenges such as the local minima and low convergence issues [90,91]. Consistently, the MBO-ANN model outperformed all other hybridizations when mapping PSFs, which was likely attributed to its ability to optimize the initial input parameters (e.g., thresholds and weights) for ANN by the MBO algorithm, thereby resulting in the suitable convergence of the neural network towards a global optimum solution and effectively capturing the nonlinear relationships between variables [92]. Although the MBO-ANN approach was able to handle the large set of input variables in predicting the spatial distribution of PSFs, its processing time was longer than that of BP-ANN; furthermore, the use of the other bio-inspired hybridizations with ANN were also longer, but acceptable. From this study, the increased computational demand and processing time was warranted given the level of improvement when compared to the BP-ANN approach, as evidenced by all accuracy metrics.
Theses soil textural class maps reflected the contribution of environmental covariates that were used to generate the PSFs prediction maps. These maps can also potentially improve land surface modeling for future studies. From a soil management perspective, information on the spatial distribution of clay in the northern region is important for decision makers in order to improve agricultural practices and the management of paddy fields and orchards. In particular, the application of potassium fertilizer is greatly dependent on the clay contents of the study area due to the ability of clay minerals to bind/fix the potassium in interlayers spaces and cause its unavailability for plant uptake.

Conclusions
Provincial-scale information on soil texture is crucial for the management of soil resources and agricultural production planning by decision makers. By using a large set of environmental covariates derived from terrain, remotely sensed, climatic, and categorical data as the main sources of data, series of hybridized ANN models were built for predicting PSFs and tested using a 10-fold cross-validation. The softmax transfer function was used in the output layer of the ANN models to normalize the predicted clay, silt, and sand contents, leading to the summation of PSFs to 100% in unsampled locations. Of the 64 environmental covariates used, categorical data, which consist of physiographic maps, maps of soil types, and geologic maps, were the most effective variables for predicting the spatial distribution of PSFs. The importance of these categorical inputs was subsequently followed by the climatic data, terrain data, and RS-derived covariates, respectively. Where available, existing map products generated using conventional approaches (i.e., existing categorical maps) could have the potential of being refined using DSM approaches for future mapping of PSFs at the provincial scale.
When compared to the reference ANN approach, BP-ANN, the bio-inspired optimization algorithms substantially improved the predictions of PSFs, whereby the results indicate that MBO-ANN resulted in the best performance across all accuracy metrics and PSFs. Furthermore, the MBO-ANN approach also resulted in the lowest uncertainty in comparison to BP-ANN. It could be concluded that the application of bio-inspired computational algorithms is a promising alternative to the conventional BP-ANN-especially given that the computational demands were not substantially greater. The high-resolution PSFs maps generated in this study can be used for integrated management of soils along with sustainable development strategies at the provincial scale. Furthermore. The maps can be used as inputs to support future environmental and hydrological modeling activities.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions. Brightness Index of Sentinel-2 Bright.In.S ((RED)2 + (NIR)2)0.5