Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods

: Spatially continuous soil thickness data at large scales are usually not readily available and are often di ﬃ cult and expensive to acquire. Various machine learning algorithms have become very popular in digital soil mapping to predict and map the spatial distribution of soil properties. Identifying the controlling environmental variables of soil thickness and selecting suitable machine learning algorithms are vitally important in modeling. In this study, 11 quantitative and four qualitative environmental variables were selected to explore the main variables that a ﬀ ect soil thickness. Four commonly used machine learning algorithms (multiple linear regression (MLR), support vector regression (SVR), random forest (RF), and extreme gradient boosting (XGBoost) were evaluated as individual models to separately predict and obtain a soil thickness distribution map in Henan Province, China. In addition, the two stacking ensemble models using least absolute shrinkage and selection operator (LASSO) and generalized boosted regression model (GBM) were tested and applied to build the most reliable and accurate estimation model. The results showed that variable selection was a very important part of soil thickness modeling. Topographic wetness index (TWI), slope, elevation, land use and enhanced vegetation index (EVI) were the most inﬂuential environmental variables in soil thickness modeling. Comparative results showed that the XGBoost model outperformed the MLR, RF and SVR models. Importantly, the two stacking models achieved higher performance than the single model, especially when using GBM. In terms of accuracy, the proposed stacking method explained 64.0% of the variation for soil thickness. The results of our study provide useful alternative approaches for mapping soil thickness, with potential for use with other soil properties.


Introduction
Soil thickness is considered to play an important role in numerous areas, such as soil structure and function [1], vegetation growth [2], land surface energy flux [3], hydrology [4] and ecological land classification [5]. Traditional soil sampling is laborious, time consuming and difficult to carry out at large scales [6]. Currently, most modeling works require continuous and quantitative spatial soil information [7,8].
average annual temperature ranging from 12.7° to 16.2° and an annual precipitation from 478 mm to 1 167 mm. The soil types consist of cinnamon soil, fluro-aquic soil, brown soil and yellow-cinnamon soil [42]. The plain basin and hilly areas account for a total of 55.7% and 44.3% of the region of Henan Province, respectively. Forest and farmland are the dominant land uses in the study area.

Soil Dataset
Soil thickness is defined, in this study, as the distance from surface to bedrock or material that contains >75% volume of >2 mm gravel [43]. Three available soil datasets were extracted from the work of Wei on soils of Henan Province [44,45]. Detailed information on the landscape, soil profile, soil type, land use and parent material about the sampling sites was employed. A total of 246 soil thickness sample points were collected. The soil thickness values ranged from 7 to 250 cm with a mean of 101.5 cm. The standard deviation and coefficient of variation were 47.5 cm and 46.8%, respectively. The results of the Shapiro-Wilk test (P > 0.05) indicated that soil thickness samples followed a normal distribution.

Environmental Factors
All quantitative and qualitative environmental covariates assumed to influence soil formation were selected (Tables 1 and 2), including topography, climate, organisms, parent material and soil properties.

Soil Dataset
Soil thickness is defined, in this study, as the distance from surface to bedrock or material that contains >75% volume of >2 mm gravel [43]. Three available soil datasets were extracted from the work of Wei on soils of Henan Province [44,45]. Detailed information on the landscape, soil profile, soil type, land use and parent material about the sampling sites was employed. A total of 246 soil thickness sample points were collected. The soil thickness values ranged from 7 to 250 cm with a mean of 101.5 cm. The standard deviation and coefficient of variation were 47.5 cm and 46.8%, respectively. The results of the Shapiro-Wilk test (p > 0.05) indicated that soil thickness samples followed a normal distribution.

Environmental Factors
All quantitative and qualitative environmental covariates assumed to influence soil formation were selected (Tables 1 and 2), including topography, climate, organisms, parent material and soil properties. Shuttle Radar Topography Mission (SRTM) digital elevation datasets at 1 arc second (30 m) spatial resolution are widely available from EarthExplorer. Five topographic attributes derived from SRTM DEM were used, including slope, aspect, topographic wetness index (TWI), pan curvature (PLC) and profile curvature (PRC). The System for Automated Geoscientific Analysis (SAGA) was used to derive these topographic attributes [46].

Climate
Both mean annual temperature (MAT) and mean annual precipitation (MAP) for the study area from 1980 to 2010 were obtained from China Meteorological Data Sharing Service System (http://data.cma.cn/). MAT and MAP were interpolated over a 250 m grid for 176 meteorological stations using the inverse distance weighting (IDW) interpolation procedure. Mean annual precipitation (MAP) ranges from 518 to 1378 mm, with the average being 766 mm. Mean annual temperature (MAT) ranges from 11.6 • to 16.1 • mm, with the average being 14.7 • .

Vegetation
The normalized difference vegetation index (NDVI), enhanced vegetation index (EVI) and leaf area index (LAI) were used to represent the vegetation conditions of the soil properties. Both NDVI and EVI data were extracted from the MOD13Q1 product from the Moderate-resolution Imaging Spectroradiometer (MODIS) with 250 m spatial resolution and 16-day composites. LAI data were extracted from MOD15A2 with a spatial resolution of 1000 m and 8-day composite. All data were obtained from the Earth Observing System Data and Information System (EOSDIS) (https: //search.earthdata.nasa.gov/). This study collected three vegetation products in the period from 2010-2015. First, monthly vegetation data were produced using the maximum value composite (MVC) method [47]. Then, the mean annual NDVI, mean annual EVI and mean annual LAI from 2001 to 2015 were calculated based on monthly vegetation data.

Geological and Soil Environment
We obtained the 1:1,500,000 geological resources and soil environment database of Henan Province from National Earth System Science Data Center (http://www.geodata.cn/). The four quantitative environmental variables used in this work included landform, parent material, soil texture, and land use ( Table 2). In order to have sufficient soil thickness samples in each land use category, land use was grouped into forest and farmland.
All the covariates were available as raster files. As a processing step, the covariates were resampled to match a 250 m grid using the nearest neighbor strategy. The spatial distributions of 15 environmental variables in this study area are presented in Figure 2.

Individual Machine Learning Techniques
Numerous machine learning algorithms for digital soil mapping have been provided and discussed [23]. Four commonly used machine learning techniques (MLR, SVR, RF and XGBoost) were compared as individual models in this study.

Multiple Linear Regression
Multiple linear regression (MLR) is one of the simplest machine learning methods to digitally create soil maps [48][49][50], due to its simplicity and efficiency in computation, and easy interpretation. This algorithm finds the best covariates to predict the primary variable (soil thickness) from the explanatory variables (environmental variables) by fitting a linear equation. The MLR was implemented using the lm function in R [51].

Support Vector Regression
Support vector machine (SVM) is a supervised learning method that has recently gained some popularity for predicting soil properties [52]. Support vector regression (SVR) is an extension of SVM, and is used as a regression technique. SVR generates an optimal separating hyperplane to differentiate classes that overlap and are not separable in a linear way. In this case, a large transformed feature space is created to map the data with the help of kernel functions to separate it along a linear boundary. More detailed explanations about SVR can be found in [53]. The commonly used radial basis kernel function was applied in this study. Two parameters of cost and sigma should be defined. After several experiments, the optimal set of cost and sigma is 0.5 and 0.07, respectively. The SVR was implemented using the R package e1071 version 1.7-3 [54].

Random Forest
Random forest (RF) is currently one of the most successful data mining methods in DSM [49]. The principle of RF, randomly selects two-thirds of the training dataset (bootstrapping) to build a decision tree, and the remainder of the training dataset is left out to test the model error using an out-of-bag (OOB) strategy. The RF defines the percent increase in the mean square error (%IncMSE) as the indicator to quantify the importance of the input variables. More detailed explanations about RF can be found in [53]. In RF modeling, the number of trees to grow in the forest (n tree ) and the number of variables used per tree (m try ) should be defined. One third of the total number of predictor variables for m try and n tree = 500 provided stable and visually meaningful results. The RF was implemented using the R package randomForest version 4.6-14 [55].
We obtained the 1:1,500,000 geological resources and soil environment database of Henan Province from National Earth System Science Data Center (http://www.geodata.cn/). The four quantitative environmental variables used in this work included landform, parent material, soil texture, and land use ( Table 2). In order to have sufficient soil thickness samples in each land use category, land use was grouped into forest and farmland.
All the covariates were available as raster files. As a processing step, the covariates were resampled to match a 250 m grid using the nearest neighbor strategy. The spatial distributions of 15 environmental variables in this study area are presented in Figure 2.

Individual Machine Learning Techniques
Numerous machine learning algorithms for digital soil mapping have been provided and discussed [23]. Four commonly used machine learning techniques (MLR, SVR, RF and XGBoost) were compared as individual models in this study.

Multiple Linear Regression
Multiple linear regression (MLR) is one of the simplest machine learning methods to digitally create soil maps [48][49][50], due to its simplicity and efficiency in computation, and easy interpretation.

Extreme Gradient Boosting
Extreme gradient boosting (XGBoost) is a novel tree-based ensemble model proposed by Chen and Guestrin [56]. Based on the boosting strategy, XGBoost can obtain a strong learner from weak learners. The XGBoost algorithm can improve computing speed by parallel learning, prevent over-fitting and improve performance. XGBoost is also able to measure the importance of input variables using the weight. More detailed information about XGBoost can be found in Chen and Guestrin [56]. The XGBoost was implemented using the R package xgboost version 1.1.1.1 [57]. After several experiments, the number of trees (n_estimators) is set as 180, the learning rate is 0.02, the maximum tree depth (max_depth) is 2 and for remaining variables, default parameters were used.

Stacking Ensemble Learning Models
Four outcomes of MLR, SVR, RF and XGBoost served as an input dataset to the stacking technique that generated the final output. Two stacking ensemble learning models were applied. The first was the least absolute shrinkage and selection operator (LASSO) model (Stacking1), which is a linear model with regularization and avoids over-fitting in the prediction model [58]. The second is the Generalized Boosted Regression Models (GBM) model (Stacking2), which deals with non-linear systems and provides great predictive performance [59]. The glmnet [60] and the gbm [61] packages in R were used to implement the stacking ensemble learning models.

Model Evaluation
To achieve stable model results, a ten-fold cross validation with 100 repetitions was used to evaluate the prediction performance of different models. Four validation criteria were used to evaluate the prediction accuracy of models: (1) coefficient of determination (R 2 ), (2) root mean square error (RMSE), (3) the mean absolute error (MAE) and (4) Lin's concordance correlation coefficient (LCCC). These indices were calculated as follows: where P i and O i are the predicted and observed soil thickness for the i-th observation; n is the number of sample points; P and O are the means of the predicted and observed soil thickness values; σ 2 o and σ 2 p are the variances of the predicted and observed values; and r is the Pearson correlation coefficient between the predicted and observed values.
The workflow for digital mapping of soil thickness in this study was shown in Figure 3. Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 22 The workflow for digital mapping of soil thickness in this study was shown in Figure 3.

Performances of Four Individual Machine Learning (ML) Models
Eleven quantitative environmental variables and four qualitative variables (LF, LU, PM and ST) were used to train four individual ML models: MLR, SVR, RF and XGBoost. To further illustrate the stable performances of these four models, we used boxplots of the R 2 , LCCC, RMSE and MAE distributions based on 100 iterations ( Figure 6). The results showed that the average R 2 values of four models using eleven quantitative variables ranged from 0.43 to 0.53; LCCC ranged from 0.63 to 0.68; RMSE was between 32.7 and 35.8 cm; MAE ranged from 24.2 to 27.6 cm. The XGBoost model obtained the best performance to predict soil thickness (R 2 = 0.53, LCCC = 0.68, RMSE = 32.7 cm and MAE = 24.2 cm), followed by RF, SVR and MLR.

Performances of Four Individual Machine Learning (ML) Models
Eleven quantitative environmental variables and four qualitative variables (LF, LU, PM and ST) were used to train four individual ML models: MLR, SVR, RF and XGBoost. To further illustrate the stable performances of these four models, we used boxplots of the R 2 , LCCC, RMSE and MAE distributions based on 100 iterations ( Figure 6). The results showed that the average R 2 values of four models using eleven quantitative variables ranged from 0.43 to 0.53; LCCC ranged from 0.63 to 0.68; RMSE was between 32.7 and 35.8 cm; MAE ranged from 24.2 to 27.6 cm. The XGBoost model obtained the best performance to predict soil thickness (R 2 =0.53, LCCC=0.68, RMSE = 32.7 cm and MAE = 24.2 cm), followed by RF, SVR and MLR.
Incorporation of four qualitative variables improved soil thickness predictions in these four individual ML models. For example, the XGBoost model improved the predictive performance by increasing the R 2 and LCCC by 9.15% and 7.57%, respectively, while also reducing the RMSE and MAE by 5.72% and 3.9%, respectively.

Best Environmental Variables
The relative importance of 15 environmental variables in the XGBoost, RF and SVR models were assessed by normalizing the environmental variables of each model to 100% (Figure 7). Three models showed similar environmental variable importance. The five most important predictor variables in the XGBoost model were slope, TWI, LU, LF and elevation. According to the RF model, they were Incorporation of four qualitative variables improved soil thickness predictions in these four individual ML models. For example, the XGBoost model improved the predictive performance by increasing the R 2 and LCCC by 9.15% and 7.57%, respectively, while also reducing the RMSE and MAE by 5.72% and 3.9%, respectively.

Best Environmental Variables
The relative importance of 15 environmental variables in the XGBoost, RF and SVR models were assessed by normalizing the environmental variables of each model to 100% (Figure 7). Three models showed similar environmental variable importance. The five most important predictor variables in the XGBoost model were slope, TWI, LU, LF and elevation. According to the RF model, they were TWI, slope, elevation, LU, and EVI. In the SVR model, they were TWI, lope, elevation, EVI and LU. Generally, TWI and slope were the most critical variables for determining soil thickness. The following important predictor variables varied, but their difference was insignificant. Based on the average rankings of three methods, the variables were ranked in the order of TWI > Slope > Elevation In total, the topography exhibited the highest explanation (57.5%, with seven variables) in the soil thickness model, followed by organisms (27.0%, with four variates), climate (8.8%, with two variables), parent material (5.0%, with one variable); and soil (1.7%, with one variable).  To identify the variations in the predictive accuracies with different numbers of environmental variables using MLR, SVR, RF and XGBoost, we used a forward variable selection approach based on the ranked variables above. Figure 8 shows how R 2 values vary with the number of input environmental variables for the MLR, SVR, RF and XGBoost models. The four models had similar variations in that their R 2 values increased progressively for each additional input variable. The models comprising the first five variables (TWI, slope, LU, EVI and elevation) were sufficient to reach the highest R 2 values of the graph, and the subsequent additional input variables resulted in only very small increases in R 2 . The inclusion of some redundant variables (such as LF) may cause noise that reduced the robustness of the predicted models. This result indicated that the use of only a few important variables was sufficient to achieve a high level of estimation accuracy. To identify the variations in the predictive accuracies with different numbers of environmental variables using MLR, SVR, RF and XGBoost, we used a forward variable selection approach based on the ranked variables above. Figure 8 shows how R 2 values vary with the number of input environmental variables for the MLR, SVR, RF and XGBoost models. The four models had similar variations in that their R 2 values increased progressively for each additional input variable. The models comprising the first five variables (TWI, slope, LU, EVI and elevation) were sufficient to reach the highest R 2 values of the graph, and the subsequent additional input variables resulted in only very small increases in R 2 . The inclusion of some redundant variables (such as LF) may cause noise that reduced the robustness of the predicted models. This result indicated that the use of only a few important variables was sufficient to achieve a high level of estimation accuracy. Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 22

Performances of Stacking Ensemble Models
On the basis of the selection of the five best variables (TWI, slope, LU, EVI and elevation), the performances of four individual ML models (MLR, SVR, RF and XGBoost) and two stacking models (Stacking1 and Stacking2) are presented in Table 3. The model accuracy was as follows: Stacking2 > Stacking1 > XGBoost > RF > SVR > MLR. Two stacking approaches had higher accuracy than individual ML models. For instance, the R 2 values of Stacking2 and Stacking1 were 7.4% and 3.3% higher than XGBoost, respectively. The Stacking2 model exhibited the greatest performance and could explain 58.7% of the soil thickness variation, with the highest R 2 (0.640) and LCCC (0.763) values and the lowest RMSE (29.2 cm) and MAE (22.2 cm) values.

Spatial Distribution of Soil Thickness
The spatial distribution maps and corresponding descriptive statistics of soil thickness predicted by different models (MLR, SVR, RF, XGBoost, Stacking1 and Satcking2) are shown in Figure 9 and Table 4, respectively. These predicted soil thickness maps had similar spatial distributional trends of soil thickness (Figure 9). Thin soil thickness values were mainly distributed in the western and southern parts of the research area where mountains dominated the landscape. Thick soil thickness values were mainly distributed in the wide plain in the central-eastern and southwest regions. These results showed that the spatial distributions of soil thickness varied significantly with topographic position. The mean values of soil thickness predicted ranged from 99.6 to 105.5 cm, respectively, which were close to the measured values (Table 4). Figure 10 and Table 5

Performances of Stacking Ensemble Models
On the basis of the selection of the five best variables (TWI, slope, LU, EVI and elevation), the performances of four individual ML models (MLR, SVR, RF and XGBoost) and two stacking models (Stacking1 and Stacking2) are presented in Table 3. The model accuracy was as follows: Stacking2 > Stacking1 > XGBoost > RF > SVR > MLR. Two stacking approaches had higher accuracy than individual ML models. For instance, the R 2 values of Stacking2 and Stacking1 were 7.4% and 3.3% higher than XGBoost, respectively. The Stacking2 model exhibited the greatest performance and could explain 58.7% of the soil thickness variation, with the highest R 2 (0.640) and LCCC (0.763) values and the lowest RMSE (29.2 cm) and MAE (22.2 cm) values.

Spatial Distribution of Soil Thickness
The spatial distribution maps and corresponding descriptive statistics of soil thickness predicted by different models (MLR, SVR, RF, XGBoost, Stacking1 and Stacking2) are shown in Figure 9 and Table 4, respectively. These predicted soil thickness maps had similar spatial distributional trends of soil thickness (Figure 9). Thin soil thickness values were mainly distributed in the western and southern parts of the research area where mountains dominated the landscape. Thick soil thickness values were mainly distributed in the wide plain in the central-eastern and southwest regions. These results showed that the spatial distributions of soil thickness varied significantly with topographic position. The mean values of soil thickness predicted ranged from 99.6 to 105.5 cm, respectively, which were close to the measured values (Table 4).
were less than 50 %. Most of the differences between the Stacking2 model and the RF and the XGBoost models were less than 10 cm. The Stacking2 model and the Stacking1 model had the smallest spatial distribution differences, of these 91.3% were less than 10 cm. In general, the larger differences of predicted soil thickness maps were mainly located in relatively large topographic variation areas such as steep hillsides, mountain ridges and river valleys, whereas the regions with gentle slopes or low elevations had smaller differences.     Figure 10 and Table 5 display the differences in the predicted soil thickness maps between the Stacking2 model and the other five models (MLR, SVR, RF, XGBoost, Stacking1). The MLR model and the SVR model had the most significant differences with the Stacking2 model in the predictions results, where their standard deviations were more than 19.1 cm and their differences less than 10 cm were less than 50 %. Most of the differences between the Stacking2 model and the RF and the XGBoost models were less than 10 cm. The Stacking2 model and the Stacking1 model had the smallest spatial distribution differences, of these 91.3% were less than 10 cm. In general, the larger differences of predicted soil thickness maps were mainly located in relatively large topographic variation areas such as steep hillsides, mountain ridges and river valleys, whereas the regions with gentle slopes or low elevations had smaller differences.

The Performance of Environmental Variables
According to soil formative factors of the SCORPAN model [12], a total of fifteen relative environmental variables from five factors (soil, climate, organism, topography and parent material) were assembled to explain spatial variability of soil thickness in Henan Province, China.
Findings in previous works coincide with our results in the sense that topography was the main predictor in soil thickness modeling [21,24,62]. Of the topographical variables used in the study, TWI, elevation and slope had highly significant correlations with soil thickness and were the most influential variables in the soil thickness models. We also found that LF was a combination of topography and could reflect the trend of soil thickness variation, which showed significant correlations with soil thickness. The spatial distribution of thick soil thickness was mainly in the floodplains or the bottom of the valleys with low elevations and gentle slopes in this study area. Shallow soil was found in areas of high elevation, steep slopes or rugged relief, where the rate of soil

The Performance of Environmental Variables
According to soil formative factors of the SCORPAN model [12], a total of fifteen relative environmental variables from five factors (soil, climate, organism, topography and parent material) were assembled to explain spatial variability of soil thickness in Henan Province, China.
Findings in previous works coincide with our results in the sense that topography was the main predictor in soil thickness modeling [21,24,62]. Of the topographical variables used in the study, TWI, elevation and slope had highly significant correlations with soil thickness and were the most influential variables in the soil thickness models. We also found that LF was a combination of topography and could reflect the trend of soil thickness variation, which showed significant correlations with soil thickness. The spatial distribution of thick soil thickness was mainly in the floodplains or the bottom of the valleys with low elevations and gentle slopes in this study area. Shallow soil was found in areas of high elevation, steep slopes or rugged relief, where the rate of soil erosion was high. The spatial pattern of soil thickness is a result of the interaction of soil deposition, soil erosion and surface runoff [49]. Soil deposition can be intensified in low plains and valley bottoms and soil erosion is accelerated under increasing elevation in steep areas [29]. TWI reflects the spatial distribution of water flow and thus the accumulation processes in a closed catchment. Yang et al. [50] reported that TWI, slope and elevation explained approximately 61.4% of the variation on soil thickness in karst regions in Southwest China. Ho et al. [63] also showed that TWI was an important variable reflecting the spatial distribution of soil thickness.
The main soil-forming or altering organisms are natural vegetation or humans (e.g., land use) [12]. The soil-vegetation feedback system indicates that the distribution of vegetation is influenced by soil properties [64]. We also demonstrated that vegetation was the main factor affecting soil thickness. EVI, NDVI and LAI have positive correlations with soil thickness. The high vegetation fractional coverage shows that soil can be well protected against soil erosion, resulting in a large soil thickness. EVI provides improved vegetation monitoring through increased sensitivity in areas with dense biomass and reduced atmospheric influences [65]. EVI captured vegetation variations more sensitively compared to NDVI and LAI. Land use was classified as forest and farmland in this study, and it was another indicator of vegetation. The soil thickness values of forest were significantly lower than those of farmland, where the high fractional coverage of vegetation mostly occurred in farmland. The models that used LU made a large contribution to soil thickness modeling. Kuriakose et al. [11] indicated that land use had a strong relationship with soil thickness in the Aruvikkal catchment. Sarkar et al. (2013) stated that land use was identified as the most dominant predictor of soil thickness in the regression analysis.
The nature of the parent material can be a significant influence on soil thickness. We also found that soil thickness values varied with different parent material categories. Additionally, the spatial distributions of topography and parent material were highly consistent in this study area. The soil thickness values increased with an increase from residual and slope deposits to loess and laterite. Chen et al. [66] also indicated that parent material was the most important variable in the soil thickness modeling of mainland France.
Climate may control soil distribution over a continental scale [8]. Malone and Searle [67] found that climatic variables were consistently used and important in mapping soil thickness across Australia. In this study, MAP and MAT had no correlation with soil thickness (p > 0.05) and were not selected as predictors in the modeling. This could be due to the distributions of MAP and MAT that decreased from south to north, which reflects the characteristics of latitude, and these variables have relatively small impacts on the weathering of parent rocks and soil formation in this area [21]. In this study, soil texture showed only weak predictive power in the models. Soil texture was slightly relevant to infer soil thickness variations.

The Importance of Variable Selection
Variable selection is one of the most important processes in modeling, which can reduce the number of predictor variables to several important variables and make it easier to interpret the model [18]. The inclusion of additional environmental variables potentially adds additional information and is usually expected to improve predictions in DSM. However, different variables showed different predictive capabilities in the soil thickness modeling. This increased dimensionality and complexity may actually result in a decrease in accuracy. We generally observed improvements when using more variables, but the improvements were not large and may be outweighed by the costs. This is because the number of training data may be insufficient to characterize the increased complexity associated with the larger dimensionality of the feature space. The contributions of the other variables were less important and could reduce the model's performance and capability [68]. RF, SVR and XGBoost models have their own procedures for calculating the importance of each variable. Their importance scores may be different, but most optimal variables selected were similar. This result is in line with other studies [69]. In this study, TWI, slope, LU, EVI and elevation played more important roles than the other variables and achieved the best accuracy, which is consistent with the result of Lu et al. [21] who reported that the performance of four environmental covariates selected by the proposed method was better than that of all 17 variables.

The Performance of Ensemble Methods
RF and XGBoost are the most popular ensemble methods for bagging and boosting, respectively. They are both based on decision trees and are different with respect to the tree ensemble methods. In this study, the XGBoost model performed slightly better than the RF model based on the comparison of four statistical indicators (R 2 , LCCC, RMSE and MAE). This is likely due to the differences in the tree ensemble methods. RF uses the bagging (bootstrap resampling) method to build different training sets, and average the forecast results of decisions trees. XGBoost uses previous ideas from gradient boosting, creating tree models based on residuals from previously created trees. Dietterich [70] compared bagging and boosting ensembles and found that boosting outperformed bagging in the dataset with little noise.
These two tree-based models performed much better than SVR and MLR, as they are more capable of dealing with the non-linear problem, resistance to overfitting, and capturing complex interactions in the variables. MLR exhibited poorly performance in this study area because it cannot solve the non-linear relationships between soil attributes and environmental variables. SVR is able to model highly non-linear dimensional relationships, but its performance is still susceptible to overfitting and finding optimal parameters. Scarpone et al. [29] used different models to map the soil thickness in the Tulameen study area in the southern interior of British Columbia and found that the predictive performance of the RF model was significantly better than the generalized linear model. Li et al. [49] compared the accuracy of MLR, geographically weighted regression (GWR), SVR, and RF models to predict active-layer soil thickness, and found that RF performed best and MLR performed worst.
All available information is not always efficiently used by a single accurate model. Each algorithm has specific advantages and disadvantages. The errors of the single model have different uncertainties. The application of the stacking ensemble model used multiple learning models' strengths to achieve more robust performance, reduce estimation uncertainties and improve prediction accuracy. The stacking strategy was more accurate than individual models in the current research. The predictions from four individual models with different principles were combined using two stacking approaches: LASSO and GBM. The GBM stacking model (Stacking2) achieves better predictive performance than the LASSO stacking model (Stacking1). GBM is more capable of dealing with the non-linear problem, resistance to overfitting, and capturing complex interactions in the variables [71,72].
Several studies have found this advantage for predicting soil properties. For instance, Taghizadeh-Mehrjardi et al. [36] proposed the stacking approach to have the best performances in soil organic carbon (SOC) content predictions in comparison to six ML models. They also found that the RMSE values of stacking models using SVR for SOC content were lower those of stacking models using LASSO. Chen et al. [40] couplied RF and XGBoost models to predict soil pH at the national scale. Román Dobarco et al. [41], however, found that the ensemble predictions did not improve for silt and sand, and improved only for clay. The performance of the stacking models may depend on the quality of input datasets or primary maps and the diversification of the input single models [73,74].

The Performance of Predictions
The proposed model (Stacking2) found in this study used the five best variables and the stacking ensemble method to explain 64.0% of soil thickness variation. Tesfa et al. [24] used the developed random forest model with 11 input variables to explain approximately 50% of the variation of soil thickness in a semiarid mountainous watershed. Sarkar et al. [23] showed that the regression kriging model with seven landscape variables could explain 67% of the variability of soil thickness in a Himalayan terrain. Lacoste et al. [8] proposed three methods to explain 50-58% of the soil thickness variation in France. Lu et al. [21] used five environmental covariates and the RF method to explain 76% of the variation in soil thickness. Malone and Searle [67] developed an integrated data mining approach to create soil thickness maps across Australia with a concordance coefficient of 0.77. The results described in our research were within the range of previous studies.
The soil thickness maps predicted by different models exhibited similar distribution patterns, and topography was the most notable variable. Thick soil was mostly concentrated in valleys, wide plains and areas with gentle slopes, while thin soil was mainly found in steep slopes and areas of high elevation. Topography was shown to be the dominant factor of soil thickness in these landscapes. This result was also consistent with other studies [8,11,49]. Our results indicated that the predicted soil thickness was less variable than the measured soil thickness. Similar findings were also reported in soil thickness mapping studies by Yang et al. [50]

Limitations and Future Research
First, it was difficult to assess the uncertainty of the quantitative environmental variables with coarse resolutions, which were surveyed during the Second National Soil Survey of China [75]. Therefore, it is necessary to explore the influence of more detailed environmental variables in soil thickness mapping [76]. Other environmental variables such as vegetation types, bedrock geology and soil type should be explored [77,78]. Second, machine learning approaches discover and quantify only the relationships between soil thickness and environmental variables and ignore the neighbouring spatial information of soil sample sites for DSM. The combination of regression models and geostatistical methods such as kriging could account for more of the variability in the landscape, and thus has led to improvements in DSM [29]. Different validation strategies (such as block cross validation) should be applied to test model performance because of the over-fitting in machine learning models [79][80][81]. Third, the main factors controlling soil thickness may not be the same at different spatial and temporal scales [75] and the optimal spatial resolution for soil thickness prediction should be discussed [25].

Conclusions
In this study, exhaustive covariates and machine learning methods were applied to build the most reliable and accurate estimation model to provide the spatial distribution of soil thickness for Henan Province in China. The results suggested that using qualitative environmental variables could improve the accuracy of soil thickness estimations; in particular, each qualitative variable category showed significant differences with soil thickness values. The results also demonstrated the importance of variable selection in soil thickness modeling. Topography and organisms were the most important environmental soil-forming factors in predicting soil thickness. After variable selection, TWI, slope, elevation, land use and EVI were the most important environmental variables explaining the observed variability in soil thickness. Out of four individual machine learning methods, XGBoost achieved the best performance, followed by RF, SVR, and MLR. However, compared with these individual methods, two stacking models showed better performance. The best model (Stacking2) exhibited the greatest performance with the highest R 2 (0.640) and LCCC (0.763) values and the lowest RMSE (29.2 cm) and MAE (22.2 cm) values. From the soil thickness spatial distribution maps, thick soils were mainly concentrated in valleys and low elevations and shallow soils mostly occurred on steep slopes and high elevations. This study provided an example of producing regional soil thickness maps. It is recommended to select appropriate variables and stacking models when using machine learning algorithms for digital soil mapping.