Modeling and Predictive Mapping of Soil Organic Carbon Density in a Small-Scale Area Using Geographically Weighted Regression Kriging Approach

Different natural environmental variables affect the spatial distribution of soil organic carbon (SOC), which has strong spatial heterogeneity and non-stationarity. Additionally, the soil organic carbon density (SOCD) has strong spatial varying relationships with the environmental factors, and the residuals should keep independent. This is one hard and challenging study in digital soil mapping. This study was designed to explore the different impacts of natural environmental factors and construct spatial prediction models of SOC in the junction region (with an area of 2130.37 km2) between Enshi City and Yidu City, Hubei Province, China. Multiple spatial interpolation models, such as stepwise linear regression (STR), geographically weighted regression (GWR), regression kriging (RK), and geographically weighted regression kriging (GWRK), were built using different natural environmental variables (e.g., terrain, environmental, and human factors) as auxiliary variables. The goodness of fit (R2), root mean square error, and improving accuracy were used to evaluate the predicted results of the spatial interpolation models. Results from Pearson correlation coefficient analysis and STR showed that SOCD was strongly correlated with elevation, topographic position index (TPI), topographic wetness index (TWI), slope, and normalized difference vegetation index (NDVI). GWRK had the highest simulation accuracy, followed by RK, whereas STR was the weakest. A theoretical scientific basis is, therefore, provided for exploring the relationship between SOCD and the corresponding environmental variables as well as for modeling and estimating the regional soil carbon pool.


Introduction
Soil organic carbon (SOC) plays an important role in soil productivity and the global carbon cycle [1]. SOC stock is not only related to soil, ecosystem, and land-use types, but it also depends on the amount of net biological input, farming methods, organic carbon stability, and many other factors [2][3][4]. The change in SOC directly impacts CO 2 concentration, which then influences solar heat

Study Area
The junction region between Enshi City and Yidu City has an area of 2130.37 km 2 and is located in the middle reaches of the Yangtze River, China ( Figure 1). This region has a northern subtropical climate and belongs to the subtropical monsoon zone, which has high annual rainfall and hills with yellow brown loam, purple soil, lime rock soil, damp soil, and paddy soil. Data from a 1:10,000 scale topographic map and the second soil census data were used, and a representative area with diverse landforms, large topographical changes, and typical soil types was selected. The mesh uniform distribution pattern was adopted with a sampling area of 200 m × 200 m. A differential global positioning system was used for field sample spatial orientation, and 129 of the 329 samples collected were from plain and arable lands, while the rest were from hilly and uneven terrain regions that have a planting farmland surface 20 cm deep and an orchard planting area 0-30 cm deep, respectively.

Soil Organic Carbon Density
The model of SOCD is described as where stands for the SOCD of the soil surface (0-20 cm), i represents the soil horizon (1, 2, 3…n), θi denotes the gravel concentration (>2 mm) in the ith horizon (volume percentage), pi represents the soil bulk density in the ith horizon (g·cm −3 ), is the SOC stock (g·kg −1 ) obtained by multiplying soil organic matter (SOM) content with 0.58 (Bemmelen conversion fraction), and is the soil thickness. Samples in this study were collected from surface soil of 30 cm deep.

Environmental Factors
The terrain attributes were acquired through a digital terrain analysis with the ASTER GDEM v2 with ArcGIS 9.3 (Esri Inc., Redlands, CA, USA, 1999-2006) by 1:50,000 topographic maps and sample information. The obtained terrain indices were the elevation, the slope, the aspect, the surface

Soil Organic Carbon Density
The model of SOCD is described as where ρ SOC stands for the SOCD of the soil surface (0-20 cm), i represents the soil horizon (1, 2, 3, . . . , n), θ i denotes the gravel concentration (>2 mm) in the ith horizon (volume percentage), p i represents the soil bulk density in the ith horizon (g·cm −3 ), C i is the SOC stock (g·kg −1 ) obtained by multiplying soil organic matter (SOM) content with 0.58 (Bemmelen conversion fraction), and T i is the soil thickness. Samples in this study were collected from surface soil of 30 cm deep.

Environmental Factors
The terrain attributes were acquired through a digital terrain analysis with the ASTER GDEM v2 with ArcGIS 9.3 (Esri Inc., Redlands, CA, USA, 1999-2006) by 1:50,000 topographic maps and Sustainability 2020, 12, 9330 4 of 12 sample information. The obtained terrain indices were the elevation, the slope, the aspect, the surface roughness, the texture, the topographic position index (TPI), the stream power index (SPI), and the topographic wetness index (TWI). The vegetation indices were extracted from the Landsat 8 OLI image (LC08_L1TP_125039_20180929_20181009_01_T1) including the normalized difference vegetation index (NDVI), the blue normalized difference vegetation index (BNDVI), the difference vegetation index (DVI), the green normalized difference vegetation index (GNDVI), and the transformed vegetation index (TVI). The detailed information of these vegetation indices can be found from the index database (https://www.indexdatabase.de/).

Methods
First, the Kennard-Stone algorithm was used to divide the 329 soil samples into different datasets: 229 samples for calibration and 100 samples for validation. Then, stepwise linear regression (STR) was used to choose the most suitable environmental factors for constructing the prediction models, and then GWR, RK, and GWRK were used as differential value models to simulate and predict the spatial distribution of small-scale SOCD. Finally, root mean square error (RMSE), goodness of fit (R 2 ), and improving accuracy (PIA) were chosen as evaluation indices to compare the accuracy of the SOCD prediction models.

Geographically Weighted Regression Kriging Model
OK and RK models have already been discussed in previous studies; thus, details regarding the principles and formulae of these approaches are not provided in the current study [10]. GWRK is an extension of the GWR model and explores the effects of topographic factors and environmental variables on the SOC spatial distribution based on the local spatial heterogeneity of independent and dependent variables [14]. GWRK eliminates the residual effects on the accuracy of the model by reviewing the OK interpolation on residual errors. GWR intends to embed spatial locations into regression parameters based on the principle of local smoothing and uses locally weighted least squares method for pointwise parameter estimation. GWR is calculated as follows: whereĈ gwr (s 0 ) is the SOCD at the geographical location of s 0 , X k (s 0 ) is the independent variable of the terrain factor and environmental variable at the geographical location of s 0 , β k (s 0 ) is the coefficient of the different independent variables in this model, and p is the total number of soil samples. Residuals appeared after the simulation of SOCD through GWR, affecting the fitting accuracy of the whole model. OK interpolation was necessary to remove the effects of the residuals from GWR and construct the GWRK model.Ĉ whereĈ gwrk (s 0 ) is the SOCD value of the sampling point at the location of s 0 , which was estimated through GWRK;Ĉ gwr (s 0 ) is the SOCD value of the sampling point at the location of s 0 , which was estimated through GWR; andε ok (s 0 ) is the result of using OK to interpolate the differences between the SOCD values simulated through GWR and the measured SOCD values.

Model Validation and Evaluation
The advantages and disadvantages of the different interpolation methods can be tested on the calibration and validation data. The calibration data can be used to construct a statistical model, and it can reflect the spatial distribution of SOCD, whereas validation data can be used to evaluate the accuracy of predictions. The prediction accuracy was evaluated by comparing the R 2 and RMSE of the Sustainability 2020, 12, 9330 5 of 12 measured and predicted values on checkpoints. The RMSE value reflects the valuation sensitivity and extreme effects using the sample data. The smaller the RMSE, the more accurate the prediction.
where V oi is the measured value of SOCD, V pi is the predicted value of SOCD, and n is the number of samples.
The RMSE was considered the evaluation criterion, and the maximal value of RMSE the cardinal number to assess the percentage of PIA between different models. PIA is calculated as: where PIA is the abbreviation of the improving accuracy percentage, RMSE_maximal is the maximal value of RMSE, and RMSE_evaluated is the RMSE of the evaluated model.

Descriptive Statistical Analysis
The basic statistics of soil organic matter, SOCD, environmental variables, and variation coefficients are shown in Table 1. SOCD was selected to explore the SOC spatial distribution features, whereas terrain factors were used to exhibit the effects of various terrain conditions on SOCD. The vegetation indices of NDVI, BNDVI, DVI, GNDVI, and TVI were used to represent the land surface vegetation status. The results of the experimental zone show remarkable differences in the SOM with mean, minimum, and maximum values of 22.541, 5, and 55.4 g·kg −1 as well as a full-pitch range of 50.4 g·kg −1 . The mean value of SOCD was 5.081 kg·m −2 with a range of 11.152 kg·m −2 , implying that SOCD changed remarkably in the study area. The coefficients of variation showed the degree of dispersion in the soil properties and environmental variables: <10%, weak variability; values between 10% and 100%, medium variability; and >100% shows strong variability. The absolute values of TPI and TWI were considered to determine the variability because the mean values of these two indices were negative. TPI and SPI had strong spatial variabilities, indicating a strong degree of dispersion; whereas the SOM, SOCD, texture, slope, TWI, aspect, BNDVI, DVI, GNDVI, and NDVI have the median spatial variations; and others have the weak variability.

Relationship between SOCD and Environmental Variables
Pearson correlation coefficient (Pearson's r) is frequently used to measure the relationship between data sets, by checking the linear relationship of the interval variables. The closer a correlation coefficient is to 1 or −1, the stronger the relationship of a correlation. By contrast, the closer a correlation coefficient is to 0, the weaker the relationship. The order of the absolute value of Pearson's r for the terrain variables was elevation > texture > slope > TPI > roughness > SPI, which shows that these variables have stronger correlations with SOCD than other variables ( Table 2). In addition, the vegetation indices of DVI, GNDVI, TVI, and NDVI have significant correlations with SOCD. However, among these vegetation indices, strong correlations existed between them, thus, the multilinearly should be removed when construing the prediction models.

The Model Parameters of SOCD
The STR model was used to select the most suitable environmental factors for constructing the prediction models, and the TPI, slope, TWI, elevation, and NDVI were used ( Table 3). The different weighting coefficients indicated the contributions of different factors to SOCD, which showed that the effect of NDVI on SOCD was the most significant, with a weighting coefficient of −2.721.
Negative correlations were observed between TPI, NDVI, and SOCD. The higher the NDVI, the more vegetation was covered, and the lower the SOCD. Positive correlations existed between the elevation, slope, and TWI with SOCD. The variance inflation factor (VIF) values were used to check the multicollinearity between these factors, and all of these values were smaller than 7.0; thus, there was no multicollinearity between them. Additionally, t-statistic test was used to check the significance of the environmental factors to SOCD, and these environmental factors went through the t-statistic test.
GWR was used to construct the SOCD prediction models, and the spatial variability relationships between the environmental factors and SOCD are shown in Table 3. The influence weights of TPI ranged from −0.52 to 0.15, and the mean value was −0.186. In addition, the mean coefficient values of other environmental factors were 0.017, 0.052, 0.001, and −2.334 for slope, TWI, elevation, and NDVI, respectively. The spatial distribution characteristics of these factors are shown in Figure 2. The elevation has negative correlations with SOCD in the northeast part of the region, and has positive correlations in the southwest pattern. The TWI has various spatial weights with SOCD in the study region, and the influence degrees were changed with the geographical locations. Other environmental factors also have different influence degrees in the study region. GWR can reflect the spatial distribution characteristics of the weights between the auxiliary variables and SOCD.
The residuals of SOCD are the differences between the measured and predicted values through the STR and GWR approaches. The traditional geostatistical interpolation model, OK, was used to map the residual spatial distribution of SOCD because the sampling points have spatial discontinuity and unsustainability The circular model was chosen for the semivariogram functions, and the detailed information of the model parameters is shown in Figure 3. The nugget values of RK and GWRK were 1.336 and 1.056, and the sill values were 2.129 and 2.094. The ratios of the nugget and sill were 62.74% and 50.43%, respectively, which showed that most of the spatial characteristics of SOCD residuals predicted by STR and GWR cannot be estimated by the semivariogram functions.    GWR was used to construct the SOCD prediction models, and the spatial variability relationships between the environmental factors and SOCD are shown in Table 3. The influence weights of TPI ranged from −0.52 to 0.15, and the mean value was −0.186. In addition, the mean coefficient values of other environmental factors were 0.017, 0.052, 0.001, and −2.334 for slope, TWI, elevation, and NDVI, respectively. The spatial distribution characteristics of these factors are shown in Figure 2. The elevation has negative correlations with SOCD in the northeast part of the region, and has positive correlations in the southwest pattern. The TWI has various spatial weights with SOCD in the study region, and the influence degrees were changed with the geographical locations. Other environmental factors also have different influence degrees in the study region. GWR can reflect the spatial distribution characteristics of the weights between the auxiliary variables and SOCD. The residuals of SOCD are the differences between the measured and predicted values through the STR and GWR approaches. The traditional geostatistical interpolation model, OK, was used to map the residual spatial distribution of SOCD because the sampling points have spatial discontinuity and unsustainability The circular model was chosen for the semivariogram functions, and the detailed information of the model parameters is shown in Figure 3. The nugget values of RK and GWRK were 1.336 and 1.056, and the sill values were 2.129 and 2.094. The ratios of the nugget and sill were 62.74% and 50.43%, respectively, which showed that most of the spatial characteristics of SOCD residuals predicted by STR and GWR cannot be estimated by the semivariogram functions.

Model Performance
A total of 229 samples were selected as the calibration dataset to build the SOCD models and another 100 samples as the validation data set to test the spatial simulation and prediction abilities of these approaches. R 2 and RMSE were selected as the evaluation indices. Table 4 shows the predictive accuracy of the testing points in four approaches using the two indices. The validated RMSE of STR, GWR, RK, and GWRK models were 1.666, 1.603, 1.658, and 1.552 kg·m −2 , respectively. These results showed that the values of the GWRK model were smaller than the others, indicating that the GWRK model had the best prediction accuracy and results, followed by GWR, RK, and STR. GWR, RK, and GWRK compared with STR increased by 3.76%, 0.51%, and 6.84%, respectively. The values showed that the prediction accuracy of GWRK compared with STR significantly increased by 6.84%, and the predictive accuracy of GWR and RK were better than that of STR, although their accuracy was less than that of GWRK. Compared with the STR model, GWR highly considered local spatial characteristics of SOCD and its impact factors. Thus, the GWR prediction is more accurate than the STR prediction. RK eliminated the STR residuals of SOCD as much as possible. Meanwhile, GWRK not only considered the local spatial characteristics of the independent and dependent variables, but also eliminated the simulating SOCD residuals. The GWRK simulation and prediction precision was preferred, and the results from the spatial distribution of SOCD through GWRK were more consistent with the actual situation.

Model Performance
A total of 229 samples were selected as the calibration dataset to build the SOCD models and another 100 samples as the validation data set to test the spatial simulation and prediction abilities of these approaches. R 2 and RMSE were selected as the evaluation indices. Table 4 shows the predictive accuracy of the testing points in four approaches using the two indices. The validated RMSE of STR, GWR, RK, and GWRK models were 1.666, 1.603, 1.658, and 1.552 kg·m −2 , respectively. These results showed that the values of the GWRK model were smaller than the others, indicating that the GWRK model had the best prediction accuracy and results, followed by GWR, RK, and STR. GWR, RK, and GWRK compared with STR increased by 3.76%, 0.51%, and 6.84%, respectively. The values showed that the prediction accuracy of GWRK compared with STR significantly increased by 6.84%, and the predictive accuracy of GWR and RK were better than that of STR, although their accuracy was less than that of GWRK. Compared with the STR model, GWR highly considered local spatial characteristics of SOCD and its impact factors. Thus, the GWR prediction is more accurate than the STR prediction. RK eliminated the STR residuals of SOCD as much as possible. Meanwhile, GWRK not only considered the local spatial characteristics of the independent and dependent variables, but also eliminated the simulating SOCD residuals. The GWRK simulation and prediction precision was preferred, and the results from the spatial distribution of SOCD through GWRK were more consistent with the actual situation.

SOCD Maps
The spatial distributions of SOCD based on STR, RK, GWR, and GWRK are shown in Figure 4a-d, respectively. The SOCD ranged from 2.3 to 9.3 g/kg, and the lower values were accumulated in the eastern area of the study region, and the higher values gathered at the western area of the study region. The spatial distribution of SOCD predicted by GWR and STR have line striped features, and the SOCD values increased from the east to the west. The SOCD predicted by GWR was smoother than the SOCD predicted by STR, and there were many irregular patches distributed. In Figure 4b,d, the spatial distribution characteristics of SOCD predicted by RK and GWRK were very trivial relative to STR and GWR. However, the global spatial distribution characteristics have similar distribution trends. From east to west, SOCD values increased.

SOCD Maps
The spatial distributions of SOCD based on STR, RK, GWR, and GWRK are shown in Figure 4ad, respectively. The SOCD ranged from 2.3 to 9.3 g/kg, and the lower values were accumulated in the eastern area of the study region, and the higher values gathered at the western area of the study region. The spatial distribution of SOCD predicted by GWR and STR have line striped features, and the SOCD values increased from the east to the west. The SOCD predicted by GWR was smoother than the SOCD predicted by STR, and there were many irregular patches distributed. In Figure 4b,d, the spatial distribution characteristics of SOCD predicted by RK and GWRK were very trivial relative to STR and GWR. However, the global spatial distribution characteristics have similar distribution trends. From east to west, SOCD values increased.

Discussion
The spatial distribution of SOCD had strong spatial heterogeneity and spatial dependence, and the development and evolution of SOCD were influenced by many different environmental factors, so it is important and hard work to accurately map the spatial distribution of SOCD. In this paper, two basal models of stepwise linear regression (STR) and geographically weighted regression (GWR) and two extended models of regression kriging (RK) and geographically weighted regression kriging (GWRK) were used to map the spatial distribution of SOCD. The five environmental factors of elevation, TPI, slope, TWI, and NDVI were used as the auxiliary variables for constructing the

Discussion
The spatial distribution of SOCD had strong spatial heterogeneity and spatial dependence, and the development and evolution of SOCD were influenced by many different environmental factors, so it is important and hard work to accurately map the spatial distribution of SOCD. In this paper, two basal models of stepwise linear regression (STR) and geographically weighted regression (GWR) and two extended models of regression kriging (RK) and geographically weighted regression kriging (GWRK) were used to map the spatial distribution of SOCD. The five environmental factors of elevation, TPI, slope, TWI, and NDVI were used as the auxiliary variables for constructing the prediction models. GWRK is a combination of GWR and OK: GWR is a locally geographical regression model that considers the spatial autocorrelation and dependence between the SOCD and its impact factors and OK is a spatial interpolation model which can be used to reduce the random residuals of regression models. As GWRK has the advantages from GWR and OK, and ignores the deficiencies of them, it had the best evaluation indexes in these four methods: RMSE had the smallest value of 1.552 and the improving accuracy was 6.84% greater than STR. RK is one of the combination methods of STR and OK, although it considers the influences from the environmental factors and reduces the random residuals of regression models. STR is one global regression model and it ignores the spatial characteristics of SOCD and its environmental factors. So, the prediction accuracy of RK was lower than GWRK, but was higher than GWR and STR, and the improving accuracy was 0.51% greater than STR. There were three important points to improve the prediction accuracy of simulation models: the impact factors, the residuals of regression models, and the spatial characteristics of SOCD. It is one important precondition that considers the above three problems to construct the higher prediction accuracy model of SOCD.
SOCD has strong spatial heterogeneity in the study region, and it has spatial varying relationships with the environmental factors. GWR can simulate these relationships and clearly map the spatial distribution of SOCD. The main reason can be due to GWR being a local spatial regression model. The basic assumption of the regression model was that there was no dependency between the environmental factors, and the residuals should be independent in the spatial distribution. Thus, STR can select the suitable environmental factors, and OK can simulate the spatial distribution of the predicted residuals, and these can be removed from the predicted values; then, GWRK and RK can obtain better-predicted values relative to the traditional regression models. Similar research was reported by Zhu and Lin [18], who compared the prediction accuracy between OK and RK to predict soil properties, and the terrain indices and electromagnetic induction surveys were used as the auxiliary variables. The results showed when spatial structure could not be well captured, or when a strong relationship existed between target soil properties and auxiliary variables, RK was more accurate for interpolating soil properties in both landscapes studied, otherwise OK was better. Wang et al. [10] compared GWR and ordinary cokriging (OCK) in predictive mapping of soil total nitrogen (TN) using multiple environmental variables, and the results showed that GWR is a more promising spatial interpolation method compared to OCK in predicting soil TN and potentially other soil properties. Kumar and Lal [19] used GWRK to predict the spatial distribution of SOC based on the other soil properties, and the results showed that the GWRK captures spatial dependent relationships, and addresses spatial non-stationarity issues; hence, this approach improves the estimations of SOC stock. All of these research studies showed that the environmental factors can be used as auxiliary variables, and the spatial geostatistical approaches can be used to predict the soil properties. In these papers, these models (STR, GWR, RK, and GWRK) were used to predict the spatial distribution of SOCD and the differences between these models were compared comprehensively and in detail.
Although GWRK was the best suitable model in estimating the spatial distribution of SOCD in this study area, the variation of environmental factors in other geographical locations may lead to different prediction results and model performance. The limitation and uneven distribution of the soil samples may be also inevitable problems when estimating the spatial distribution of SOCD [20]. Therefore, more studies from different geographical landscapes are strongly encouraged for a better understanding of the spatial relationships between environmental variables and SOCD.

Conclusions
The spatial distribution of soil properties has strong interdependence and heterogeneity at different geographical locations with different magnitudes. Previous studies have focused on building spatial interpolation models of soil attributes using traditional geostatistical models. These models consider the spatial heterogeneity and interdependence of soil properties, but ignore the impact of environmental factors on SOCD. The RK, GWR, and GWRK approaches have been successively employed to predict and simulate soil properties because environmental variables, such as topography, climate, and human factors, affect soil properties. These approaches surpass the traditional statistical models. Although the GWR approach considers the regional features of soil properties and various environmental variables, the combination of this approach with the OK approach can compensate for the defect of larger residuals. The spatial distribution of SOCD was modeled and predicted in the present study using four approaches in the same study area. Two evaluation indices were used to test the simulation accuracies of the spatial interpolation models. The results showed that the RMSE of GWRK was minimal, and the prediction accuracy of GWRK was 6.84% higher than that of STR. Therefore, the GWRK model of SOCD can reflect the regional spatial distribution of SOCD and help study the spatial distribution of SOCD. The GWR model aids in implementing the prediction and simulation of SOCD by selecting different environmental factors as auxiliary variables and studying the influences of these variables on SOCD. Additionally, it also provides a theoretical and scientific basis for estimating SOCD in a large area.