Predicting Soil Organic Carbon and Soil Nitrogen Stocks in Topsoil of Forest Ecosystems in Northeastern China Using Remote Sensing Data

: Forest ecosystems play an important role in regional carbon and nitrogen cycling. Accurate and e ﬀ ective monitoring of their soil organic carbon (SOC) and soil total nitrogen (STN) stocks provides important information for soil quality assessment, sustainable forestry management and climate change policy making. In this study, a geographical weighted regression (GWR) model, a multiple stepwise regression (MLSR) model


Introduction
Carbon and nitrogen are two important chemical elements to maintain the structure and functioning of forest ecosystems [1].Their cycling processes and interactions play a key role in regulating plant productivity, carbon sequestration potential, and stability of ecosystems [2].Forests are a main component of the terrestrial biosphere, and store large amounts of carbon and nitrogen in soils [3].Forest soil carbon stocks account for about 73% of global soil carbon, and have 3.5 × 10 5 -5.5 × 10 5 Tg of nitrogen [1].Their storage and dynamical changes are of a great significance to forest productivity, global carbon and nitrogen balance and global climate change [4].Climate can affect carbon and nitrogen changes in forest soils [5], plant distribution and productivity [6], and the change in soil organic carbon (SOC) and soil total nitrogen (STN) by changing the input of aboveground and underground litter [7].Furthermore, climate can affect the decomposition and transformation of organic carbon by changing soil temperature and water status, affecting greenhouse gas emissions [8], in turn impacting our climate.Mapping soil carbon and nitrogen pools have become one of the core research topics in soil science, ecology and global climate change.
Natural ecological processes control the spatial heterogeneity of SOC and STN stocks in forest ecosystems [9].To date, it is still a challenge to accurately predict the spatial distribution of SOC and STN stocks at regional scales.It is not realistic to use intensive sampling methods to estimate SOC and STN stocks for a large region.A simple and low-cost digital soil mapping (DSM) technology, which can estimate the distribution of regional SOC and STN stocks based on a small amount of sampling data and environmental variables is desirable [10].
DSM is based on the principle that natural soil forming factors control soil formation, which include climate, biota, time, topography and parent material.Jenny [11] further elaborated this principle and laid a foundation for the development of DSM.In forest ecosystems, all SOC comes from plants [12] and can be classified into two main sources [13].First, soil organic matter is formed by humification of the dead remains of roots or branches.Second, root exudates or separations released to rhizosphere during plant growth, such as root hairs and metabolized fine roots.Soil nitrogen mainly comes from litters and changes through nitrification and other processes.Therefore, plants in forest ecosystems are closely linked to SOC and STN stocks and their changes.Vegetation has been identified as the most important environmental variable that affects the spatial distribution of SOC and STN in DSM research, especially in the areas with good and dense vegetation coverage [14].Based on remote sensing data, vegetation type map, vegetation cover index, biomass map, SOC and STN stocks can be mapped using various DSM methods [15].For instance, Landsat, WorldView, SPOT, KOMPSAT, and IKONOS images have been effectively used for this purpose, for different soil layers [16].In homogeneous or bare soils, simple linear regression models and remote sensing band values are usually used to estimate SOC and STN stocks across space [12,17].To date, various DSM techniques have been used for the mapping, such as multiple stepwise regression (MLSR) model [13], regression Kriging [18], ordinary Kriging [19], random forest model [20], boosted regression trees (BRT) [21], geographic weighted regression (GWR) [22], and principal component regression [23].However, depending on different study regions, sampling schemes, and experimental methods in collecting data, the specific method used to map SOC and STN in surface forest soils could vary.
Remote sensing data have been used to predict SOC and STN stocks with DSM [12].Furthermore, the topsoil SOC and STN stocks in the natural environment proved to have a good correlation with the topsoil biomass [9,24].In this regard, Landsat 8 [25], which is the most important free access spatial data source, provided various vegetation indices to make our study possible.
Combining environmental variables, in particular, satellite vegetation data, and DSM technology can be powerful.In this study, SOC and STN stocks of topsoil of forests in northeastern China were mapped by combining DSM and remote sensing technology.The objectives of this research were to: 1) Compare the performance of GWR, MLSR and BRT models in mapping topsoil (0-30 cm) SOC and STN stocks by using 513 soil samples and 9 satellite-based variables (Landsat TM green band (B GREEN ), Landsat TM red band (B RED ), Landsat TM near-infrared band (B NIR ), difference vegetation index (DVI), enhanced vegetation index (EVI), ratio vegetation index (RVI), normalized difference vegetation index (NDVI), renormalization difference vegetation index (RDVI), and soil adjusted vegetation index (SAVI)); 2) identify the key remote sensing variables in predicting these stocks; and 3) analyze the mapping uncertainties.

Study Area
Liaoning Province (38  1), with a total area of 148,000 km 2 [9].The main land use types are cultivated land, forest land, grassland, construction land, water conservancy facility land, and unused land, accounting for 43.5%, 42.4%, 2.2%, 0.6%, 1%, and 9.3% of the total area, respectively (Figure 2a).The altitude ranges from 1 m in the southwest of the coastal area to 1288 m in the northeastern mountainous area.The terrain is generally reduced from north to south and inclines from east to west to the middle.The study area had a continental monsoon climate in the North Temperate Zone with four distinct seasons.The annual average precipitation is 400-1100 mm, and 65-75% is concentrated from June to August, accompanied by heavy rain.The annual average temperature ranges from 7 °C to 11 °C, with the highest temperature of 30 °C in summer and the lowest temperature of −40 °C in winter.According to the soil system classification of China, the dominant soil types covering the whole study area are Cambosols and Argosols [9], followed by Primosols, Anthrosols, Halosols, Gleyosols, Isohumosols, and Histosols, accounting for 54.44%, 27.72%, 10.34%, 4.07%, 2.53%, 0.79%, 0.10%, and 0.01%, respectively (Figure 2b).Cambosols is a soil with cambic horizon.The main process of its formation is the low degree of mineralization and leaching of base ions.This type of soil is widely distributed in the whole region (Figure 2b).The variation of SOC content in Cambosols is large, and affected by soil parent material.Argosols refers to the soil with obvious clay particles and fully leached lime under the condition of moist soil moisture.Its clay content is high, and it is composed of silicate clay mineral which is not completely weathered.Its texture is relatively sticky, mostly in the form of prismatic block structure, with brown glue film.The SOC content in Argosols is relatively high (20-40 g kg −1 in the topsoil layer) [9].This type of soil is mainly distributed in the middle Liaohe alluvial plain area in our study region.Primosols is a kind of young soil and its soil properties basically keep the characteristics of soil parent material, with only an ochric epipedon, which is widely distributed in the river banks, estuarine deltas, alluvial plains and sand accumulation areas.Its existence is closely related to the short soil forming time, extreme drought, cold climate, high quartz content with strong weathering resistance, continuous soil erosion and accumulation, and artificial disturbance, so its SOC content is relatively low.Anthrosols is a new kind of soil type.Its original soils have changed through cultivation, fertilization, irrigation and drainage due to human activities.This type of soil is mainly distributed in the middle of Liaoning Province.This area is the main farming area of Liaoning Province, with frequent and intense human activities.Gleyosols, Isohumosols, and Histosols are relatively less distributed in Liaoning Province, but they have the highest SOC content.

Soil Sampling Collection and Measurement
Field intensive sampling is a time and money consuming activity, especially for a large forest area.In order to accurately reflect the spatial heterogeneity of SOC and STN stocks in those regions, a purposeful sampling method [26] was used.First, we selected main environmental factors that affect the spatial variability of topsoil SOC and STN, including mean annual precipitation, mean annual temperature, elevation, slope gradient, and slope aspect.In addition, we also recognized that parent material, groundwater and vegetation are important environmental factors that determine the spatial variation of SOC and STN [10,15,21,26].However, it was difficult to obtain them in this study.Consequently, we only used soil type map, topographic wetness index and normalized difference vegetation index (NDVI) to approximate their characterization at sampling sites.Second, a fuzzy c-means clustering method was used to cluster the whole research area and generated 64 clustering units.Finally, combining with road network information, 8-10 soil samples were collected in each cluster unit and a total of 513 topsoil (0-30 cm) samples were collected in 2015.Meanwhile, we also recorded gravel content greater than 2 mm for later calculation of SOC and STN density.Among all these samples, 80% of the samples were randomly selected as a training dataset, and the remaining 20% of the samples were used as an independent dataset to test the prediction performance of the three models.Limited by funds and personnel, it was very difficult to sample deeper, in such a large region.In our next research, we plan to select a relatively small forest area to carry out deeper sampling so as to obtain more accurate SOC and STN stocks of forest soils.In addition, 100 cm 3 of undisturbed soil cores were collected from the topsoil layer for subsequent laboratory determination of soil bulk density.The longitude and latitude information of each sample site was recorded by a hand-held GPS, and the positioning accuracy was 5 m.
SOC and STN contents were measured using a dry combustion method and a Vario EL III elemental analyzer in the Central Laboratory at Shenyang Agricultural University.The 100 cm 3 core samples were dried in an oven at 105 °C for 48 hours to determine the soil bulk density.SOC and STN stocks were calculated using the following formulas: where SOC density and STN density are the SOC and STN density at i soil layer (kg m −2 ), respectively; SOC content and STN content are the SOC and STN contents (g kg −1 ), respectively; BD i is the soil bulk density (g cm −3 ); D i and S i are the soil layer thickness (m) and the gravel content greater than 2 mm (%), respectively; i is specific soil layer, in this study, it represents topsoil 30 cm; k represents the number of sampling sites.

Remote Sensing Related Data
In this study, the remote sensing data were obtained from Landsat 8 satellite, downloaded from the United States Geological Survey (USGS).We obtained 11 satellite images from July to September of 2015, with cloud cover less than 10%.In MATLAB software, we used a homomorphic filtering method [27] to remove the cloud from the remote sensing images.The spatial resolution of the remote sensing data was 30 meters, and the data level was L1T, which went through geometric precision correction.Therefore, it was not necessary to use the ground control points or digital elevation model (DEM) data to do geometric precision correction again.In addition, many previous studies have shown that the Bottom-Of-Atmosphere (BOA) reflectance is the most appropriate remote sensing variable to predict the spatial distribution of SOC and STN [28].However, to obtain the BOA reflectance, atmospheric correction shall be performed for the remote sensing data.In this study, we used the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) atmospheric correction method [29] to calibrate the atmosphere in Environment for Visualizing Images (ENVI) 5.1 software.Due to the sun's altitude angle, some remote sensing images appeared to have mountain shadow, so we used a ratio method [12] to eliminate it in ENVI 5.1 software.Furthermore, we also carried out topographic correction because the study region is a mountainous area.We used a cosine correction method for the topographic correction of remote sensing images [30].This method is based on the modeling of illumination (IL) condition, so a DEM with the same spatial resolution as Landsat 8 images was needed.Subsequently, the IL conditions were modeled using the ground slope and aspect, as well as solar and satellite parameters.DEM is required to calculate the incident angle (λ i ), defined as the angle perpendicular to the ground and the sun light.The IL parameter is between −1 and +1, indicating the minimum and maximum illuminance, respectively, which can be calculated with Equation (3): where ϕ p is the slope angle; ϕ z is the solar zenith angle; φ a is the solar azimuth angle; and φ b is the aspect angle.In the cosine correction method, the reflectance of the surface is calculated using Equation (4): where ρ h is the surface reflectance; ρ t is the reflectance of an inclined surface; and θ z is the solar zenith angle.This method does not require any external parameters.Then, the region of interest (ROI) (forest boundary vector layer which was obtained from Liaoning Provincial Department of Natural Resources, China) was used to cut the remote sensing imagery data, remove the overlapped parts, and then to assemble the multiple images to form a mosaic of the region.The selected remote sensing data with 9 variables, included Landsat TM green band reflectance (B GREEN ), Landsat TM red band reflectance (B RED ), Landsat TM near-infrared band reflectance (B NIR ), difference vegetation index (DVI), enhanced vegetation index (EVI), ratio vegetation index (RVI), normalized difference vegetation index (NDVI), renormalization difference vegetation index (RDVI), and soil adjusted vegetation index (SAVI).B GREEN , B RED , and B NIR represent the growth and biomass of vegetation, respectively, and the ground feature images are rich, clear and well-organized [15,21].In the field of remote sensing science, NDVI is widely used in crop growth monitoring and yield prediction and serves as the best indicator of vegetation growth [31].SAVI explains the change in optical characteristics of the background [32].Different from NDVI, SAVI introduces a soil-adjusted coefficient to modify NDVI, which mainly reflects the influence of soil background on vegetation coverage, and its value ranges from 0 to 1.When SAVI is 0, vegetation coverage is 0; when SAVI is 1, the influence of soil background value is 0, mainly indicating that the influence of soil background is 0 in areas with dense canopy.Huete's research [33], conducted in a well-vegetated area, found that when the coefficient of soil adjusted L is 0.5, SAVI had a better effect on eliminating soil reflectance.RVI can better reflect the difference of vegetation coverage and growth status, especially suitable for vegetation monitoring with vigorous growth and high coverage [34].DVI, proposed by Richardson et al. [35], is very sensitive to the change of soil background.It can be used to identify water bodies well, and it increases rapidly with an increase in vegetation.RDVI increases rapidly with the increase of vegetation coverage when the vegetation is within the middle and low coverage, and increases slowly when the vegetation reaches certain coverage.So RDVI is suitable for monitoring the early and middle growth stages of vegetation [36].The specific definitions of these indices are: where B NIR , B RED , and B BLUE represented Landsat TM near-infrared band reflectance, Landsat TM red band reflectance, and Landsat TM blue band reflectance, respectively; L is the coefficient of soil adjusted, and its value range is from 0 to 1.In this study, we set it to 0.5.

Geographically Weighted Regression
Geographically weighted regression (GWR), multiple stepwise linear regressions (MSLR), and boosted regression trees (BRT) models were compared to obtain the best prediction of SOC and STN stocks in the forest area.GWR is the most classical spatial analysis technology and has been widely used in geographic science and spatial analysis [22].GWR could be used for prediction by establishing a local regression equation at each point in space [37].Because it considers the local effect of spatial objects, its advantage is of high accuracy.GWR is developed on the basis of the least square (OLS) model.The basic assumption of OLS is that the direct relationship between independent variables and dependent variables in a region is stable and uniform.In the OLS model, the regression coefficient of the model is for the whole research area and is calculated as an average value, which does not reflect the spatial characteristics of the regression parameters [38].To overcome this problem, a spatial variable parameter regression model was used [39].The parameters in the global model are a function of geographical location, so the change trend of parameters in space can be measured [22].GWR is an extension of ordinary linear regression model, which embeds the spatial position of data into regression [37].The GWR model was expressed as: where (ν i , δ i ) are the coordinates for the i location; ξ o (ν i , δ i ), is the intercept, ξ k is a regression coefficient, and x i,k is an environmental variable at the i location, and k is the number of environmental variables.
The regression parameters of this equation are estimated at each location i (ν i , δ i ).ω i is the residual values at i point.In this study, we used the "spgwr" package [22] in R software environment.

Multiple Stepwise Linear Regressions
As a classical prediction method, MLSR is widely used to predict soil properties and the interaction between analysis variables by considering many factors [40,41].For instance, Ishii et al.
[42] used the MLSR method by considering all relevant information, eliminating irrelevant factors, simplifying equations, reducing errors, adding variables to the model one by one, and analyzing the explanatory variables with F-test.The MLSR model introduced each variable into the model one by one, and F-test was used to analyze and explain the contribution of each variable to the prediction [43].In the process of modeling, environment variables were introduced into the model one by one, and unimportant environment variables were proposed by using t-test [41].During the whole iteration process, the iteration was stopped until there were no environment variables that could be added and deleted in the modeling process.Therefore, the final model only included the important environmental variables that affected the spatial distribution of soil attributes.The MLSR models can be expressed as: STN density = 1.39 − (0.18 ± 0.02) * B GREEN +(0.26 ± 0.04) * SAVI − (0.28 ± 0.05) * RVI+(0.14 ± 0.03) * EVI (13) R 2 values of Equations ( 12) and (13) were 0.46 and 0.41, respectively, while the errors were 9.32 kg m −2 (±0.52) for SOC stocks and 1.26 kg m −2 (±0.31) for STN stocks.

Boosted Regression Trees
BRT model was developed by Friedman et al. [44].It is composed of boosting and regression trees [15].Based on the random gradient of decision tree, boosting technology introduces all samples into the model at one time, and corrects the model by constantly changing the weight.In the process of multiple iterations, the goal of the next iteration is to find a function to fit the residuals of the previous round.The iteration can only be stopped when the residuals are small enough or when they reach the number set by the user [9].BRT models can easily deal with soil environment problems in complex landscape areas, and avoid nonlinear and interaction problems [15].Compared with the traditional regression model, the BRT model has better prediction performance, especially in the spatial prediction of soil properties (such as SOC, STN, and pH) [9].In this study, we used the "gbm" software package developed by Elith et al. [45] to build the model in the R language environment.The fitting of BRT model was controlled by four parameters: Learning rate (LR), tree complexity (TC), bag fraction (BF), and tree number (NT) [46].LR represents the contribution of each tree in the model to the final fit model.TC is a direct predictor of tree depth and maximum interaction level [21].BF represents the scale of data used in each model [9].NT is determined by LR and TC [15].To obtain the best prediction performance of the BRT model, we tested different combinations of LR (0.0025, 0.025, 0.25, and 0.50), TC (3, 6, 8, 9 and 10), BF (0.45-0.75) and NT (500, 800, 1000, 1200, and 1500) through 10-fold cross-validation technology.Finally, we found the model has the smallest error when LR, TC, BF and NT are 0.025, 9, 0.65 and 1200, respectively.

Prediction Accuracy
Randomly selected, 80% of the total samples were used as model training data (n = 410), and the remaining 20% for model validation and accuracy assessment (n = 103).The mean absolute error (MAE), the root mean square error (RMSE), the coefficient of determination (R 2 ) and the concordance correlation coefficient (LCCC) [47] were selected to test the performance of the GWR, MLSR and BRT models.Previous studies [44,47] have revealed that error estimation is an effective method to test different models and datasets.The specific indices used for the validation were: where Y and X represent the predicted value and measured value at sampling point i; Y and X represent the average value of the predicted value and measured value at sampling sites; ∂ Y and ∂ X represent the change of the predicted value and measured value at sampling point; n represents the number of samples; r represents the Pearson correlation coefficient.

Exploratory Data Analysis
Table 1 listed the descriptive statistics of the measured topsoil (0-30 cm) SOC and STN stocks, and remote sensing environment variables at sample sites.The average SOC and STN stocks of topsoils were 10.32 kg m −2 (±0.53) and 1.21kg m −2 (±0.32), respectively.Under the generalized skew distribution with skew coefficients of 0.54 and 0.63, the measured topsoil SOC and STN stocks can be well described in this area.Histosols (the average SOC and STN stocks were 13.14 kg m −2 and 1.43 kg m −2 , respectively) and Isohumosols (the average SOC and STN stocks were 10.71 kg m −2 and 1.37 kg m −2 , respectively) have the highest SOC and STN stocks, while Primosols (the average SOC and STN stocks were 4.32 kg m −2 and 0.93 kg m −2 , respectively) has the lowest SOC and STN stocks in 2015 (Table 2).The correlation coefficients between topsoil SOC and STN stocks, and selected remote sensing environmental variables are shown in Table 3. SOC and STN stocks were negatively correlated with B GREEN , B NIR , and RDVI, and positively correlated with SAVI, NDVI and EVI.We found that there was a multicollinearity between the environmental variables of remote sensing data (Table 3).It was not reliable to use a simple linear method to predict topsoil SOC and STN stocks in space.Comparing multiple, more sophisticated models helps identify the best model and obtain accurate spatial distribution of topsoil SOC and STN stocks.

Model Performance and Uncertainty
In forest ecosystems in northeastern China, the spatial prediction performance of GWR, MLSR and BRT models for topsoil SOC and STN stocks was compared.The summary statistical results of model validation are shown in Table 4. Comparing the three models in topsoil SOC and STN stock prediction performance, BRT was the best, followed by GWR and MLSR models.The BRT model with the best performance could explain the spatial variation of 56% and 51% SOC and STN stocks in the region, respectively.To further examine the three models, we created density plots of the predicted and measured values of SOC and STN stocks at the sampling sites (Figure 3).We also presented scatter plots of the predicted and measured values, which also indicated that the BRT model has the best performance (Figure 4).Overall, BRT performs the best, matching with the measured topsoil SOC and STN stocks.We iterated the BRT model 100 times and calculated the average standard deviation (SDs) to analyze the uncertainty of the BRT model in predicting topsoil SOC and STN stocks (Figure 5).We found that the BRT model had a lower uncertainty compared with GWR and MLSR models.

Importance of Remotely Sensed Environmental Variables
Through 100 iterations of BRT model, the average relative importance of each remotely-sensed environmental variable in predicting topsoil SOC and STN stocks was calculated.To facilitate the analysis of the model, we combined the relative importance of all environments to 100% (Figure 6).We found that the main remotely-sensed environmental variables affecting the spatial variability of topsoil SOC stocks were NDVI, SAVI, B GREEN , EVI, and RDVI (77% of the total relative importance).Correspondingly, SAVI, NDVI, B GREEN , EVI and DVI were the main environmental variables (accounting for 80% of the total relative importance) that affect the spatial variability of topsoil STN stocks in forest dominated areas.

Spatial Prediction of SOC and STN Stocks
Our model verification showed that BRT model performs best, so it was selected as the final model to predict the spatial distribution of topsoil SOC and STN stocks (Table 5).The topsoil SOC and STN were mainly stored in Argosols and Cambosols in forest ecosystems Liaoning Province, accounting for 88.9% and 85.8% of topsoil SOC and STN stocks, respectively.In addition, the BRT model showed that topsoil SOC and STN stocks had a similar spatial distribution pattern (Figures 7 and 8), increasing gradually from southwest to northeast, and the average topsoil SOC stocks increase from 0.51 to 28.12 kg m −2 estimated with BRT model.Correspondingly, the average topsoil STN stocks increased from 0.22 to 2.73 kg m −2 .

Role of Remotely-Sensed Enviroment Variables in Predicting Topsoil SOC and STN Stocks
In forest ecosystems, the combination of remotely-sensed data and BRT model provided a better prediction (Table 3) of SOC and STN stocks in topsoil, which was consistent with previous findings [21,46].For instance, Chen et al. [48] used remote sensing images to predict the topsoil SOC content of a 115-hectare land in crisp County, Georgia.They found that the remote sensing data showed a significant linear relationship with the topsoil SOC content.Through the study on the semi-arid grassland in eastern Australia, Wang et al. [49] concluded that spatial prediction of SOC stocks was a very challenging task in the semi-arid grassland area, especially in those areas lacking basic data and vegetation coverage.Compared with direct measurement in the field, DSM is an economical method to predict SOC.Previous studies also indicated that the vegetation is correlated with carbon and nitrogen stocks in topsoil [50].The spectral reflectance of remote sensing data and derived vegetation index reflect vegetation coverage well, which are valuable for mapping SOC and STN.
To compare the contribution of various remotely-sensed environmental variables to the prediction of topsoil SOC and STN stocks, we iterated the BRT model 100 times to calculate their average relative importance, and then merged them into a percentage.We found that NDVI, SAVI and B GREEN were the most effective variables affecting topsoil SOC and STN stocks in the forest dominated areas.This conclusion was consistent with previous studies.For example, Gong et al. [51] used 8227 soil profiles and a cubist model, a generalized linear model, support vector machines, and random forest models to simulate the SOC stocks in Brazil, by using NDVI as an effective predictor.In the coastal forest area of northeastern China, Qi et al. [52] concluded that SAVI was an important remotely-sensed variable to predict SOC stocks.In the same study area, Wang et al. [21] recommended that SAVI should be introduced into the spatial prediction model of topsoil SOC stocks, especially in dense forest ecosystems.B GREEN has also been used as an efficient predictor in estimating SOC and STN stocks.According to Yang et al. [53] and Wang et al. [49], B NIR and B RED reflected the land-use situation in a region to a certain extent, but our study was limited to the study of SOC and STN stocks of surface forest soils, not all land use types in the region.Therefore, in mapping SOC and STN stocks, B NIR and B RED showed the lowest relative importance.DVI could better reflect the difference of vegetation coverage and growth status, especially suitable for vegetation monitoring of dense forests.Similarly, RVI could better reflect the difference of vegetation coverage and growth status, especially suitable for dense forests [54].DVI was sensitive to the change of soil background and could better identify water bodies.Its value increased with the increase of vegetation density.Therefore, both DVI and RVI were introduced into SOC and STN mapping [55].Furthermore, in the Jutland peninsula, Denmark, Pouladi et al. [56] used NDVI, SAVI, DVI, RVI and terrain related variables combined with Kriging, cubist model, random forest and regression-Kriging models to predict the SOC content in the topsoil layer.They found these remotely-sensed variables explained the spatial variation of 89% SOC content on average.They also found that, there is no need to introduce other environment variables into the model construction when the sampling sites were dense enough.However, our study suggests that environmental variables including remotely-sensed data are necessary to map these stocks, especially for dense forest areas.
A few studies have also pointed out the importance of using remote sensing data in mapping SOC.For instance, in the coastal forest areas of northeastern China, Wang et al. [46] constructed a BRT model using three multispectral bands, NDVI and SAVI to predict the SOC stocks in topsoil, and achieved high prediction accuracy (R 2 = 59%).Yang et al. [53] used B GREEN , B RED , B NIR and a random forest model to predict SOC stocks and found that these variables could explain the spatial variation of 72% SOC stocks in this region.In Dalian, China, Wang et al. [15] used three Landsat 5 satellite bands and NDVI data, combining a random forest model, to predict STN content in the topsoil layer.They found that remotely-sensed data should be used as key environmental variables for the prediction in densely forest dominated areas.

Uncertainty in Current Research
BRT performed best in comparison with GWR and MLSR models to predict SOC and STN in forest areas of northeastern China.However, there were still uncertainties in our prediction.First, data collection and lab analysis were conducted by several groups.Thus, there might be some sampling errors and experimental errors.Second, due to the influence of terrain and clouds, high altitude areas were prone to produce shadows in the process of image segmentation, leading to large reflectivity errors in satellite imagery data.Finally, this study was limited to estimating the SOC and STN stocks of the forest topsoil (0-30 cm), which might lead to underestimation of SOC and STN stocks in the region since SOC and STN are also stored in deep soil layers of forest ecosystems.

Figure 1 .
Figure 1.Soil sampling locations in digital elevation model map (c) at a 30 m spatial resolution in Liaoning Province (b) of China (a).

Figure 2 .
Figure 2. Land use types (a) and soil types (b) in the study area.

Figure 6 .
Figure 6.Relative importance of remotely-sensed environment variables as determined from 100 iterations using the boosted regression trees (BRT) in predicting topsoil (0-30 cm) SOC stocks (a) and STN stocks (b) in 2015, which are combined into percentage.Note: B GREEN , Landsat TM green band reflectance; B RED , Landsat TM red band reflectance; B NIR , Landsat TM near-infrared band reflectance; SAVI, soil adjusted vegetation index; NDVI, normalized difference vegetation index; RVI, ratio vegetation index; DVI, difference vegetation index; EVI, enhanced vegetation index; RDVI, renormalization difference vegetation index.

Table 2 .
Summary statistics of topsoil (0-30 cm) soil organic carbon (SOC) stocks (kg m −2 ) and soil total nitrogen (STN) stocks (kg m −2 ) of each soil group according to the Chinese soil system classification at sampling sites.

Table 4 .
Summary statistics of the predictive quality of geographically weighted regression (GWR), multiple stepwise linear regression (MSLR), and boosted regression trees (BRT) for topsoil (0-30 cm) SOC and STN stocks based on an independent dataset (n = 103) in 2015.

Table 5 .
Statistical description of soil organic carbon (SOC) and soil total nitrogen (STN) stocks of each soil groups according to the Chinese soil system classification estimated with boosted regression model (BRT) in forest topsoil (0-30 cm).