Drivers of Organic Carbon Stocks in Different LULC History and along Soil Depth for a 30 Years Image Time Series

: Soil organic carbon (SOC) stocks are a remarkable property for soil and environmental monitoring. The understanding of their dynamics in crop soils must go forward. The objective of this study was to determine the impact of temporal environmental controlling factors obtained by satellite images over the SOC stocks along soil depth, using machine learning algorithms. The work was carried out in S ã o Paulo state (Brazil) in an area of 2577 km 2 . We obtained a dataset of boreholes with soil analyses from topsoil to subsoil (0–100 cm). Additionally, remote sensing covariates (30 years of land use history, vegetation indexes), soil properties (i.e., clay, sand, mineralogy), soil types (classiﬁcation), geology, climate and relief information were used. All covariates were confronted with SOC stocks contents, to identify their impact. Afterwards, the abilities of the predictive models were tested by splitting soil samples into two random groups (70 for training and 30% for model testing). We observed that the mean values of SOC stocks decreased by increasing the depth in all land use and land cover (LULC) historical classes. The results indicated that the random forest with recursive features elimination (RFE) was an accurate technique for predicting SOC stocks and ﬁnding controlling factors. We also found that the soil properties (especially clay and CEC), terrain attributes, geology, bioclimatic parameters and land use history were the most critical factors in controlling the SOC stocks in all LULC history and soil depths. We concluded that random forest coupled with RFE could be a functional approach to detect, map and monitor SOC stocks using environmental and remote sensing data.


Introduction
Soil organic carbon (SOC) plays an important role in soil physical, chemical and biological processes, as well as to mitigate global climate change [1,2]. The SOC increases soil aggregate stability, water retention capacity and infiltration, plant available water, cation exchange capacity, and soil microbial and macrofauna biomass [3]. Moreover, the SOC is vital for soil productivity and can improve soil health [4,5]. Soil stores about 750 Pg of SOC in the top 30 cm and nearly 1500 Pg SOC in the first 1-meter depth [6]. Therefore, evaluating SOC et al. [76] and Mendes et al. [77] predicted soil mineralogy at the surface and subsurface using RS data and soil spectra which can assist on SOC stocks dynamic studies.
To better understand the controlling factors of SOC stocks, Random Forest (RF) has been applied as an attractive machine-learning technique [21,78,79]. RF can deal with a high number of input variables both categorical and numerical simultaneously [16,28]. Besides, it has the ability to predict spatial distribution of the SOC stocks with environmental covariates such as digital elevation model (DEM), climate data, RS and soil information [80]. Mayer et al. [16] applied RF to find the controlling factors of SOC stocks in agricultural topsoils and subsoils of Bavaria, Germany. Mahmoudzadeh et al. [79] and Gomes et al. [21] implemented machine learning techniques to predict the spatial distribution of SOC in western Iran and Brazil. Their results showed that RF had the best performance in predicting the spatial distribution of SOC. Nabiollahi et al. [78] used RF models to evaluate soil SOC stocks under different land use change scenarios.
Identifying the most important factors affecting the SOC stocks at various land uses and soil depths is required for an effective management decision to increase soil C levels, especially in the Brazilian tropical soils where agriculture has been intensive. In fact, high temperatures, precipitation, photosynthetic activity and deforestation in tropical region caused intensive decomposition of soil organic matter (SOM) and reduced the SOC content and CEC [81][82][83][84][85]. Despite this, Brazil is one of the most important biofuel producers (sugarcane ethanol) [86] with 29.8 million tons of sugar and 35.6 billion liters of ethanol from 642.8 million tons of sugarcane) in the 2019/2020 harvest [87]. Therefore, there is a gap on SOC stocks evaluation in depth for this important culture, as determined by sensing techniques.
The objectives of this work were to find the controlling factors of SOC stocks along depth and space derived from a 30-year satellite time series to recover soil properties and land use. Therefore, we considered the following steps: (1) assess the land use and land cover (LULC) changes during the 1985-2015 period; (2) evaluate the distribution of SOC stocks both across the LULC classes and soil profile (0-100 cm); (3) identify the main controlling factors of SOC stocks up to 1 m depth in different LULC history classes using remote sensing data and machine learning algorithms; and (4) predict the spatial distribution of SOC stocks and its uncertainty in an area at São Paulo State, Brazil. The results of this can assist decision-makers to understand the effects of anthropic practices on the SOC stocks in tropical areas as well as to create strategies to increase soil C levels to face climate change.

Study Area
This work was conducted in Piracicaba region, which covers around 2577 km 2 at Northwest of the São Paulo State, Brazil ( Figure 1). This region has a subtropical climate with dry winters and rainy summers, which is classified as Cwa by the Köppen system [88]. The annual precipitation average is 1440 mm, while the maximum and minimum temperatures are 25 and 13 • C, respectively. The topography varies between 450 and 943 m above the sea level. The soils of the region, classified according to the Word Reference Base for Soil Resources (WRB) [89] and the Brazilian Soil Classification System (SiBCS) [90], encompass Luvisols (Luvissolos), Chernozems (Chernossolos), Acrisols/Lixisols (Argissolos), Cambisols (Cambissolos), Cryosols (Gleissolos), Latosols (Latossolos), Arenosols (Neossolos) and Planosols (Planossolos), developed mainly from crystalline and sedimentary rocks [91]. The main land uses are sugarcane, pasture, urban area, remaining forestland, water bodies and riparian forests [92].

Methods
In this study, the following steps were conducted ( Figure 1): (1) Determine bulk density and SOC content, followed by the calculation of SOC stocks for each geographical sample position; (2) Determination of environmental covariates; (3) Generate LULC (land use and land cover) history map along 1985-2015; (4) Overlaying and extracting the variables for each sample; (5) Reducing covariates by applying the Recursive Feature Elimination (RFE) for each layer and LULC history class; (6) Performing Random Forest (RF) using 10-fold cross-validation to identify important variables; (7) Select the best model and compute variable importance; (8) Provide spatial prediction of SOC stocks for each depth and uncertainty map.

Soil Survey Data
The soil dataset was obtained from the study of Mendes et al. [93]. This dataset has soil samples at 0-20 (2229 samples), 40-60 (1796 samples) and 80-100 (1664 samples) cm depth including chemical (carbon) and granulometric analysis. These samples were collected following the toposequence method which considers the geological and relief variation along the landscape, taking into account soil variation and LULC changes from the highest to the lowest parts of the relief [94].
We used pedotransfer equations developed for tropical areas [96] to estimate soil bulk density from our soil dataset. As our soil dataset did not report the percent of stoniness, we did not consider it for the calculation of the SOC stocks through the profile, according to Equation (1) [21]: where SOC stocksi represented the soil organic carbon stocks (gm −2 ) for layer i, SOCi is the content of soil organic carbon (gkg −1 ) for layer i, BDi is bulk density (gcm −3 ) for layer i and Di is soil thickness of layer i (cm).

Land Use Time Series and Land Cover (LULC History)
Time series of LULC information is important for assessing the SOC stocks [14,37]. Each LULC map represents just a LULC at a special time and usually does not show the temporal variation of LULC. Therefore, evaluating the history of LULC can help scientists to study SOC stocks precisely.
The map of temporal frequency of the LULC classes from 1985 to 2015 defined as LULC history map was generated using LULC maps obtained from the published database of Sayão et al. [97] with the average kappa coefficient value of 0.74. The authors used a time series of Landsat between 1985 and 2015 (every 5 years) restricted to two climatic periods, the dry (June-August) and moist (September-January) seasons. They excluded urban areas from the images, due to similar spectrum characteristics with bare soil to avoid confusion. The authors applied maximum-likelihood supervised classification using four classes: agriculture, forest, water and pasture. Landsat 8 OLI images were selected as a base and all the maps were co-registered with it. Afterward, the frequency of land use alteration along time was calculated for each LULC class and in each pixel along 1985 to 2015 according to Equation (2) in ArcGIS10.3: where PLCi is the temporal frequency of a specific LULC class between 1985-2015; a i is the count of a specific LULC class that occurred across the period of time and n is the total number of LULC maps for the same period. If the PLCi is closer to 100 (highest frequency), it means that the class i did not change during the period of study and if PLCi is equal to zero (lowest frequency), it means that the class i did not exist during the period of study.

Covariates Preparation
In order to identify the controlling factors of SOC stocks at each depth increment, different variables to act as proxies were selected based on the SCORPAN (Soil, Climate, Organisms, Relief, Parent material, Age and Space) model [98]. Thus, we included soil parameters, soil mineralogy, soil types, geology, topographical factors, climatic factors, vegetation indexes and land use information ( Table 2). The maps of soil properties including sand, silt, clay and CEC were obtained from the study of Mendes et al. [93] and soil minerals including kaolinite, goethite, gibbsite and hematite from Poppiel et al. [76] and Mendes et al [77].
Terrain attributes were generated from a 30 m Digital Elevation Model (DEM) using the Terrain Analysis in Google Earth Engine (TAGEE) package [99]. Additional terrain attributes were derived from a 5 m DEM [93] using ArcGIS 10.3 and SAGA GIS and resampled to 30 m pixel size.
Bare soil fragments were mapped between 1985 and 2015 from surface reflectance Landsat images using classification rules and were aggregated by calculating the median values into the denominated Synthetic Soil Image (SYSI) described in detail by the Demattê et al. [100]. The SYSI has six spectral bands (Blue, Green, Red, NIR, SWIR1 and SWIR2) and a Bare Soil Frequency (SF) band. The SF was calculated by dividing the number of pixels detected as bare soil by the total number of pixels [100].
We also used the geologic map performed by Bonfatti et al. [101] 30 m resolution (about 1:100,000 scale). In the Piracicaba region, six lithology units including Ct (sandstones, siltstones, varvites, tillites and conglomerates of Tubarão Group), H (unconsolidated sand, gravel and clay of the Holocene), Jbp (sandstones, siltstones and shales of the Botucatu and Piramboia Formation), Ksg (Basalts, sandstones and diabases of Serra Geral Formation), Pc (siltstones, shales, limestones and quartzites of the Corumbatai Formation) and Pi (Oil shales, dolomites, siltstones and quartzites of the Irati Formation) exist. Soil type was extracted from the published database of Rossi [91] and concluded 9 classes (see Section 2.1). Time series of climatic factors  were obtained from TerraClimate [102] dataset at 4 km resolution. Meanwhile, a mean annual precipitation was determined from CHIRPS Daily (Climate Hazards Group InfraRed Precipitation with Station Data, version 2.0 final) dataset at 4.8 km resolution [103]. All of these climatic factors were determined in GEE, while the bioclimatic ones were obtained from the 1 km resolution WorldClim2 dataset   [104]. These climatic and bioclimatic maps were upscaled to 30 m pixel size by Spline interpolation in ArcGIS10.3 to have the same resolution of the other covariates. Besides, the land surface temperature (LST) images were obtained between 1985 to 2015 from Sayão et al. [97] and were aggregated into a single image by calculating the mean values of LST. The LST was estimated using the inversion of Planck's function in the thermal band. Landsat 5, 7 and 8 Collection 1 Tier 1 8-day NDVI, EVI and NDWI composites were used to calculate the following spectral indexes over a period of 1985 to 2015 in GEE at 30 m resolution: normalized difference vegetation index (NDVI = (NIR-Red)/(NIR + Red)), Enhanced Vegetation Index (EVI = 2.5((NIR-Red)/(NIR + 6Red-7.5Blue + 1))) and normalized difference water index (NDWI = (NIR-SWIR)/(NIR + SWIR)).

Covariates Selection
Using a large number (67) of covariates as inputs machine learning approaches is time and resources intensive [21,105]. The Feature selection can help to deal with these computational limitations by reducing the number of input variables for the machine learning algorithm [106] and also have a significant impact on the model performance. In this study, spearman correlation analysis was used to avoid multicollinearity effects. For this topic, among high correlated (>0.8) parameters in each category, the variables with the weakest correlation were omitted. Thus, some variables including sand (from soil physical and chemical properties), Green, Red, NIR and SWIR1 (from SYSI), EVI (from Mean Annual Vegetation Index), MAP, MaxTemp, MinTemp, BIO1, BIO12, BIO13, BIO15 (from Climate and Bioclimatic) were not included into the Recursive feature elimination (RFE). Then RFE algorithms from the Caret package [107] were applied in R using all covariates with eight subsets: 1, 5, 10, 15, 20, 25, 30, 40 s based on the 5-fold cross-validation. Co-variates were chosen based on the lowest RMSE.

Random Forest Models
RF is one of the machine learning approaches that generate multiple trees without pruning to boost the prediction accuracy [108]. This method has the capability of reducing noise, resisting overfitting, increasing the performance of prediction by aggregating various predictions, and handling both quantitative and qualitative predictors [21,80,109,110]. In addition, RF has the ability to measure the importance of variables based on mean decrease in accuracy and in Gini index [111]. RF has been widely used to predict the spatial distribution of the SOC stocks because it is fast and simple to use, has high accuracy and also provides the importance of variables [79,109]. The selected covariates in the previous section were used to map the SOC stocks and to evaluate their influence at each depth increment and LULC history classes using RF regression from the Caret package in R software [112]. The controlling factors of SOC stocks were evaluated in four LULC history classes of Ag = 100, Pas = 100, Ag > 50-Pas < 50 and Ag > 50-Fo < 50, as well as total samples in each depth. These LULC history classes were chosen because the number of soil samples were not enough for modeling in all classes.
For these purposes, the soil dataset of each LULC history and depth was randomly split into 70% and 30% for training and testing, respectively. Moreover, ten-folds crossvalidation was predefined to optimize the models with the training dataset. The number of randomly selected predictor variables (mtry) was tested by considering 5, 10, 15, 20, 25 and the number of total variables that selected with the best model of feature selection methods as mtry. Additionally, ntree was fixed on 500 iterations for stable predictions. Afterwards, the ability of the models to correctly predict SOC stocks were evaluated by applying the fitted models to the testing dataset (30%) and calculating the coefficient of determination (R 2 ), root mean square error (RMSE) and residual prediction deviation (RPD). The R 2 represents variation in the dependent variable, while RMSE is the square root of the variance of the residuals and indicates the model's absolute fit to the data. The RPD is the ratio of standard deviation to RMSE that classifies the predicted model to good (RPD > 2), fair (1.4 < RPD < 2) or unreliable (RPD < 1.4). The importance of variables also was also determined based on mean decrease in accuracy.

Spatial Predictions and Uncertainty Maps
The soil data was randomly split into training (70%) and validation (30%) sets to obtain optimal models. This can provide spatially robust predictions of SOC stocks and their associated uncertainties at each depth increment by a bootstrapping method [113] with 100 interactions using the Caret package in R. The mean values upper (Q95) and lower (Q5) percentiles for each pixel were calculated. The mean value considered as the final map of the SOC stocks and the coefficient of variation (CV) were computed as spatial uncertainty [114]. The higher amount of CV represents more uncertainty and the lower CV shows more accurate estimates.

LULC History
The frequency of alteration in agriculture and forest class from 1985 to 2015 is shown in Figure 2a and b. These maps show that the study area has been used with high frequency for agriculture ( Figure 2a) and less for forest (Figure 2b). The high temporal use frequency of agriculture demonstrates that there have been few changes in LULC classes in the indicated period. Generally, the LULC history map presented differences along temporal analysis (Figure 2c). Pure agriculture use covered about 29% of study area while mixed use reached more than 50% agriculture and less than 50% pasture (Ag > 50-Pas < 50) with 34% of the region. The other dominant LULC history classes comprised only 37% of the area. Figure 2d shows the location of soil samples in each LULC history. Table 3 shows the summary statistics of soil attributes from our dataset measured over different LULC history classes and in depth. SOC contents varied from 0 to 17.47 gkg −1 at 0-10 cm depth, and between 0.06 and 7.77 gkg −1 at 90-100 cm. Soil bulk density varied from 0.05 to 1.57 gcm −3 . As expected, the mean values of SOC stocks decreased with depth from 198.82 gm −2 to 125.18 gm −2 from 0-10 to 90-100 cm, respectively. Results showed that the mean values of SOC content and SOC stocks decreased with depth in all LULC history classes. The     Since then, the percentage of agricultural land has declined. In contrast, the percentage of forest and pasture areas has decreased steadily until the 2000 and then increased. Overall, in about 37% of the study site, land use has not changed during 1985-2015. 28.6% of the area is used only for agriculture, 2% for forest and 0.02% for pasture (Figure 4c). Meanwhile, about 0.7% and 6% of the study area are composed of water and urban-road, respectively. In the rest of the study area (63%), land use has changed during 1985-2015. The highest and the lowest percentages of the area belong to the Ag > 50-Pas <50 class (34%) and Pas < 50-Fo < 50 class (0.0008%) (Figure 4c).

Covariate Selection
Figure 5a-e provides the RMSE values from the five-fold cross-validation in the process of selecting important variables in each LULC and depth with the RFE algorithm. The number of important variables was selected by the lowest RMSE repeated from the five-fold cross-validation using the RFE algorithm. The bar chart (Figure 5f) compares the number of variables selected as important in each depth and LULC history class. It can be observed that the numbers of important variables varied in different depths and decreased on the contrary (increasing depth) in each LULC history class and from layer 4 to 10 it stabilized (Figure 5a-e). However, some soil depths presented different trends. Overall, more variables were selected in total samples than in other LULC throughout the depth. In this case, the number of important variables from layer 1 to 5 were 15, 10, 10, 15, 10, respectively, and from layer 6 to 10 were 15 variables.

Performance of the Random Forest Models
The result of the RF prediction models for the SOC stocks for each LULC history and depth is presented in Table 4. It is noticeable that the models are strongest in total data, with R 2 test of 0.78 to 0.92, RPD of 2.13 to 3.42. Despite this, they performed weakest in Ag > 50-Fo < 50 class with R 2 tests of 0.12 to 0.43 and RPD of 0.73 to 1.48. Indeed, the performance of prediction models in all depth based on R 2 , RMSE and RDP decreased on the sequence, total samples, Pas = 100, Ag > 50-Fo < 50 class, Ag = 100 class, Ag > 50-Fo < 50 class, respectively. Results demonstrated the best performance of the RF for Ag = 100 class, Ag > 50-Pas < 50 class and total samples with the mtry of 5. The values of mtry varied from 5 to 20 for different depths of Pas = 100 classes and Ag > 50-Fo < 50 classes.

Ag = 100
The percentage of importance for each variable used in the RF model is provided in Figure 6 to 10. The results illustrate the influence of each variable on SOC stocks values over each depth and LULC history class. Figure 6 gives information about the most important variables for predicting SOC stocks in all layers, in Ag = 100 class. As observed (Figure 6), soil properties including clay, silt and CEC were important in all depths. Clay presented the greatest importance among the soil properties and its relative importance increased with depth. Among Vegetation indexes, the NDVI was the best at top layers (1 and 2). Although, NDWI had influence in all layers except for layer 6. Bioclimatic parameters were important in all depths ( Figure 6). Moreover, the importance of parameters related to precipitation (BIO14) increased in all depth. Among terrain attributes, elevation was important at all depths and others only at top layers. Minerals were important along allprofiles, but hematite showed greater relevance in layers 1-3 and 9-10. Gibbsite was important in layers 4 to 7 and goethite in layer 8. In addition, SYSI bands were important in all depths so that SF was important at surface layers and SWIR2 was important at all layers. Soil type was important in layers 1, 2 and 6.

Pas = 100
In Pas = 100 class (Figure 7), soil parameters including clay, silt and CEC were important in all depths and their cumulative influence increased with depth. Vegetation indexes (NDVI, NDWI) were important at layers 1 to 7. Precipitation parameter was important at surface layers and temperature in deeper ones. Meanwhile, LST was more important in upper layers. From terrain attributes, Eastness, AnalyticalHillshading, SlopeHeight, were important at first layer and other attributes such as MRVBF, Vally Depth and Hillshade impacted more in deeper layers. Minerals including kaolinite, gibbsite, goethite and hematite were important in most of the depths. In addition, SWIR2 was important in upper layers, while Geology was important in layer 10.

Ag > 50-Pas < 50
Moreover, in Ag > 50-Pas < 50 class (Figure 8), clay was important in layers 2 to 10, while CEC was for 1, 2, 5 and 7 to 10, although their relative importance increased with depth. From Vegetation indexes, NDWI was noticed only in layer 2, while Climate parameters in all depths. In this case, the importance of BiO5 and BIO6, parameters related to temperature decreased with depth and BIO14 and LST increased with depth. From Terrain parameters, elevation was important in all depths, TFD500 was important in layers 1 to 7, Northness in layer 1 to 4 and 6 and Analytical Hillshading in layer 2. From SYSI bands, SF was important only in layer 2. Geology and soil type were important in all depths.

Ag > 50-Fo < 50
In Ag > 50-Fo < 50 class (Figure 9), clay and CEC wererelevant in most depths and the relative importance of these factors tended to increase with depth. NDVI was more related at top layers. Among the Climate parameters, just BIO14 and LST were relevant in some layers. Terrain parameters had higher influence on SOC stocks, mainly elevation, aspect, valley depth, hill shade, MRRTF, TST, TFD500, minimal curvature, maximal curvature, Gaussian curvature, Analytical Hillshading. Gibbsite and hematite impacted in layers 6 and 1, respectively, and goethite greatly influenced layers 5 and 8. In addition, from SYSI bands, SF influenced layers 3 and 8, whileSWIR2 was important for layer 10. Additionally, geology was mostly related with layer 1, while soil type was not important in any depth.

Predicting SOC Stocks
Total data in each depth were used to predict SOC stocks in the Piracicaba region. Different variables including soil properties, climate factors, SYSI bands, terrain attributes, soil mineralogy, geology and soil type were used to produce SOC stocks maps. The predicted maps of SOC stocks in layers 1 and 10 are shown in Figure 12. The south, center and eastern parts of the study area had shown the highest values of SOC stocks in both layers. Comparing the spatial distribution of lower, upper and mean values of SOC stocks, each layer shows a similar trend. The uncertainties varied from 0.41 to 31.97% and 0.41 to 52.35% in layer 1 and 10, respectively.

LULC History and Distribution of SOC Stocks
Most areas (about 63%) in Piracicaba region were mainly affected by anthropogenic activities during 1985 to 2015 (Figure 4 c). The highest SOC stocks were found in Pas = 100% class and the lowest value in Ag > 50-Pas < 50. Thus, the land use is an important factor that controls the SOC stocks. This is in agreement with Boddey et al. [115], Rezende et al. [116] and Cerri et al. [117] for whom pasture grasses have the potential to introduce large amounts of organic matter into the soil. This happens because carbon flows from plants to the soil through the inlets above and into it. The amount of SOC on the surface is significant (13.17%), and will impact the environment. This will impact in observations of Pausch and Kuzyakov [118] and Freschet et al. [119], for whom two-thirds of the photoassimilates produced by the plants remain above the ground and are used for the production of biomass and respiration, as the rest goes direct to the roots.
Higher SOC stocks in pasture (Figure 4a, Table 3), is explained by the type of roots. Plants have root systems with different characteristics and functions (e.g., architecture, morphology, mutualistic associations), which determine its influence on SOC. Pasture remains more time in the field, since their management on soil disturbance occurs from a minimum of about 10 years and going until 30 years or more. This situation associated with the fasciculate type of roots brings light to the results. As a consequence, the roots go deeper to reach water and nutrients. Carbon derived from roots has a residence time in the soil 2.4 times greater than the one derived from the aerial part [120]. Despite this, the fasciculate roots of pasture have a better distribution along all profiles, and will impact in depth SOC. Indeed, Tognon et al. [121] observed this same behavior when comparing fasciculated roots from Cerrado bioma with pivoting (in general) from the Amazon region. The authors observed that in Amazonia clay was the main driver of OM in the surface layer. In cerrado, OM are lower in surface but greater than amazon in deeper layers which was related with the type of vegetation. Mayer et al. [16] found similar results to us that the constant grassland has the highest SOC stocks in the upper-most 30 cm of the soils in comparison to periodical grassland.
Lower SOC stocks in other LULC history classes can be explained by human activities and different soil management. In Ag = 100, Ag > 50-Fo < 50 and Ag > 50-Pas < 50 class, the lands have been cultivated for more than 15 or even 30 years. Therefore, cultivating may cause loss in SOC stocks by tilling and ploughing (break down the aggregates), harvesting (crop and residual removal) and erosion [16,[122][123][124]. In fact, this region has reported great erosion risks with soil loss of amount around 58 Mg ha −1 year −1 in areas with agriculture [125].
The higher amount of SOC stocks in Ag > 50-Fo < 50 in all depth than Ag = 100 and Ag > 50-Pas < 50 classes can be attributed to the role of forest in inputting more C to the soil. In fact, forest soil can be richer in SOC than other land uses because of deep root distribution and higher input of SOC [126] (Figure 4a). Sheikh et al. [127] and Martín et al. [128] stated that forests maintain higher value of carbon in soils than agriculture due to deep root systems. When we compare pasture with forestry, still, the second maintains more carbon along time [125], and thus explains why agriculture with forestry presented higher SOC than agriculture-pasture (Figure 4a). Their results show an intense soil degradation in areas with agriculture (soil loss rate-58 Mg ha −1 year −1 ) while the ones with forest, riparian vegetation, afforestation and pasture had rates 30 times lower.
There were important differences in the amount of SOC stocks content for Ag > 50-Fo < 50 and Ag > 50-Pas < 50 (Figure 4a). These anthropic alterations can impact on gas emission as stated by Don et al. [42], Sharma et al. [129] and Wijesekara et al. [130]. For these authors LULC alteration and management practices are significant sources of human induced greenhouse gas emission and changes in SOC stocks. Other researchers have also found similar results, reporting that the SOC stocks were mainly controlled by the LULC and LULC history [16,35,42,131], and thus may occur in our studied region.
Comparing the results in soil depth SOC stocks decreased in all LULC history classes (Figures 3 and 4a). This is in agreement with other studies [4,132]. However, the amount of carbon alteration on the surface is greater than along deeper layers. At the deeper layers (>80 cm), the amount of SOC stocks is similar in all cultivated areas. These results could be due to more exposing SOC stocks to anthropogenic disturbances at topsoils. On the other hand, the changes of SOC need hundreds or thousands of years at deeper layers [132,133] due to stabilization processes. Actually, more water accumulation and low oxygen in deeper layers can lead to reduced microbial decomposition of SOC [134]. Although the processes and mechanisms involved in stabilizing and consequently accumulating SOC are widely investigated and described by the scientific community [135][136][137][138], these studies lack description of the mechanistic dynamics on how biotic and abiotic factors affect C in the tropics. On the other hand, there is plenty of knowledge that addresses quantitative changes in SOC stocks, especially in areas of land use change and soil management [42,117,139].
In the emerging knowledge of the mechanisms responsible for protecting SOC against microbial decomposition, it is assumed that the persistence of organic material, that is, its stabilization, is a property of the surrounding ecosystem [140]. Thus, it cannot be explained solely by the molecular composition of organic compounds that in theory would guarantee selective preservation [141,142]. The spatial inaccessibility conferred by the formation of soil aggregates [135], or more intimate associations such as the formation of chemical bonds between OM and the surface of soil minerals [136,143] are considered key SOC protection mechanisms [137]. However, what makes organic material associated with minerals unavailable to microbial communities, or the factors that promote spatial compartmentalization between microorganisms and enzymes, and the organic substrate, are still partially answered questions.
It is a fact that the soils are extremely heterogeneous in materials and environments, with a range of organic and mineral constituents, a wide variety of living organisms, presenting a complex spatial arrangement of these constituents at different scales. The soil microbiome directly influences the decomposition and stabilization of SOC in the soil for a long time, in addition to governing the biogeochemical cycling of various nutrients [144]. The regulatory power of microorganisms in C dynamics cannot be overlooked. Therefore, an integrated understanding of the factors that govern the dynamics of SOC in its microenvironment is essential to better elucidate the mechanisms involved in the stabilization/destabilization of SOC [145].

Variable Selection and Model Performance
The number of influential variables on SOC stocks varied in different depths from 5 to 15 variables (Figures 6-10). The RFE method not only determined the best subset of influential variables for each soil depth but also was effective in reducing the processing time by decreasing the total variables in agreement with [21].
The performance of prediction models in all depths were achieved with different results in various LULC history and depth (Table 3). Overall, R 2 varied from 0.11 to 0.93 and 0.12 to 0.94 in top layers (1 to 5) and sub layers (6 to 10), respectively. The performance of prediction models for LULC history classes based on R 2 from the test data decreased in Pas = 100, Total samples, agr > 50-pas < 50, agr = 100, and ag > 50-for < 50, respectively. In other studies, the performance of RF in predicting SOC stocks varied as well. Hounkpatin et al. [146], Dharumarajan et al. [80] and Nabiollahi et al. [78] computed R 2 of about 0.14, 0.23 and 0.7 using RF, respectively. Taghizadeh

Controlling Factors
A survey of 67 predictors in RF showed that in all land use history classes about 5 to 20 variables had an impact on the prediction of SOC stocks ( Figure 5). The parameters of soil and terrain were common in all land use history classes and depths although with different importance. In all LULC history classes, soil parameters including clay, silt and CEC were achieved as important variables in almost all depths.
Clay was revealed as the most prominent variable as its relative importance increased with soil depth. Many studies reported that soil texture and especially fine particles (clay and fine silt) have influence on stabilization of SOC via formation of organo-minerals [24,27,[147][148][149]. Moreover, specific surface areas (SSA), surface charges and cation exchange capacity of clay are effective in stabilization of SOC [27,147,[150][151][152]. The importance of CEC increased with depth because this parameter is directly related to soil mineralogy.
Terrain attribute was also the key factor in variation of SOC stocks in Piracicaba region, in agreement with studies of Davy and Koen [25], Kozłowski and Komisarek et al. [153], Taghizadeh-Mehrjardi et al. [8], Fissore et al. [23], Zhu et al. [22], Li et al. [4]. They stated that terrain attributes are important in variation of SOC stocks in local scales where the climate is more uniform.
The mean annual temperature and precipitation were the most important climatic variables in most studies but they were omitted because of high correlation with the Bioclimatic variables which reflect seasonal variability. Indeed, spearman correlation coefficients were used to avoid multicollinearity effects similar to the study of Mayer et al. [16]. Luo et al. [154] also described that Bioclimatic variables and the changes in the patterns of temperature and precipitation under climate change were more important because this parameter has significant effect on the transport and availability of water and ecosystem functional processes [155].
Bioclimatic parameters were important just in all depths of Total samples, Ag > 50-Pas < 50 and Ag = 100 classes which have more samples and more distribution in the region. However, Bioclimatic parameters were also important in some depths of other classes.
Geology was another prominent factor that impacted SOC stocks in all depths of Total samples, Ag > 50-Pas < 50 classes due to influence on basic soil parameters [156]. In our region Jbp, a magmatic rock had the highest mean value of SOC stocks. Indeed, geology and parent material has an indirect role in protecting SOC from mineralization by affecting soil texture and clay content [157][158][159].
Soil type was important in all depths of Total samples and Ag > 50-Pas < 50. The most important soil types in the Piracicaba region are Latossolos (Oxisols), and Argissolos (Ultisols) which comprise about 50% and 25%, respectively. These soil types are the most important classes found in tropical region. Maximum value of SOC stocks were found in these soils in agreement with Marques et al. [160]. In tropical clayey soils including Oxisols and Ultisols, Fe and Al oxides play a significant role in stabilization and accumulation of soil organic matter [161,162]. Therefore, soil type should be considered as a factor in predictions of SOC stocks similar to study of Paz et al. [82], Grimm et al. [109], Matteodo et al. [163], Ottoy et al. [46] and Mayer et al. [16].
Minerals react differently in LULC history and are quite important for SOC dynamics because they provide surfaces that can adsorb organic molecules. They are represented by the fine mineral particles smaller than 2 µm that form the clay-sized fraction [164]. However, in general, minerals that exhibit particle size < 53 µm are mainly responsible for the formation and persistence of organo-mineral associations in soils [165]. The finest fraction of the soil contains three main types of minerals: phyllosilicates (known as clay fraction minerals), metal oxides and hydroxides, and primary minerals (quartz, feldspar). All of these types of minerals have been shown to effectively protect SOM from decomposition. SOC is associated with soil minerals through various mechanisms, such as ligand exchange, polyvalent cation bridge, hydrogen bonds and van der Waals forces [137]. Kopittke et al. [166] described two key processes by which SOC is chemically stabilized: (i) part of SOC is associated with existing links between organic material and minerals, and (ii) compounds rich in N can bind directly to mineral surfaces and form new associations. Evidence suggests that chemically stabilized SOC has its mineralization rate reduced [167], since microorganisms and enzymes are not able to break the chemical bonds existing between organic materials and soil minerals, making the substrate organic unavailable [164].
Oxisols and Ultisols with high clay content and Fe and Al oxides have a greater capacity to SOC which was similar to the studies of Six et al. [135] and Carter et al. [168]. This trend was recently verified by Poirier et al. [169], who observed that soils and horizons with a high content of clay and reactive minerals, with a low initial concentration of SOC, quickly stabilize the C and N from vegetable residues recently added through the formation of organo-organic associations. Some studies do not consider clay content to be a good predictor of soil capacity to form organo-mineral associations [170]. Although, literature also shows positive correlations between clay and SOC [171]. In this sense, several authors define the mineralogy and the initial content of MOS as the two key factors that determine whether the vegetal residue added to the soil will be protected inside aggregates or transformed into mineral associated organic matter [135,136,172,173].
Mineralogy has been poorly addressed to its impact in SOC. Gibbsite, goethite and hematite presented more effect than kaolinite on SOC stocks (Figures 6-10). Clay minerals with large SSA and Fe/Al oxides with flocculating and reducing the surface area available for SOM adsorption have an effect on SOC stabilization [85,135]. Moreover, Al/Fe-(hydr) oxides can protect organic matter from microbial decomposition by limiting the activity of soil microorganisms [174]. Therefore, minerals have an important role in stabilization of SOC [175].
In total samples the percent of agriculture, pasture and LULC history classes effect on SOC in all depths. It shows that land use conversions and soil management can have effect on SOC stocks in all depth which is in agreement with study of Guo and Gifford [38], Deng et al. [31] and Mayer et al. [16]. In the study region the main practice is soil conventional tillage and not no-till. This takes us to underline that the management practices may have significant impact, although it was not possible to quantify. Additionally, many studies show that long term LULC has an effect on distribution of SOC stocks [14,16,176,177].
In addition, Vegetation indexes just affected SOC stocks in top soils of all land use history classes, which can explain their importance as drivers of SOC [21,178]. Historically, it was assumed that the quantity and quality of structural plant residues deposited aboveground were responsible for maintaining and/or increasing SOC contents. However, scientific advances have shown that the contribution of inputs that occur inside the soil exceeds those above the ground for the formation of stable SOC [120,179]. Most of the organic material provided above ground in the form of leaves, trunks and branches. These remains are retained in the top layer of the soil in the form of particulate organic matter, which is quickly lost in a short time, moves little to deeper layers of the profile and has a limited contribution to the accumulation of SOC [180].
Moreover, SYSI bands were also selected as one of the significant covariates for digital mapping of SOC stocks. This is because SYSI detects bare soil and has advanced images related to lithological and pedological variability, such as mineralogy, texture, color and OC [100]. Other researcher also identified that SYSI was important for DSM of soil attributes in top soils and subsoils [75,[181][182][183]. Overall, the variation of SOC stocks in soil depths depended on the interaction of different environmental variables which is in agreement with the study of Gray et al. [50], Mayer et al. [16], Gomes et al. [21] and Ramesh et al. [184].
The results produced by this research showed that not only environmental factors can affect SOC stocks levels along depth but also human interventions with land use change over time. Indeed, changes in SOC stocks in different Land use history indicate that land use change can have devastating effects on SOC stocks levels and global climate. Thus, soil management can be one of the significant factors in changes of SOC stocks and climate change. Moreover, the combination of remote sensed variables with RF model can be an accurate approach to monitor SOC stocks in soil depth at tropical regions. Therefore, this research has significant implications in identifying the controlling factors of SOC stocks and mapping SOC stocks soils in different LULC history and depth with remote sensed data. However, more studies should be carried out in humid and sub-humid areas to find the controlling factors of SOC stocks and appropriate management strategies. Moreover, in the present study the border pixels of each LULC class were not corrected which might have an effect on the area of each LULC class. Therefore, it is recommended to consider this process prior to making a time series of LULC history maps.

Conclusions
Remote sensing technology was essential to deliver the object information (soil properties and land use history). With the results it was possible to map and evaluate the impacts of co-variables on SOC stocks. The RF model with RFF proved to be a reliable tool for identifying the driving factors of SOC stocks over depth and land use class based on a 30-year time series. Meanwhile, the model performed well in predicting the spatial distribution of SOC stocks.
The SOC stocks varied in different LULC history classes and soil depth. The highest and the lowest mean values of SOC stocks were observed in areas with Pas = 100 and Ag > 50-Pas < 50, respectively. Moreover, the mean values of SOC stocks decreased by increasing the depth in all LULC history classes. The root type and land use indicated differences of SOC stocks in depth, being greater for pasture (fasciculated roots) than for the others.
The RF model with RFE algorithm revealed to be a suitable approach to finding the factors controlling SOC stocks and for monitoring SOC stocks with environmental and remote sensing covariates.
Soil properties, terrain attributes, geology, bioclimatic and land use history parameters were identified as the most important factors in controlling the SOC stocks in all depths. Natural and anthropogenic factors affected SOC stocks in all depths.
Geology > BIO14 > clay > BIO6 > BIO5 > soil type > land use history > percent of agriculture > TFD500 > Elevation> silt > CEC > LST > Pasture > SF, in this order, are presented from the most to the least important variables that impacted SOC stocks in layer 1. Additionally, goethite > elevation > Clay > land use history > BIO6 > Geology > percent of agriculture > BIO14 > CEC > percent of Pasture > Soil type > BIO5 > silt > SF >LST were the most important to the least important variables that impacted SOC stocks in layer 10.
In summary, from soil attributes, clay and mineralogy were the most important factors which impacted SOC stocks. The results can bring light to SOC management and its environmental monitoring by public policies, farmers and agriculture industries.