Predictive Models to Estimate Carbon Stocks in Agroforestry Systems

: This study aims to assess the carbon stock in a pasture area and fragment of forest in natural regeneration, given the importance of agroforestry systems in mitigating gas emissions which contribute to the greenhouse effect, as well as promoting the maintenance of agricultural productivity. Our other goal was to predict the carbon stock, according to different land use systems, from physical and chemical soil variables using the Random Forest algorithm. We carried out our study at an Entisols Quartzipsamments area with a completely randomized experimental design: four treatments and six replites. The treatments consisted of the following: (i) an agroforestry system developed for livestock, (ii) an agroforestry system developed for fruit culture, (iii) a conventional pasture, and (iv) a forest fragment. Deformed and undeformed soil samples were collected in order to analyze their physical and chemical properties across two consecutive agricultural years. The response variable, carbon stock, was subjected to a boxplot analysis and all the databases were used for a predictive modeling which in turn used the Random Forest algorithm. Results led to the conclusion that the agroforestry systems developed both for fruit culture and livestock, are more efﬁcient at stocking carbon in the soil than the pasture area and forest fragment undergoing natural regeneration. Nitrogen stock and land use systems are the most important variables to estimate carbon stock from the physical and chemical variables of soil using the Random Forest algorithm. The predictive models generated from the physical and chemical variables of soil, as well as the Random Forest algorithm, presented a high potential for predicting soil carbon stock and are sensitive to different land use systems.


Introduction
The use of agroforestry systems to achieve optimum agronomic benefits through the efficient use of resources (nutrients, light, water collection, and utilization) has received great attention for its contribution to mitigating climate change through organic carbon sequestration [1]. In this context, understanding the dynamics and storage of soil carbon, especially in agroforestry systems, is essential for informing public policies focused on disseminating these agricultural practices [2]. Four land use systems were assessed: (i) agroforestry system deve (AFS1), (ii) the agroforestry system developed for fruit culture (AFS2 pasture (Pasture), and (iv) forest fragment undergoing natural reg (Forest). All land use systems were comprised under the same type o Neossolo Quartzarênico according to the Brazilian Soil Classification S sandy texture corresponding to the Entisols Quartzipsamments acc Taxonomy [15]. Table 1 presents the granulometry performed in June 2 the soil texture areas. Land use systems were implanted in areas which were occupied b except for the forest fragment (tree species), which had been undergoin of regeneration for more than 35 years. This area was used as a refer recovery strategy in this transitional region between the Atlantic a biomes.
The management systems adopted to implement the AFS1 and Four land use systems were assessed: (i) agroforestry system developed for livestock (AFS1), (ii) the agroforestry system developed for fruit culture (AFS2), (iii) area used as pasture (Pasture), and (iv) forest fragment undergoing natural regeneration process (Forest). All land use systems were comprised under the same type of soil-classified as Neossolo Quartzarênico according to the Brazilian Soil Classification System [14]-with a sandy texture corresponding to the Entisols Quartzipsamments according to the Soil Taxonomy [15]. Table 1 presents the granulometry performed in June 2016 to characterize the soil texture areas. Land use systems were implanted in areas which were occupied by crops until 2011, except for the forest fragment (tree species), which had been undergoing a natural process of regeneration for more than 35 years. This area was used as a reference of the natural recovery strategy in this transitional region between the Atlantic and Cerrado forest biomes.
The management systems adopted to implement the AFS1 and AFS2 areas were similar to each other and effectively followed the subsequent order of events. (1st) Grasses (Brachiaria sp.) and pigeon pea (Cajanus cajan) were planted to decompress the soil and produce biomass. (2nd) After 2 years, the area was cleared, preserving the grass stems for regrowth, after which tracks for soil tillage were opened using a rotary hoe (machinery to remove biomass and allow soil tillage for planting in windrows). (3rd) Preparation of planting windrows was carried out (1.2 m wide tracks at a distance of 5 m from each other) with rotary tillers and fertilizing with rock dust (500 g linear meter −1 ). (4th) Hoed material was laid on the tracks prepared for planting using the same machinery (rotary hoe) to form a thick layer of mulch and to control the growth of grass in the windrows, nutrient cycling, and accumulation of organic matter to the soil of this region. (5th) Species of interest were introduced according to the productive focus of each system.
The implantation of AFS1 occurred in June 2015 in total area of 15 ha, focused on producing livestock and wood (Eucalyptus pellita) by preparing 1.2 m wide rows spaced 5 m from each other to plant eucalyptus. The inter-rows were occupied with Marandu grass (Urochloa brizantha cv. Marandu) to supply organic residue to cover the soil of the planting tracks. Considering the need for a greater spacing of the pasture, this model included 12 m-wide tracks of Marandu grass for every three rows of planted eucalyptus. In turn, AFS2 was planted in June 2014 in a total area of 5.2 ha, and focused on fruit production (bananas, citrus, and mango) and wood (Acacia mangium and Eucalyptus sp.) following the same pattern as the previous model (AFS1) where 1.2 wide windrows spaced 5 m from each other were used. The inter-rows were also occupied with Marandu grass.
In addition to the use of rock dust during the system's implantation, biofertilizer (principal active compound of this material is fresh cattle manure), organic compound, or castor oil meal were applied sporadically in the planting windrows. After hoeing the inter-rows, Azospirillum was applied to the grass.
The pasture area was used in this study as a reference that demonstrated the common trend of agricultural occupation of the region. Planting was carried out in 2012, using grass (Urochloa brizantha cv. Marandu). The farm functioned until April 2016, when it ceased activities in the dairy and the area was abandoned. Therefore, the soil sampling was taken from the 4-year-old grasses.

Experimental Design, Soil Collection and Analyzed Physical and Chemical Properties
We used a completely randomized experimental design with six repetitions and four land use systems: AFS1, AFS2, pasture, and forest.
Soil samples were collected to analyze the physical and chemical variables during two agricultural years-the first collection in the second quarter of 2016 and the second collection in the second quarter of 2017. The soil collections were performed at the depths of 0.00-0.05, 0.05-0.10, 0.10-0.20, and 0.20-0.40 m. The sampling for AFS1 and AFS2 were performed at three sampling regions: (i) planting windrows (L), (ii) windrow buffers (I), and (iii) inter-rows (E). However, in AFS1, an additional collection was performed in the 12 m inter-rows designed for animal pasture (E12). Because of their homogeneity, both the pasture and forest areas were not subdivided in sampling regions, and the soil sampling was performed in six repetitions at each depth, randomly distributed throughout the areas ( Figure 2).
performed at three sampling regions: (i) planting windrows (L), (ii) windrow b and (iii) inter-rows (E). However, in AFS1, an additional collection was perform 12 m inter-rows designed for animal pasture (E12). Because of their homogen the pasture and forest areas were not subdivided in sampling regions, an sampling was performed in six repetitions at each depth, randomly d throughout the areas ( Figure 2).  Organic C and total N contents were determined from samples of air-dried fine earth (ADFE) crushed in mortar and filtered in a 100-mesh sieve (0.149 mm), before determining the total C and N content through dry combustion using an elemental analyzer (Leco CHN-600 instrument) [19].
C and N stocks were calculated (in Mg ha −1 ) according to each sampled soil layer by multiplying the concentration of each element (%) by Ds (g cm −3 ) and layer thickness (cm). As the samples were always collected in fixed layers, the values of carbon stock found were corrected based on their equivalent mass-a methodology proposed in [20]-by using the equivalent soil mass found in the forest area as reference.

Predictive Modeling
The complete database was composed of 21 variables, of which 20 were predictive variables and 1 was a goal-variable or response-variable, which referred to the carbon stock in the soil ( Table 2). For the purpose of inducing the model, the complete database was subdivided in four data subsets, one for each sampling depth: 0.00-0.05, 0.05-0.10, 0.10-0.20, and 0.20-0.40 m.
The response variable (soil carbon stock) was initially submitted to descriptive analysis through boxplot graphs in which the following measures were identified: lower limit, first quartile, median, third quartile, and upper limit.
Afterwards, in order to select only the variables which contributed to the model, a correlation matrix was used to eliminate variables with null variance or variable which were highly correlated between each other. In the case of two highly correlated variables, one was randomly maintained and the other was eliminated for not adding any practical information to the model. In contrast, a procedure to eliminate null variables was performed to ensure that no variable had null variance. Following this, we modeled the soil carbon stock through the Random Forest algorithm implemented on the R program [21]. Specifically for the Random Forest algorithm, at each division in each tree, an improvement in the divided criterium was an important measure attributed to the division variable and was accumulated over all trees in the forest for each variable. Thus, to access the importance of the selected variables, each tree was trained in a bootstrap sample, and the optimum variables in each division were identified from a random subset of all variables. Different selection criteria were applied for classification and regression problems: the former used the Gini coefficient and the latter employed variance reduction [22].
Model validation was performed using the hold-out method, in which 70% of the data were used for training and 30% for testing. Later, the results were graphically expressed through a regression in which the final result was the mean of all regression tree results forming the Random Forest algorithm [23]. The model performance was assessed through the coefficient of determination (R 2 ), Pearson correlation coefficient (r), and root-meansquare error (RSME) obtained from the analysis between the values observed and predicted through the models generated.

Results
Firstly, a descriptive analysis was conducted using boxplot graphs to understand the soil carbon stock behavior (response variables) for each land use system assessed ( Figure 3).
In general, the area cultivated in the pasture had the lowest carbon stock in relation to the remaining land use systems, followed by forest area (Figure 3). In addition, the AFS2 was significant in relation to other land use systems for showing higher mean values for soil carbon stock in the superficial layers; values of 7.73 and 7.09 Mg ha −1 were observed at the layers of 0.00-0.05 and 0.05-0.10 m, respectively. In contrast, for deeper layers, the AFS1 was significant with a carbon stock of 10.66 Mg ha −1 for the layer 0.10-0.20 in AFS1 L and 15.40 for layer 0.20-0.40 m in AFS1 l. soil carbon stock behavior (response variables) for each land use system assessed ( Figure 3).
In general, the area cultivated in the pasture had the lowest carbon stock in relation to the remaining land use systems, followed by forest area (Figure 3). In addition, the AFS2 was significant in relation to other land use systems for showing higher mean values for soil carbon stock in the superficial layers; values of 7.73 and 7.09 Mg ha −1 were observed at the layers of 0.00-0.05 and 0.05-0.10 m, respectively. In contrast, for deeper layers, the AFS1 was significant with a carbon stock of 10.66 Mg ha −1 for the layer 0.10-0.20 in AFS1 L and 15.40 for layer 0.20-0.40 m in AFS1 l. Subsequently, we generated a correlation matrix to select variables to identify null variance or high correlation (Figure 4). We found that variables such as SB, CEC and V were highly correlated among themselves and with the exchangeable bases Ca, Mg and K, as well as with pH and m.  Subsequently, we generated a correlation matrix to select variables to identify null variance or high correlation (Figure 4). We found that variables such as SB, CEC and V were highly correlated among themselves and with the exchangeable bases Ca, Mg and K, as well as with pH and m.    Furthermore, with the increase in depth, the interactions between the variables became less intense and a larger number of variables was selected for model construction. The database was tested for data dispersion, frequency distribution, and Pearson correlation coefficient ( Figure 6) to prove the absence of correlations in the database after the variable selection process. The results confirmed that variables with null variance and a high correlation were completely eliminated from the database, and only those with a correlation coefficient below 70% were used. The database was tested for data dispersion, frequency distribution, and Pearson correlation coefficient ( Figure 6) to prove the absence of correlations in the database after the variable selection process. The results confirmed that variables with null variance and a high correlation were completely eliminated from the database, and only those with a correlation coefficient below 70% were used. According to Figure 7, nitrogen stock was the most important variable in predicting soil carbon stock, followed by land use system, a behavior verified in all the soil layers available. However, from the third most important variable onward, the variable selected starts to show a differentiation according to the soil layer, revealing Fe and m as the most important variables for the layers 0.00-0.05 and 0.05-0.10 m, respectively, as well as the SOM for layers 0.10-0.20 and 0.20-0.40 m. According to Figure 7, nitrogen stock was the most important variable in predicting soil carbon stock, followed by land use system, a behavior verified in all the soil layers available. However, from the third most important variable onward, the variable selected starts to show a differentiation according to the soil layer, revealing Fe and m as the most important variables for the layers 0.00-0.05 and 0.05-0.10 m, respectively, as well as the SOM for layers 0.10-0.20 and 0.20-0.40 m.

Discussion
Nitrogen stock was the most important variable for predicting soil carbon stock in different land use systems, a result similar to the findings by [24] where several approaches involving predictive modeling were used; the variable of total nitrogen concentration contributed the most to explaining the spatial patterns of soil carbon stocks. Nitrogen availability could act in two opposite ways to maintain soil carbon stock: The first contributed to increased soil C stock by enhancing primary production and consequently raising the amount of biomass above the ground. The second contributed to decreased soil carbon stock since higher N availability could also accelerate the SOM mineralization rate [7].
Land use was the variable with the second strongest influence on predicting carbon stock. Science has indicated that land use and changing land use are the most important factors in determining carbon stocks and sequestration in the short term, since soil carbon stocks could take anywhere from decades to centuries to accumulate, but carbon losses resulting from changes in land use could occur rapidly in few years and were extremely difficult to reverse [25]. Therefore, considering the importance of land use to predict carbon stocks, along with the possibility of performing land use mapping through satellite images, it was possible to quantify the impact on the soil carbon stock associated with future changes in land use [26]. This result was quite relevant for allowing the development of public policies focused on a rational land use; to prioritize land use systems to promote increased carbon sequestration and enhanced soil carbon stocks.
Another important result pointed out that models generated from the physical and chemical variables of soil and the Random Forest algorithm had a high potential to predict carbon stock, and the models generated were sensitive to different land use systems. However, a study conducted by [24] using a support vector regression (SVR), an artificial neural network (ANN), and the Random Forest to predict and map soil carbon stocks

Discussion
Nitrogen stock was the most important variable for predicting soil carbon stock in different land use systems, a result similar to the findings by [24] where several approaches involving predictive modeling were used; the variable of total nitrogen concentration contributed the most to explaining the spatial patterns of soil carbon stocks. Nitrogen availability could act in two opposite ways to maintain soil carbon stock: The first contributed to increased soil C stock by enhancing primary production and consequently raising the amount of biomass above the ground. The second contributed to decreased soil carbon stock since higher N availability could also accelerate the SOM mineralization rate [7].
Land use was the variable with the second strongest influence on predicting carbon stock. Science has indicated that land use and changing land use are the most important factors in determining carbon stocks and sequestration in the short term, since soil carbon stocks could take anywhere from decades to centuries to accumulate, but carbon losses resulting from changes in land use could occur rapidly in few years and were extremely difficult to reverse [25]. Therefore, considering the importance of land use to predict carbon stocks, along with the possibility of performing land use mapping through satellite images, it was possible to quantify the impact on the soil carbon stock associated with future changes in land use [26]. This result was quite relevant for allowing the development of public policies focused on a rational land use; to prioritize land use systems to promote increased carbon sequestration and enhanced soil carbon stocks.
Another important result pointed out that models generated from the physical and chemical variables of soil and the Random Forest algorithm had a high potential to predict carbon stock, and the models generated were sensitive to different land use systems. However, a study conducted by [24] using a support vector regression (SVR), an artificial neural network (ANN), and the Random Forest to predict and map soil carbon stocks revealed that the Random Forest algorithm achieved the worst result, having generated a model with a determination coefficient of only 0.53. Nevertheless, the authors justified that the development of the Random Forest algorithm may have been compromised due to different extensions of the study areas, topography, sampling density, or quantity and quality of the auxiliary data used.
In the 0-40 cm layer of soil, AFS1 and AFS2 areas had higher carbon stocks than the the pasture and forest areas. These results agree with [27], who demonstrated the high potential of agroforestry systems to increase carbon stocks, both in soil and in the biomass of trees in different pedoclimatic conditions in France. However, these findings contrast with the indications found by [28] and [2]: agroforestry systems had a soil carbon stock similar to a natural forest.
Even though the forest fragment used as a reference in this study had been undergoing a natural regeneration process for over 35 years, the carbon storage process tended to be slow as it was located in a soil classified as Quartzarenic Neosol, which tended to show few mechanisms to stabilize soil carbon. Soils with limited capacity to protect organic carbon, either chemically, biochemically or through aggregation, leave the carbon in an unprotected form and vulnerable to decomposition [29].
The profile of the carbon stock analysis along revealed that the AFS2 planting windrows were more efficient at stocking carbon in the topsoil layers. In contrast, the AFS1 demonstrated a better ability to stock carbon in deeper layers (0.10-0.20 and 0.20-0.40 m), both for planting windrows and windrow buffers. According to [27], soil carbon stocks were larger particularly in rows of trees from AFSs and mainly at the upper 0.30 m of the soil. It was also possible to increase carbon stocks at deeper layers in some silvicultural systems.
The intensive application of vegetable residues from hoeing the inter-rows combined with a pruning system management favored the formation of a thick layer of vegetable residues in the planting windrows in AFS2. This likely contributed to the results found for a larger soil carbon stock, especially for the most superficial layers in the system. In addition, the AFS2 was characterized by a high diversity of plants and high biomass production, therefore showing more potential to contribute to vegetable residues in planting windrows.
By contrast, the AFS2 had a lower plant diversity biomass accumulation on the soil surface, but had an important carbon source in depth, which is connected to the root biomass. A study conducted by [30] demonstrated that 26-year-old agroforestry systems have significantly increased the total organic carbon content in the soil as well as carbon storage through two a two-way carbon input, rhizodeposition, and deposition of residues above the ground. However, according to [31], the contributions of roots under the carbon stocks are smaller in depth (below 0.30 m) compared to the most superficial layers of the soil, such as the effect of residues on wooded areas restricted to the superficial layer.

Conclusions
Agroforestry systems developed both for fruit culture and livestock were more efficient at stocking carbon in the soil than a pasture area and a forest fragment in natural regeneration, due to the greater addition of crop residues on topsoil superficies and the management of organic fertilizer in agroforestry systems. Nitrogen stock and the land use system represent the most important variables used to estimate carbon stock from the physical and chemical variables of soil using the Random Forest algorithm. The predictive models generated from the physical and chemical variables of soil through the Random Forest algorithm had a high potential to predict soil carbon stocks and could be applied in different land use systems.