Prediction of Soil Organic Carbon under Different Land Use Types Using Sentinel-1/-2 Data in a Small Watershed

: Soil organic carbon (SOC) is a key property for evaluating soil quality. SOC is thus an important parameter of agricultural soils and needs to be regularly monitored. The aim of this study is to explore the potential of synthetic aperture radar (SAR) satellite imagery (Sentinel-1), optical satellite imagery (Sentinel-2), and digital elevation model (DEM) data to estimate the SOC content under different land use types. The extreme gradient boosting (XGboost) algorithm was used to predict the SOC content and evaluate the importance of feature variables under different land use types. For this purpose, 290 topsoil samples were collected and 49 features were derived from remote sensing images and DEM. Feature selection was carried out to prevent data redundancy. Coefﬁcient of determination (R 2 ), mean absolute error (MAE), mean squared error (MSE), percent root mean squared error (%RMSE), ratio of performance to interquartile range (RPIQ), and corrected akaike information criterion (AICc) were employed for evaluating model performance. The results showed that Sentinel-1 and Sentinel-2 data were both important for the prediction of SOC and the prediction accuracy of the model differed with land use types. Among them, the prediction accuracy of this model is the best for orchard (R 2 = 0.86 and MSE = 0.004%), good for dry land (R 2 = 0.74 and MSE = 0.008%) and paddy ﬁeld (R 2 = 0.66 and MSE = 0.009%). The prediction model of SOC content is effective and can provide support for the application of remote sensing data to soil property monitoring.


Introduction
Terrestrial ecosystem is an important carbon pool, most of which is stored in soil [1]. There is an exchange of carbon dioxide (CO 2 ) between the atmosphere and soil, which leads to the sequestration or emission of CO 2 by soil [2]. Moreover, soil organic carbon (SOC) is closely related to soil fertility, plant growth, conservation of soil and water, aggregate stability, and soil nutrient cycling capability [3]. Both natural factors (soil type, climate, and topography) and human activities (land use and farming practices) affect the content of SOC [4]. However, the carbon storage in soil is limited and the carbon content in the soil will gradually reach saturation with the increase of SOC storage [5]. Therefore, it is necessary to monitor the content of SOC. However, the traditional method of measuring sample points is time-consuming and costly [6], and thus is not suitable for large-scale monitoring. In order to solve these problems, scholars gradually began to explore the use of robust and cost-effective approaches to predict SOC content.
In recent studies, scholars have mostly used laboratory-based soil reflectance spectroscopy to predict SOC content [7][8][9][10][11][12][13]. They selected several spectral bands (visible, mid-infrared, near-infrared, and short-wave infrared spectroscopy) for prediction. Although these methods have a good prediction accuracy, with coefficient of determination (R 2 ) approximately greater than 0.83, the laboratory spectrum has some limitations; that is, it can only provide spectral data at the point scale, but not at the landscape scale [14]. The remote sensing data are easy to obtain, covering a wide range, and can be used for ground monitoring throughout the year. Therefore, it is necessary to explore the contribution of remote sensing data to SOC prediction. There have been some studies on SOC prediction based on remote sensing data that achieved good prediction results, especially optical data (e.g., Sentinel-2 [15][16][17][18][19][20][21], Landsat [22][23][24][25][26][27][28], and MODIS satellite data [29][30][31]); their bands cover from visible to short-wave infrared, providing more information. However, the application of optical data is susceptible to weather conditions, especially in the Sichuan Basin where clouds occur most frequently [32], so the available optical data are very limited. Moreover, soil surface conditions (e.g., vegetation coverage, debris, soil moisture, and soil roughness) also affect SOC prediction [19].
Synthetic aperture radar (SAR) technology has the advantages of all-weather, day and night imaging [33]. Therefore, the spectral range from remote sensing used for SOC prediction is not limited to the visible and infrared; radar can and have also be used [34][35][36]. Recent studies have found that multitemporal SAR data could capture the soil-vegetation relationship to predict soil chemical properties, and its ability depends on capturing vegetation characteristics from remote sensing images that indicate variation of soil properties [37]. Yang and Guo [38] reported fluctuation correlations between the backscattering coefficients of multitemporal Sentinel-1 images and SOC. SAR data have broad application prospects for the prediction of soil properties [33,38], but its application in predicting SOC is still limited and rarely reported in the literature due to its complexity, diversity, and availability (e.g., the preprocessing of SAR data is more complicated and has less information than optical data) [36,39]. The newly released Sentinel satellites developed by the European Space Agency (ESA) have the advantages of short revisit time (Sentinel-1: 6 days; Sentinel-2: 5 days), high spatial resolution (Sentinel-1: 5-20 m; Sentinel-2: 10-60 m), free download, and high-quality physical calibration data. They can provide a large amount of free optical and SAR remote sensing data with high spatial resolution for SOC prediction.
The hilly areas in the Sichuan Basin have diverse land use types, with small and scattered plots. Complex ground conditions add uncertainties to the SOC prediction with remote sensing data. There have been many studies on the influence of land use change on SOC content, but fewer on the prediction accuracy of SOC content under different land use types. Studies have shown that land use has a significant effect on the SOC content [40,41]. Land use effects on SOC are related to the land use conversion that has led to the exposure of soils and the rapid decomposition of SOC [42], so the prediction accuracy of SOC content may be affected by land use types. Zhou et al. [36], Zhang et al. [43], and Paul et al. [44] used land use type as a variable to predict SOC content, but the results showed that it did not contribute much to the model. In recent studies, remote sensing data were used to predict SOC content based on multiple land use types, while the results showed that the accuracy of SOC prediction using satellite remote sensing data was lower than that using airborne remote sensing and laboratory reflectance spectra data [19,45]. Therefore, it is possible to improve the prediction accuracy of SOC based on satellite remote sensing data under a single land use type.
The main purposes of this study were to (1) explore the joint application of radar and optical remote sensing in SOC prediction and (2) compare the prediction accuracy of SOC under different land use types. The SOC content was predicted by considering the relationship between SOC and environmental auxiliary variables in the vegetationcovered area. Specifically, SAR satellite imagery (Sentinel-1), optical satellite imagery (Sentinel-2), and digital elevation model (DEM) data were used as input variables. The extreme gradient boosting (XGboost) algorithm has been used to predict SOC with good prediction results [17,22,29,30]. Therefore, we used XGboost to build prediction models of SOC content under different land use types and analyze the importance of feature variables. The results of this study would contribute to the application of remote sensing data for the effective monitoring of SOC content.

Study Area
The study area is located in Ganning Town, Wanzhou District, Chongqing (108 • 13 10 -108 • 17 40 E, 30 • 39 1 -30 • 41 30 N) and covers 1800 hectares ( Figure 1). The soil developed from purple shale is characterized by 0.5 to 0.8 m depth, relatively light texture, and susceptibility to water erosion [46,47]. The pH value of the soil varies from 4.63 to 8.43, and acid soils mainly dominate the study area. The landform type is tectonic denudation hills in eastern Sichuan. The altitude ranges from 248 to 657 m, with an average of 433 m. The slope is between 0 and 79 • and the average is 13 • . The study site is located in subtropical monsoon humid area, with annual average temperature of 17 • C and annual average rainfall of 1293.3 mm.
The main land use types in the study area are orchard, dry land, and paddy field, which account for 12%, 13%, and 30% of the area, respectively. Dry land is mainly planted with wheat (Triticum aestivum L.), corn (Zea mays Linn.), sweet potato (Ipomoea batatas (L.) Lam.), and peanut (Arachis hypogaea Linn.). Orchard grows blood oranges (Citrus sinensis (L.) Osbeck). Single cropping rice (Oryza sativa L.) is planted in paddy field; the growth period of rice in the study area is from April to September, and it is used for water storage in winter. The land use types in the study area are representative of southwestern China.

Sample Collection and Preparation
Soil sampling was conducted in March 2019. According to the requirements of soil sampling point layout in DZ/T 0295-2016 Specification of Land Quality Geochemical Assessment, 290 topsoil samples were arranged based on parcel of land and grid, and the sampling medium was 0-20 cm soil column on the surface ( Figure 1). About 3-5 subsoil columns were collected within a radius of 10 m around the sampling points to form one sample, and the land use type and soil type of each subsampling point were basically the same. The original weight of the samples was generally more than 1.5 kg, which was stored in a special sample bag. Samples were air-dried at room temperature and passed through a 2 mm soil sieve before soil chemical analyses. After that, soil samples weighing 0.5 g were taken, then potassium dichromate-sulfuric acid solution was added, and it was heated on the electric heating plate for digestion. Soil organic carbon (percent of content) was determined by the redox volumetric method [48].

Sentinel-1 Images
Sentinel-1 satellite is an earth observation system in the Copernicus Program of ESA. It consists of two satellites and carries C-band synthetic aperture radar, which can provide continuous images and is not affected by the weather [49]. Six ground range detected (GRD) format images of Interferometric Wide Swath Mode (IW) from October 2018 to September 2019 were selected in this paper. Single temporal Sentinel-1 image contains less information, so we chose multitemporal Sentinel-1 images with a growth period (one year). All images were downloaded from Sentinel Scientific Data Center of ESA. Images are VH (vertical-horizontal) and VV (vertical-vertical) dual polarization, resulting in 12 images with a resolution of 20 m (Table 1). In this study, the Sentinel-1 data were preprocessed by SNAP 7.0, including calibration, thermal noise removal, multilooking, speckle filtering, and terrain correction. Finally, it was converted into VV and VH polarization in dB format without further conversion; the mean values of VV and VH at six time points are shown in Figure 2. The mean values of VV and VH differ with time and land use type, indicating that multitemporal Sentinel-1 data may be helpful for SOC prediction [38].

Sentinel-2 Images
Sentinel-2 image acquired on 12 August 2019 was also from the official website of ESA. This is because only this image has no cloud cover in the study area, and its acquisition time is close to the sampling time. The image is of L2A level, so it does not require atmospheric correction. The data of each band was resampled to 10 m resolution in SNAP 7.0. Ten bands (B2, B3, B4, B5, B6, B7, B8, B8a, B11, B12) with higher resolution were reserved, and the four soil radiometric indices and twenty vegetation radiometric indices were calculated on this basis (Appendix A: Table A1).

DEM and Land Use
The elevation, slope, and aspect data of the study area were extracted by ArcGIS 10.5, based on the 5 m resolution DEM data produced by the national surveying and mapping department that are available at https://www.webmap.cn/ (accessed on 30 June 2020). Land use data were derived from the 1:10,000 land use map in 2019, which can be obtained from the Ministry of Natural Resources of the People's Republic of China, are available at http://www.mnr.gov.cn/ (accessed on 21 June 2020). This land use map came from the land use database of Wanzhou District, which was established in 2008 and is updated by the local Ministry of Natural Resources every year [50].

Statistical Analysis
One-way analysis of variance (ANOVA) and multiple comparisons (least significant difference test, LSD) were applied to analyze the differences in SOC content under different land use types. The confidence interval was 0.05. Pearson correlation was used to analyze the correlation between features and SOC content. These statistical analyses were performed with SPSS V. 25.

XGboost Algorithm
The extreme gradient boosting is an improved algorithm based on the gradient boosting decision tree (GBDT), which can run in parallel based on the constructed boosting tree [51]. The core of the algorithm is to optimize the value of the objective function. In order to prevent overfitting, XGboost not only adds a regularization term to the cost function to control the complexity of the model, but also uses shrinkage and column subsampling. Shrinkage means that a nonnegative factor less than 1 is used to scale the newly added score after each iteration of tree boosting. Column subsampling means that only the ratio of training data samples is used to determine the XGboost model [52]. In short, the prediction process is as follows: (1) the function of projecting a set of predictor variables into an output variable is estimated by minimizing the loss function. (2) The model uses this loss function to iteratively fit the regression tree model for the calibration data. (3) The predicted values of each iteration are combined and multiplied by the weight of regression tree model to obtain the final prediction results [31]. The XGboost model implemented in Python 3.7.4 was used in this study.
The feature selection of XGboost is to establish a model based on the initial feature set, test the performance of the feature in the model, and obtain the importance of the feature. The more a feature is used to make critical decisions on the boosting tree, the higher its score [53]. The algorithm calculates the importance of features by "gain", "frequency", and "coverage". Gain is the main reference factor that determines the importance of branch features. Frequency is a simplification of gain and is measured by the number of times the feature appears in all construction trees. Coverage is the relative value of feature observation [54]. The importance of features in this study is determined by gain.
For a single tree T, if J is selected as the split variable on this tree, the sum of mean square errors on all branch nodes t is calculated [54]. For example, the importance of feature j on the tree is as follows: where i t 2 is the improvement of squared error of a node t, and υ t is the feature associated with node t. By summing and averaging the importance of feature j on each tree, the final importance of the forest with M trees can be obtained:

Assessing Model Performance
The prediction models under total area, orchard, dry land, and paddy field were compared to explore the difference in SOC prediction under different land use types. All models were based on mixed signal including both soil and vegetation. In the previous experiment, we found that the sole use of DEM, Sentinel-1, and Sentinel-2 were relatively poor in predicting SOC (Appendix A: Table A2), so we explored their combined effect. Four models with different input combinations were built from the attribute values (corresponding to soil data points) of the predictor variables generated in Section 2.3; namely, Model A: DEM and Sentinel-1-derived features, Model B: DEM and Sentinel-2-derived features, Model C: DEM and Sentinel-1/-2-derived features, and Model D: the selected features from DEM, Sentinel-1, and Sentinel-2. We designed a set of candidate combinations to adjust the value of learning rate (0.01, 0.05, 0.1), maximum depth of tree (2,20), minimum child weight (0, 20), and regression lambda (0, 20), number of boost round (0, 1000) for each model under total area, orchard, dry land, and paddy field. We randomly selected 80% of the sample points of each land use type as the calibration data set and the remaining 20% as the validation data set [31]. The optimal parameters for the models were calculated based on the calibration set (Appendix A: Table A3). The validation data set was employed to evaluate model performance under total area, orchard, dry land, and paddy field (Appendix A: Table A4).
Six common performance indicators including: R 2 , mean absolute error (MAE), mean squared error (MSE), percent root mean square error (%RMSE), ratio of performance to interquartile range (RPIQ), and corrected akaike information criterion (AICc) were used to assess model performance under different land use types. R 2 varies between 0 and 1, which indicates the closeness between the observed value and the fitting regression line or the variance ratio explained by independent predictors [17]. The model has great fitting degree and stability when R 2 closes to 1. The smaller the MSE and MAE, the better the predictive ability and robustness of the model [55]. %RMSE is dimensionless, and the accuracy of the model is excellent when %RMSE < 10%, good if 10% ≤ %RMSE < 20%, fair if 20% ≤ %RMSE < 30%, and poor if %RMSE ≥ 30% [56]. Generally, a larger value of RPIQ indicates better model performance. The model with the lowest AICc is the most concise.
where m is the number of samples, y i is the true value,ŷ i is the predicted value,ӯ is the average value of y i , IQ is the interquartile range (IQ = Q3 − Q1) of the true value, Q1 and Q3 denote the first and third quartiles, respectively, SS is the sum squared residual, and K is the number of features.

Statistical Analysis of SOC
The descriptive statistics of SOC under total area, orchard, dry land, and paddy field are shown in Table 2. The mean value of SOC content in total area was relatively high. For different land use types, the average contents of SOC in orchard and paddy field were higher. This is because the time of soil sampling in orchard is in the fertilization period after blood orange picking (mainly organic fertilizer), whereas paddy field is conducive to the accumulation of SOC in the process of cultivation and fertilization [57]. The mean, minimum, and maximum values of SOC in dry land were all low. The reason is that the physical protection of organic matter in dry land is weak [58], farming periodically decomposes large aggregates, exposing the previously protected organic matter in large soil aggregates [59], which is not conducive to the accumulation of SOC. According to one-way ANOVA and multiple comparisons, the orchard, dry land, and paddy field had significant difference in SOC content (p < 0.05). The results showed that SOC content was affected by land use types to some extent [60]. Pearson correlation analysis showed that all the features had significant or slight correlation with SOC content under different land use types. In orchard, SOC content had significant correlation with Sentinel-2 data set and DEM data set. SOC content in dry land was only strongly correlated with some features of Sentinel-2 data set. In the paddy field, the three data sets all had features that had a significant correlation with the SOC content. See Table A5 (Appendix A) for details.

Feature Selection and Model Evaluation
In order to explore the role of different remote sensing data in SOC prediction, we used twelve Sentinel-1-derived predictors, thirty-four Sentinel-2-derived predictors, and three DEM-derived predictors (Appendix A: Table A5) to establish the following three models: Model A, Model B, and Model C (Table 3). From the prediction results, the prediction accuracy of SOC under total area was significantly lower than that under three land use types. Among Model A, Model B, and Model C, the prediction accuracy of Model B was higher than that of Model A; it showed that Sentinel-1 and DEM-derived predictors were not as effective as Sentinel-2 and DEM-derived predictors in SOC prediction. However, Model C had the highest R 2 (0.35, 0.85, 0.67, and 0.66, respectively) and the smallest error (e.g., MAE was 0.143%, 0.053%, 0.066%, and 0.077%, respectively). It showed that prediction model could get higher accuracy under the synergistic effect of Sentinel-1, Sentinel-2, and DEM data. Although Model C had the best prediction accuracy, the value of AICc was too large. For the sake of reducing the complexity of XGboost model and preventing data redundancy, feature selection was done for each land use type and total area. Firstly, the collected 49 features were input to the models. Then, the feature importance ranking was output. Finally, according to this ranking, features were added to the model from high to low. We selected those features that produced maximal R 2 . For example, the relationship between the number of features and R 2 under total area is shown in Figure 3a. When the number of features was 39, R 2 reached the largest value and then declined. The addition of following features with lower importance reduced model predictive accuracy. Therefore, the top 39 features with higher feature importance were selected as the input variables of the model under the total area. Similarly, the numbers of features were 25 for orchard, 26 for dry land, and 43 for paddy field (Figure 3). In this case, Model D was established by the selected features (detailed features and their importance scores are given in Section 3.2.2).  The prediction accuracy of Model D under different land use types and total area is shown in Table 4. The AICc of Model D was lower than that of Model C and the value of R 2 did not decrease, indicating that feature selection was effective in reducing complexity of the model. For total area, the prediction accuracy was still lower than others after feature selection. However, for each land use type (Figure 4), the %RMSE in orchard, dry land, and paddy field was 7.87%, 13.11%, and 10.57%, respectively, showing that the model was excellent for orchard and good for dry land and paddy field. In addition, orchard had the highest R 2 and lowest MAE and MSE, which further proves that the model had excellent prediction accuracy under this land use type. In dry land, R 2 , MAE, and MSE were 0.74, 0.057%, and 0.008%, respectively. Although the prediction result was not as good as that in orchard, it still showed that this model had good prediction accuracy in dry land. The value of %RMSE indicated that the prediction result of paddy field was as good as that of dry land, but the R 2 of paddy field was lower than that of dry land, and MAE and MSE were higher than that of dry land, so there was a certain difference between paddy field and dry land. Although the prediction accuracy of paddy field was worse than that of orchard and dry land, the model of paddy field had better performance (RPIQ = 2.54). Comparing the prediction accuracy of the three models, it can be found that there were significant differences in the prediction accuracy of SOC for different land use types. Based on Model D, the map of SOC is presented in Figure 5.

Importance of Predictor Variables
The feature importance score under orchard, dry land, and paddy field are shown in Figure 6. In orchard, the VV_5 (0.03) had the highest importance score, followed by CI (0.02), and BI2 (0.02) with similar scores. In dry land, BI2 (0.23), TSAVI (0.19), B8 (0.17), and MSAVI2 (0.15) of the Sentinel-2 data set were the most important. In paddy field, the B11 (0.09) band of the Sentinel-2 data set contributed the most. VV_5 (0.09) and VH_1 (0.08) of the Sentinel-1 data set also had a great contribution. For terrain indicators, only aspect (0.06) was slightly more important in paddy field, and it had little contribution to the other land use types.   (Table 1); VH_1 to VH_6 are the backscatter coefficients of VH polarization of six Sentinel-1 images, respectively (Table 1); other abbreviations are soil radiometric indices and vegetation radiometric indices of Sentinel-2 image (the meanings of these indices can be found in Appendix A: Table A1); the relative importance of the features above the red dotted line is greater than 60%.

Performance of Prediction Models under Orchard, Dry Land, and Paddy Field
The results of Table 3 show that neither SAR data nor optical data alone can be used to make effective SOC prediction, and only the combination of the two can play a more comprehensive role. The usefulness of Sentinel-1 radar data in predicting SOC is proved [33]. Radar data can make up for the application defect of optical data in the hilly area of southwest China; that is, it can provide ground information in cloudy weather. Compared with existing studies, the results of SOC prediction based on multisource remote sensing data in this study are better than those based on single Sentinel-2 images (R 2 = 0.56) [16] or Landsat images (R 2 = 0.55) [25]. In addition, the application of highresolution satellite images is also helpful for improving the prediction accuracy of SOC. The highest resolution of this image can reach 10 m, while Mahmoudzadeh et al. [23] and Hengl et al. [29] have reported general results by using Landsat-5 satellite data (30 m) and MODIS satellite data (250 m), respectively (R 2 is about 0.5).
The effects of different land use types on the prediction of SOC content were explored. We found that the prediction accuracy under orchard, dry land, and paddy field was significantly higher than that under total area, and there were significant differences in the prediction accuracy among the three land use types (orchard > dry land > paddy field) ( Table 4), which indicates that land use types have certain influence on the prediction accuracy of SOC content [12,61]. The reasons for this difference may be that: (1) the large area of orchard in the study area and the relatively uniform vegetation cover (blood orange) avoids the error of SAR data intensity value and optical data spectral information caused by the complex ground environment. (2) In the study area, the types of crop planted in dry land are diverse, which affects the feature variables, and the number of sample points was relatively insufficient, resulting in lower prediction accuracy than orchard. (3) In paddy field, soil moisture has a greater impact on SAR data, and due to the low spatial resolution and spectral resolution, as well as the effect of soil moisture [15], the performance of the short-wave infrared (SWIR) band was not good, resulting in poor prediction of SOC in paddy field. (4) Studies have shown that, in general, the prediction accuracy of the model will increase with the increase of coefficient of variation [62]. In this study, the coefficient of variation of dry land was higher than that of paddy field (Table 2), which may have had some influence on the prediction accuracy of SOC content.
It should be noted that there are still some limitations in using remote sensing data to predict SOC under a single land use type, which are mainly associated with the locationspecific land use type, soil texture, and vegetation cover conditions. We used samples located in a small watershed for predictions, which may be regional (e.g., areas with the same land use type have similar vegetation and soil conditions). Further work is needed to prove whether it can be applied to other land use types and other larger areas. Furthermore, since we basically used all vegetation indices that currently exist, there is a correlation between the indices. Some excluded features (e.g., spectral bands of Sentinel-2) may have a greater correlation with SOC, but their effects have been replaced by other indices.

Variable Importance
The terrain controls the flow of solute, water, and sediment, which in turn affects the development of soil and the spatial distribution of soil properties [63]. Although terrain factors have a significant correlation with SOC content [64], the prediction ability of the model can be significantly improved by adding vegetation index and backscatter coefficient compared with those using only terrain variables [65]. On the one hand, the vegetation features have enough heterogeneity, and there is a strong interaction between vegetation features and soil properties [66]. On the other hand, the backscatter coefficient of SAR data is sensitive to the changes of surface conditions and soil moisture [67]. We found that there was a weak correlation between the polarization value of the Sentinel-1 image and the SOC content (Appendix A: Table A5), and Yang and Guo [38] have also proven that this correlation can be successfully used for SOC prediction. Therefore, both optical data (Sentinel-2) and radar data (Sentinel-1) can be used to build a predictive model of SOC content. In this study, the aspect made an important contribution to paddy field, and the other terrain variables data had a lower contribution than Sentinel-1 and Sentinel-2 data sets in orchard and dry land. Taghizadeh-Mehrjardi et al. [17] also found that Sentinel-2 bands have a greater correlation with SOC content than elevation while Zhou et al. [33] concluded that terrain variables are more important than Sentinel data in a study area with plains and plateaus. The difference may be because the range of altitude changes in this study was relatively small.
Due to the differences in the ground conditions of the three land use types, the contributions of remote sensing data in these models were different. In orchard, VV_5 had the highest score ( Figure 6a); this is because the time of VV_5 (22 August 2019) was after summer pruning and weeding in orchard and it was the autumn shoot germination period of blood orange. At this time, the bare soil area in orchard increases, the contribution of ground scattering increases, and the contribution of canopy scattering decreases [68], which is more conducive to radar satellite reflecting the real situation of soil surface. VV polarization is mainly directly affected by the ground and canopy, whereas VH includes stem-ground double scattering and volume scattering [69] that has more interference information, so VV_5 is more important than VH_5 in orchard. The results in Figure 6a also indicate that CI and BI2 have made great contributions in orchard, due to soil color and soil brightness index having a high correlation with SOC content [19,70,71]. For dry land, BI2 also has a great influence on the prediction accuracy of SOC (Figure 6b). Because TSAVI and MSAVI2 have a significant correlation with SOC content (Appendix A: Table A5), and the close relationship between soil properties and vegetation coverage [66], TSAVI and MSAVI2 can effectively reflect the changes of SOC content. B8 band, which is the near-infrared band (NIR), ranks third in importance for dry land. Studies have shown that the near-infrared band has a good predictive effect on the carbon and nitrogen content in the soil [72]. In paddy field, B11 is the most important (Figure 6c). It may be that B11 band belongs to the short-wave infrared (SWIR) band, there is a strong correlation between reflectivity and SOC content in the SWIR region [15]. The importance of VH_5 (22 August 2019) and VH_1 (26 October 2018) follows B11. Firstly, VH is more sensitive to vegetation and VV is more sensitive to soil moisture [73], so VH is slightly more important than VV in paddy field. Secondly, since VH_5 is in the drainage time before rice harvest, it strengthens the contribution of stem-ground scattering in VH polarization; VH_1 is in the water storage period of paddy field, when the ground and canopy scattering of VV polarization is weakened and concentrated on the scattering of water surface, which is not conducive to reflecting SOC content in soil. It can be seen that in the three types of land use, both Sentinel-2 data set and Sentinel-1 data set have a great impact on the prediction of SOC.

Conclusions
This study presented and evaluated a method for predicting SOC content under different land use types based on remote sensing data. For this purpose, SAR satellite imagery (Sentinel-1), optical satellite imagery (Sentinel-2), and auxiliary DEM data were used for modeling. The influence of land use type on SOC prediction and the importance of optical and radar data under different land use types were analyzed. The overall prediction result of the model is better than some existing researches based on satellite remote sensing data [16,21,[23][24][25]33], and is similar to the result of Taghizadeh-Mehrjardi et al. [17]. Among the three land use types, the prediction results of orchard (R 2 = 0.86 and MSE = 0.004%) are better than dry land and paddy field. Both optical satellite imagery and radar satellite imagery have made important contributions to the prediction of SOC content. Although commonly used optical satellites, such as SPOT, Landsat 8 OLI, and Sentinel-2, have a large amount of information, it is difficult to obtain useful images because of the influence of cloud and fog. Whereas the radar satellites, such as GF-3, ALOS PALSAR, and Sentinel-1 SAR, have less information, but can obtain the continuous image. Therefore, it is necessary to combine the two images to study the real condition of the ground. Since the land use type has some influence on the prediction accuracy of SOC, in the future, analysis based on single land use type may improve the prediction results of SOC content.

Conflicts of Interest:
The authors declare no conflict of interest.  L is a correction factor which ranges from 0 for very high vegetation cover to 1 for very low vegetation cover; a is the soil line intercept; s is the soil line slope; X is the adjustment factor to minimize soil noise; M = 1 − 2 × s × NDVI × WDVI; b is the angle between the soil line and the NIR axis, in degrees; eta = (2 × (B8A × B8A − B4 × B4) + 1.5 × B8A + 0.5 × B4)/(B8A + B4 + 0.5); rb = B4 − (B2 − B4). In this paper, L, a, s, X, and b are 0.5, 0.5, 0.5, 0.08, and 45, respectively.