Modeling the Spatial Dynamics of Soil Organic Carbon Using Remotely-Sensed Predictors in Fuzhou City, China

: Assessing the spatial dynamics of soil organic carbon (SOC) is essential for carbon monitoring. Since variability of SOC is mainly attributed to biophysical land surface variables, integrating a compressive set of such indices may support the pursuit of an optimum set of predictor variables. Therefore, this study was aimed at predicting the spatial distribution of SOC in relation to remotely sensed variables and other covariates. Hence, the land surface variables were combined from remote sensing, topographic, and soil spectral sources. Moreover, the most inﬂuential variables for prediction were selected using the random forest (RF) and classiﬁcation and regression tree (CART). The results indicated that the RF model has good prediction performance with corresponding R 2 and root-mean-square error (RMSE) values of 0.96 and 0.91 mg · g − 1 , respectively. The distribution of SOC content showed variability across landforms (CV = 78.67%), land use (CV = 93%), and lithology (CV = 64.67%). Forestland had the highest SOC (13.60 mg · g − 1 ) followed by agriculture (10.43 mg · g − 1 ), urban (9.74 mg · g − 1 ), and water body (4.55 mg · g − 1 ) land uses. Furthermore, soils developed in bauxite and laterite lithology had the highest SOC content (14.69 mg · g − 1 ). The SOC content was remarkably lower in soils developed in sandstones; however, the values obtained in soils from the rest of the lithologies could not be signiﬁcantly differentiated. The mean SOC concentration was 11.70 mg · g − 1 , where the majority of soils in the study area were classiﬁed as highly humus and extremely humus. The soils with the highest SOC content (extremely humus) were distributed in the mountainous regions of the study area. The biophysical land surface indices, brightness removed vegetation indices, topographic indices, and soil spectral bands were the most inﬂuential predictors of SOC in the study area. The spatial variability of SOC may be inﬂuenced by landform, land use, and lithology of the study area. Remotely sensed predictors including land moisture, land surface temperature, and built-up indices added valuable information for the prediction of SOC. Hence, the land surface indices may provide new insights into SOC modeling in complex landscapes of warm subtropical urban regions.


Introduction
Soil organic carbon (SOC) is essential for the normal functioning of ecosystems [1]. It also plays a significant role in global C dynamics and climate change study as it stores the largest total carbon pool of terrestrial ecosystems [2]. In the urban context, the existence of an optimum SOC content is a critical factor for greening projects and is also a good indicator of the state of urban ecosystems and soil quality [3]. In general, SOC spatial distribution is influenced by climatic, geomorphological, lithological, and land-use factors.
In particular, land-use change due to intensive human activities including urbanization and industrialization processes hugely impacts its spatial distribution [4].
In line with this, estimation of the spatial variability of SOC in urban areas has attracted the interests of many studies. For instance, Chen et al. [5] studied the SOC densities in urban built-up areas of 35 Chinese cities. The study reported the carbon storage of 35 cities in china including the Fuzhou city (i.e., the current study area). Similarly, Wang et al. [6] determined the spatial variation of SOC on a hilly coastal landscape of Wafangdian, Liaoning Province, and reported higher contents toward the mountainous areas. Additionally, they confirmed strong influence of land use on the spatial variation of SOC. Raciti et al. [7] also made a comprehensive assessment of SOC contents of residential areas and stated that soils in residential areas with past agricultural use had a higher capability to sequester carbon. Likewise, Xia et al. [8] studied the spatial variations of SOC in relation to land-use change in eastern regions of China and confirmed that land-use change from and into a paddy field had a high impact on SOC variability.
Even though many previous researches estimated SOC in urban areas globally and nationally, the current study area has limited data. Moreover, there is a need to find the optimal set of environmental predictors to remotely determine the spatial variability of topsoil organic carbon; thus, assessing the suitability of such predictors, as well as other environmental covariates, is an imperative task to SOC mapping in urban environments. To that end, McBratney et al. [9] also highly recommended the need for new studies so as to select suitable environmental covariates for digital soil mapping. Furthermore, even though the previous studies used multisource datasets, various biophysical land surface variables, which can provide information on the SOC distribution, were generally ignored. For instance, soil moisture, land surface temperature, and built-up index were omitted. Since SOC is highly influenced by long-term mean soil temperature and moisture than any other factors, integrating them may provide new insight for SOC mapping [10]. The previous SOC prediction studies overlooked soil moisture and temperature indices perhaps due to a shortage of high-resolution soil moisture and temperature data. In particular, the traditional ground-based soil-moisture observation networks produce sparse soil-moisture data for smaller regions. Meanwhile, some studies used atmospheric temperature and precipitation as a proxy to measure soil temperature and moisture [11,12]. However, they also had very coarse resolution for smaller geographic scales. In the meantime, the remote sensing products of soil moisture such as Advanced Scatterometer (ASCAT) and Soil Moisture and Ocean Salinity (SMOS) have coarse resolutions (i.e., tens of kilometers) [13]. However, optical/thermal infrared (TIR) sensor products have higher spatial resolutions (meters to kilometers) and could be a good solution for regional and local applications [14]. Variables derived from optical sensor products including vegetation temperature condition index (VTCI) and land surface temperature (LST) can be a better choice to obtain the soil moisture and temperature information.
Therefore, the main aim of this study was to identify the role of remotely sensed variables for SOC prediction and to understand the contribution of the landform, land use, and lithology on the spatial variation of SOC in the coastal city of Fuzhou, China. The soil samples were collected using purposively distributed sample points that represent the dominant land cover, lithology, and landform of the study area. To that end, biophysical land surface variables such as vegetation temperature condition index (VTCI), land surface temperature (LST), and normalized difference built-up index (NDBI) were integrated with other environmental covariates obtained from multiple sources including remote sensing, Remote Sens. 2021, 13, 1682 3 of 23 topography, and proximal sensing. Furthermore, the most important environmental variables were selected and used to estimate the spatial distribution of SOC using a random forest and CART model.

Description of the Study Area and Sampling Locations
Fuzhou district is located in the southeastern coastal area of China in the estuary of the Minjiang River. It is the capital city of the Fujian province ( Figure 1) and serves as a central city for the Western Taiwan Straits Economic Zone with a total area of 11,462. 41  sensing, topography, and proximal sensing. Furthermore, the most important environmental variables were selected and used to estimate the spatial distribution of SOC using a random forest and CART model.

Description of the Study Area and Sampling Locations
Fuzhou district is located in the southeastern coastal area of China in the estuary of the Minjiang River. It is the capital city of the Fujian province ( Figure 1) and serves as a central city for the Western Taiwan Straits Economic Zone with a total area of 11,462.41 km 2 . It is geographically located at 118°08′E to 120°31′E longitude and 25°15′N to 26°29′N latitude in the southeast of China. It neighbors Ningde and Nanping to the north, Quanzhou and Putian to the south, and Sanming to the west. The city has 13 administrative regions comprising six districts, one county-level city, and six counties. The current study site includes all regions except Pingtan County. The climatic condition of the area belongs to the humid subtropical maritime climate, with an annual average temperature of 16-20 °C and an annual average precipitation of 900-1200 mm.
It is also covered by acidic volcanic rocks and Cretaceous sandstones from the Jurassic Period [15]. The soil of the upper part is dominated by red soil [16], whereas mountainous regions have mainly red and laterite soil. The climatic condition of the area belongs to the humid subtropical maritime climate, with an annual average temperature of 16-20 • C and an annual average precipitation of 900-1200 mm.
It is also covered by acidic volcanic rocks and Cretaceous sandstones from the Jurassic Period [15]. The soil of the upper part is dominated by red soil [16], whereas mountainous regions have mainly red and laterite soil. Additionally, the area is characterized by complex topographic features. The northern, western, and southern parts of the study area are dominated by mountains, whereas the eastern part is mainly plain landform [16].

Soil Sampling and Laboratory Analysis
Topsoil samples (0-20 cm depth) were collected from all administrative counties of Fuzhou City except Pingtan Island in February and March. The soil samples were collected using purposively distributed sample points that represent the dominant land cover, lithology, and landform of the study area. The sampling sites were spread out across the whole study area (Figure 1). The land use and the topographic and lithology maps were superimposed using ArcGIS 10.3 to identify sample locations. A total of 244 sample sites were selected as predetermined sampling points. However, due to complex topography, land use, lack of budget, and accessibility problems, 121 samples were collected. A handheld global position system (GPS) receiver was used to identify sample points and to capture the location information. Then, the samples were transported to the laboratory, air-dried, and sieved. The samples were separated for spectroscopic measurements and chemical analysis. The SOC was determined using a dry combustion method using a CNS elemental analyzer (Flash EA 1112 NC-Soil, Thermo Fisher Scientific, Pittsburgh, PA, USA). Spectral soil data were measured using an indoor spectral measurement in the wavelength ranges of 350 nm to 2500 nm using the FieldSpec-3 spectroradiometer. The sampling intervals were set to be 1 nm, and, as a result, a total of 2151 bands were produced.

Description and Preprocessing of Landsat-8 Images, and Soil Spectral Data Transformations
Landsat images are commonly used for SOC predictions due to their high spectral resolution [17]. This study also used Landsat-8 data of Operational Land Imager (OLI) Level 1 and Thermal Infrared Sensor (TIRS). The images acquired in December 2013 having less cloud cover (i.e., <10%) with path and raw numbers of 119 and 42 were downloaded from the United States Geological Survey archives (https://earthexplorer.usgs.gov/; accessed on 12 June 2018). The radiometric calibration was performed using ENVI. Radiometric calibration was done by converting the images into the top-of-atmosphere (TOA) reflectance using radiometric rescaling coefficients provided in the product metadata (MTL) file [18].
Moreover, bandwidth of the spectral soil data ranged from 350 nm to 2500 nm. Different spectral transformations such as multiplicative scatter correction (MSC), first derivative, second derivative, normalization, and continuum removal were applied to remove background noise. Moreover, smoothing approaches of the Savitzky-Golay filtering algorithm (SG) with a second-order polynomial and averaging were performed across a 10-band window to remove the complexity of bandwidths by eliminating redundancy between adjacent bands and compressing band data without losing information [19].

Data Sources, Software, and Extraction of Environmental Covariates
The study integrated covariates obtained from multiple sources including remote sensing, digital elevation models (DEMs), soil maps, soil spectral data, and other sources (e.g., OSM layers). The environmental covariates were extracted using the equations provided in Table 1.
The land use/cover of the study area was classified using a supervised image classification technique with the maximum-likelihood classification method. Representative ground control points (i.e., 50 points for each class) were captured to compare the class signatures. Out of 50 ground control points, 40 were used for classification calibration, whereas the remaining 10 were used for validation. The land use was classified into four predominant land use/cover types of urban, vegetation (forest), agriculture, and water bodies ( Figure 3).
According to the classification results, 34.44% of the area is covered by vegetation mainly consisting of forestland, whereas the remaining area is covered by urban land (29.40%), agricultural land (21.80%), and water bodies (14.30%). The detailed classification results and the accuracy of classification are given in Table 2. Table 2. Land-cover classification of the study area. Additional landform characteristics of the study area were described using different landscape metrics including slope, relief, curvature, TWI, and TPI (Table 1). Furthermore, proximity to essential features such as industries, landfill sites, water-points, and port points was calculated as the Euclidean distance from sampling points [43].
The land use/cover of the study area was classified using a supervised image classification technique with the maximum-likelihood classification method. Representative ground control points (i.e., 50 points for each class) were captured to compare the class signatures. Out of 50 ground control points, 40 were used for classification calibration, whereas the remaining 10 were used for validation. The land use was classified into four predominant land use/cover types of urban, vegetation (forest), agriculture, and water bodies ( Figure 3). Remote Sens. 2021, 13, x FOR PEER REVIEW 8 of 23  The LST had minimum, maximum, and average values of 6.25 °C, 15.15 °C, and 10.65 °C with a CV of 47% for the given season. The LST distribution pattern shows that the higher values were observed toward the urbanized areas (e.g., downtown), whereas it decreased toward the mountains and vegetated land. The VTCI was calculated as stated in Table 1, and the minimum, maximum, and mean VTCI values were 0.04, 0.99, and 0.51, respectively. The lowest VTCI value (0.04) was extracted from the highly urbanized area (downtown), whereas the highest value (0.99) was obtained from the forestland of the study area ( Figure 5). According to the classification results, 34.44% of the area is covered by vegetation mainly consisting of forestland, whereas the remaining area is covered by urban land (29.40%), agricultural land (21.80%), and water bodies (14.30%). The detailed classification results and the accuracy of classification are given in Table 2. The biophysical indices including land surface temperature (LST), vegetation temperature condition index (VTCI), and normalized difference built-up index (NDBI) were used as a proxy to measure land temperature, moisture, and the extent of built-up areas, respectively. The land surface temperature (LST) of the study area was calculated using TIR bands of Landsat 8 (Figure 4), as described in Table 1 [29].
The LST had minimum, maximum, and average values of 6.25 • C, 15.15 • C, and 10.65 • C with a CV of 47% for the given season. The LST distribution pattern shows that the higher values were observed toward the urbanized areas (e.g., downtown), whereas it decreased toward the mountains and vegetated land. The VTCI was calculated as stated in Table 1, and the minimum, maximum, and mean VTCI values were 0.04, 0.99, and 0.51, respectively. The lowest VTCI value (0.04) was extracted from the highly urbanized area (downtown), whereas the highest value (0.99) was obtained from the forestland of the study area ( Figure 5).  The LST had minimum, maximum, and average values of 6.25 °C, 15.15 °C, and 10.65 °C with a CV of 47% for the given season. The LST distribution pattern shows that the higher values were observed toward the urbanized areas (e.g., downtown), whereas it decreased toward the mountains and vegetated land. The VTCI was calculated as stated in Table 1, and the minimum, maximum, and mean VTCI values were 0.04, 0.99, and 0.51 respectively. The lowest VTCI value (0.04) was extracted from the highly urbanized area (downtown), whereas the highest value (0.99) was obtained from the forestland of the study area ( Figure 5). Additionally, the normalized differences built-up index (NDBI) was used to extrac the extent of built-up areas ( Figure 6) using a threshold value of 0.038 [47]. Additionally the spectral bands of the Landsat-8 image were used along with soil spectral data. Additionally, the normalized differences built-up index (NDBI) was used to extract the extent of built-up areas ( Figure 6) using a threshold value of 0.038 [47]. Additionally, the spectral bands of the Landsat-8 image were used along with soil spectral data. Additionally, the normalized differences built-up index (NDBI) was used to extract the extent of built-up areas ( Figure 6) using a threshold value of 0.038 [47]. Additionally, the spectral bands of the Landsat-8 image were used along with soil spectral data. All environmental covariate maps and point datasets were converted into similar formats and stacked using ArcGIS 10.3 software. All datasets were resampled into 30 m pixels using 'bilinear resampling' for continuous data and 'nearest' for categorical data, All environmental covariate maps and point datasets were converted into similar formats and stacked using ArcGIS 10.3 software. All datasets were resampled into 30 m pixels using 'bilinear resampling' for continuous data and 'nearest' for categorical data, before converting them into the same coordinate and projection systems in the R environment [48].

Statistical Analysis, Spatial Modeling, and Validation
The summary statistics were calculated using standard descriptive statistics for the variates. The coefficient of variation (CV) and the Pearson correlation coefficients were used to measure the spread of the mean and the linear dependence between SOC values and environmental covariates, respectively. Similarly, Wilding (1985) used CV values to characterize the variability into classes of low (0-15%), moderate (16-35%), and high (>36%) SOC variability [49].
A combination of random forest and CART was used for prediction, performance evaluation, and selection of the essential variables. Random forest avoids overfitting and provides reliable out-of-bag (OOB) error estimates, avoiding the need for an independent validation dataset [50].
Here, 75% of soil samples were used for the training of a model and identifying essential variables for prediction, and then the model was applied to the validation set. Additionally, CART, was implemented using the R package "rpart" to improve interpretability [51]. The regression forest was generated with a minimum split factor of 4. Similarly, Wiesmeier et al. [52] used a combination of CART and RF for SOM determination using limited samples in semi-arid steppes in northern China.
The model parameters (i.e., ntree, mtry, and nodesize) were optimized using both tuning and manual adjustment, which were done iteratively on training datasets in R [53]. Liaw and Wiener [54] confirmed that applying the tuning function can improve the results of the model. Hence, according to previous studies, the tuning function was applied to training data and used to select the number of mtry and number of the trees grown. Finally, the mtry and the total number of trees selected by tune function to grow forest were crosschecked with manual mtry entries. The trained and developed model was first applied to the training dataset and used to identify essential variables for the prediction.
The variable importance curve was implemented recursively 15 times, and the highly influential variables that frequently appeared at the top rank were selected accordingly [55].
A tree was built from a bootstrap sample of the original dataset, which allows for robust error estimation with the remaining test of out-of-bag (OOB) samples. The OOB samples were predicted from the bootstrap samples, and the mean square error (MSE OOB ) that depends on the samples omitted from the bootstrapped OOB samples was computed as stated in Equation (1).
where n is the number of observations, z i is the average prediction of the i-th observation, and z i OOB is the average prediction of the i-th observation from all trees for which the observation was OOB.
Additionally, the percentage of explained variance (Var ex ) was calculated as where, Var z is the total variance of the response variable. The n tree parameter (the number of trees in the forest) was adjusted using the mean squared error (MSE values) as a measure of the prediction accuracy of the RF model ( Figure 7). al. [58] reviewed studies that focused on SOC prediction and reported that more than half of the studies did not show validation. The values of ntree and mtry were selected to be 500 and 52, respectively, as a large number of trees is recommended for datasets with sophisticated features and when the emphasis is on identifying essential variables (Figure 8). Similar to this study, the MSE error estimate was used in the validation procedure of another study [56]. Wiesmeier et al. [52] also used MSE oob for validation of the RF model in the prediction of soil organic matter. The OOB estimate was used to evaluate prediction performance as it solves the problem of collecting an independent validation dataset [57]. Many previous studies that used RF for SOC prediction rarely used cross-validation since RF internally estimates errors during the running of the model. For instance, Minasny et al. [58] reviewed studies that focused on SOC prediction and reported that more than half of the studies did not show validation.
The values of ntree and mtry were selected to be 500 and 52, respectively, as a large number of trees is recommended for datasets with sophisticated features and when the emphasis is on identifying essential variables (Figure 8). The values of ntree and mtry were selected to be 500 and 52, respectively, as a large number of trees is recommended for datasets with sophisticated features and when the emphasis is on identifying essential variables (Figure 8).

Figure 8.
The optimal value of mtry with respect to the out-of-bag error estimate (OOB error) for random forest (ntree = 500) with a tuning function applied to the model.
The statistical indices of root-mean-square error (RMSE) and mean error (ME) were used, as stated in Equations (3) and (4), to evaluate the performances of the model. The statistical indices of root-mean-square error (RMSE) and mean error (ME) were used, as stated in Equations (3) and (4), to evaluate the performances of the model.
where n is the number of data points, x i represents the measured values, y i represents the predicted values, x i is the mean measured value, and y i is the mean predicted values.

Spatial Prediction of SOC and Model Validation
The result of prediction was evaluated on the training and test datasets. The statistics of validation are provided in Table 3. The ME of the training set varied from 0.05 mg·g −1 to 0.06 mg·g −1 with a mean value of 0.04 mg·g −1 , while the RMSE values ranged from 1.35 mg·g −1 to 1.38 mg·g −1 with a mean value of 1.37 mg·g −1 . The ME of the test set varied from 0.3 mg·g −1 to 0.43 mg·g −1 with a mean value of 0.4 mg·g −1 , while the RMSE values ranged from 0.94 mg·g −1 to 0.97 mg·g −1 with a mean value of 0.96 mg·g −1 . In addition, the tuned random forest model applied to select optimum number of mtry and number of trees was verified. Accordingly, the out-of-bag (OOB) errors were 0.44, 0.22, 0.11, and 0.09 for mtry values of 9, 17, 34, and 52, respectively, implying that the use of more predictor variables led to a lower OOB error ( Figure 9) and better accuracy for prediction of SOC in the study area. mean value of 0.4 mg·g −1 , while the RMSE values ranged from 0.94 mg·g −1 to 0.97 mg·g −1 with a mean value of 0.96 mg·g −1 . In addition, the tuned random forest model applied to select optimum number of mtry and number of trees was verified. Accordingly, the outof-bag (OOB) errors were 0.44, 0.22, 0.11, and 0.09 for mtry values of 9, 17, 34, and 52, respectively, implying that the use of more predictor variables led to a lower OOB error ( Figure 9) and better accuracy for prediction of SOC in the study area.  The results indicate that the RF had excellent performance for SOC prediction. The SOC variability map predicted by the RF model using selected important predictors is shown in Figure 10. The results indicate that the RF had excellent performance for SOC prediction. The SOC variability map predicted by the RF model using selected important predictors is shown in Figure 10. The mean concentration of SOC was 11.70 mg·g −1 , with a minimum value of 0.74 mg·g −1 and a maximum value of 45.80 mg·g −1 . It was highly variable with a coefficient of variation (CV) of 81% and a standard deviation of 8.92. The SOC concentration was grouped into six levels (i.e., low, moderate, high, humus, and organo-humus classes.) [59]. The mean concentration of SOC was 11.70 mg·g −1 , with a minimum value of 0.74 mg·g −1 and a maximum value of 45.80 mg·g −1 . It was highly variable with a coefficient of variation (CV) of 81% and a standard deviation of 8.92. The SOC concentration was grouped into six levels (i.e., low, moderate, high, humus, and organo-humus classes.) [59]. As can be portrayed from the reclassification map ( Figure 11) of SOC level distribution, a shallow level of SOC content was concentrated in the middle of the study area, where the highly urbanized downtown with high NDBI values is located, comprising about 2.01% of the total area. Remote Sens. 2021, 13, x FOR PEER REVIEW 14 of 23 Figure 11. The classes or levels of SOC distribution in the study area.

Importance of Environmental Variables
Based on the IncNodePurity index model, the 15 most important variables for prediction of SOC distribution in the study area were VTCI, ASTAVI, EVI, lithology, NDBI, TPI, slope, X2224, LST, Band4, X424., RVI, TWI, Band7, and NDVI ( Figure 12).  A moderate SOC level was predicted in the surroundings of the urbanized area, whereas a high SOC level was predicted in forested (vegetated) areas. According to the reclassification results, 0.01% of the study area had organic SOC levels. The remaining area was covered by moderate, high, humus, and organo-humus SOC levels, with proportions of 5.11%, 2.17%, 47.51%, and 43.19%, respectively.
The humus SOC level covered about half of the study area (47.51%), spreading from east to west through the middle of the study area. The organo-humus SOC level also constituted almost half of the study area (43.19%), mainly concentrated in mountainous areas of the southern and northern parts, whereas a moderate SOC level was located in the central parts.

Importance of Environmental Variables
Based on the IncNodePurity index model, the 15 most important variables for prediction of SOC distribution in the study area were VTCI, ASTAVI, EVI, lithology, NDBI, TPI, slope, X2224, LST, Band4, X424., RVI, TWI, Band7, and NDVI ( Figure 12). Figure 11. The classes or levels of SOC distribution in the study area.

Figure 12. Variable importance measured as a function of IncNodePurity index.
According to the result, the VTCI index was the most crucial variable for the prediction of SOC among all 52 predictors, followed by VIs, lithology, and NDBI. The remaining variables, such as TPI, slope, X2224, LST, Band4, X424, RVI, TWI, Band7, and NDVI, According to the result, the VTCI index was the most crucial variable for the prediction of SOC among all 52 predictors, followed by VIs, lithology, and NDBI. The remaining variables, such as TPI, slope, X2224, LST, Band4, X424, RVI, TWI, Band7, and NDVI, occupied the remaining top 15 positions as influential variables. NDBI and LST were among the influential variables with rankings of fifth and ninth.
Similarly, the topographic variables of TPI and slope had a high impact ranking in sixth and seventh, respectively, while TWI was 13th.

Influence of Topographic and Vegetation Indices
The topographic and hydrological factors such as slope, curvature, aspect, TPI, MBI, and TWI of the study area had high variability with increased CV (Table 4). Slope, aspect, and MBI were more variable (with >50 % CV) than curvature, TPI, and TWI, which had CV values < 50%. Additionally, the Pearson correlation values of the slope, curvature, aspect, TPI, MBI, and TWI were variable. Slope, curvature, and MBI were negatively correlated to SOC, while aspect, TPI, and TWI were positively correlated.  However, only MBI and curvature were significantly correlated with SOC at (p < 0.05). Moreover, the LS factor was highly variable, with a CV of 68%. On the other hand, all VIs had a very low CV except for RVI and BI indices, which had slight variability. Additionally, the Pearson correlation values showed that NDVI, ASTAVI, GI, EVI, SAVI, CTVI, TVI, NRVI, RVI, and TSAVI_91 had a significant positive correlation with SOC concentration at p < 0.05 (Table 4).

SOC Distribution across Landform, Land Use, and Lithology
The distribution of SOC content across landforms, land use, and lithology was examined. The result shows that its distribution was highly variable across landforms with a mean CV of 78.67%. The highest SOC variation was recorded within medium-gradient mountains (SM) with a CV of 122.08% (Table 5), followed by high-gradient mountains (TM) (12.64 mg·g −1 ). Considerably high SOC contents were predicted in plain, medium-gradient hills (SH), and medium-gradient mountains (SM) with mean values of 11.94 mg·g −1 , 11.88 mg·g −1 , and 10.90 mg·g −1 , respectively. On the contrary, a lower proportion of SOC was predicted in high-gradient hills (TH) and shoreline (WR) with mean values of 7.47 mg·g −1 and 1.67 mg·g −1 , respectively.

Spatial Variability of SOC
The results of this study indicated that the RF model has good prediction performance with corresponding R 2 and RMSE values of 0.96 and 0.91 mg·g −1 , respectively. This result is consistent with previous studies, which reported RF as an accurate model for prediction of SOC. For instance, Nawar and Mouazen [60] reported high accuracies of RF with R 2 = 0.98 and RMSE = 0.10% for SOC prediction. Likewise, Forkuor et al. [55] reported a better prediction accuracy for RF than other machine learning models with R 2 of 0.26 and RMSE of 0.51. Moreover, the spatial distribution map of SOC had a similar trend with distribution of selected environmental factors including soil temperature, soil moisture, and the extent of impervious surface (see  suggesting their strong relationships. The built-up area had the lowest SOC content, but the forested and mountainous regions had the highest SOC content suggesting a high influence of impervious surface for such disparities.

Importance of Environmental Variables
VTCI was the most essential variable for the prediction of SOC among the 52 environmental predictors. The main reason for the superior influence of VTCI on the spatial prediction of SOC distribution may be attributed to its precision in measuring the crop water status and the subsequent impacts of soil moisture on the aboveground biomass [39]. Moreover, since VTCI is derived on the basis of the relationship between land surface temperature (LST) and vegetation index (NDVI), it could provide more information for the spatial prediction of SOC. Alvarez and Lavado [61] also reported that the SOC contents of the topsoil were highly correlated with moisture to temperature ratios. However, the wetness index (WI), derived as tasseled cap transformation (TCT), provided little information for prediction of SOC distribution in the study area. This result suggests that an understanding of soil moisture derived in combination with the vegetation and temperature ratio (VTCI) may be a better predictor for SOC mapping in a similar environment.
The reason for the high importance of VTCI for SOC mapping may be due to its control on the extent of vegetation cover (i.e., quantity and quality of OM entering into the soil), the rate of mineralization, and litter decomposition [62]. Moreover, since soil moisture is an essential element for microbial growth, it may facilitate the degradation of plant and animal residues that improve SOC contents [10], even though a continuous microbial activity may increase mineralization, causing depletion of SOC in the long term [20].
ASTAVI and EVI were the second and third most influential variables for the prediction of SOC distribution. The reason for their influence may be due to their ability to provide proxies for measuring aboveground biomass, which may influence SOC contents stored in the soil as a litter [63]. Additionally, the characteristics of ASTAVI (i.e., low sensitivity to soil backgrounds) may contribute to deriving more information related to SOC contents than other vegetation indices [64]. Similarly, the enhanced vegetation index (EVI) has lower soil and atmospheric effects than the other VIs used in this study. The exclusively stronger influences of the ASTAVI and EVI than the other vegetation indices suggest that VIs with minimized brightness-related soil effects (i.e., ASTAVI and EVI) may perform better than RVI, NDVI, and others (Table 1) for SOC prediction in complex landscapes such as Fuzhou city. However, previous studies did not separately use VIs on the basis of their strength in reducing background effects for predictions of SOC; rather, they used a mixture of both. For instance, Peng et al. [65] also confirmed that EVI was one of the top predictors. Compared to remote sensing raw-bands, vegetation indices performed better perhaps due to their ability to accurately inferring crop/vegetation/bare soil characteristics. This result suggested that VIs were good predictors, but VIs that significantly remove soil color effects were better predictors for SOC contents.
NDBI was one of the top predictors for SOC distribution since impervious surfaces may impact the spatial distribution of SOC. This result shows that the NDBI index can be used for SOC prediction at the city scale in a complex urban landscape where the land-use change into built-up areas is prominent. Similarly, previous studies reported that land-use change associated with urbanization processes was profoundly influencing total carbon fluxes. For instance, Raciti et al. [66] compared the carbon (C) pools in residential areas with similar soil type to forested reference sites and reported substantial variability. Hence, the selection of the NDBI variable among the top predictors suggests that land surface variables may be among highly influential predictors for modeling soil properties in complex urban landscapes. This result also suggests that residential and nonresidential areas may have a diverse pattern of SOC distribution.
Additionally, NDBI and SOC had a significant positive correlation (r = 0.26, p < 0.05), showing their relationship. The high extent of impervious surface may impact SOC contents since increased human disturbances in combination with artificial materials from buildings cause variability. Similarly, Yan et al. [3] confirmed the importance of impervious surface on SOC quantification in their study. In line with this, a few previous studies reported the negative impacts of impervious surface on SOC and microbial activities. For instance, Wei et al. [20] reported substantially low concentrations of SOC in impervious areas than open soils.
Likewise, the LST was among the most influential variables, whereby SOC showed decreasing trends when LST increased. This result is in agreement with previous research that confirmed temperature as an important variable for SOC contents [67].
The soil spectral data with 2224 nm and 424 nm wavelength, along with LANDSAT-8 raw spectral data (red and shortwave infrared 2), were among influential predictors. Similarly, other studies confirmed hyperspectral remote sensing data as key predictors for SOC [68,69]. Peng et al. [65] also reported that Landsat bands combined with VIS-NIR were efficient for the prediction of SOC in the topsoil.
Generally, biophysical land surface indices (i.e., VTCI, LST, and NDBI), brightness removed vegetation indices (i.e., ASTAVI and EVI), topographic indices (i.e., TPI and Slope), soil spectral bands (i.e., 424 nm and 2224 nm) were the most influential variables for SOC prediction in the study area, suggesting their potential importance in similar complex urban landscapes.

Influence of Topographic and Vegetation Indices
MBI (r = −0.18, p < 0.05) and curvature (r = −0.19, p < 0.05) were significantly negatively correlated with SOC contents, emphasizing the influence of the prominent variability of the topographic and hydrological factors of the study area. Another study also confirmed the high influence of landscape and hydrological variables on the SOC patchiness [70]. Furthermore, there was a negative correlation between slope and SOC that can be explained by erosion and deposition processes [71]. Similarly, Stevens [72] reported a positive correlation between SOC contents and aspect, TPI, and TWI. The positive relationship between aspect and SOC contents is explained by the direction of slopes with respect to the sun, which may cause variations in temperature, leading to decomposition. Shaded slopes may have a lower decomposition due to a low soil temperature. Low soil wetness may result in a decrease in SOC since it affects microbial activity [72], which may have a positive impact on the composition of organic materials. Variability of the LS factor may also impact the amount of deposit [73]. Compared with topographic variables, vegetation indices were better correlated with SOC contents.

SOC Distribution across the Landform, Land Use, and Lithology
The prediction of increased SOC contents in high-gradient mountains (TM), mediumgradient hills (SH), and medium-gradient mountains (SM) may be due to the availability of a large share of vegetation cover in these landforms. Additionally, landform-related lithological, moisture, and temperature variations may have influenced the SOC distribution along the altitudinal gradient [74]. The reason for decent SOC contents in plain landform may be associated with the abundance of fluvial materials. Additionally, the dominant soils were paddy soil (paddy field), with a small proportion of red soil and plaster fields (i.e., cultivate fields covered by plaster sheet). In particular, the paddy field of the area has undergone intensive agricultural practices where the application of organic fertilizers may have influenced the SOC content of the landform [8].
Low SOC contents in the high-gradient hills (TH) may be attributed to the impact of complex land use in the area and disturbance posed by intensive human activities. Kemen port, transportation hubs, and large development projects in proximity may have contributed to the decrease in SOC contents [75]. Additionally, Luoyuan Bay located in this area may also have contributed to the low SOC distribution in this landform. Wu et al. [76] reported that the soil sediments in Luoyuan Bay had a high level of eutrophication, suggesting that the SOC contents of this area may be washed into the surrounding water body. Another reason could be the increased use of nitrogen fertilizers and fossil fuels in the surrounding areas, which may have stimulated the loss of organic carbon from terrestrial soils into the surrounding water bodies through erosion [76].
In general, landform variations may contribute to changes in soil properties (clay content), human activities (i.e., land use), vegetation quality and quantity, and alteration of climatic variables (temperature and precipitations), which may affect the SOC distribution. Similarly, a previous study reported that landform elements play a significant role in the variability of SOC [77]. Even though the variability of SOC depends on geologic, climatic, topographic, vegetation, and land-use variables, landform plays a crucial role in modifying all these factors.
Additionally, the SOC distribution was highly variable across the land use. The study area was characterized by a highly urbanized downtown, which was mainly covered by impervious surfaces, to a hilly and mountainous area, which was dominated by plantation forests. Therefore, these land use/cover dynamics might contribute to SOC variations in the area. Similarly, Chuai et al. [35] reported a higher SOC density in towns, woodland, paddy land, and shallow water areas due to industrial and human influence.
Additionally, weathered residuum had the highest content of SOC. This may be due to possession of aluminosilicate red soil (acid red soil), Fe, and permeate paddy soil. Fe leaching, coupled with the high activities of microorganisms, may lead to the increased content of SOC in the weathered residuum, bauxite, and laterite lithology. The increase in soil acidity limits the mineralization of organic matter, reducing microbial activity and the rate of mineralization. On the other hand, the low oxygenation of soils such as "paddy soils" also limits microbial activity, slowing down the mineralization of organic matter. A previous study also reported increased SOC in bauxite lithology due to increased pyrite, owing to redox reactions [78].
Moreover, a considerable amount of the area dominated by laterite lithology has been used for agriculture, where there is cultivation of paddy rice and a resulting long-term application of feedlot manure to cropland. Therefore, the farm management systems may have affected the SOC contents. Similar results were reported by Lui et al. [79], who stated that long-term fertilization practices profoundly influenced the SOC content of red soil of southern China. However, the lowest content of SOC in the sandstone-dominated area may be attributed to its minimal weathering [80]. Similarly, a high concentration of SOC in siliceous red soil, red clay, and yellow-red soils can be related to their high binding capability to organic matter. Additionally, the high content of silicon material may have influenced the chemical and physical properties, since the soils of this area originated from arenaceous rock. The fluvial lithology was mainly located in the eastern parts of the Mingjian riverbanks of the study area, where the landform was dominated by plain (LP) surface. The reason for the lower concentration of SOC contents in acid red soil, silicon aluminum red soil, and red sand soils may be related to the higher content of sand and the low decomposition rate. This result is consistent with Zhang et al. [81], who reported similar results in the hilly red soil region of South China.
Similarly, previous studies reported consistent results regarding the contribution of landform, land use, and lithology to the spatial variability of SOC [50,55].

Conclusions
This study was aimed at predicting the spatial distribution of SOC in relation to environmental covariates, including land temperature, soil moisture, and extent of urbanization using the RF and CART models in the coastal city of Fuzhou city, China. To that end, a compressive set of biophysical land surface variables such as LST, VTCI, and NDBI was combined with other environmental covariates. The environmental covariates extracted from remote sensing, topography, and soil spectral sources were used to predict the SOC distribution and to select the most influential variables for the spatial prediction.
The results indicated that the RF had excellent performance for SOC prediction. The SOC content of the study area was highly variable owing to the heterogeneity of the landform, land use, lithology, surface temperature, soil moisture contents, and the rate of built-up area. Biophysical variables including soil moisture status index (VTCI), adjusted transformed soil-adjusted vegetation index (ASTAVI), enhanced vegetation index (EVI), lithology, and built-up index (NDBI) were the five most influential predictors contributing to the prediction of SOC in the study area. The results suggested that the biophysical land variables of VTCI, LST, and NDBI were good predictors. Additionally, the selection of NDBI as one of the essential predictors may provide further insight to predict SOC in residential areas.
The current study derived biophysical land variables such as soil moisture, land surface temperature, and human influence using substantially improved indices, which were often ignored in the prediction of SOC in previous studies. However, these indices were among the most influential variables for the prediction of the spatial distribution of SOC in a complex coastal urban environment of Fuzhou city. Even though the variables were derived from high-resolution Landsat-8 images, the result might be further improved in future studies by using better-resolution images such as Sentinel products.
This result shows that similar approaches and biophysical land variables can be used in other regional-and local-level SOC prediction studies in similar subtropical coastal environments.