Prediction of Soil Organic Carbon Content in Complex Vegetation Areas Based on CNN-LSTM Model

: Synthesizing bare soil pictures in regions with complex vegetation is challenging, which hinders the accuracy of predicting soil organic carbon (SOC) in specific areas. An SOC prediction model was developed in this study by integrating the convolutional neural network and long and short-term memory network (CNN-LSTM) algorithms, taking into consideration soil-forming factors such as climate, vegetation, and topography in Hainan. Compared with common algorithmic models (random forest, CNN, LSTM), the SOC prediction model based on the CNN-LSTM algorithm achieved high accuracy (R 2 = 0.69, RMSE = 6.06 g kg − 1 , RPIQ = 1.96). The model predicted that the SOC content ranged from 5.49 to 36.68 g kg − 1 , with Hainan in the central and southern parts of the region with high SOC values and the surrounding areas with low SOC values, and that the SOC was roughly distributed as follows: high in the mountainous areas and low in the flat areas. Among the four models, CNN-LSTM outperformed LSTM, CNN, and random forest models in terms of R 2 accuracy by 11.3%, 23.2%, and 53.3%, respectively. The CNN-LSTM model demonstrates its applicability in predicting SOC content and shows great potential in complex areas where obtaining sample data is challenging and where SOC is influenced by multiple interacting factors. Furthermore, it shows significant potential for advancing the broader field of digital soil mapping.


Introduction
Soil organic carbon (SOC) is intricately connected to the worldwide carbon cycle and climate change [1].SOC is one of the major controls of soil health and soil quality, as well as soil productivity, yield, and food security, so precise monitoring of SOC is crucial for evaluating soil health, improving agricultural methods, and studying variations in soil carbon reserves [2][3][4].Accurately estimating SOC is crucial for comprehending the soil organic composition and promoting effective land use and sustainable development [5,6].
Digital soil mapping (DSM) is a technology that uses spatially available covariates and geographic information systems to describe the geographical and temporal distribution of soil properties [7].It aims to map, assess, and monitor the condition and health of the soil [8][9][10].Currently, soil-forming elements such as climate, soil, organism, and topography are widely considered in predicting soil properties using DSM [11][12][13][14][15]. Climatic factors play an essential role in soil formation and development, providing an environmental context for prediction [12].Organism factors impact the accumulation and alteration of organic matter in the soil [13].Topographic factors significantly influence hydrological Land 2024, 13, 915 2 of 19 processes, affecting soil erosion and soil moisture distribution [14].Soil properties indicate the material source of soil and aid in identifying soil heterogeneity [15].
Compared to areas where bare soil period images are easily accessible, Hainan is limited by vegetation shading, which makes it difficult to obtain bare soil images covering the entire study area, limiting the prediction accuracy of SOC and making DSM more challenging [16].The results of previous studies confirmed that the DSM of complex areas using multispectral images and high-resolution digital elevation models (DEMs) as input data can achieve good results.However, there are still two main limitations in the current study [17,18].First, most of the predictive inputs used in complex area studies are short-temporal or even single-temporal data, and the slow rate of change in soil properties makes it difficult for short-term data to accurately predict soil properties [19][20][21].The lack of long time-series information may lead to an incomplete understanding of soil properties; second, in complex areas, existing models have limited learning ability for time-series inputs, which restricts the potential to quantify SOC content using multi-source remote sensing information.
Machine learning usually relies on sample and environmental covariate data to fit relationships [22].However, it may not be beneficial for intricate areas to capture temporal and geographical features of input data.Deep learning models excel in characterizing massive, complicated, and abstract data by effectively extracting nonlinear relationships, making them highly adaptable and promising for applications in the field of DSM [22].Convolutional neural networks (CNNs) and long short-term memory (LSTM) are two prominent network architectures in deep learning that specialize in handling spatial and temporal input, respectively [23,24].Utilizing CNN allows for the precise extraction of intricate and abstract characteristics from soil spatial data to address the challenge of identifying complicated soil patterns and linkages [25,26].LSTM excels in handling sequential time series data, and it can better utilize the rich surface information provided by long time series satellite images to analyze the long-term relationships existing in the time series data [27].The convolutional neural network and long short-term memory (CNN-LSTM) model, which combines the two, can simultaneously process both spatial and temporal dimensional information, which helps to accurately capture the complex nonlinear relationship between soil properties and surface features and enhance the prediction accuracy of soil properties in complex areas.In recent years, many studies have applied deep learning and machine learning algorithms to DSM.Padarian et al. used CNN and environmental data to predict SOC depth throughout Chile, and the results showed that the prediction error of the CNN model was reduced by 30% compared to the traditional technique that used only covariate point information [25].Liu et al. added volumetric weight and soil thickness based on readily accessible environmental variables such as vegetation elements and climate factors.Based on the soil attributes, such as vegetation elements and climate factors, the random forest (RF) algorithm simulated the SOC density of grassland in northern China, and the results showed that the R 2 reached 0.73 [28].Some scholars used the long-term MODIS MCD12Q2 phenology variables to compare and study the effectiveness of the CNN model and the RF model in the prediction of the SOC in Anhui Province of China, and the results showed that the prediction accuracy of the CNN model was better than that of the RF model [29].In addition, some studies have used LSTM to estimate soil properties based on hyperspectral data inputs, which can be regarded as a type of sequence data [26,30].However, the CNN-LSTM model has not been used for SOC estimation in complex areas, and its performance in predicting SOC in such areas has yet to be assessed.Additionally, its ability to predict SOC content in complex areas compared to RF, CNN, and LSTM remains unexplored.
In summary, we developed an SOC content prediction method using long-time series satellite data combined with the CNN-LSTM model.Our objectives for this study are (1) to explore the potential of the CNN-LSTM model for DSM in complex regions; (2) to assess the comparative performance of this model against RF, CNN, and LSTM models in such contexts; and (3) clarify the spatial correlation between soil-forming environmental variables and SOC content.

Study Area
The study area is situated on Hainan Island, Hainan Province, China (Figure 1), with latitudes between 18 • 09 ′ and 20 • 10 ′ N and longitudes from 108 • 37 ′ to 111 • 03 ′ E, covering a total area of 33,920 km 2 [31].The region has a tropical monsoon climate with an average annual temperature range of 22 to 26 • C and an average annual rainfall of over 1600 mm.Precipitation decreases from east to west with notable year-to-year variations [32].Hainan Island's topography is characterized by high elevation in the center and lower elevation in the surrounding regions, mostly mountains and plains, with elevations ranging from 0 to 1813 m.Rice and other crops are predominantly cultivated in the coastal lowlands, while natural forests dominate the mountainous areas in the south-central section [33].Hainan is the only tropical province in China with over 60% forest coverage, which complicates the monitoring of SOC due to the extensive vegetation [34].Therefore, an accurate estimation of SOC in Hainan is particularly necessary.

Study Area
The study area is situated on Hainan Island, Hainan Province, China (Figure 1), with latitudes between 18°09′ and 20°10′ N and longitudes from 108°37′ to 111°03′ E, covering a total area of 33,920 km 2 [31].The region has a tropical monsoon climate with an average annual temperature range of 22 to 26 °C and an average annual rainfall of over 1600 mm.Precipitation decreases from east to west with notable year-to-year variations [32].Hainan Island's topography is characterized by high elevation in the center and lower elevation in the surrounding regions, mostly mountains and plains, with elevations ranging from 0 to 1813 m.Rice and other crops are predominantly cultivated in the coastal lowlands, while natural forests dominate the mountainous areas in the south-central section [33].Hainan is the only tropical province in China with over 60% forest coverage, which complicates the monitoring of SOC due to the extensive vegetation [34].Therefore, an accurate estimation of SOC in Hainan is particularly necessary.

Data Collection and Pre-Processing
The CLORPT soil formation model explains the process of soil formation and its final characteristics and predicts soil properties through several key factors that influence soil formation.The study thoroughly incorporated climate, vegetation, soil, and topography based on the CLORPT model [11].Given the slow rate of soil change and a long time series of environmental information aids in characterizing the spatial details of soil properties, environmental covariate data before and after five years of soil sampling (2005 to 2015)

Data Collection and Pre-Processing
The CLORPT soil formation model explains the process of soil formation and its final characteristics and predicts soil properties through several key factors that influence soil formation.The study thoroughly incorporated climate, vegetation, soil, and topography based on the CLORPT model [11].Given the slow rate of soil change and a long time series of environmental information aids in characterizing the spatial details of soil properties, environmental covariate data before and after five years of soil sampling (2005 to 2015) were selected as input data in this study (Table 1) [35].This study produced monthly averages Land 2024, 13, 915 4 of 19 over ten years using meteorological and vegetation data.After conducting covariance and independence analysis, variables exhibiting a Pearson correlation coefficient greater than 0.8 with any other variable were considered redundant and subsequently removed; only factors that showed a significant connection (p < 0.05) with SOC content were kept [36].Seventeen environmental factors were selected as predictors of SOC.The data collected in this study were resampled to a 500 m resolution using bilinear interpolation and then translated to the WGS 1984 UTM 50 N projection system as input.

Soil Sample Data
The study acquired the geographic locations and SOC contents of 73 soil samples from the study region through the Center for Resource and Environmental Science and Data (www.resdc.cn,accessed on 13 March 2023) of the Chinese Academy of Sciences [37].These samples were taken from the top soil layer (0 to 20 cm), and data were collected around 2010.

Climate Data
The TerraClimate dataset combines detailed climate mean data from World-Clim (https://www.worldclim.org,accessed on 13 August 2023) with more precise monthly data from several sources to create monthly statistics of precipitation, maximum and minimum temperatures, wind speeds, and solar radiation [38].The study acquired monthly average temperature and precipitation data from the TerraClimate dataset for the study area.

Vegetation Data
The data utilized comprised the normalized difference vegetation index (NDVI), the enhanced vegetation index (EVI), and the leaf area index (LAI) to describe the density and coverage of plants supported by soil [39,40].All vegetation data were calculated from Terra MODIS (MOD09GA) imagery from the NASA Open Data Platform (https://earthdata.nasa.gov/,accessed on 20 August 2023) [41].

Soil Attribute Data
Soil sand, silt, and clay data at a spatial resolution of 250 m for the research region were acquired via the SoilGrids platform (https://soilgrids.org,accessed on 03 September 2023).To maintain spatial resolution consistency in the data, the soil normalized differential water index (NDWI) was computed from Terra MODIS (MOD09GA) imagery obtained from the NASA Open Data Platform (https://earthdata.nasa.gov/,accessed on 20 August 2023) [41].
where ρ green, ρ nir represent the green and near-infrared bands, respectively.

Topographic Data
Elevation model data were obtained from the shuttle radar topography mission-digital elevation model (SRTM-DEM) with a spatial resolution of 30 m. Slope and aspect were calculated in SAGA GIS.

Methodology
This study introduces a CNN-LSTM model designed to extract spatial feature information from topographic and soil data and capture time-varying features from climate, vegetation, and NDWI data.These are the CNN, LSTM, and CNN-LSTM models.

Convolutional Neural Network (CNN)
The CNN is a deep learning model designed for handling tasks involving gridstructured data.It extracts features from convolutional, pooling, and fully connected layers [42,43].The structure is seen in Figure 2.

Soil Attribute Data
Soil sand, silt, and clay data at a spatial resolution of 250 m for the research region were acquired via the SoilGrids platform (https://soilgrids.org,accessed on 03 September 2023).To maintain spatial resolution consistency in the data, the soil normalized differential water index (NDWI) was computed from Terra MODIS (MOD09GA) imagery obtained from the NASA Open Data Platform (https://earthdata.nasa.gov/,accessed on 20 August 2023) [41].
where  ,  represent the green and near-infrared bands, respectively.

Topographic Data
Elevation model data were obtained from the shuttle radar topography mission-digital elevation model (SRTM-DEM) with a spatial resolution of 30 m. Slope and aspect were calculated in SAGA GIS.

Methodology
This study introduces a CNN-LSTM model designed to extract spatial feature information from topographic and soil data and capture time-varying features from climate, vegetation, and NDWI data.These are the CNN, LSTM, and CNN-LSTM models.

Convolutional Neural Network (CNN)
The CNN is a deep learning model designed for handling tasks involving grid-structured data.It extracts features from convolutional, pooling, and fully connected layers [42,43].The structure is seen in Figure 2. The first layer of a CNN is a convolutional layer, which extracts local features of the data by sliding a specific convolutional kernel over the input data.Once the convolution operation is complete, an activation function (e.g., ReLU) is applied to introduce a nonlinear mapping, allowing the network to learn more complex features [44].The CNN's subsequent layers usually employ a pattern of convolutional and pooling layers, which aid in diminishing the spatial distances of local features and decreasing computing expenses [44].The fully connected layer includes an attention mechanism that focuses on processing important input variables by calculating pairwise covariances and assigning weights to highlight useful variables and suppress unimportant ones for SOC prediction [45].The first layer of a CNN is a convolutional layer, which extracts local features of the data by sliding a specific convolutional kernel over the input data.Once the convolution operation is complete, an activation function (e.g., ReLU) is applied to introduce a nonlinear mapping, allowing the network to learn more complex features [44].The CNN's subsequent layers usually employ a pattern of convolutional and pooling layers, which aid in diminishing the spatial distances of local features and decreasing computing expenses [44].The fully connected layer includes an attention mechanism that focuses on processing important input variables by calculating pairwise covariances and assigning weights to highlight useful variables and suppress unimportant ones for SOC prediction [45].
A CNN model is created in this study using two convolutional layers, two max-pooling layers, and two fully connected layers within the PyTorch framework.The study employs a convolutional layer with 2 × 2 convolutional kernels and a max-pooling layer with 2 × 2 filters.The study area's complexity may result in significant spatial heterogeneity Land 2024, 13, 915 6 of 19 in the SOC distribution.Using a 2 × 2 convolutional kernel for predicting SOC content can decrease error rates in convolutional computation and effectively utilize correlations between neighboring images [46].Additionally, maximal pooling is superior to other types of pooling operations to some degree [47].

Long Short-Term Memory (LSTM)
As shown in Figure 3, LSTM is a deep learning model designed for handling sequence data, especially well-suited for processing lengthy sequences and data with long-term correlations [48].LSTM effectively addresses the issue of forgetting and prevents gradient vanishing and explosion when dealing with long sequences [49].
of pooling operations to some degree [47].

Long Short-Term Memory (LSTM)
As shown in Figure 3, LSTM is a deep learning model designed for handling sequence data, especially well-suited for processing lengthy sequences and data with longterm correlations [48].LSTM effectively addresses the issue of forgetting and prevents gradient vanishing and explosion when dealing with long sequences [49].
LSTM consists of memory cells (c), input gates (i), forgetting gates (f), and output gates (o), and each component adds unique weights, biases, and activation functions to the inputs to achieve certain functions [50].The memory cell (c) functions as a unit responsible for transmitting and storing crucial information.The input gate (i) controls the input information at the present moment.The forgetting gate (f) determines which information to retain or discard by considering the output from the previous time step and the input from the current time step.The previous moment's output (ht−1) and the current moment's input (Xt) are combined and processed through the forgetting gate (f), input gate (i), and output gate (o) to generate the memorized information of the current moment (ct) and the output information (ht).This output is then fed back as input for the next moment.

Convolutional Neural Network-Long Short-Term Memory Model (CNN-LSTM)
As shown in Figure 4, the CNN-LSTM is a deep learning model that integrates the characteristics of CNN and LSTM.The design concept aims to maximize the utilization of CNN's spatial processing and LSTM's temporal modeling to effectively collect input data features and enhance the model's capability to address intricate issues.The design concept aims to maximize the utilization of CNN's spatial processing and LSTM's temporal modeling to effectively collect input data features and enhance the model's capability to address intricate issues.LSTM consists of memory cells (c), input gates (i), forgetting gates (f), and output gates (o), and each component adds unique weights, biases, and activation functions to the inputs to achieve certain functions [50].The memory cell (c) functions as a unit responsible for transmitting and storing crucial information.The input gate (i) controls the input information at the present moment.The forgetting gate (f) determines which information to retain or discard by considering the output from the previous time step and the input from the current time step.The previous moment's output (h t−1 ) and the current moment's input (X t ) are combined and processed through the forgetting gate (f), input gate (i), and output gate (o) to generate the memorized information of the current moment (c t ) and the output information (h t ).This output is then fed back as input for the next moment.

Convolutional Neural Network-Long Short-Term Memory Model (CNN-LSTM)
As shown in Figure 4, the CNN-LSTM is a deep learning model that integrates the characteristics of CNN and LSTM.The design concept aims to maximize the utilization of CNN's spatial processing and LSTM's temporal modeling to effectively collect input data features and enhance the model's capability to address intricate issues.The design concept aims to maximize the utilization of CNN's spatial processing and LSTM's temporal modeling to effectively collect input data features and enhance the model's capability to address intricate issues.CNN processes topographic and soil data in the study area by convolution, pooling, and other operations to extract spatial features; LSTM analyzes climate vegetation data in the study area using its distinctive gating mechanism to obtain its temporal features.Subsequently, the results processed by CNN and LSTM are aggregated together and further computed through the fully connected layer to generate the final prediction results.Integrating the unique structural benefits of CNN and LSTM enhances the model's ability to capture the factors impacting SOC content and boosts prediction accuracy.
The accuracy of the CNN-LSTM model is limited by various factors.Adjustment of the learning parameter weights (W, U) and bias (b) can enhance prediction accuracy to a certain degree.In this study, the backpropagation algorithm is used to calculate the gradient of the loss function concerning the network parameters, and the gradient information is used to update the network parameters so that the loss function is minimized [23].This study uses dropout in the fully connected layer and sets the dropout value to 0.25.By randomly dropping a small number of outputs during training, this technique not only effectively avoids the problem of overfitting but also prevents too much information loss that may lead to degradation of learning accuracy.This setup enhances the model's ability to generalize while ensuring effective learning from the data [51].CNN processes topographic and soil data in the study area by convolution, pooling, and other operations to extract spatial features; LSTM analyzes climate vegetation data in the study area using its distinctive gating mechanism to obtain its temporal features.Subsequently, the results processed by CNN and LSTM are aggregated together and further computed through the fully connected layer to generate the final prediction results.Integrating the unique structural benefits of CNN and LSTM enhances the model's ability to capture the factors impacting SOC content and boosts prediction accuracy.
The accuracy of the CNN-LSTM model is limited by various factors.Adjustment of the learning parameter weights (W, U) and bias (b) can enhance prediction accuracy to a certain degree.In this study, the backpropagation algorithm is used to calculate the gradient of the loss function concerning the network parameters, and the gradient information is used to update the network parameters so that the loss function is minimized [23].This study uses dropout in the fully connected layer and sets the dropout value to 0.25.By randomly dropping a small number of outputs during training, this technique not only effectively avoids the problem of overfitting but also prevents too much information loss that may lead to degradation of learning accuracy.This setup enhances the model's ability to generalize while ensuring effective learning from the data [51].

Correlation Analysis and Significance Test
This study used the correlation coefficient r to assess the level of linear correlation between the predictor and the predicted variables.The results of the analyses were conducted to determine statistical significance, with a threshold of p < 0.05 indicating significance.
where n represents the number of sample points, x i is the value of the predicted quantity at point sample i, y i is the measured SOC content of sample i, and x, y are the average of n predicted and observed SOC values.

Evaluation of SOC Predictions
The sample data were split into a 2/3 training set and a 1/3 validation set.The root mean square error (RMSE), ratio of performance to interquartile distance (RPIQ), and coefficient of determination (R 2 ) were used as accuracy evaluation metrics.Among them, RMSE is used to evaluate the absolute error between predicted and true values, RPIQ is used to evaluate the robustness and generalization of the model, and R 2 is used to evaluate the fitting degree of the model.They are calculated as follows: where n is the number of samples, y i is the measured SOC content of sample i, and ŷi is the predicted SOC content of sample i. IQ is the interquartile range, that is, the difference between the third quartile and the first quartile (IQ = Q 3 − Q 1 ).In general, well-performing models usually exhibit high R 2 and RPIQ values and low RMSE values.

Uncertainty Analysis of Mapping Results
To assess the uncertainty of the prediction model in mapping SOC content prediction results, we used the random forest algorithm to calculate the model uncertainty.Uncertainty reflects the robustness of the prediction model and is key to determining the confidence level of the predictions [52].In this study, the 10-fold cross-validation method was used to determine the magnitude of uncertainty by predicting the SOC content 10 times and calculating the standard deviation.The smaller the uncertainty, the more stable the model proved to be.

SOC Sample Characterization
From Table 2, the statistics of SOC content in the study area spanned a wide range (2.12 to 66.27 g kg −1 ), with a total sample mean of 15.07 g kg −1 and a coefficient of variation of 74.64%.Most of the sampling sites were situated in the flat regions surrounding the research area, with a few sites in the central mountainous sections (Figure 1).

Correlation between Predictor Variables and SOC Content
All predictor variables and SOC content were analyzed for correlation, and only those that were statistically significant (p < 0.05) were retained; the results are shown in Figure 5.The pairs of variables that were not significantly correlated were mainly centered between LAI and clay, silt, slope, precipitation, and temperature, and most of the NDVI also showed non-significant correlations.Among the selected indicators, NDVI, temperature, and LAI data of different months showed a higher correlation with SOC content.This study found that data of the same type from different time periods exhibit highly significant positive correlations while showing weak correlations with other types of data.For example, NDVI values from different months displayed strong positive correlations (r all above 0.7) and weak correlations with other datasets (|r| all below 0.5).There may be consistent patterns in the variation of similar data kinds, while the patterns between distinct data types may not follow a linear trend [53].

Correlation between Predictor Variables and SOC Content
All predictor variables and SOC content were analyzed for correlation, and only those that were statistically significant (p < 0.05) were retained; the results are shown in Figure 5.The pairs of variables that were not significantly correlated were mainly centered between LAI and clay, silt, slope, precipitation, and temperature, and most of the NDVI also showed non-significant correlations.Among the selected indicators, NDVI, temperature, and LAI data of different months showed a higher correlation with SOC content.This study found that data of the same type from different time periods exhibit highly significant positive correlations while showing weak correlations with other types of data.For example, NDVI values from different months displayed strong positive correlations (r all above 0.7) and weak correlations with other datasets (|r| all below 0.5).There may be consistent patterns in the variation of similar data kinds, while the patterns between distinct data types may not follow a linear trend [53].The definitions of all variable acronyms may be found in Table 1.The definitions of all variable acronyms may be found in Table 1.
Among the inputs, NDVI showed the strongest association with SOC, with temperature following closely behind.Furthermore, there was a significant link between precipitation and SOC.The study revealed that most input variables were temporally distributed from March to September, coinciding with the seasonal rainfall and plant growing season.It has been shown that the importance of SOC controls varies with depth, with climate dominating the controls of shallow SOC [54], while Cen et al. found that carbon inputs from vegetation can also have a significant impact on surface SOC [55].

CNN-LSTM Model Prediction Results
As shown in Figure 6, the CNN-LSTM model prediction achieved high accuracy, which reached an R 2 of 0.69, RMSE of 6.06 g kg −1 , and RPIQ of 1.96.The fit to the actual values indicates that the model can effectively predict SOC content.The error is mainly in the overestimation of low SOC content.

CNN-LSTM Model Prediction Results
As shown in Figure 6, the CNN-LSTM model prediction achieved high accuracy, which reached an R 2 of 0.69, RMSE of 6.06 g kg −1 , and RPIQ of 1.96.The fit to the actual values indicates that the model can effectively predict SOC content.The error is mainly in the overestimation of low SOC content.
The SOC prediction plot based on all covariates and the CNN-LSTM model is shown in Figure 6.The SOC content in the prediction map showed obvious spatial variability in distribution.The prediction results show that in the central part of the study area, the SOC is higher in the southern mountainous area, while the surrounding areas with gentle terrain have lower SOC content and a small number of high values are also distributed in the northern urban, mixed forest area.In the central and southern mountainous areas, the main land cover type is forest, with dense vegetation cover and complex topography; less human intervention makes the vegetation continuously input organic carbon to the soil, and the mountainous topography may slow down the water and soil loss, which makes the rate of accumulation of SOC in the soil higher.The surrounding gentle areas are dominated by arable land, and more agricultural activities, such as long-term plowing and fertilizer application, may accelerate the decomposition of the SOC content.Moreover, regions with gentle terrain are more prone to rainfall and wind erosion, further reducing SOC content.

Model Accuracy Comparison
The prediction accuracies of CNN, LSTM, and RF models are shown in Figure 7.Among these three models, the LSTM model (R 2 = 0.62, RMSE = 6.84 g kg −1 , RPIQ = 1.55) performs the best, followed by CNN (R 2 = 0.56, RMSE = 7.02 g kg −1 , RPIQ = 1.54), and the RF model (R 2 = 0.45, RMSE = 7.74 g kg −1 , RPIQ = 1.09) performs the worst in terms of both the absolute error and the goodness of fit.Overall, all three models share a common problem of overestimation at low values and underestimation at high values, with the CNN and RF models having particularly pronounced overestimation in the low-value region and the CNN and LSTM having more severe underestimation in the high-value The SOC prediction plot based on all covariates and the CNN-LSTM model is shown in Figure 6.The SOC content in the prediction map showed obvious spatial variability in distribution.The prediction results show that in the central part of the study area, the SOC is higher in the southern mountainous area, while the surrounding areas with gentle terrain have lower SOC content and a small number of high values are also distributed in the northern urban, mixed forest area.In the central and southern mountainous areas, the main land cover type is forest, with dense vegetation cover and complex topography; less human intervention makes the vegetation continuously input organic carbon to the soil, and the mountainous topography may slow down the water and soil loss, which makes the rate of accumulation of SOC in the soil higher.The surrounding gentle areas are dominated by arable land, and more agricultural activities, such as long-term plowing and fertilizer application, may accelerate the decomposition of the SOC content.Moreover, regions with gentle terrain are more prone to rainfall and wind erosion, further reducing SOC content.

Model Accuracy Comparison
The prediction accuracies of CNN, LSTM, and RF models are shown in Figure 7.Among these three models, the LSTM model (R 2 = 0.62, RMSE = 6.84 g kg −1 , RPIQ = 1.55) performs the best, followed by CNN (R 2 = 0.56, RMSE = 7.02 g kg −1 , RPIQ = 1.54), and the RF model (R 2 = 0.45, RMSE = 7.74 g kg −1 , RPIQ = 1.09) performs the worst in terms of both the absolute error and the goodness of fit.Overall, all three models share a common problem of overestimation at low values and underestimation at high values, with the CNN and RF models having particularly pronounced overestimation in the low-value region and the CNN and LSTM having more severe underestimation in the high-value region.The reason may be that the relatively simple structure of these models cannot effectively handle the complex nonlinear relationship between SOC and environmental covariates.
However, the deep learning model still achieves higher accuracy than RF even when some environmental covariates are missing, reflecting the advantage of the deep learning model in considering the changing features of spatial contextual information.The CNN-LSTM model achieved the optimal model accuracy, indicating that combining CNN and LSTM makes the model more powerful in extracting complex spatiotemporal information to deal with spatially and temporally complex features.The reason may be that the relatively simple structure of these models cannot effectively handle the complex nonlinear relationship between SOC and environmental covariates.However, the deep learning model still achieves higher accuracy than RF even when some environmental covariates are missing, reflecting the advantage of the deep learning model in considering the changing features of spatial contextual information.The CNN-LSTM model achieved the optimal model accuracy, indicating that combining CNN and LSTM makes the model more powerful in extracting complex spatiotemporal information to deal with spatially and temporally complex features.

SOC Distribution Characteristics
The spatial distribution of SOC content in Hainan based on CNN, LSTM, and RF models is shown in Figure 8. Combining Figures 7 and 8 with the CNN-LSTM model shows that the above model reveals clear shortcomings in the high SOC value part.Usually, the SOC is higher in the central and southern mountainous areas where forests are the main land cover type, while it is lower in the adjacent plains with mild terrain.However, there are some differences between these prediction maps: the CNN model can discriminate between high and low-value zones, but there is an obvious underestimation of high-value zones.The LSTM model has an overestimation of some low-value zones, and there is also a problem of not perceiving the boundary of the low and high-value zones in the south-central part of the study area, and the RF model performs poorly in general and cannot effectively distinguish the low-value zones from the high-value zones.The limitations of CNN, which focuses on capturing spatial features, and LSTM, which focuses on temporal features, prevent them from utilizing effective information more extensively, which may be the main reason for the inaccurate prediction.The RF model's approach of fitting prediction functions using existing data may need to adequately address the spatial heterogeneity in complex areas, resulting in unsatisfactory predictions for SOC.
The maps predicted using CNN and LSTM models appear to be more fragmented compared to CNN-LSTM and RF models.This may be due to the limited number of environmental variables that the model can handle, which results in the model preferring to fit the spatial features of some of the data.Compared to the other predicted maps, the maps generated using the CNN-LSTM model are smoother, suggesting that the model has a significant advantage when dealing with time-series data.

SOC Distribution Characteristics
The spatial distribution of SOC content in Hainan based on CNN, LSTM, and RF models is shown in Figure 8. Combining Figures 7 and 8 with the CNN-LSTM model shows that the above model reveals clear shortcomings in the high SOC value part.Usually, the SOC is higher in the central and southern mountainous areas where forests are the main land cover type, while it is lower in the adjacent plains with mild terrain.However, there are some differences between these prediction maps: the CNN model can discriminate between high and low-value zones, but there is an obvious underestimation of high-value zones.The LSTM model has an overestimation of some low-value zones, and there is also a problem of not perceiving the boundary of the low and high-value zones in the southcentral part of the study area, and the RF model performs poorly in general and cannot effectively distinguish the low-value zones from the high-value zones.The limitations of CNN, which focuses on capturing spatial features, and LSTM, which focuses on temporal features, prevent them from utilizing effective information more extensively, which may be the main reason for the inaccurate prediction.The RF model's approach of fitting prediction functions using existing data may need to adequately address the spatial heterogeneity in complex areas, resulting in unsatisfactory predictions for SOC.
when some environmental covariates are missing, reflecting the advantage of the deep learning model in considering the changing features of spatial contextual information.The CNN-LSTM model achieved the optimal model accuracy, indicating that combining CNN and LSTM makes the model more powerful in extracting complex spatiotemporal information to deal with spatially and temporally complex features.

SOC Distribution Characteristics
The spatial distribution of SOC content in Hainan based on CNN, LSTM, and RF models is shown in Figure 8. Combining Figure 7 and Figure 8 with the CNN-LSTM model shows that the above model reveals clear shortcomings in the high SOC value part.Usually, the SOC is higher in the central and southern mountainous areas where forests are the main land cover type, while it is lower in the adjacent plains with mild terrain.However, there are some differences between these prediction maps: the CNN model can discriminate between high and low-value zones, but there is an obvious underestimation of high-value zones.The LSTM model has an overestimation of some low-value zones, and there is also a problem of not perceiving the boundary of the low and high-value zones in the south-central part of the study area, and the RF model performs poorly in general and cannot effectively distinguish the low-value zones from the high-value zones.The limitations of CNN, which focuses on capturing spatial features, and LSTM, which focuses on temporal features, prevent them from utilizing effective information more extensively, which may be the main reason for the inaccurate prediction.The RF model's approach of fitting prediction functions using existing data may need to adequately address the spatial heterogeneity in complex areas, resulting in unsatisfactory predictions for SOC.
The maps predicted using CNN and LSTM models appear to be more fragmented compared to CNN-LSTM and RF models.This may be due to the limited number of environmental variables that the model can handle, which results in the model preferring to fit the spatial features of some of the data.Compared to the other predicted maps, the maps generated using the CNN-LSTM model are smoother, suggesting that the model has a significant advantage when dealing with time-series data.The maps predicted using CNN and LSTM models appear to be more fragmented compared to CNN-LSTM and RF models.This may be due to the limited number of environmental variables that the model can handle, which results in the model preferring to fit the spatial features of some of the data.Compared to the other predicted maps, the maps generated using the CNN-LSTM model are smoother, suggesting that the model has a significant advantage when dealing with time-series data.

Uncertainty of Prediction
Figure 9 shows the uncertainty in the results of CNN-LSTM-based prediction of SOC content.We observe that the uncertainties are mainly concentrated in the lower range.Relatively high uncertainties were observed in the central and southwestern parts of the study area.In contrast, the uncertainty is generally lower in the surrounding regions of the study area, with higher uncertainty observed in only a very small portion of the region.It is noteworthy that when observed in conjunction with Figure 6, it can be observed that there are higher uncertainties in areas with high SOC content and lower uncertainties in areas with low SOC content.

Uncertainty of Prediction
Figure 9 shows the uncertainty in the results of CNN-LSTM-based prediction content.We observe that the uncertainties are mainly concentrated in the lowe Relatively high uncertainties were observed in the central and southwestern par study area.In contrast, the uncertainty is generally lower in the surrounding re the study area, with higher uncertainty observed in only a very small portion of th It is noteworthy that when observed in conjunction with Figure 6, it can be obser there are higher uncertainties in areas with high SOC content and lower uncerta areas with low SOC content.

Importance of Variables
The importance of different input variables in the CNN-LSTM model is s Figure 10.Our results show that the five input variables with the highest import Silt, Clay, TA_1m, TA_2m, and LaI_7m.In addition, we compared the importanc ferent types of input variables.Our results showed that vegetation had the hig portance among the input data we used (50.78%),followed by meteorology (30.our study area, Silt became the variable with the highest weight, possibly due t gion's considerable spatial variability of soil texture [56].Soil texture can show c able differences between regions and soil types.At the same time, the vegetation a whole has the highest weight.This is mainly attributed to the fact that the stud located in the tropics and has a wide distribution of tropical rainforests, resulting etation being a key factor influencing SOC content [57].

Importance of Variables
The importance of different input variables in the CNN-LSTM model is shown in Figure 10.Our results show that the five input variables with the highest importance are Silt, Clay, TA_1m, TA_2m, and LaI_7m.In addition, we compared the importance of different types of input variables.Our results showed that vegetation had the highest importance among the input data we used (50.78%),followed by meteorology (30.70%).In our study area, Silt became the variable with the highest weight, possibly due to the region's considerable spatial variability of soil texture [56].Soil texture can show considerable differences between regions and soil types.At the same time, the vegetation type as a whole has the highest weight.This is mainly attributed to the fact that the study area is located in the tropics and has a wide distribution of tropical rainforests, resulting in vegetation being a key factor influencing SOC content [57].

Advantages of CNN-LSTM Model
In comparison to conventional methods, the approach in this study provides the following advantages.First, the model in this study addresses spatial heterogeneity in complex areas and enhances the ability to understand the characteristics of the data by simultaneously processing information in both spatial and temporal dimensions, taking advantage of the joint use of spatiotemporal information in the research [27].Compared to a previous SOC estimation study, including Hainan (R 2 = 0.53, RMSE = 6.74 g kg −1 ), better fitting accuracy was achieved even without distinguishing land use types [58].Mahmoudzadeh et al. [59] utilized numerous soil samples, auxiliary variables, and five machine learning techniques (K-approximation, RF, extreme gradient boosting, and support vector machine, etc.) to predict SOC for the western region of Iran (29,043 km 2 ) achieving R 2 values between 0.5 and 0.6.Therefore, it is difficult to estimate the spatial distribution of SOC accurately by relying only on machine learning models with relatively simple structures.The model in this study has a significant advantage in terms of accuracy compared to previous SOC prediction models.
In addition, the model has a significant advantage over CNN, LSTM, and RF methods for SOC prediction in complex areas.The model combines the advantages of CNN and LSTM.The LSTM model can handle more environmental covariates compared to CNN [58] and has a more complex structure and stronger feature learning and representation capabilities compared to the RF model [60].By using CNN in concert, the LSTM model significantly improves the accuracy of SOC prediction compared with using either model alone.

Environmental Variables and SOC
By observing Figures 5 and 10 and disaggregating the statistics of the predicted results (Tables A1-A4), a strong correlation can be found between vegetation compared to SOC.The use of vegetation indices is effective in predicting SOC in comparison.Thus, vegetation indices show great potential in improving the accuracy of digital soil mapping.

Advantages of CNN-LSTM Model
In comparison to conventional methods, the approach in this study provides the following advantages.First, the model in this study addresses spatial heterogeneity in complex areas and enhances the ability to understand the characteristics of the data by simultaneously processing information in both spatial and temporal dimensions, taking advantage of the joint use of spatiotemporal information in the research [27].Compared to a previous SOC estimation study, including Hainan (R 2 = 0.53, RMSE = 6.74 g kg −1 ), better fitting accuracy was achieved even without distinguishing land use types [58].Mahmoudzadeh et al. [59] utilized numerous soil samples, auxiliary variables, and five machine learning techniques (K-approximation, RF, extreme gradient boosting, and support vector machine, etc.) to predict SOC for the western region of Iran (29,043 km 2 ) achieving R 2 values between 0.5 and 0.6.Therefore, it is difficult to estimate the spatial distribution of SOC accurately by relying only on machine learning models with relatively simple structures.The model in this study has a significant advantage in terms of accuracy compared to previous SOC prediction models.
In addition, the model has a significant advantage over CNN, LSTM, and RF methods for SOC prediction in complex areas.The model combines the advantages of CNN and LSTM.The LSTM model can handle more environmental covariates compared to CNN [58] and has a more complex structure and stronger feature learning and representation capabilities compared to the RF model [60].By using CNN in concert, the LSTM model significantly improves the accuracy of SOC prediction compared with using either model alone.

Environmental Variables and SOC
By observing Figures 5 and 10 and disaggregating the statistics of the predicted results (Tables A1-A4), a strong correlation can be found between vegetation compared to SOC.The use of vegetation indices is effective in predicting SOC in comparison.Thus, vegetation indices show great potential in improving the accuracy of digital soil mapping.Critical pheno-logical periods were used for SOC prediction in some studies [19,61], suggesting that plant growth may strongly influence soil carbon inputs.Therefore, long time series of vegetation may be beneficial in improving the accuracy of predicting SOC; the biggest challenge in SOC prediction using long-time vegetation (ten years as used in this study) is the effective extraction of temporal features contained in long time series.The LSTM model can address this challenge as it is specifically designed for time series data.
Studies have shown that changes in soil use type can also significantly alter SOC [62].Other studies have shown that in a humid climate near the Mediterranean Sea, total soil carbon in the near-surface 10 cm of soil increased by 23% after 20 years of afforestation [63].Wang et al. showed that in the hilly, semi-arid climate of the Loess Plateau of China, only at the soil surface of the 5-cm depth, different land the organic carbon content of the overburden is only significant [64].This suggests that climatic conditions can have an important effect on SOC accumulation.In addition, topographic factors can have a profound effect on SOC, with soil erosion in rugged terrain altering the spatial distribution of SOC fixation effects [65] and vegetation regrowth proving to be a significant carbon sink, while erosion may be a source of carbon in erosion-sensitive areas.

Limitations and Potential Improvements
Although the proposed model has many advantages, as mentioned above, some disadvantages must be considered.There are some shortcomings in the sample data used in this paper.First, the sample dataset is small, resulting in limited feature and geospatial coverage, and a sufficiently large and well-designed number of sample points can more effectively capture geospatial and feature space.For example, Brungard et al. constructed a regionally integrated model for the Upper Colorado River Basin (432,000 km 2 ) in the western United States using 12,194 soil depth category observations with similar accuracy (56.1% to 75.0%) to the global model accuracy (62.8%) but with less uncertainty [66].Holmes et al. used nearly 40,000 geographically aligned sites as sample data to focus on improved spatial distribution prediction of the soil coarse fragment layer using digital soil mapping methods in the southwestern region of Australia (1 million km 2 ), with an overall accuracy of 0.82 to 0.92 [67].Loiseau et al. improved the spatial distribution prediction of the soil coarse fragment layer by examining the Mayenne region of France (~5000 km 2 ) by gradually reducing the number of training sites (from 7500 to 400, simulating different observation densities) to test the performance of the ordinary kriging and quartile random forest algorithms.In general, the performance metrics improve with increasing observation density [68].The results indicate that the main limiting factor in making DSM predictions is the amount of data collected in the field.
Second, it is shown that clustering the sample sets can lead to close samples in the validation and calibration sets, leading to a higher test accuracy of the model than the actual accuracy.The use of methods such as leave-alone and cross-validation can mitigate the overestimation of model accuracy due to the sample sets, to some extent, but they are often ineffective when the data are significant.Therefore, future research should emphasize the acquisition of sample data to obtain higher prediction accuracy.In this paper, the test set range and coefficient of variation are smaller than the validation set, which means this prediction does not predict the data outside the valid range.However, it will also cause the extreme value to not be verified.In addition, some studies have shown that SoilGrids2.0 may produce some bias in the local area [69,70], but the addition of this data can improve the generalization ability of the model to a certain extent [71]; of course, we will also try to use regional data in the future [36].
Finally, seventeen covariates were used for SOC prediction in this paper, which is not in line with the principle of DSM's economy.However, we removed all the data with low correlation and those that did not pass the significance test, and the remaining covariates were retained in order to ensure prediction accuracy.

Conclusions
A CNN-LSTM model was created in this research to forecast SOC in Hainan using environmental factors.Utilizing CNN and LSTM concurrently can enhance the accuracy of SOC prediction by extracting spatiotemporal features more effectively than previous models, making it a promising tool for SOC prediction in complex regions.These findings highlight the potential of CNN-LSTM models to be applied in various geographic and environmental contexts.By accurately predicting SOC, they can make a significant contribution to soil management practices and inform decisions that promote sustainable land use and agricultural productivity.In addition, they provide valuable insights into the carbon cycle, which is critical for understanding and mitigating the effects of climate change.Future research should address the limitations identified in this study.Increasing the number and density of in situ observations is essential to improve prediction accuracy.Overall, its strong data processing and precise prediction capabilities make it a valuable tool in soil carbon cycle research and management, providing crucial data for understanding soil ecosystem structure and function for soil management and conservation.

Figure 1 .
Figure 1.Location of the study area (a) and land use (b), along with elevation and distribution of sample points (c).

Figure 1 .
Figure 1.Location of the study area (a) and land use (b), along with elevation and distribution of sample points (c).

Figure 2 .
Figure 2. Schematic diagram of the CNN model structure.

Figure 2 .
Figure 2. Schematic diagram of the CNN model structure.

Figure 3 .
Figure 3. Schematic illustration depicting the architecture of the LSTM model.Where ht−2, ht−1, ht, ht+1 represent the output from time t − 2 to t + 1; ct−2, ct−1, ct, ct+1 represent the memorized information; Xt−1, Xt, Xt+1 represent the input data; f stands for the forgetting gate; i stands for the input gate; and o stands for the output gate.

Figure 3 .
Figure 3. Schematic illustration depicting the architecture of the LSTM model.Where h t−2 , h t−1 , h t , h t+1 represent the output from time t − 2 to t + 1; c t−2 , c t−1 , c t , c t+1 represent the memorized information; X t−1 , X t , X t+1 represent the input data; f stands for the forgetting gate; i stands for the input gate; and o stands for the output gate.

Figure 4 .
Figure 4. Iterative process flowchart of data in the CNN-LSTM model.

Figure 4 .
Figure 4. Iterative process flowchart of data in the CNN-LSTM model.

Figure 5 .
Figure 5. Examining the relationship between SOC in 2010 and the predicted variables in the study region.Dynamic variables are named by combining the variable name with the corresponding month.For example, EVI_9 m represents the EVI data for September following the fusion from 2005 to 2015.The definitions of all variable acronyms may be found in Table1.

Figure 5 .
Figure 5. Examining the relationship between SOC in 2010 and the predicted variables in the study region.Dynamic variables are named by combining the variable name with the corresponding month.For example, EVI_9 m represents the EVI data for September following the fusion from 2005 to 2015.The definitions of all variable acronyms may be found in Table1.

Figure 6 .
Figure 6.Spatial distribution map of prediction accuracy of the CNN-LSTM model for SOC.

Figure 6 .
Figure 6.Spatial distribution map of prediction accuracy of the CNN-LSTM model for SOC.

Figure 7 .
Figure 7. Accuracy comparison of SOC prediction models with CNN, LSTM, and RF methods.

Figure 7 .
Figure 7. Accuracy comparison of SOC prediction models with CNN, LSTM, and RF methods.

Figure 7 .
Figure 7. Accuracy comparison of SOC prediction models with CNN, LSTM, and RF methods.

Figure 9 .
Figure 9. Uncertainty distribution of SOC content based on CNN-LSTM modeling.

Figure 9 .
Figure 9. Uncertainty distribution of SOC content based on CNN-LSTM modeling.

Figure 10 .
Figure 10.Importance of different input variables for CNN-LSTM models.

Figure 10 .
Figure 10.Importance of different input variables for CNN-LSTM models.

Table 1 .
Input variables used in this study.

Table 2 .
Statistical description of soil samples.

Table 2 .
Statistical description of soil samples.

Table A2 .
Slope statistics of CNN-LSTM model prediction results.

Table A3 .
Aspect statistics of CNN-LSTM model prediction results.

Table A4 .
Land-use of CNN-LSTM model prediction results.