Next Article in Journal
Reversible Watermarking for Electrocardiogram Protection
Previous Article in Journal
XGate: Explainable Reinforcement Learning for Transparent and Trustworthy API Traffic Management in IoT Sensor Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mapping Soil Organic Carbon by Integrating Time-Series Sentinel-2 Data, Environmental Covariates and Multiple Ensemble Models

1
College of Agriculture, Tarim University, Alar 843300, China
2
Research Center of Oasis Agricultural Resources and Environment in Southern Xinjiang, Tarim University, Alar 843300, China
3
ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou 311215, China
4
College of Environment and Resources Sciences, Zhejiang University, Hangzhou 310058, China
5
Department of Land Resource Management, Jiangxi University of Finance and Economics, Nanchang 330013, China
6
Key Laboratory of Data Science in Finance and Economics of Jiangxi Province, Jiangxi University of Finance and Economics, Nanchang 330013, China
7
Department of Earth System Science, Ministry of Education Key Laboratory for Earth System Modeling, Institute for Global Change Studies, Tsinghua University, Beijing 100084, China
8
College of Horticulture and Forestry, Tarim University, Alar 843300, China
9
Key Laboratory of Tarim Oasis Agriculture, Tarim University, Ministry of Education, Alar 843300, China
10
Key Laboratory of Genetic Improvement and Efficient Production for Specialty Crops in Arid Southern Xinjiang of Xinjiang Corps, Alar 843300, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(7), 2184; https://doi.org/10.3390/s25072184
Submission received: 17 February 2025 / Revised: 26 March 2025 / Accepted: 27 March 2025 / Published: 30 March 2025
(This article belongs to the Section Optical Sensors)

Abstract

:
Despite extensive use of Sentinel-2 (S-2) data for mapping soil organic carbon (SOC), how to fully mine the potential of time-series S-2 data still remains unclear. To fill this gap, this study introduced an innovative approach for mining time-series data. Using 200 top soil organic carbon samples as an example, we revealed temporal variation patterns in the correlation between SOC and time-series S-2 data and subsequently identified the optimal monitoring time window for SOC. The integration of environmental covariates with multiple ensemble models enabled precise mapping of SOC in the arid region of southern Xinjiang, China (6109 km2). Our results indicated the following: (a) the correlation between SOC and time-series S-2 data exhibited both interannual and monthly variations, while July to August is the optimal monitoring time window for SOC; (b) adding soil properties and S-2 texture information could greatly improve the accuracy of SOC prediction models. Soil properties and S-2 texture information contribute 8.85% and 61.78% to the best model, respectively; (c) among different ensemble models, the stacking ensemble model outperformed both the weight averaging and sample averaging ensemble models in terms of prediction performance. Therefore, our study proved that mining spectral and texture information from the optimal monitoring time window, integrated with environmental covariates and ensemble models, has a high potential for accurate SOC mapping.

1. Introduction

Soil organic carbon (SOC) is a key element in the global carbon cycle and plays a vital role in addressing climate change [1,2]. The global stock of SOC in arid zones is as high as 646 Pg, which exceeds the organic carbon stock of all vegetation on earth and occupies a crucial position in the global carbon sink [3]. However, as the expansion of land reclamation in arid regions expands, water scarcity has become increasingly prominent, leading to further exacerbation of aridification [4]. Against the backdrop of intensifying aridification, SOC stocks are highly susceptible to impact [5]. Therefore, high-accuracy mapping of SOC in arid zones is essential for ensuring soil health, protecting the ecological environment, managing regional ecosystem carbon sinks and achieving the ‘4 per 1000’ soil initiative [6,7,8,9].
Digital soil mapping (DSM) has been significantly enhanced by the widespread adoption of remote sensing technology, which has emerged as a critical tool for monitoring soil properties on a large scale [10]. The Sentinel-2 (S-2) satellite offers high temporal and space resolution multispectral images, which provides valuable covariate data for fine mapping of SOC at regional, national and global [11,12] levels. Currently, research on mapping SOC using S-2 data, often relying on single or multiple temporal data, primarily focuses on agricultural ecosystems in temperate regions [13]. The monitoring accuracy is generally low, with the prediction model coefficient of determination ranging from 0.02–0.62 [13,14,15]. Single temporal data represent a single data acquisition at a specific moment, providing a static view, while multiple temporal data consist of data collected at multiple discrete time points [16]. A single temporal data only captures static surface information at a specific moment, lacking the ability to represent the dynamic processes of environmental shifts and vegetation development [16]. While observations at different times enable multiple temporal data to overcome the limitations of single temporal data, their discrete nature or focus on specific periods still limit their ability to fully capture vegetation phenology dynamics or short-term variation [17]. For monitoring soil properties, research has demonstrated that time-series data generally achieve better accuracy compared to single or multiple temporal data [17]. Time-series data are a sequence of data collected over a continuous period, enabling the capture of dynamic changes [16]. Reference [18] employed 2018–2019 time-series S-2 images to map SOC distribution across croplands in Xuanzhou, Anhui Province, China. Reference [19] predicted surface SOC in cropland across soil types in the Northern Hemisphere using time-series S-2 images for 2019–2021. Time-series S-2 data provide richer temporal information than single or multiple temporal data. However, the size of these datasets can surge exponentially, occasionally growing by an order of magnitude, leading to significant computational demand. Despite the richness of these datasets, not all temporal features are relevant for monitoring SOC. Therefore, effectively utilizing the intricate time-series S-2 and extracting the most relevant temporal features remain a critical challenge in current research.
In previous DSM studies, most have primarily used spectral information as auxiliary datasets, or have mined deeply into this data to develop multidimensional spectral indices for monitoring soil properties [20,21]. However, the spatial dimensional information and geometric shapes of soil are often neglected [22]. Remote sensing images provide various spatial features, with texture features being important variables as they effectively reveal the spatial characteristics of objects [23]. By reflecting spatial variation in image brightness, texture features can reveal the unique structural information of different land cover types, thereby providing a more intuitive depiction of spatial information in images [11]. Research has demonstrated that incorporating texture features as auxiliary variables in DSM can improve the accuracy of soil property monitoring [23]. Reference [22] improved soil type mapping accuracy by combining multiple temporal Landsat 8 spectral indices with texture features. Reference [20] improved soil salinity prediction model accuracy by combining Gaofen-2 spectral indices with texture features. However, similar attempts on SOC have rarely reported on using texture features for monitoring SOC. SOC is a key factor influencing the spatial configuration of the soil’s surface structure, with its heterogeneity reflected in the variation in image pixel greyness in remote sensing images and expressed through distinct texture features [11,22]. Therefore, texture features can provide valuable information for SOC prediction. Mining S-2 texture features and developing multidimensional texture indices have a promising future in high-accuracy SOC mapping.
The choice of environmental covariates is critical to the high accuracy of SOC mapping [24]. Topography is essential for soil formation and significantly affects the spatial distribution of surface SOC [6]. It has become a commonly used environmental covariate for SOC mapping [25]. Reference [10] reported that nearly 50% of SOC mapping studies used spectral information combined with topography as covariates and the combination of spectral information and topography has become a routine covariate for SOC prediction. The frequency of application of the SCORPAN factor is usually constrained by the accessibility of environmental covariates [25]. Therefore, soil properties that influence SOC are often neglected. These properties significantly affect the processes of SOC decomposition and accumulation, so the available information on soil layers is an important basis for highly accurate SOC mapping [26]. Reference [27] included 15 soil properties to predict SOC in agricultural fields in Jiangxi Province, China. Reference [28] showed that adding soil properties could enhance the precision of SOC prediction models. Thus, using open-source soil properties data as auxiliary variables can improve the accuracy of SOC mapping. For example, as a global open-source soil property database, SoilGrids has great potential for SOC mapping and holds significant global value for broader applications [27]. However, the time-series S-2 data and environmental covariates still contain relatively rich feature variables. Here, feature selection algorithms are important for dealing with this challenge [27]. Among the many feature selection algorithms, Boruta can identify the most relevant features and remove redundant variables, thereby improving the accuracy of SOC mapping [29].
The predictive model is another crucial aspect of DSM research [24]. Current research has primarily explored single models, particularly deep learning and machine learning methods [10]. By integrating the outputs of multiple base learners, ensemble models improve generalization and robustness, demonstrating strong potential for DSM applications [27]. For instance, Reference [30] applied a stacking ensemble model to map soil moisture in Ningxia, China, while Reference [28] utilized a weight averaging ensemble model to estimate SOC in northern Iran’s forests. Similarly, Reference [26] employed the weight averaging ensemble approach to map SOC density across China. Nevertheless, the comparative performance of different ensemble models in SOC mapping remains unexplored.
Therefore, this study seeks to explore the capability of time-series S-2 data for digital SOC mapping, by developing a high-accuracy SOC mapping strategy that integrates time-series S-2 data, environmental covariates and ensemble models. The primary goals of this study are threefold: (a) to develop a novel idea for mining the important temporal features of time-series S-2 data for mapping SOC; (b) to explore the capability of S-2 texture information and soil properties for enhancing SOC mapping; (c) to assess the comparative effectiveness of different ensemble models in SOC mapping.

2. Materials and Methods

2.1. Study Area and Soil Sample Collection

Our study site is situated in Aksu Prefecture, in the southern region of Xinjiang, China, which features a continental arid climate (Figure 1). The study area experiences a characteristic temperate continental arid climate, with the ratio of annual evaporation to precipitation close to 30 (1948 mm year−1:65.4 mm year−1) [4]. It has lower elevations to the south and higher elevations to the north (993–2121 m). A substantial volume of sediment transported by the Tianshan snowmelt has deposited at the mountain’s outlet, giving rise to a characteristic alluvial fan. Soil types mainly include meadow saline soils and brown desert soils, with high degrees of salinization and sand content. The vegetation types mainly include natural species such as Halocnemum strobilaceum, Tamarix chinensis Lour and Haloxylon ammodendron, as well as salt-tolerant crop cotton. In recent years, various degrees of desert areas have been converted into cropland to address the increasing food demands of a growing population. The continuous expansion of agricultural activities has profoundly impacted soil aggregate stability and SOC stocks, while also posing a threat to local agricultural resources and the ecological environment. In 2021, we collected 200 samples from the 0–20 cm topsoil layer along the main and secondary roads using a mixed sampling method and ensured that the landscape characteristics were consistent within the sampling area of each mixed sampling point individually. After laboratory pretreatment, SOC was analyzed using the externally heated potassium dichromate oxidation method [17]. The findings showed that SOC content varied between 0.74 g kg−1 and 13.41 g kg−1. The average SOC content was 4.82 g kg−1, with a standard deviation (SD) of 2.50 and a coefficient of variation of 51.86%. These measured SOC values were used to construct a ground-reference dataset for training and validating the SOC mapping models in this study.

2.2. Data Sources and Processing

2.2.1. Topographic Data

The digital elevation model utilized in our study is the 30 m resolution SRTM product provided by NASA. We calculated a total of 10 topographic variables, including the following: elevation, longitudinal curvature, aspect, valley depth, slope, curvature, flow direction, topographic wetness index, convergence index and channel network base level (Table 1). In ArcGIS 10.8.1, the nearest neighboring method was employed to resample topographic variables to a 10 m resolution.

2.2.2. Soil Properties Data

The soil properties data used in this study were obtained from SoilGrids 2.0, a global dataset with a 250 m resolution, freely provided by the International Soil Information Center. We downloaded a total of 11 soil properties, including the following: Vol. water content at −33 kPa, nitrogen, coarse fragments, sand, silt, Vol. water content at −10 kPa, clay, cation exchange capacity (at pH 7), pH water, bulk density and Vol. water content at −1500 kPa (Table 1). In ArcGIS 10.8.1, the nearest neighboring method was employed to resample the soil properties data to a 10 m resolution.

2.2.3. Time-Series S-2 Data

The S-2 satellite is equipped with a multispectral imager featuring 13 spectral bands and spatial resolutions of 10 m, 20 m and 60 m [12]. In this study, we acquired imagery from 2017 to 2021 through the S2_HARMONIZED dataset on the Google Earth Engine platform. To correct atmospheric effects, the Semi-Automatic Classification Plugin Atmospheric Correction tool was applied, while cloud masking was conducted with the QA60 mask band [31]. To mitigate the impact of weather conditions on a single temporal S-2 image, monthly median composites free of clouds were generated, with all spectral bands resampled to a 10 m resolution [32]. Although the S2_SR_HARMONIZED dataset provides Level-2A S-2 imagery, it lacks complete global coverage from 2017 to 2018, including our study area. Therefore, this dataset was not used in the study. This study calculated 28 spectral indices, including triangle vegetation index (TVI), redness index (RI), brightness index (BI) and enhanced vegetation index (EVI), among others. After performing collinearity analysis and assessing correlation strength, 10 spectral indices were selected (Table 2).
The study employed the Grey Level Co-occurrence Matrix with a 3 × 3 window size, to extract 8 texture features from 10 S-2 spectral bands (excluding Band 10-SWIR-Cirrus, Band 9-Water vapor and Band 1-Coastal aerosol). The advantage of a 3 × 3 window size is that it can capture the heterogeneity of pixel values within a small area [11]. Based on existing empirical formulas for spectral indices, 3 two-dimensional and 3 three-dimensional texture indices were developed (Table 2).
This study proposed a new idea for mining time-series data. It emphasizes the temporal variation in the correlation between soil properties and time-series data [33]. The variation in the correlation between SOC and time-series data arises from the differing relationships between SOC and the data at each time point. Specifically, correlation analysis was conducted to explore temporal variation patterns between SOC and time-series S-2 data. At the monthly scale, the relationship between SOC and S-2 data exhibits significant shifts, indicating that annually, there exists a specific month where the correlation between SOC and S-2 data reaches its peak. This month was considered the optimal monitoring month for SOC. However, because of variations in the rainy season’s timing or annual differences in plant phenology, the optimal monitoring month can vary between years. This range of variation was termed the optimal monitoring time window for SOC. By determining the optimal monitoring time window, we can eliminate irrelevant temporal features, thereby improving the efficiency of SOC monitoring.

2.3. Scenario Construction

This study integrated spectral indices, terrain variables, texture indices and soil properties to construct four scenarios (Table 3). Building upon the conventional modeling approach that combines spectral indices with terrain variables, we assessed the improvements in SOC prediction models by adding soil properties, texture indices and a combination of soil properties and texture indices. Finally, the contribution of these four types of variables to model performance was analyzed.

2.4. Feature Selection Algorithm

Boruta is a feature selection algorithm that leverages Random Forest to identify significant features by comparing them with randomized shadow features [32]. By using the ggplot2, Boruta and randomForest package in R Studio for feature variable selection on Scenario A, Scenario B, Scenario C and Scenario D, a total of 40, 42, 43 and 45 variables were selected, respectively. Compared with the original dataset, the quantity of the data was reduced by 63.64%, 65.29%, 78.50% and 78.72%, respectively.

2.5. Modeling Approaches and Performance Evaluation

This study utilized five base learners, classified into different categories: Partial least squares regression (PLSR) represented the linear regression model, while machine learning models included gradient eXtreme gradient boosting (XGBoost), boosting regression tree (GBRT) and random forest (RF). The deep learning approach was Multilayer Perceptron (MLP). To determine the best parameter combinations, GridSearchCV was applied for hyperparameter tuning (Table 4). Utilizing 10-fold cross-validation, the performance of three ensemble techniques—stacking, weighted averaging and simple averaging—was assessed. The evaluation metrics included the coefficient of determination (R2), mean absolute error (MAE) and root mean square error (RMSE). Quantifying model uncertainty is vital, as it enhances understanding of forecast reliability by analyzing the SD of cross-validation predictions, complementing predictive performance assessment [34].

3. Results

3.1. Correlation Analysis of Covariates with SOC

3.1.1. Correlation Between SOC and S-2 Texture Features

Figure 2 and Figure A1 show the correlation between SOC and eight texture features extracted from 10 bands of a single temporal S-2 image. Among different texture features, the correlation between SOC and Mean is higher than other texture features in all the bands and the highest value of correlation coefficient is 0.72. For texture features in different bands, the correlation between SOC and texture features derived from Band 7 (B7), B8 and B8A is significantly higher than that of other bands. In summary, the texture feature Mean had the best correlation with SOC and the texture features extracted on the B7, B8 and B8A bands had an overall better correlation with SOC. Therefore, we selected texture feature Mean on all the bands and all texture features on bands B7, B8 and B8A to develop the multidimensional texture indices.
Figure 3 shows the correlation between SOC and the newly developed multidimensional texture indices. The multidimensional texture indices could greatly enhance the correlation between SOC and texture features. The correlation coefficients of the best combination of two-dimensional texture indices DTeI, NDTeI and RTeI with SOC were 0.77, 0.74 and 0.73, respectively, which were higher than the maximum value of correlation coefficients of the one-dimensional texture features with SOC of 0.72 and the T1 and T2 in the optimal combinations were both B2-Mean and B4-Mean. The correlation coefficients of the best three-dimensional texture indices, TDTeI1 and TDTeI2, were higher than the maximum value of correlation coefficient between one-dimensional texture features and SOC, which was 0.72. TDTeI3 was lower than the maximum value of 0.72, and the correlation did not improve, so we did not consider TDTeI3 in the subsequent study. The T1, T2 and T3 in the optimal combination of three-dimensional texture indices TDTeI1 and TDTeI2 were B2-Mean, B4-Mean, B6-Mean and B2-Mean, B3-Mean, B4-Mean, respectively. Therefore, the two-dimensional texture indices of DTeI, NDTeI and RTeI and the three-dimensional texture indices of TDTeI1 and TDTeI2 could improve the correlation between SOC and texture features. We performed further mining of time-series data for these five multidimensional texture indices and the texture features involved in the development of these multidimensional texture indices.

3.1.2. Correlation Between SOC and S-2 Texture Indices

Figure 4 and 5 show the temporal variation pattern of correlation between SOC and time-series S-2 data. The correlation between SOC and time-series S-2 data exhibited an annual cyclic variation. As shown in Figure 4, the correlation between SOC and time-series S-2 spectral indices showed an increase followed by a decrease from January to April, a similar pattern from May to September, and a continuous increase from October to December. From Figure 5, the correlation between SOC and time-series S-2 texture index showed an increase followed by a decrease from January to April, and a similar pattern from May to December. Thus, throughout the annual cycle, there was always a month when the correlation between SOC and time-series S-2 data peaked, defining the optimal monitoring month for SOC. However, the optimal monitoring month varied between years, with a concentration in July and August. We determined July and August as the optimal monitoring time window for SOC. Identifying the optimal monitoring time window led to an 83.33% reduction in data volume relative to the initial time-series dataset.
The interannual variation pattern revealed a gradual decline in the correlation between SOC and time-series S-2 data, with the rate of decrease varying across different indices. However, because of the time limitations of the time-series S-2 data, the correlation between SOC and the data within the optimal monitoring time window remained strongly significant. Nevertheless, the maximum valid monitoring year for SOC has not yet been identified.

3.2. Assessment and Comparison of Multiple Scenarios by Different Ensemble Models

Table 5 displays the results of multiple ensemble models developed based on four distinct scenarios. The results indicated that the choice of scenario and ensemble model significantly impacted modeling accuracy. First, in terms of the comparison among different ensemble models, the stacking ensemble model outperformed weight averaging and simple averaging ensemble models. Second, adding soil properties and texture indices enhanced the predictive capability of the models. Compared to Scenario A, adding soil properties in Scenario B increased R2 by 0.02, and decreased MAE and RMSE by 0.06 g kg−1 and 0.07–0.08 g kg−1, respectively. Similarly, compared to scenario A, adding texture information in Scenario C increased R2 by 0.04–0.05, and decreased MAE and RMSE by 0.12 g kg−1 and 0.13–0.14 g kg−1, respectively. Finally, Scenario D, which added both soil properties and texture information, attained the greatest accuracy among the three ensemble models. Compared to Scenario A, Scenario D showed an increase in R2 by 0.06–0.07, and a decrease in MAE and RMSE by 0.18 g kg−1 and 0.19 g kg−1, respectively.

3.3. The Significance of Feature Variables

Figure 6 displays the contribution of each feature variable to the optimal predictive model. Normalizing feature variable importance to a total of 100% improved comparability, providing a more accurate depiction of their relative roles in the SOC prediction model. B4-Mean-2021.07 was the most important individual variable, with a relative importance of 6.17%. NDWI-2020.08 made the least contribution, representing just 0.68% of the total impact. Among similar feature variables, texture indices exhibited the highest relative importance, followed by spectral indices and soil properties, whereas topographic factors contributed the least. Among the texture indices, the multidimensional texture indices had a relative importance of 28.96%, indicating a significant contribution to the optimal predictive model. Among the soil properties, sand was the most important. Topography showed the lowest contribution among all the variables, reflecting the flat terrain and minimal influence of topographic factors on SOC patterns in the region. Most feature variables were concentrated in years closer to the sampling period (2019–2021), while fewer were selected from 2017–2018, as indicated by their temporal distribution. This pattern aligns with the observed correlation trend between SOC and time-series S-2 data.

3.4. Spatial Distribution of SOC and Its Uncertainty

Figure 7 illustrates the predicted SOC distribution along with the associated uncertainty, based on the three ensemble models derived from Scenario D. The SOC distribution trends predicted by the three ensemble models were generally consistent. SOC displayed significant spatial heterogeneity, with long-term cultivated farmland exhibiting higher SOC levels, while newly cultivated fields, desert regions, and mountainous areas had lower SOC content. Sustained fertilization and the practice of returning straw in long-term cultivated fields contributed to an enhanced supply of SOC, leading to higher levels. In contrast, desert and mountainous regions had lower SOC levels, primarily because of limited vegetation cover, which resulted in insufficient SOC inputs. Newly cultivated farmland, originally part of the desert, inherently had low SOC content, and due to the short cultivation period, the enhancement of soil fertility has been constrained and remains relatively limited. Despite the three ensemble models producing similar overall SOC distribution patterns, significant variations were evident in specific spatial details. In the northern mountainous and desert regions, the simple averaging and weight averaging ensemble models tended to overestimate low SOC areas compared to the stacking ensemble model. The predicted mean SOC values for the three ensemble models were 4.76 g kg−1, 4.68 g kg−1 and 4.62 g kg−1, respectively. The stacking ensemble model provided SOC predictions with a mean and range that were most similar to the ground survey data, outperforming both the simple averaging and weight averaging ensemble models. As a result, the stacking ensemble approach outperformed other methods in SOC prediction and effectively captured the overall spatial distribution pattern.
Digital SOC mapping inherently involves a certain degree of uncertainty. The three ensemble models exhibited a largely consistent pattern in SOC uncertainty distribution. SOC uncertainty was generally higher in mountainous and desert regions with low SOC content, whereas areas like farmland, where SOC levels were higher, exhibited lower uncertainty. The stacking ensemble model exhibited a lower average SOC uncertainty (0.20 g kg−1) compared to the weight averaging ensemble model (0.26 g kg−1) and the simple averaging ensemble model (0.29 g kg−1). Furthermore, the stacking ensemble model effectively reduced SOC uncertainty in areas where high uncertainty persisted in the simple averaging and weight averaging models. The heightened uncertainty in desert and mountainous regions stemmed primarily from the scarcity of sampling points, a consequence of challenging terrain and limited accessibility.

4. Discussion

4.1. Enhancing SOC Mapping with Time-Series S-2 Data

The effectiveness of remote sensing technology in predicting SOC relies on both the accessibility and quality of the imagery [13]. Compared to time-series images, both single temporal and multiple temporal images are more susceptible to the influence of weather conditions [32]. Moreover, the ability to predict SOC by analyzing soil-vegetation relationships relies on the capture of vegetation characteristics that reflect changes in SOC through remote sensing imagery [35]. However, SOC content is influenced not only by the vegetation data of the current year but also by the vegetation conditions of previous years [36]. This is because SOC represents the accumulation and decomposition of plant litter over several years in the soil [27]. As a result, long-term vegetation data have a more significant impact on SOC predictions than short-term vegetation data. Time-series images can track the continuous dynamic changes in vegetation, thus providing a more effective way to monitor SOC fluctuations over time [6]. However, redundant information in time-series data can lead to higher computational complexity, ultimately reducing model efficiency and performance. To overcome this challenge, the study introduces a time-series data mining method that determines the optimal monitoring time window and the maximum valid monitoring year for SOC by analyzing the temporal changes in the relationship between SOC and time-series S-2 data. Through feature variable selection, the most relevant temporal features for SOC monitoring were identified. The primary objective of our study is to identify the optimal monitoring time window by examining the periodic correlation trends between SOC and time-series S-2 data, subsequently extracting critical temporal features to facilitate the efficient use of complex time-series data.
Most research using satellite remote sensing for SOC monitoring and mapping has predominantly relied on either single or multiple temporal data, yet the precision of these assessments often fluctuates based on the specific timing of the observations made [19]. Identifying the optimal monitoring time window for SOC is crucial when utilizing single or multiple temporal data, as it offers a theoretical basis for choosing the most suitable period for acquiring remote sensing data and acts as a useful guideline for analogous areas. Some studies have also investigated the optimal monitoring period for soil property. Reference [37] analyzed the accuracy of soil organic matter (SOM) predictions during the bare soil period, revealing that the most reliable results were obtained in mid-May in China’s Songnen Plain. Reference [32] assessed SOM prediction accuracy from April to October, identifying the April–June period as yielding the highest accuracy. However, these studies primarily considered the annual scale, overlooking the year-to-year fluctuations in the optimal monitoring period caused by phenological variations. Unlike previous studies, our research investigated the optimal SOC monitoring window by tracking temporal variations over multiple years, allowing for a more precise assessment of both seasonal and interannual SOC dynamics. For southern Xinjiang, China, the optimal monitoring time window for SOC was identified as July–August. This period corresponds to the wet season and the peak vegetation growth phase in the study area. The wet season significantly increases precipitation, leading to a substantial rise in soil moisture. The region’s dry climate and high evaporation rates lead to salt buildup on the soil surface, which subsequently affects the soil’s spectral reflectance [4]. However, during periods of increased rainfall, the elevated precipitation facilitates the downward movement of salts into deeper soil layers, thereby diminishing their influence on the spectral reflectance associated with SOC [4]. During this time, different plant species reach their peak growth stages, and it is the period when maximum biomass is produced across various plants in this region. Given the essential role of SOC in plant growth, S-2 indices effectively capture canopy characteristics during peak vegetation periods, enhancing SOC monitoring [17].
Although we did not directly mind the maximum valid monitoring year for SOC using time-series S-2 data, the decreasing trend in the correlation between SOC and data from the optimal monitoring time window allowed us to infer the maximum valid monitoring year for SOC. Figure 8 and Figure 9 show the decreasing trend line of the optimal monitoring month correlation with SOC. Based on the decreasing trends of different indices, it can be inferred that the maximum valid monitoring year for SOC using S-2 spectral indices ranges from 13 to 49 years, while for texture indices, it ranges from 7 to 18 years.

4.2. The Importance of Mining S-2 Texture Information for Mapping SOC

In digital SOC mapping, it is crucial to identify remote sensing indicators related to SOC [24]. This is vital in arid regions, where highly salinized soil surfaces can lead to salt patches or salt crusts, weakening the soil’s spectral characteristics and resulting in the ‘different objects, same spectra’ and ‘same object, different spectra’ phenomena [23]. Therefore, it poses a major challenge for SOC mapping studies that rely only on spectral features.
Remote sensing images can provide abundant and distinct spatial information [10]. Texture features can reflect the structural feature and spatial variation in grey values in images, providing supplementary information for image attributes and thus compensating for the lack of soil spectral information [23]. By fully using the structural information of image grey distributions, differences between various land cover types can be enhanced, reducing confusion caused by mixed pixels and better characterizing the spatial distribution of the soil surface [22,23]. The results of our study indicated that adding texture information improved the accuracy of SOC prediction models. R2 increased by 0.04–0.05, while MAE and RMSE decreased by 0.12 g kg−1 and 0.13–0.14 g kg−1, respectively. In the optimal model, texture information contributed 61.78% of the overall relative importance. These results are consistent with previous studies [22,23]. Compared with previous studies, our study not only mined the potential of temporal texture features in SOC monitoring but also systematically compared the correlation between SOC and texture features in different bands of S-2. Our results indicated that the three red-edge bands, B7, B8 and B8A, were the sensitive bands for texture features to monitor SOC, and Mean was the sensitive feature. Consistent with the finding of [38], they found that the texture features extracted from the red-edge bands were more strongly correlated in monitoring aboveground biomass of potato. Since SOC is directly related to aboveground biomass and represents a cumulative result of vegetation, apoplectic material accumulated and humified in the soil through the years [27]. The sensitive bands and features identified offer a theoretical foundation and reference for monitoring SOC using S-2 texture features. Currently, research on monitoring land cover using texture information typically relies on single texture features and has not fully used the texture information in images. The multidimensional texture indices allow for the integrated application of various texture features, enabling the effective exploration of their combined potential and enhancing the precision of SOC monitoring. The multidimensional texture indices we developed enhanced the correlation between SOC and texture features and accounted for 28.96% of the relative importance. By identifying sensitive bands and developing multidimensional texture indices, a more comprehensive reflection of SOC spatial distribution can be achieved.

4.3. Evaluating the Predictive Capability of Different Ensemble Models for SOC

By integrating the strengths of multiple base learners, ensemble models significantly enhance the reliability and precision of SOC prediction [26]. Our findings indicate that the ensemble model achieves a notable improvement in accuracy over individual base learners, aligning with findings from earlier studies [28,39]. Furthermore, most prior studies have primarily focused on ensemble machine learning models [30,39]. Unlike these studies, our research not only incorporated ensembles of deep learning, machine learning, and linear regression models, but also conducted a comparison among various ensemble approaches. The findings indicated that the stacking ensemble model outperformed the others, with the weighted averaging approach ranking second, whereas the simple averaging method showed comparatively lower performance.
The variations in accuracy among different ensemble models can likely be attributed to their underlying ensemble strategies. By averaging the outputs of multiple base learners, the simple averaging ensemble model enhances prediction accuracy, addressing challenges such as the underestimation of high values and overestimation of low values, thereby lowering overall prediction variance [40]. This model serves as a basic benchmark for evaluating the performance of more complex ensemble methods. Using ‘R2 normalization’, the weighted averaging ensemble model assigns specific weights to individual base learners, combining their outputs by multiplying predictions with their assigned weights and summing the results to generate the final prediction [26]. On the other hand, the stacking ensemble model employs cross-validation to merge the predictions of various base learners, utilizing a meta-learner to integrate these outcomes [30]. This layered approach captures complex relationships and overcomes the limitations of individual models, boosting accuracy and generalization. In contrast to simple averaging and weight averaging, the stacking model benefits from its layered structure to enhance performance. The relationship between SOC and covariates can be influenced by factors such as study area, type of remote sensing data, and environmental conditions used. Additionally, differences in dataset size and sample distribution across regions make it unlikely for any single model to perform optimally in all scenarios [24]. Future studies could refine ensemble models by incorporating additional base learners and layers to improve their generalization capability and forecasting accuracy.

5. Conclusions

This study used time-series S-2 data, environmental covariates, and multiple ensemble models to create a digital SOC map for the arid regions of southern Xinjiang, China. Our results showed that: (a) the optimal monitoring time window for SOC using time-series S-2 data is July–August, and the maximum effective year is inferred to be 7–49 years; (b) the sensitive bands for monitoring SOC using S-2 texture features are B7, B8 and B8A, and the sensitive feature is Mean. The newly developed multidimensional texture indices not only improve the correlation between SOC and texture features but also accounted for 28.96% relative importance in the optimal model; (c) among the soil properties, sand is most important for the SOC prediction model. We explored the significant potential of S-2 texture features in monitoring SOC and introduced a novel approach for mining time-series data. These advancements enhance our ability to monitor SOC, which is crucial for mitigating climate change, improving soil management practices, and promoting sustainable land use strategies.

Author Contributions

Conceptualization, C.F. and S.C.; methodology, Z.C.; software, Z.C. and S.C.; validation, Z.C., S.C. and B.H.; formal analysis, N.W.; investigation, Z.C. and S.C.; resources, C.F.; data curation, Z.C. and J.P.; writing—original draft preparation, Z.C.; writing—review and editing, N.W., B.H., S.C., J.P. and C.F.; visualization, N.W.; supervision, C.F.; project administration, B.H., S.C. and J.P.; funding acquisition, B.H., S.C. and J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Tarim University President’s Fund (Grant Nos. TDZKCX202205, TDZKSS202227, and TDZKSS202350), the National Natural Science Foundation of China (Grant Nos. 42201073 and 42201054), the Jiangxi “Double Thousand plan” (Nos. jxsq202301091) and Open funding of Key Laboratory of Data Science in Finance and Economics, Jiangxi University of Finance and Economics.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author.

Acknowledgments

Acknowledgement for the data support from “SoilGrids” (https://soilgrids.org, accessed on 24 May 2024) and NASA (https://earthdata.nasa.gov/, accessed on 10 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. The correlation heatmaps between SOC and texture features of the 10 S-2 bands.
Figure A1. The correlation heatmaps between SOC and texture features of the 10 S-2 bands.
Sensors 25 02184 g0a1

References

  1. Padarian, J.; Stockmann, U.; Minasny, B.; McBratney, A.B. Monitoring Changes in Global Soil Organic Carbon Stocks from Space. Remote Sens. Environ. 2022, 281, 113260. [Google Scholar] [CrossRef]
  2. Ugbemuna Ugbaje, S.; Karunaratne, S.; Bishop, T.; Gregory, L.; Searle, R.; Coelli, K.; Farrell, M. Space-Time Mapping of Soil Organic Carbon Stock and Its Local Drivers: Potential for Use in Carbon Accounting. Geoderma 2024, 441, 116771. [Google Scholar] [CrossRef]
  3. Díaz-Martínez, P.; Maestre, F.T.; Moreno-Jiménez, E.; Delgado-Baquerizo, M.; Eldridge, D.J.; Saiz, H.; Gross, N.; Le Bagousse-Pinguet, Y.; Gozalo, B.; Ochoa, V.; et al. Vulnerability of Mineral-Associated Soil Organic Carbon to Climate across Global Drylands. Nat. Clim. Change 2024, 14, 976–982. [Google Scholar] [CrossRef]
  4. Peng, J.; Biswas, A.; Jiang, Q.; Zhao, R.; Hu, J.; Hu, B.; Shi, Z. Estimating Soil Salinity from Remote Sensing and Terrain Data in Southern Xinjiang Province, China. Geoderma 2019, 337, 1309–1319. [Google Scholar] [CrossRef]
  5. Ren, Z.; Li, C.; Fu, B.; Wang, S.; Stringer, L.C. Effects of Aridification on Soil Total Carbon Pools in China’s Drylands. Glob. Change Biol. 2023, 30, e17091. [Google Scholar] [CrossRef]
  6. Huang, H.; Yang, L.; Zhang, L.; Pu, Y.; Yang, C.; Wu, Q.; Cai, Y.; Shen, F.; Zhou, C. A Review on Digital Mapping of Soil Carbon in Cropland: Progress, Challenge, and Prospect. Environ. Res. Lett. 2022, 17, 123004. [Google Scholar] [CrossRef]
  7. Kang, Y.; Li, X.; Mao, D.; Wang, Z.; Liang, M. Combining Artificial Neural Network and Ordinary Kriging to Predict Wetland Soil Organic Carbon Concentration in China’s Liao River Basin. Sensors 2020, 20, 7005. [Google Scholar] [CrossRef]
  8. Xie, B.; Ding, J.; Ge, X.; Li, X.; Han, L.; Wang, Z. Estimation of Soil Organic Carbon Content in the Ebinur Lake Wetland, Xinjiang, China, Based on Multisource Remote Sensing Data and Ensemble Learning Algorithms. Sensors 2022, 22, 2685. [Google Scholar] [CrossRef]
  9. Vazirani, H.; Wu, X.; Srivastava, A.; Dhar, D.; Pathak, D. Highly Efficient JR Optimization Technique for Solving Prediction Problem of Soil Organic Carbon on Large Scale. Sensors 2024, 24, 7317. [Google Scholar] [CrossRef]
  10. Pouladi, N.; Gholizadeh, A.; Khosravi, V.; Borůvka, L. Digital Mapping of Soil Organic Carbon Using Remote Sensing Data: A Systematic Review. Catena 2023, 232, 107409. [Google Scholar] [CrossRef]
  11. Xiang, X.; Du, J.; Jacinthe, P.A.; Zhao, B.; Zhou, H.; Liu, H.; Song, K. Integration of Tillage Indices and Textural Features of Sentinel-2A Multispectral Images for Maize Residue Cover Estimation. Soil Tillage Res. 2022, 221, 105405. [Google Scholar] [CrossRef]
  12. Vaudour, E.; Gholizadeh, A.; Castaldi, F.; Saberioon, M.; Borůvka, L.; Urbina-Salazar, D.; Fouad, Y.; Arrouays, D.; Richer-de-Forges, A.C.; Biney, J.; et al. Satellite Imagery to Map Topsoil Organic Carbon Content over Cultivated Areas: An Overview. Remote Sens. 2022, 14, 2917. [Google Scholar] [CrossRef]
  13. Shi, P.; Six, J.; Sila, A.; Vanlauwe, B.; Van Oost, K. Towards Spatially Continuous Mapping of Soil Organic Carbon in Croplands Using Multitemporal Sentinel-2 Remote Sensing. ISPRS J. Photogramm. Remote Sens. 2022, 193, 187–199. [Google Scholar] [CrossRef]
  14. Vaudour, E.; Gomez, C.; Lagacherie, P.; Loiseau, T.; Baghdadi, N.; Urbina-Salazar, D.; Loubet, B.; Arrouays, D. Temporal Mosaicking Approaches of Sentinel-2 Images for Extending Topsoil Organic Carbon Content Mapping in Croplands. Int. J. Appl. Earth Obs. Geoinf. 2021, 96, 102277. [Google Scholar] [CrossRef]
  15. Vanongeval, F.; Van Orshoven, J.; Gobin, A. Contribution of Sentinel-2 Spring Seedbed Spectra to the Digital Mapping of Soil Organic Carbon Concentration. Geoderma 2024, 449, 116984. [Google Scholar] [CrossRef]
  16. Guo, B.; Yang, X.; Yang, M.; Sun, D.; Zhu, W.; Zhu, D.; Wang, J. Mapping Soil Salinity Using a Combination of Vegetation Index Time Series and Single-Temporal Remote Sensing Images in the Yellow River Delta, China. Catena 2023, 231, 107313. [Google Scholar] [CrossRef]
  17. Wang, J.; Feng, C.; Hu, B.; Chen, S.; Hong, Y.; Arrouays, D.; Peng, J.; Shi, Z. A Novel Framework for Improving Soil Organic Matter Prediction Accuracy in Cropland by Integrating Soil, Vegetation and Human Activity Information. Sci. Total Environ. 2023, 903, 166112. [Google Scholar] [CrossRef]
  18. He, X.; Yang, L.; Li, A.; Zhang, L.; Shen, F.; Cai, Y.; Zhou, C. Soil Organic Carbon Prediction Using Phenological Parameters and Remote Sensing Variables Generated from Sentinel-2 Images. Catena 2021, 205, 105442. [Google Scholar] [CrossRef]
  19. Castaldi, F.; Halil Koparan, M.; Wetterlind, J.; Žydelis, R.; Vinci, I.; Özge Savaş, A.; Kıvrak, C.; Tunçay, T.; Volungevičius, J.; Obber, S.; et al. Assessing the Capability of Sentinel-2 Time-Series to Estimate Soil Organic Carbon and Clay Content at Local Scale in Croplands. ISPRS J. Photogramm. Remote Sens. 2023, 199, 40–60. [Google Scholar] [CrossRef]
  20. Yang, H.; Wang, Z.; Cao, J.; Wu, Q.; Zhang, B. Estimating Soil Salinity Using Gaofen-2 Imagery: A Novel Application of Combined Spectral and Textural Features. Environ. Res. 2023, 217, 114870. [Google Scholar] [CrossRef]
  21. Cao, X.; Chen, W.; Ge, X.; Chen, X.; Wang, J.; Ding, J. Multidimensional Soil Salinity Data Mining and Evaluation from Different Satellites. Sci. Total Environ. 2022, 846, 157416. [Google Scholar] [CrossRef] [PubMed]
  22. Duan, M.; Song, X.; Liu, X.; Cui, D.; Zhang, X. Mapping the Soil Types Combining Multi-Temporal Remote Sensing Data with Texture Features. Comput. Electron. Agric. 2022, 200, 107230. [Google Scholar] [CrossRef]
  23. Duan, M.; Song, X.; Li, Z.; Zhang, X.; Ding, X.; Cui, D. Identifying Soil Groups and Selecting a High-Accuracy Classification Method Based on Multi-Textural Features with Optimal Window Sizes Using Remote Sensing Images. Ecol. Inform. 2024, 81, 102563. [Google Scholar] [CrossRef]
  24. Lamichhane, S.; Kumar, L.; Wilson, B. Digital Soil Mapping Algorithms and Covariates for Soil Organic Carbon Mapping and Their Implications: A Review. Geoderma 2019, 352, 395–413. [Google Scholar] [CrossRef]
  25. Chen, S.; Arrouays, D.; Leatitia Mulder, V.; Poggio, L.; Minasny, B.; Roudier, P.; Libohova, Z.; Lagacherie, P.; Shi, Z.; Hannam, J.; et al. Digital Mapping of GlobalSoilMap Soil Properties at a Broad Scale: A Review. Geoderma 2022, 409, 115567. [Google Scholar] [CrossRef]
  26. Sun, Y.; Ma, J.; Zhao, W.; Qu, Y.; Gou, Z.; Chen, H.; Tian, Y.; Wu, F. Digital Mapping of Soil Organic Carbon Density in China Using an Ensemble Model. Environ. Res. 2023, 231, 116131. [Google Scholar] [CrossRef]
  27. Hu, B.; Xie, M.; Zhou, Y.; Chen, S.; Zhou, Y.; Ni, H.; Peng, J.; Ji, W.; Hong, Y.; Li, H.; et al. A High-Resolution Map of Soil Organic Carbon in Cropland of Southern China. Catena 2024, 237, 107813. [Google Scholar] [CrossRef]
  28. Tajik, S.; Ayoubi, S.; Zeraatpisheh, M. Digital Mapping of Soil Organic Carbon Using Ensemble Learning Model in Mollisols of Hyrcanian Forests, Northern Iran. Geoderma Reg. 2020, 20, e00256. [Google Scholar] [CrossRef]
  29. Li, Z.; Liu, F.; Peng, X.; Hu, B.; Song, X. Synergetic Use of DEM Derivatives, Sentinel-1 and Sentinel-2 Data for Mapping Soil Properties of a Sloped Cropland Based on a Two-Step Ensemble Learning Method. Sci. Total Environ. 2023, 866, 161421. [Google Scholar] [CrossRef]
  30. Tao, S.; Zhang, X.; Feng, R.; Qi, W.; Wang, Y.; Shrestha, B. Retrieving Soil Moisture from Grape Growing Areas Using Multi-Feature and Stacking-Based Ensemble Learning Modeling. Comput. Electron. Agric. 2023, 204, 107537. [Google Scholar] [CrossRef]
  31. Dvorakova, K.; Heiden, U.; Pepers, K.; Staats, G.; van Os, G.; van Wesemael, B. Improving Soil Organic Carbon Predictions from a Sentinel–2 Soil Composite by Assessing Surface Conditions and Uncertainties. Geoderma 2023, 429, 116128. [Google Scholar] [CrossRef]
  32. Luo, C.; Zhang, W.; Zhang, X.; Liu, H. Mapping of Soil Organic Matter in a Typical Black Soil Area Using Landsat-8 Synthetic Images at Different Time Periods. Catena 2023, 231, 107336. [Google Scholar] [CrossRef]
  33. Zhang, T.T.; Qi, J.G.; Gao, Y.; Ouyang, Z.T.; Zeng, S.L.; Zhao, B. Detecting Soil Salinity with MODIS Time Series VI Data. Ecol. Indic. 2015, 52, 480–489. [Google Scholar] [CrossRef]
  34. Zhou, T.; Geng, Y.; Lv, W.; Xiao, S.; Zhang, P.; Xu, X.; Chen, J.; Wu, Z.; Pan, J.; Si, B.; et al. Effects of Optical and Radar Satellite Observations within Google Earth Engine on Soil Organic Carbon Prediction Models in Spain. J. Environ. Manage. 2023, 338, 117810. [Google Scholar] [CrossRef]
  35. Yang, R.M.; Guo, W.W. Modelling of Soil Organic Carbon and Bulk Density in Invaded Coastal Wetlands Using Sentinel-1 Imagery. Int. J. Appl. Earth Obs. Geoinf. 2019, 82, 101906. [Google Scholar] [CrossRef]
  36. Zhang, L.; Cai, Y.; Huang, H.; Li, A.; Yang, L.; Zhou, C. A CNN-LSTM Model for Soil Organic Carbon Content Prediction with Long Time Series of MODIS-Based Phenological Variables. Remote Sens. 2022, 14, 4441. [Google Scholar] [CrossRef]
  37. Luo, C.; Zhang, X.; Wang, Y.; Men, Z.; Liu, H. Regional Soil Organic Matter Mapping Models Based on the Optimal Time Window, Feature Selection Algorithm and Google Earth Engine. Soil Tillage Res. 2022, 219, 105325. [Google Scholar] [CrossRef]
  38. Liu, Y.; Fan, Y.; Feng, H.; Chen, R.; Bian, M.; Ma, Y.; Yue, J.; Yang, G. Estimating Potato Above-Ground Biomass Based on Vegetation Indices and Texture Features Constructed from Sensitive Bands of UAV Hyperspectral Imagery. Comput. Electron. Agric. 2024, 220, 108918. [Google Scholar] [CrossRef]
  39. Chen, Z.; Xue, J.; Wang, Z.; Zhou, Y.; Deng, X.; Liu, F.; Song, X.; Zhang, G.; Su, Y.; Zhu, P.; et al. Ensemble Modelling-Based Pedotransfer Functions for Predicting Soil Bulk Density in China. Geoderma 2024, 448, 116969. [Google Scholar] [CrossRef]
  40. Swain, S.R.; Chakraborty, P.; Panigrahi, N.; Vasava, H.B.; Reddy, N.N.; Roy, S.; Majeed, I.; Das, B.S. Estimation of Soil Texture Using Sentinel-2 Multispectral Imaging Data: An Ensemble Modeling Approach. Soil Tillage Res. 2021, 213, 105134. [Google Scholar] [CrossRef]
Figure 1. Study area overview: (a) spatial distribution of soil sampling locations; (b) Study area location; (c) desert region; (d) cotton field.
Figure 1. Study area overview: (a) spatial distribution of soil sampling locations; (b) Study area location; (c) desert region; (d) cotton field.
Sensors 25 02184 g001
Figure 2. The correlation heatmaps between SOC and texture features of the 10 S-2 bands.
Figure 2. The correlation heatmaps between SOC and texture features of the 10 S-2 bands.
Sensors 25 02184 g002
Figure 3. The correlations between SOC and the newly developed multi-dimensional texture indices.
Figure 3. The correlations between SOC and the newly developed multi-dimensional texture indices.
Sensors 25 02184 g003
Figure 4. Patterns of change between SOC and time-series S-2 spectral indices. (aj) represent the correlation changes between SOC and different time-series S-2 spectral indices. Note: The black solid line denotes the correlation coefficient at a significant level (P0.05 = 0.139), whereas the red solid line denotes the correlation coefficient at a highly significant level (P0.01 = 0.182). The red dashed line shows the overall trend between SOC and time-series S-2 spectral indices.
Figure 4. Patterns of change between SOC and time-series S-2 spectral indices. (aj) represent the correlation changes between SOC and different time-series S-2 spectral indices. Note: The black solid line denotes the correlation coefficient at a significant level (P0.05 = 0.139), whereas the red solid line denotes the correlation coefficient at a highly significant level (P0.01 = 0.182). The red dashed line shows the overall trend between SOC and time-series S-2 spectral indices.
Sensors 25 02184 g004
Figure 5. Patterns of change between SOC and time-series S-2 texture indices. (ai) represent the correlation changes between SOC and different time-series S-2 texture indices. Note: The red dashed line shows the overall trend between SOC and time-series S-2 texture indices.
Figure 5. Patterns of change between SOC and time-series S-2 texture indices. (ai) represent the correlation changes between SOC and different time-series S-2 texture indices. Note: The red dashed line shows the overall trend between SOC and time-series S-2 texture indices.
Sensors 25 02184 g005
Figure 6. The significance of variables in the optimal predictive model.
Figure 6. The significance of variables in the optimal predictive model.
Sensors 25 02184 g006
Figure 7. Predicted SOC (g kg−1) and SD (g kg−1) distributions derived from three ensemble models.
Figure 7. Predicted SOC (g kg−1) and SD (g kg−1) distributions derived from three ensemble models.
Sensors 25 02184 g007
Figure 8. The correlation between SOC and the optimal monitoring month of spectral indices. (aj) show the variation trends of correlations between SOC and different spectral indices across optimal monitoring months, respectively.
Figure 8. The correlation between SOC and the optimal monitoring month of spectral indices. (aj) show the variation trends of correlations between SOC and different spectral indices across optimal monitoring months, respectively.
Sensors 25 02184 g008
Figure 9. The correlation between SOC and the optimal monitoring month of texture indices. (ai) show the variation trends of correlations between SOC and different texture indices across optimal monitoring months, respectively.
Figure 9. The correlation between SOC and the optimal monitoring month of texture indices. (ai) show the variation trends of correlations between SOC and different texture indices across optimal monitoring months, respectively.
Sensors 25 02184 g009
Table 1. Topographic variables and soil properties utilized in this study.
Table 1. Topographic variables and soil properties utilized in this study.
Variable CategoriesVariablesSources
Topographic variablesElevation(https://earthdata.nasa.gov/)
(accessed on 10 May 2024)
Longitudinal curvature
Aspect
Valley depth
Slope
Curvature
Flow direction
Topographic wetness index
Convergence index
Channel network base level
Vol. water content at −33 kPa
Soil propertiesNitrogen(https://soilgrids.org)
(accessed on 24 May 2024)
Coarse fragments
Sand
Silt
Vol. water content at −10 kPa
Clay
Cation exchange capacity (at pH 7)
pH water
Bulk density
Vol. water content at −1500 kPa
Table 2. Remote sensing variables utilized in this study for SOC prediction, including their acronyms, calculation formulas and corresponding references.
Table 2. Remote sensing variables utilized in this study for SOC prediction, including their acronyms, calculation formulas and corresponding references.
VariablesPredictorsAcronymsFormulasReference
Spectral indicesNormalized Difference Vegetation IndexNDVI N I R R / N I R + R [16]
Difference Vegetation IndexDVI N I R R [16]
Enhanced Normalized Difference Vegetation IndexENDVI N I R + S W I R 2 R / N I R + S W I R 2 + R [16]
Ratio Vegetation IndexRVI N I R / R [16]
Green-Red Vegetation IndexGRVI G R / G + R [16]
Generalized Difference Vegetation IndexGDVI N I R 2 R 2 / ( N I R 2 + R 2 ) [16]
Soil-Adjusted Vegetation IndexSAVI 1.5 × N I R R / N I R + R + 0.5 [16]
Enhanced Vegetation IndexEVI 2.5 × N I R R / ( N I R + 6 × R
7.5 × B + 0.5 )
[16]
Enhanced Environment Vegetation IndexEEVI 1.5 × N I R R / N I R + R + 0.5 [16]
Normalized Difference Water IndexNDWI G N I R / G + N I R [16]
Texture indicesAngular Second MomentASM i = 0 N 1 j = 0 N 1 P i , j 2 [11]
ContrastCON i , j = 0 N 1 i P i , j i j 2 [11]
CorrelationCOR i , j = 0 N 1 i P i , j i m e a n j m e a n / v a r i v a r j [11]
DissimilarityDIS i , j = 0 N 1 i P i , j i j [11]
HomogeneityHOM i , j = 0 N 1 i P i , j / 1 + ( i j ) 2 [11]
EntropyENT i , j = 0 N 1 i P i , j ln P i , j [11]
VarianceVAR i , j = 0 N 1 p i , j i μ i 2 [11]
MeanMEA i , j = 0 N 1 p i , j [11]
Difference Texture IndexDTeI T 1 T 2 this study
Normalized Difference Texture IndexNDTeI T 1 T 2 / T 1 + T 2 this study
Ratio Texture IndexRTeI T 1 / T 2 this study
Three-Dimensional Texture Index 1TDTeI1 T 1 T 2 / T 1 + T 3 this study
Three-Dimensional Texture Index 2TDTeI2 T 1 / T 2 + T 3 this study
Three-Dimensional Texture Index 3TDTeI3 T 1 T 2 / T 1 T 2 T 2 + T 3 this study
Note: the parameters P (i, j) represent the joint probability density function of grey levels i and j in the image. T1, T2 and T3 in the constructed multidimensional texture index represent the texture features of different bands and T1 ≠ T2 ≠ T3.
Table 3. Different combinations of S-2 spectral indices, texture indices, topographic properties, and soil properties.
Table 3. Different combinations of S-2 spectral indices, texture indices, topographic properties, and soil properties.
ScenariosVariables
Scenario ASpectral indices + Topographic
Scenario BSpectral indices + Topographic + Soil properties
Scenario CSpectral indices + Topographic + Texture indices
Scenario DSpectral indices + Topographic + Soil properties + Texture indices
Table 4. Base learners hyperparameter range.
Table 4. Base learners hyperparameter range.
Base LearnersParameters Type (Range [Start, Stop, Step])
MLPhidden_layer_sizes [(50), (100), (50, 50), (100, 50), (50, 100)], learning_rate [0.01, 0.1, 0.01],
solver: [ Adam], activation: [relu]
GBRTmin_samples_leaf [1, 10, 1], n_estimators [10, 500, 10], max_depth [1, 10, 1],
learning_rate [0.01, 1, 0.01], min_samples_split [1, 10, 1]
RFmin_samples_split [1, 10, 1], max_depth [1, 10, 1], n_estimators [10, 500, 10],
min_samples_leaf [1, 10, 1]
XGBoostlearning_rate [0.01, 0.1, 0.01], max_depth [1, 10, 1], n_estimators [10, 500, 10]
PLSRn_components [1, 50, 1]
Table 5. Comparison of accuracy for four SOC predictive scenarios with three ensemble models.
Table 5. Comparison of accuracy for four SOC predictive scenarios with three ensemble models.
ScenarioStackingWeight AveragingSimple Averaging
Type of ScenarioR2MAERMSER2MAERMSER2MAERMSE
Scenario A0.821.041.280.811.071.310.801.101.34
Scenario B0.840.981.200.831.011.240.821.041.27
Scenario C0.870.921.140.850.951.180.840.981.21
Scenario D0.890.861.090.870.891.120.860.921.15
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cui, Z.; Chen, S.; Hu, B.; Wang, N.; Feng, C.; Peng, J. Mapping Soil Organic Carbon by Integrating Time-Series Sentinel-2 Data, Environmental Covariates and Multiple Ensemble Models. Sensors 2025, 25, 2184. https://doi.org/10.3390/s25072184

AMA Style

Cui Z, Chen S, Hu B, Wang N, Feng C, Peng J. Mapping Soil Organic Carbon by Integrating Time-Series Sentinel-2 Data, Environmental Covariates and Multiple Ensemble Models. Sensors. 2025; 25(7):2184. https://doi.org/10.3390/s25072184

Chicago/Turabian Style

Cui, Zhibo, Songchao Chen, Bifeng Hu, Nan Wang, Chunhui Feng, and Jie Peng. 2025. "Mapping Soil Organic Carbon by Integrating Time-Series Sentinel-2 Data, Environmental Covariates and Multiple Ensemble Models" Sensors 25, no. 7: 2184. https://doi.org/10.3390/s25072184

APA Style

Cui, Z., Chen, S., Hu, B., Wang, N., Feng, C., & Peng, J. (2025). Mapping Soil Organic Carbon by Integrating Time-Series Sentinel-2 Data, Environmental Covariates and Multiple Ensemble Models. Sensors, 25(7), 2184. https://doi.org/10.3390/s25072184

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop