1. Introduction
Soil Organic Matter (SOM) and Soil Organic Carbon (SOC) are crucial for agriculture and forest environments [
1,
2], several ecosystem functions [
3,
4], and also have potential in mitigating climate change [
5,
6,
7].
In the agriculture sector, SOM and SOC are essential for maintaining soil fertility [
8], increasing crop production [
9], and soil water content [
10,
11].
In ref. [
12], the authors evaluate the short-term effects of repeated application of solid anaerobic digestate on the fertility of clay soil in an olive grove, highlighting a gradual improvement in soil chemical and biological parameters and increased microbial efficiency.
Regarding soil fertility, in ref. [
13], the authors highlight that the introduction of mixed-species grasslands, particularly those including legumes, into crop rotations can mitigate the negative effects of agricultural simplification. This approach improves SOC content, biodiversity, and nutrient use efficiency, enhancing ecosystem services, reducing dependence on external fertilizers, and providing benefits to subsequent crops as well; however, they emphasize the need for further research, especially multi-site studies, to optimize species composition and maximize these services and agronomic performance.
In recent years, the integration of remote sensing data and machine learning techniques has significantly improved the assessment of SOC and carbon dynamics at multiple spatial scales. Multispectral satellite data, such as those provided by Sentinel and Landsat missions, are widely used to derive spectral indices related to vegetation, moisture, and land surface properties, enabling indirect estimation of SOC and carbon stocks [
14,
15,
16].
Machine learning algorithms, such as Random Forest, Support Vector Machines, and deep learning approaches, effectively capture non-linear relationships between field observations and remotely sensed data, resulting in improved prediction accuracy and spatial generalization [
17,
18,
19]. Several studies have shown that these methods can outperform traditional statistical approaches in modeling SOC variability, particularly in heterogeneous and data-scarce environments [
20,
21].
Furthermore, remote sensing-based approaches enable continuous monitoring of land-use and land-cover (LULC) changes, which are key drivers of carbon dynamics [
22,
23]. These techniques reduce reliance on destructive laboratory methods and provide cost-effective tools for large-scale and long-term monitoring of soil carbon stocks [
24,
25]. Despite these advances, important gaps remain. Most existing studies focus on temperate or humid regions, while semi-arid Mediterranean environments such as Tunisia remain underrepresented [
26,
27]. Furthermore, few studies have combined remote sensing classification, ecosystem modeling (e.g., InVEST), and spatial simulation (e.g., Cellular Automata) to provide an integrated assessment of carbon dynamics under different land-use scenarios. This study addresses these gaps by applying a combined framework of remote sensing, machine learning, and ecosystem modeling to evaluate past and future carbon dynamics in the Sfax Governorate, Tunisia. By doing so, it provides new insights into the drivers of carbon change in semi-arid regions and supports sustainable land-use planning and climate mitigation strategies.
2. Materials and Methods
2.1. Study Area
The study area corresponds to the Sfax Governorate, located in central-eastern Tunisia along the Mediterranean coast. Study area is shown in
Figure 1. It covers approximately 7545 km
2, representing about 5% of the national territory, and hosts nearly one million inhabitants, making it the second-largest urban agglomeration in Tunisia after the capital. The region has experienced rapid urbanization in recent decades, with significant expansion of built-up areas and increasing pressure on natural and agricultural land. The climate is arid to semi-arid Mediterranean, characterized by hot, dry summers and mild winters. Annual rainfall is low and highly variable, with an average of 215 mm between 1963 and 2017, and the mean annual temperature is 19.4 °C [
28]. This climatic variability, combined with recurrent droughts, strongly influences vegetation cover and soil carbon dynamics, making the ecosystem particularly vulnerable to degradation.
Topographically, the governorate is dominated by flat coastal plains and inland steppe landscapes, with altitudes ranging from 0 to 250 m. The soils are mainly sandy and sandy clay, often saline or gypseous, with low organic matter content, making them highly vulnerable to degradation. The geology consists of Mio–Pliocene outcrops and Quaternary deposits composed of gravels, sands, conglomerates, and silts [
29]. Land use is dominated by agriculture, including olives, almonds, vineyards, cereals, and greenhouse crops, while natural vegetation is sparse and mainly consists of steppic shrubland and degraded grassland [
30]. These environmental and socio-economic characteristics highlight the importance of monitoring carbon dynamics in the region, where land-use change and climatic stressors jointly affect soil fertility and ecosystem resilience.
2.2. Random Forest and LULC
Multispectral satellite data from the Sentinel-2 mission (Level-2A surface reflectance) were used for the LULC classification. Sentinel-2 provides high-resolution optical imagery (10–20 m) with a revisit time of 5 days, making it suitable for land-cover mapping and environmental monitoring. The following table,
Table 1, summarizes, for each classification year, the satellite sensor, processing level, temporal range, number of images used, and cloud coverage threshold.
For each reference year (2019, 2020, 2022, and 2024), a median composite image was generated within the Google Earth Engine (GEE) platform using all available images with cloud coverage lower than 20%. This approach reduces atmospheric noise and ensures consistent spatial representation across the study period. A cloud masking procedure was applied to further improve data quality.
To enhance class separability, the composite included Red, Green, Blue (RGB) bands and the main spectral indices:
Normalized Difference Vegetation Index (NDVI);
Modified Normalized Difference Water Index (MNDWI);
Normalized Built-up Index (NBI);
Red-Edge bands.
LULC classification was performed using the Random Forest (RF) algorithm, configured with 100 decision trees. Five mutually exclusive classes were defined: built-up, agricultural soil, natural vegetation, bare soil, and water.
The RF algorithm was selected due to its demonstrated robustness and high performance [
31,
32], particularly for LULC classification based on multispectral satellite data [
33]. Previous studies have shown that RF effectively handles non-linear relationships, reduces overfitting, and integrates multiple spectral features, making it well-suited for complex and heterogeneous environments [
34,
35]. For example, in ref. [
36], the RF model achieved an overall accuracy (OA) of 0.79 and a macro F1-score of 0.72 for the third level of a LULC classification. Similarly, ref. [
37] showed that the RF algorithm outperformed both SVM and ANN algorithms with an average overall accuracy of 0.97, kappa coefficient of 0.98, producer’s accuracy of 0.99, and user’s accuracy of 0.97, surpassing the accuracies achieved by SVM (0.96, 0.97, 0.98, and 0.97) and ANN (0.89, 0.81, 0.94, and 0.88). In ref. [
38], three classification algorithms were employed: Classification and Regression Trees (CARTs), Support Vector Machine (SVM), and Random Forest (RF). The Kappa coefficients obtained from the application of these models to Sentinel-2 imagery were 94%, 95%, and 97%, respectively, while the average overall accuracies were 96.25%, 97%, and 98.68%. The results clearly indicate that, within the context of the classification and comparison conducted, the Random Forest (RF) algorithm outperformed both the Support Vector Machine (SVM) and Classification and Regression Trees (CARTs) algorithms.
The “natural vegetation” class includes both shrubland and sparse forested areas typical of semi-arid environments. These land-cover types are characterized by discontinuous vegetation cover and relatively lower biomass compared to dense forests.
Model training and validation were based on a balanced ground truth dataset consisting of 600 manually labeled points (approximately 120 per class). The dataset was split into 70% for training and 30% for independent testing to evaluate classification accuracy and model generalization. The samples were spatially distributed to capture the variability of land-use classes across the study area. The dataset represents a common sample size for supervised classification in medium-resolution remote sensing studies.
2.3. Carbon Stock, Carbon Emissions and Net Ecosystem Carbon Balance
Carbon storage was quantified using the InVEST [
39] Carbon Storage and Sequestration model, implemented through the graphical user interface on a Windows 10 environment.
The model estimates aboveground, belowground, soil, and dead organic carbon pools based on land-cover data and class-specific carbon coefficients from the literature. The model requires two main inputs:
The biophysical table was constructed using the values reported in
Table 2 of [
40], assigning specific carbon stocks (Mg C ha
−1) for each carbon pool (C_above, C_below, C_soil, and C_dead) to all mapped LULC categories (urban, forest, agriculture, water, and bare land). The generated LULC rasters were imported into InVEST and processed individually to produce a carbon stock map for each year. The adopted carbon density values should be interpreted as representative average values for each land-cover class, rather than site-specific measurements, and are considered suitable for regional-scale analysis.
For the “natural vegetation” class, carbon density values were assigned based on references from the literature representative of semi-arid ecosystems. Although shrublands and sparse forests may exhibit different carbon storage capacities, they were aggregated into a single class due to the limitations of spectral separability in medium-resolution satellite imagery.
To assess temporal variations in carbon storage, the InVEST model was also run in sequestration mode, which requires a current and a future land-cover input. The LULC 2020 map was used as the “current” scenario and the LULC 2024 map as the “future” scenario. The resulting sequestration output quantifies the net amount of carbon gained or lost between the two dates, expressed in (1):
where
and
represent carbon storage values for each pixel.
This output identifies spatial patterns of carbon accumulation (positive values) and carbon release (negative values) directly driven by land-use and land-cover (LULC) changes.
To quantify carbon dynamics within the study area, the Net Ecosystem Carbon Balance (NECB) was computed. This metric integrates two main components: (i) the change in carbon stock between two time steps, estimated using the InVEST Carbon Storage and Sequestration model, and (ii) carbon emissions and removals associated with different LULC classes, derived from class-specific emission factors expressed in Mg C ha−1 yr−1.
The emissions term represents net carbon fluxes associated with different LULC classes, expressed in Mg C ha
−1 yr
−1. These values include both carbon emissions (e.g., from built-up or degraded areas). Emission factors were assigned to each LULC class based on values reported in the literature and consistent with IPCC guidelines for land-based carbon accounting [
41,
42].
These factors are simplified representations of complex carbon processes and are used to estimate the contribution of each land-use class to the overall carbon balance within the study area.
Positive values represent carbon emissions, while negative values indicate carbon uptake.
The NECB is defined as (2)
where ΔC_stock represents the change in total carbon stock between the initial and final year (Mg C pixel
−1), and emissions represent the cumulative carbon flux associated with LULC classes over the study period (Mg C pixel
−1).
A positive NECB indicates that the ecosystem acts as a carbon sink, whereas a negative NECB denotes a net carbon source.
The NECB was computed through raster-based operations in Python, Vers. 3.10 by subtracting cumulative emissions from spatially explicit carbon stock changes. All raster outputs generated by the InVEST model were further processed to derive total carbon stock (Mg C), class-specific contributions, and spatial patterns of carbon gains and losses.
2.4. Future Scenarios
In order to provide the potential development of LULC changes and their implication for carbon dynamics, three scenarios were simulated for the future year 2030. Three scenarios were defined: business-as-usual (BAU), which follows recent urbanization trends; conservation-oriented (CONS), which represents a restrictive development policy; and urban expansion (URB+), which describes an accelerated urban growth scenario.
The three future scenarios were defined according to the total relative increase in built-up area by 2030 compared with the 2024 baseline: CONS = +2%, BAU = +5%, and URB+ = +10%. These values represent scenario-based growth fractions, not absolute annual expansion rates expressed in hectares per year. The corresponding annual urban demand was automatically calculated by the CA model by converting the total additional built-up area into yearly increments over the 2024–2030 period. Future LULC maps for each scenario were generated using a Cellular Automata model, implemented in Python. The model was based on three parameters:
Transition pressure: the amount of neighboring urban cells in a 3 × 3 moving window is used to quantify a pixel’s suitability for becoming urban.
Specific land demand: the number of new urban pixels is specific for each scenario.
Exclusion rules: water class is never converted.
At each step, the model selects the eligible pixels that will be converted into urban, resulting in a realistic, spatially explicit simulation of probable urban growth patterns.
To assess the performance of the CA model, a backcasting validation was conducted by simulating the 2024 LULC map using 2022 as the initial condition and the observed urban growth between 2022 and 2024.
Finally, carbon stock, emissions, and NECB were calculated for each scenario as described in the previous section.
3. Results
3.1. Random Forest and LULC
RF classification scored consistently high performance across all years, demonstrating the robustness of the feature set and the stability of the spectral indices in the study. Overall accuracy ranged from 0.8448 to 0.9018 and Kappa values between 0.8010 and 0.8744, indicating strong agreement between reference and classified samples. Metrics per year are reported in the following table,
Table 3.
The confusion matrices (
Figure 2) reveal that most LULC classes were correctly identified, with particularly high accuracy for water bodies and natural vegetation, which showed minimal misclassification across all years. Built-up areas and agriculture also exhibited good classification performance, although some confusion was observed between agriculture and bare soil, as well as between built-up and bare surfaces, especially in the most recent years.
3.2. LULC and Change Detection
Significant changes were observed during the study period, as can be seen from the LULC maps in
Figure 3.
Agricultural land remains the dominant class for extension, but a constant decrease was observed over time, while built-up areas showed a continuous increase in contrast with natural vegetation areas, suggesting a trend of anthropogenic pressure. Results are shown in the following table,
Table 4.
3.3. Carbon Stock, Carbon Emission and Net Ecosystem Balance
The carbon stock analysis showed a progressive decline between 2019 and 2024. The major contribution to carbon storage is due to natural areas, while built-up and bare soil present show lower carbon densities. Carbon stock values and carbon stock variations between two consecutive years are shown in the following table,
Table 5.
The following figure,
Figure 4, presents a graphical variation in the carbon stock variation through the years for each LULC.
The following figure,
Figure 5, represents the spatial distribution of carbon stock during the temporal distribution.
Regarding the emissions, positive values were associated with emission-dominated classes, i.e., built-up, while negative values were assigned to natural vegetation to represent carbon uptake. Estimated carbon emissions increased over the analyzed period, primarily driven by the expansion of built-up areas and the reduction in vegetated surfaces. Built-up areas contributed the highest emissions, followed by agricultural land, while natural vegetation acted as a carbon sink.
The spatial distribution of emissions, as shown in
Figure 6, highlights hotspots of carbon release in rapidly urbanizing zones, particularly around peri-urban areas. These findings emphasize the role of land-use change as a key driver of carbon emissions in the study area.
The NECB analysis revealed a negative carbon balance over the study period, indicating that the study area acted as a net carbon source. This condition is primarily driven by increasing emissions associated with urban expansion and decreasing carbon storage due to vegetation loss.
Spatially, negative NECB values were concentrated in urban and transitional areas, while limited positive values were observed in zones dominated by natural vegetation, as shown in
Figure 7. Overall, the results confirm a shift towards reduced ecosystem capacity to store carbon.
3.4. Future Scenarios
To assess the performance of the Cellular Automata model, a backcasting validation was conducted by simulating the 2024 LULC map using 2022 as the initial condition and the observed urban growth between 2022 and 2024.
The comparison between simulated and observed maps, after reclassification into urban and non-urban categories, yielded a Kappa coefficient of 0.712, indicating a strong agreement in the overall spatial pattern. The Cellular Automata simulation produced distinct land-use configurations for the year 2030 under three scenarios: conservative (CONS), business as usual (BAU), and urban expansion (URB+). Future LULC are shown in the following,
Figure 8.
The following figure,
Figure 9, shows the spatial distribution of the carbon stock for the future scenarios:
These land-use changes have had a direct impact on carbon stock and carbon emissions. Total annual emissions increased progressively from CONS to URB+, reaching approximately 5.19 × 105 Mg C yr−1, 5.34 × 105 Mg C yr−1, and 5.59 × 105 Mg C yr−1, respectively. Similarly, cumulative emissions for the period 2024–2030 ranged from 8.24 × 105 Mg C in the CONS scenario to 9.43 × 105 Mg C in the URB+ scenario, confirming the strong influence of urban expansion on carbon release. Emission data for future scenarios are shown in the following figure.
The Net Ecosystem Carbon Balance (NECB) under future scenarios is summarized in
Figure 10, which integrates carbon sequestration, cumulative emissions, and the resulting net balance for each scenario.
All scenarios exhibit a negative NECB, indicating that the study area is projected to remain a net carbon source under future conditions. However, significant differences emerge among the scenarios. The CONS scenario shows the lowest magnitude of carbon loss, reflecting the reduced urban expansion and a relatively more stable balance between carbon storage and emissions. In contrast, the BAU scenario presents a moderate increase in carbon deficit, consistent with ongoing urban growth trends.
The URB+ scenario shows the most critical condition, with the highest emissions and the lowest sequestration capacity, resulting in the most negative NECB values. This outcome highlights the strong impact of accelerated urban expansion on carbon dynamics, leading to a substantial reduction in the ecosystem’s ability to retain carbon.
4. Discussion
This study provides a comprehensive assessment of carbon dynamics in the Sfax Governorate by integrating remote sensing data, machine learning classification, ecosystem modeling, and spatial simulation. This framework emphasizes the use of geospatial technologies and artificial intelligence to monitor and predict carbon-related processes at regional scales.
The LULC analysis revealed a clear trend of urban expansion and vegetation loss between 2019 and 2024. These changes significantly influenced carbon dynamics, as natural vegetation was identified as the main contributor to carbon storage, while built-up and bare soil classes exhibited low carbon densities. This confirms the critical role of vegetation in maintaining soil carbon balance, particularly in semi-arid environments where ecosystem resilience is already limited. The observed carbon dynamics can be explained by the interaction of multiple driving mechanisms. The main factors include urban expansion, which increases carbon emissions and reduces vegetated areas, while the decline of natural vegetation decreases carbon sequestration capacity. In addition, climatic variability, particularly drought conditions, limits biomass production. Soil characteristics typical of semi-arid environments, such as low organic matter content and salinity, contribute to reduced carbon storage potential. These interacting drivers collectively explain the negative carbon balance observed in the study area.
The integration of the InVEST model enabled spatial quantification of carbon stock and its temporal variation, providing a robust tool to link land-use patterns with ecosystem services. The observed decline in carbon stock reflects the ongoing degradation of vegetated areas and highlights the vulnerability of soil carbon to land-use change. These findings are consistent with recent studies that underline the importance of monitoring carbon stocks using remote sensing and modeling approaches.
The emission analysis further demonstrates that land-use change is a primary driver of carbon fluxes. The increase in emissions over time is directly associated with urban growth and the conversion of agricultural and natural areas into impervious surfaces. This process not only increases carbon emissions but also reduces the capacity of the ecosystem to sequester carbon, amplifying the overall carbon imbalance.
NECB provides a synthetic indicator of these dynamics, clearly showing a transition towards a net carbon source. All future scenarios exhibit negative NECB values, although with different magnitudes. The conservation scenario (CONS) shows a reduced carbon deficit compared to BAU and URB+, demonstrating that limiting urban expansion can partially mitigate carbon losses. Conversely, the URB+ scenario highlights the potential consequences of uncontrolled urban growth, leading to a significant increase in emissions and a substantial reduction in carbon storage capacity.
These findings also have policy implications, as urban expansion in semi-arid regions directly affects carbon management strategies. Integrating land-use planning with carbon mitigation policies could help limit emissions and preserve ecosystem resilience.
From a methodological perspective, the integration of CA modeling with carbon assessment tools provides a useful approach for simulating future land-use scenarios and their environmental impacts. This framework enables spatially explicit projections that support decision-making in land management and climate mitigation. The CA model adopted in this study is intentionally simplified and primarily based on neighborhood interactions, scenario-specific land demand, and exclusion rules. While this approach allows for a clear and computationally efficient representation of urban expansion, it does not explicitly account for other important spatial drivers commonly used in urban growth modeling, such as proximity to road networks, urban centers, or topographic constraints. As a result, the simulated urban expansion patterns may tend to be more spatially homogeneous and may underestimate the influence of accessibility and infrastructure.
Despite these strengths, some limitations should be acknowledged. The use of class-based emission factors represents a simplified approximation of carbon fluxes and does not account for local variability in land management practices or soil properties. Furthermore, the simplified structure of the CA model, which does not incorporate several important spatial and socio-economic drivers of urban expansion, such as distance to transportation networks, accessibility to urban centers, land suitability, and economic development factors. The omission of these variables may affect the spatial realism of the simulated patterns, potentially leading to an overestimation of neighborhood-driven growth dynamics.
Main limitations are exposed in depth in the following sub-paragraph.
Main Limitations
The decline in carbon stock observed between 2019 and 2024 cannot be attributed solely to land-use and land-cover changes. In semi-arid Mediterranean regions such as Sfax, climatic variability plays a decisive role in shaping carbon dynamics. The governorate experiences recurrent droughts and highly irregular rainfall, with long dry seasons and occasional extreme events that reduce vegetation cover and limit biomass production. These conditions accelerate soil organic matter decomposition and reduce the capacity of ecosystems to act as carbon sinks. Furthermore, the predominance of sandy and saline soils with low organic matter content increases vulnerability to degradation, as these soils have limited capacity to retain carbon compared to more fertile clay-rich soils. Combined with rapid urban expansion, these climatic and pedological constraints amplify the negative carbon balance, highlighting the need to integrate climate variability and soil properties into future modeling efforts.
Another important limitation concerns the carbon stock estimation based on the InVEST model. The carbon density values were derived from sources in the literature and were not validated using in situ measurements of Soil Organic Carbon or biomass. As a result, the estimated carbon values may not fully reflect local variability in soil properties, vegetation structure, and land management practices.
In addition, the accuracy of carbon estimates depends on the quality of the LULC classification. Although the Random Forest model achieved high classification accuracy, misclassification errors may propagate into carbon stock calculations, potentially leading to overestimation or underestimation of carbon values in specific areas.
A further limitation is related to the simplified structure of the Cellular Automata model, which does not incorporate several important spatial drivers such as transportation networks, accessibility, or socio-economic factors. This simplification may affect the realism of simulated urban expansion patterns.
A possible limitation is the use of a random train–test split, which may introduce spatial autocorrelation between samples, potentially leading to optimistic accuracy estimates. Spatially independent validation approaches, such as spatial cross-validation, could provide a more robust assessment of model generalization but were not implemented due to sample size constraints and class balance requirements.
These limitations may influence the quantitative accuracy of the estimated carbon stocks and the spatial realism of the simulated scenarios. Therefore, the results should be interpreted as indicative of general trends rather than precise absolute values.
Future research could improve the modeling framework by integrating additional spatial predictors and coupling Cellular Automata with statistical or machine learning approaches to better capture the complexity of urban expansion processes, and include spatial cross-validation.
Overall, the proposed approach demonstrates the potential of integrating remote sensing, machine learning, and ecosystem modeling to analyze and predict carbon dynamics. Such integrated methodologies are essential for supporting sustainable land-use planning and for addressing climate change challenges in rapidly transforming landscapes.
5. Conclusions
This study develops an integrated framework to analyze past and future carbon dynamics in the Sfax Governorate by combining remote sensing data, machine learning classification, ecosystem modeling, and spatial simulation. The results demonstrate that LULC changes significantly impact carbon storage and emissions, particularly in semi-arid environments.
The analysis of historical LULC maps (2019–2024) revealed a clear trend of urban expansion and loss of natural vegetation, leading to a progressive decline in carbon stock and an increase in carbon emissions. The NECB results confirmed a shift towards a net carbon source, highlighting the imbalance between carbon losses and the declining capacity of ecosystems to store carbon.
Future scenario simulations for 2030 further emphasized the role of land-use planning in shaping carbon dynamics. All scenarios showed a negative NECB, indicating persistent carbon loss, although significant differences were observed among scenarios. The conservation-oriented scenario (CONS) resulted in the lowest carbon deficit, while the urban expansion scenario (URB+) produced the highest emissions and the most critical carbon imbalance.
These findings underline the importance of sustainable land management strategies aimed at limiting urban sprawl and preserving vegetated areas, which play a key role in maintaining ecosystem carbon balance. The results also demonstrate the effectiveness of integrating remote sensing, machine learning, and spatial modeling approaches to support carbon monitoring and environmental decision-making.
The observed carbon dynamics may also have been influenced by climate variability in addition to variations in land use. The Sfax Governorate and other semi-arid areas are particularly vulnerable to interannual variations in temperature and precipitation, which have a direct impact on biomass production, vegetation development, and Soil Organic Carbon processes. Reduced plant cover can result from drought or periods of decreased precipitation, which may impair carbon storage and affect the spectral response recorded by satellite photography. Therefore, rather than being solely caused by land-use change, some of the observed variations in LULC and carbon stock may be indirectly driven by climate variability. These considerations should be considered when interpreting the results, even though a thorough climate investigation was outside the purview of this study.
Future research could integrate climate data to better disentangle the relative contributions of land-use change and climatic variability to carbon dynamics.
Overall, the proposed methodology provides a scalable and reproducible framework that can be applied to other regions experiencing rapid land-use changes, contributing to improved understanding and management of carbon dynamics under future scenarios.