1. Introduction
Forests are among the most critical natural systems for maintaining ecological balance and regulating atmospheric carbon through long-term sequestration and ecosystem stability [
1,
2,
3]. Their capacity to absorb and store carbon makes them indispensable in moderating the pace of climate change [
4,
5]. Yet, accelerated deforestation and persistent land-use conversions—especially across developing regions—continue to erode these vital ecological functions [
6,
7,
8,
9]. The consequent reduction in forest cover undermines landscape resilience, leading to loss of biodiversity, soil degradation, and increased atmospheric carbon accumulation [
10,
11]. Consequently, large-scale forest restoration has become an essential component of both climate mitigation and sustainable development frameworks [
12,
13]. Assessing restoration potential requires analytical methods that incorporate spatial heterogeneity, temporal variability, and biophysical feedback. Although field-based measurements remain indispensable for ground validation, their limited spatial extent constrains their generalizability. In contrast, advances in remote sensing and imaging spectroscopy now allow for detailed observation of vegetation structure and function across broad spatial scales [
14,
15,
16]. Spectral measurements spanning the visible to shortwave infrared (SWIR) range capture subtle variations in canopy chemistry, structure, and productivity, offering valuable proxies for biomass and carbon estimation [
17]. These developments have expanded the role of spectroscopy in forest monitoring, providing greater precision and scalability than traditional field surveys [
18,
19].
The north coastal districts of Andhra Pradesh—Visakhapatnam, Vizianagaram, and Srikakulam—exemplify both the challenges and opportunities of forest restoration in ecologically diverse and rapidly urbanizing landscapes. The region encompasses heterogeneous terrain, including coastal plains and hilly forested tracts, that are under increasing pressure from industrial expansion, urban growth, and agricultural intensification [
20]. These drivers have fragmented natural habitats and diminished regional carbon sequestration capacity. Understanding restoration potential in such complex socio-ecological settings is crucial for regional adaptation planning and aligns with India’s commitments under the UN Decade on Ecosystem Restoration [
21]. Spectroscopy provides an effective means of linking spectral reflectance to biophysical attributes such as pigment concentration, leaf water content, and canopy integrity. Variations across the visible (VIS), near-infrared (NIR), and SWIR regions are reliable indicators of vegetation vigor and biochemical composition [
17,
22]. When integrated with advanced data-driven techniques such as Random Forest Regression (RFR) and K-Nearest Neighbors (KNN), spectral data can be transformed into predictive models that quantify forest condition and restoration potential with high spatial fidelity [
23,
24]. This integration of spectroscopy and machine learning provides a scalable pathway for evaluating restoration opportunities and associated carbon benefits [
17,
25].
Recent advances in geospatial analytics have strengthened the capacity to monitor land-cover transformations over space and time. Studies such as Chinkaka et al. [
22] highlighted the potential of machine-learning frameworks for continental-scale monitoring. Nevertheless, much of the existing research remains focused on coarse-scale assessments, offering limited insight into localized restoration dynamics or the socio-ecological resilience of community-based initiatives. At the same time, large-scale restoration and biomass studies [
17] underscore both the promise and the methodological limitations of machine learning for quantifying carbon potential. For example, Jiang et al. [
17] employed a Random Forest regression with 34 environmental variables to identify 67.2 Mha of restoration-eligible land in China, estimating ~3.99 Gt of potential above- and below-ground carbon accumulation. Similarly, Chang et al. [
26] demonstrated that integrating multiple AGB datasets with field calibration via regionally optimized Random Forest models reduces bias and improves the spatial accuracy of biomass maps. These findings reinforce the value of ensemble approaches for developing robust biomass surfaces suitable for carbon accounting.
Regional studies further reveal the ecological and management-dependent variability of restoration outcomes. Dangwal et al. [
20] reported substantial gains in both species richness and AGB following long-term restoration in sub-tropical Himalayan forests, while Osuri et al. [
13] and Panwar et al. [
14] showed that forest composition and management practices—natural regeneration, monoculture plantations, or agroforestry—strongly influence biomass accumulation and stability. Likewise, Swamy et al. [
27] demonstrated how land-cover change can convert carbon-sequestering forests into net emitters, highlighting the necessity of integrating historical disturbance regimes and land-use trajectories into restoration design. Collectively, these studies emphasize that spatial predictions of potential tree cover must be ecologically grounded—accounting for species composition, successional processes, and management regimes—to ensure realistic carbon forecasts [
18]. Methodological advances in remote sensing and machine learning have accelerated this progress. For instance, Loozen et al. [
15] and Li et al. [
28] showed that multi-sensor integration (optical + radar) combined with algorithms such as Random Forest and gradient boosting (XGBoost) can more accurately capture canopy structure and AGB than linear models. Similarly, Chen et al. [
23] demonstrated that hybrid workflows combining spectral indices, Otsu thresholding, CNN, and Random Forests enhance classification accuracy across long time series. Huang et al. [
18] extended this to landscape-scale analyses, illustrating that large-scale reforestation improves patch connectivity and surface temperature regulation, thereby linking structural restoration to functional outcomes.
Despite these advances, several recurring gaps restrict the transferability of published approaches to fine-scale, policy-relevant contexts, particularly in urbanizing regions like Visakhapatnam. Large-scale assessments often overlook local land-use constraints, tenure complexity, and edaphic variability, producing inflated estimates when downscaled [
13]. Many studies also fail to propagate uncertainties from input variables—such as climate, soil, or biomass—through to final restoration and carbon predictions, even though ensemble uncertainty quantification is essential for policy confidence [
26]. Additionally, the widespread use of generalized allometric equations [
7] to estimate below-ground biomass introduces systematic uncertainty that varies with vegetation type and soil conditions. Moreover, the socio-economic and governance dimensions of restoration—land tenure, opportunity costs, and implementation feasibility—are rarely considered, even though they critically determine restoration success [
12]. Transparency in algorithmic design, hyperparameter tuning, and validation strategies also remain inconsistent. While Random Forests are often the method of choice, alternative algorithms such as XGBoost or ensemble stacking may outperform under specific conditions, underscoring the need for multi-model comparison [
15]. Furthermore, temporal dynamics—climate variability, disturbance regimes, and land-use transitions—are often omitted, even though they shape restoration persistence and resilience [
27].
The present Visakhapatnam research addresses several of these methodological and contextual challenges by integrating spectroscopy with machine learning at medium spatial resolution (30–250 m). An ensemble Random Forest model was trained using 33 spectroscopy- derived remote sensing predictors encompassing climatic, edaphic, and topographic parameters. Comparative evaluation with KNN revealed that ensemble-based models offered superior predictive performance (R2 = 0.87) and more effectively captured complex non-linear ecological relationships. The approach incorporates anthropogenic masking to exclude urban and agricultural lands, thereby improving spatial realism. Carbon stocks were computed by sequentially estimating above- and below-ground biomass (using BGB = 0.489 × (AGB)0.89 and C = 0.5 × (AGB + BGB)). The methodology ensures consistency in reporting (Pg C and Mg ha−1) and facilitates cross-research comparison. Sensitivity analyses of input layers and cross-validation metrics (RMSE, MAE) were implemented to quantify uncertainty.
This research contributes to the emerging consensus that multi-source remote sensing combined with ensemble learning represents the most robust pathway for restoration potential mapping and carbon quantification. Field calibration and local parameterization substantially enhance accuracy and ecological validity, while socio-ecological integration ensures realistic implementation potential. By applying a medium-resolution, locally contextualized Random Forest framework that explicitly incorporates anthropogenic exclusion and carbon extrapolation, this research fills a critical methodological gap between national-scale inventories and site-specific restoration assessments. It underscores the importance of integrating spectroscopy-derived vegetation information with data-driven modeling to advance predictive, scalable, and policy-relevant restoration science [
17,
23,
28].
2. Materials and Methods
2.1. Study Area
The research was carried out in Visakhapatnam (
Figure 1), a major coastal district located in the state of Andhra Pradesh, India, extending over an area of approximately 1048 km
2 between 17.7041° N latitude and 83.2977° E longitude. The region, home to a population of 2.09 million, is characterized by a heterogeneous landscape encompassing hills, plains, valleys, and coastal stretches. Although Visakhapatnam has undergone extensive urbanization and industrial development in recent decades, it continues to sustain ecologically rich forests, wildlife sanctuaries, and biodiversity hotspots. However, increased anthropogenic activity has resulted in deforestation, habitat fragmentation, and degradation of ecosystem services, emphasizing the need for systematic identification of areas suitable for forest restoration and carbon stock enhancement. To assess the forest restoration potential, two nonlinear regression models—Random Forest (RF) and K-Nearest Neighbors (KNN)—were developed using 33 environmental variables derived from global datasets, with tree cover as the target variable. These models were used to generate continuous spatial predictions of potential tree cover and to estimate the carbon sequestration potential by integrating live woody biomass density. The overall framework followed four key stages: (i) developing predictive models using biophysical and environmental features, (ii) estimating potential tree cover, (iii) excluding settlement and agricultural lands to identify restoration-eligible areas, and (iv) extrapolating carbon stocks that could be restored.
2.2. Data Sources and Environmental Variables
Multiple open-access global datasets were integrated to develop the model (
Table 1). These datasets provide medium-resolution inputs representing land cover, terrain, soil, and climatic variability across the research area.
Soil data were obtained from SoilGrids250m, which provides global gridded data on pH, organic carbon, bulk density, coarse fragments, and cation exchange capacity for six standard depths (0–5, 5–15, 15–30, 30–60, 60–100, and 100–200 cm). Soil texture parameters (clay, sand, and silt fractions) were included to capture heterogeneity in soil composition. The SoilGrids dataset utilizes a machine-learning framework trained on field-based soil profiles, improving spatial prediction accuracy relative to linear regression-based soil models.
Climatic variables were derived from WorldClim 2.0, which provides 19 bioclimatic indicators, including mean annual temperature, precipitation seasonality, and temperature extremes. These data are interpolated from global meteorological stations and satellite inputs, providing improved coverage for regions with sparse climate observations. The climatic predictors derived from WorldClim 2.0 represent long-term baseline averages (1970–2000) and are intended to characterize stable climatic gradients rather than short-term interannual variability. This approach is appropriate for modeling structural vegetation potential and restoration suitability under prevailing climatic regimes. Spatial resolution varied from 10 arc-min to 30 arc-s depending on data availability, ensuring fine-scale representation of climatic heterogeneity across the research area.
Topographic data, including elevation, slope, aspect, and hillshade, were obtained from the Global Multi-resolution Topographic Elevation Data (GMTED2010), developed by the U.S. Geological Survey and the National Geospatial-Intelligence Agency. These variables were derived at a spatial resolution of 30 arc-sec and standardized to match the scale of climatic and soil datasets. Topography influences multiple ecological processes including soil moisture, solar exposure, and runoff, all of which are critical to vegetation establishment and persistence. In this study, GlobeLand30, GlobCover, and GFW AGB maps represent spectroscopy-derived datasets. GlobeLand30 is based on direct multispectral reflectance classification, GlobCover employs enhanced multispectral spectral discrimination, and GFW AGB maps utilize spectral and LiDAR data fusion for above-ground biomass estimation.
2.3. Model Development and Training
Random Forest (RF) and k-Nearest Neighbors (KNN) regression models were developed to predict potential tree cover using 33 environmental predictors representing climatic, edaphic, topographic, and spectroscopy-derived vegetation information. All predictor layers were co-registered, resampled to a common spatial resolution, and normalized prior to model implementation to ensure numerical comparability and to avoid scale-induced bias, particularly for distance-based algorithms.
The RF model was adopted as the primary predictive framework due to its ability to capture nonlinear relationships and handle multicollinearity among predictors. The model was implemented using bootstrap aggregation, with each tree trained on a random subset of samples and predictors. A total of 10 regression trees was used, selected based on preliminary sensitivity testing indicating stable performance beyond this threshold with minimal additional accuracy gains. Feature importance analysis was employed to identify dominant predictors, with precipitation-related variables, soil organic carbon, and elevation consistently ranking among the most influential factors controlling spatial variability in tree cover. The final prediction is obtained as the mean of all decision tree outputs. Mathematically, the RF regression can be expressed (Equation (1)) as:
where
is the predicted tree cover,
is the number of trees, and
represents the kth regression tree trained on dataset
with random parameters
.
The KNN model was implemented as a complementary non-parametric approach. KNN estimates the value of a new observation based on the mean of its
k nearest neighbors in the feature space, defined by Euclidean distance metric. The regression estimate (Equation (2)) is given by:
where
is the predicted tree cover for a point
, and
represents the tree cover values of its
nearest neighboring samples. The KNN model is particularly effective for capturing localized relationships between environmental variables and tree cover. The combined dataset
consisted of a two-dimensional matrix with
observations and
predictor variables, expressed as
, where
represents the vector of environmental features (soil, climate, and topography) and
denotes the observed tree cover fraction.
Model performance was evaluated using a 10-fold cross-validation framework applied to the complete dataset. The dataset was partitioned into ten mutually exclusive folds of approximately equal size. In each iteration, nine folds were used for model training and the remaining fold was reserved for validation. This procedure was repeated until each fold had served as the validation set once, and performance metrics were averaged across all folds.
No independent hold-out test set was defined. Reported statistics therefore represent cross-validated predictive performance, rather than unbiased external generalization estimates. Model configurations, including the number of trees in the Random Forest and the value of k in the KNN model, were fixed prior to cross-validation and were not optimized within validation folds. This approach avoids information leakage associated with non-nested hyperparameter tuning.
Model accuracy was quantified using three standard regression metrics: coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE). These metrics were computed for each fold and summarized as mean values across all validation iterations. The same cross-validation procedure was applied consistently to both RF and KNN models to ensure fair and reproducible comparison. All models were implemented in Python 3.8 using the scikit-learn library v.1.2, and identical preprocessing, predictor sets, and evaluation metrics were applied to both RF and KNN models to ensure methodological consistency.
2.4. Model Evaluation
Model performance was quantified using three statistical metrics: the coefficient of determination (R
2) (Equation (3)), the root mean square error (RMSE) (Equation (4)), and the mean absolute error (MAE) (Equation (5)). These are defined respectively as:
where
and
denote the observed and predicted tree cover values, respectively, and
represents the mean observed value. Lower RMSE and MAE values indicate higher predictive accuracy, while higher R
2 values denote stronger model fit. Both RF and KNN models were evaluated using identical training and validation datasets to ensure fair comparison.
2.5. Estimation of Potential Tree Cover and Carbon Stocks
The trained models were applied to estimate potential tree cover across the Visakhapatnam region. Predicted outputs were filtered to remove areas classified as settlements and agricultural lands using GlobCover data, thereby identifying restoration-eligible land. For estimating potential carbon sequestration, the predicted restorable tree cover was integrated with aboveground biomass (AGB) density maps from Global Forest Watch. BGB was estimated (Equation (6)) using the allometric relationship proposed by Mokany et al. [
29]:
The total live woody biomass was computed as the sum of AGB and BGB, and converted to carbon stock using a factor of 0.5 to represent the carbon fraction of dry biomass. The spatially explicit carbon stock potential was then mapped across the research area, providing a quantitative estimate of the carbon that could be sequestered through forest restoration.
To assess the robustness of carbon stock estimates derived from global allometric equations, a sensitivity analysis was conducted. The biomass-to-carbon conversion factor and allometric coefficients were varied within ±15% of baseline values reported in widely adopted global models. Carbon stock was recalculated under each parameter scenario, and resulting spatial distributions were compared using overlap statistics and relative ranking stability. This procedure allows evaluation of uncertainty propagation from allometric assumptions to restoration prioritization outcomes.
3. Results
3.1. Sensitivity Analysis
Sensitivity analysis indicated that predicted carbon stock exhibited strong positive associations with soil organic carbon (r = 0.78, p < 0.01) and mean annual precipitation (r = 0.72, p < 0.01), while temperature seasonality showed a moderating influence. Random Forest feature importance rankings consistently identified precipitation seasonality, soil organic carbon density, and elevation as the dominant predictors, confirming the robustness of the modeled restoration–carbon relationship.
Sensitivity testing indicates that absolute carbon stock values vary proportionally with adjustments in allometric parameters; however, the spatial distribution of high-priority restoration zones remains highly stable. More than 90% spatial agreement was observed between baseline and perturbed scenarios in identifying top-quartile carbon gain areas. These findings suggest that while global allometric equations introduce uncertainty in magnitude estimates, the relative spatial prioritization for restoration planning remains robust.
Additionally, aggregate carbon stock estimates were compared with available regional forest carbon statistics reported in recent national assessments. The modeled total carbon storage falls within the reported uncertainty range for the study area, providing independent plausibility support for the estimation framework.
3.2. Spatial Prediction of Potential Tree Cover Using Random Forest Model
Using the optimized Random Forest model, potential tree cover was predicted for the entire Visakhapatnam district under current climatic and environmental conditions. The resulting spatial distribution (
Figure 2a) reveals clear ecological gradients, with higher potential cover in mid-elevation zones (200–800 m a.s.l.) and moisture-rich western subregions dominated by loamy soils and moderate temperature seasonality. In contrast, coastal and lowland regions exhibited reduced potential cover, driven by anthropogenic land conversion and higher surface temperature fluctuations. By subtracting the observed tree cover from modeled potential, a tree cover deficit map was generated (
Figure 2b). This map highlights restoration priority zones—areas where ecological capacity exceeds current vegetation extent. Approximately 104,800 ha of underutilized land were identified as having potential for forest regeneration, particularly in Narsipatnam and Ananthagiri blocks. To ensure ecological and socio-economic feasibility, non-restorable areas such as urban settlements, existing forests, and agricultural lands were excluded (
Figure 2c). The filtered map delineates restoration-suitable zones that combine favorable soil texture, moderate topographic gradients (<15° slope), and optimal climatic envelopes for native forest establishment. This analysis builds on the refined potential tree cover to estimate carbon sequestration in Visakhapatnam by integrating biomass metrics (
Figure 2d). Additional tree cover was multiplied by above-ground biomass (AGB), an indicator of carbon stored in vegetation, with grayscale values ranging from 0 to 169 mg. Using a numerical formula in QGIS, below-ground biomass (BGB)—representing carbon stored in roots and soil organic matter—was computed, with values ranging from 0 to 2430.07 mg (
Figure 2e). Carbon stock was then calculated by summing AGB and BGB and applying a 0.5 conversion factor, providing a comprehensive measure of total carbon storage. The final estimated restorable carbon stock for the region is 0.12 Pg (
Figure 2f), highlighting the significant climate mitigation potential of expanding tree cover in Visakhapatnam.
3.3. Spatial Prediction of Potential Tree Cover Using KNN Model
The KNN regression model followed the same workflow as the RF model but yielded more spatially diffuse predictions (
Figure 3a–c). The KNN-derived potential tree cover extended up to 201 Mha, with exaggerated estimates in topographically complex regions. Similarly, its carbon stock projection (2.7 Pg) likely reflects over smoothing due to neighborhood averaging, which dilutes local ecological variation. Despite these discrepancies, KNN outputs highlight restoration “hotspots” consistent with RF predictions, particularly in Ananthagiri and northern Narsipatnam, confirming that both models converge on broad-scale trends even if they diverge in magnitude. The discrepancy between KNN and RF results emphasizes the importance of ensemble methods for ecological modeling, as they better accommodate environmental heterogeneity and avoid the spatial bias introduced by proximity-based regression algorithms.
3.4. Biomass and Carbon Stock Estimation
Biomass modeling built upon the predicted restorable tree cover using a two-tiered approach. First, above-ground biomass (AGB) (
Figure 3d) was estimated through established empirical models based on tree cover fraction and environmental predictors. Then, below-ground biomass (BGB) (
Figure 3e) was derived using the allometric relationship:
. This relationship, validated in global biomass assessments (Mokany et al., [
29]), captures the nonlinear scaling between above- and below-ground carbon pools. The total carbon stock (C) was computed as:
. Spatial aggregation of these estimates revealed a restorable carbon stock potential of approximately 1.2 Pg, predominantly concentrated in semi-evergreen and moist deciduous zones (
Figure 3f). The gradient of carbon accumulation correlates strongly (r = 0.78,
p < 0.01) with soil organic carbon density and mean annual precipitation, affirming the biophysical coupling between soil fertility and above-ground productivity.
3.5. Model Performance and Predictive Strength
The predictive performance of the Random Forest (RF) and k-Nearest Neighbors (KNN) regression models was evaluated using a 10-fold cross-validation framework applied to the full dataset. In this procedure, the dataset was partitioned into ten mutually exclusive folds, with nine folds used for model fitting and one fold reserved for validation in each iteration. This process was repeated until each fold had served as the validation set once, and performance metrics were averaged across all folds. Model configurations were fixed prior to cross-validation and were not optimized within validation folds, ensuring that reported statistics are free from information leakage associated with non-nested tuning. Consequently, the reported metrics represent cross-validated model performance rather than unbiased independent test estimates.
Under this validation framework, the RF model demonstrated strong and consistent predictive capacity, achieving a mean coefficient of determination (R
2) of 0.86, a mean root mean square error (RMSE) of 7.61, and a mean absolute error (MAE) of 3.96 across folds (
Table 2). These results indicate that the RF model effectively captures nonlinear relationships between tree cover and the 33 environmental predictors. In contrast, the KNN model exhibited substantially lower cross-validated performance, with a mean R
2 of 0.49, RMSE of 71.24, and MAE of 40.75. The reduced accuracy reflects the sensitivity of KNN to high-dimensional feature spaces and its limited ability to generalize across heterogeneous environmental gradients.
Although cross-validation reduces variance in performance estimates, it does not fully account for spatial autocorrelation among neighboring pixels. Therefore, the reported metrics should be interpreted as upper-bound estimates of predictive performance. Nevertheless, because the same validation strategy was applied consistently to both models, the relative performance comparison—demonstrating the superiority of the RF approach for restoration potential mapping—remains robust.
The superior performance of the RF model can be attributed to its ensemble nature, which mitigates variance and overfitting by aggregating multiple decorrelated decision trees. Moreover, its internal feature importance mechanism provides interpretability, highlighting variables such as annual mean temperature, precipitation seasonality, and soil organic carbon density as key determinants of regional tree cover. These findings align with recent large-scale forest modeling studies that demonstrate the efficacy of ensemble methods in ecological prediction and restoration planning.
3.6. Site-Level Restoration and Carbon Accounting
To provide spatially explicit insights, four representative restoration zones—Ananthagiri, Neredupalle, Narsipatnam, and Talabirada—were analyzed in detail. The results (
Table 3) show substantial variation in restoration capacity, governed by elevation, soil texture, and rainfall distribution.
Ananthagiri and Talabirada exhibit the highest carbon sequestration potentials due to their cooler microclimates, moderate slopes, and relatively undisturbed soil organic content. In contrast, Narsipatnam and Neredupalle display lower carbon densities, reflecting degraded soil structure and limited canopy regeneration. Overall, the total potential carbon gain from reforestation in Visakhapatnam is estimated at ~0.12 Pg, offering a quantifiable contribution toward regional carbon neutrality targets.
3.7. Implications for Climate Mitigation and Sustainable Land Management
The integrated machine learning framework presented here demonstrates the potential of spatially explicit modeling in guiding evidence-based forest restoration and carbon accounting. The identification of 104,800 ha of restorable land with a sequestration potential of 0.12 Pg C provides a tangible pathway for enhancing regional carbon sinks, contributing to India’s Nationally Determined Contributions (NDCs) under the Paris Agreement. Beyond carbon storage, targeted restoration in the identified zones offers cascading co-benefits—improved soil fertility, microclimatic regulation, and biodiversity recovery. Incorporating these spatial outputs into regional land-use planning could optimize afforestation initiatives by aligning ecological potential with socio-economic constraints. This research reaffirms that data-driven restoration planning, when integrated with socio-ecological realities, can advance nature-based climate solutions that are both scientifically grounded and operationally feasible.
4. Discussion
4.1. Forest Restoration Potential
Large areas of degraded tropical landscapes retain a strong ecological capacity for recovery when guided by data-driven approaches, as shown by mapping the potential for forest restoration. This study finds approximately 104,800 hectares of restorable land in Visakhapatnam by combining climate, soil, and terrain variables with vegetation metrics derived from spectroscopy. These regions, which are mostly found in mid-elevation zones (200–800 m), have a soil fertility, slope stability, and climate moderation balance that promotes long-term regeneration. Because ensemble modeling achieves spatial precision that goes beyond administrative boundaries, restoration can be prioritized according to ecological opportunity rather than jurisdictional convenience. Improving the concept of “potential” is a crucial result. The framework defines restoration as ecosystems’ ability to restore stability and function, as opposed to viewing it as a static land-cover exercise. In addition to ensuring ecological viability and social compatibility, the exclusion of urban and agricultural areas is crucial for areas under extreme human pressure. By capturing nonlinear biophysical interactions, the Random Forest model bridges the gap between ecological theory and restoration practice and offers a more realistic depiction of vegetation recovery. In the end, rather than just replanting trees, this analysis reframes restoration as a process of reactivating ecosystem functionality—nutrient cycling, hydrological regulation, and soil cohesion. Zones with inherent ecological resilience are given priority in the derived spatial framework, which provides a scientifically supported foundation for focused intervention. An important development in the management of tropical landscapes is the transition from random reforestation to opportunity-based restoration.
4.2. Carbon Sequestration
The amount of nature-based climate mitigation potential ingrained in local landscapes is revealed by quantifying carbon sequestration within the designated restorable zones. According to the model, ecological restoration in Visakhapatnam can store a total of 0.12 petagrams of carbon. This district-scale quantification supports the larger role of tropical forests in the global carbon balance and serves as a crucial reference for subnational climate accounting. Elevation, precipitation, and soil organic carbon content are all closely related to carbon accumulation, according to spatial analysis. Carbon “hotspots” are places like Narsipatnam and Ananthagiri where biophysical factors like high soil moisture and nutrient availability promote carbon gain both above and below ground. The sensitivity of imaging spectroscopy in capturing vegetation productivity gradients is confirmed by the strong correlation between observed spectral indices and predicted biomass. By taking into account nonlinear ecological feedback that parametric approaches frequently ignore, ensemble learning offers an extra benefit. The resulting carbon maps serve as baseline references for long-term carbon monitoring in addition to visualizing spatial variability. This framework can develop into a dynamic system for verifying restoration results and carbon credit mechanisms since spectroscopy allows temporal tracking. When taken as a whole, the results confirm that forests are one of the most practical and economical levers for reducing atmospheric carbon. Restoration goes beyond symbolic climate action to become a quantifiable and policy-relevant mitigation strategy when it is directed by ecological realism and spatial precision.
4.3. Potential Application and Outcomes in Developing Nations
For countries striving to advance economically while also addressing ecological decline, this framework presents a practical and affordable way forward. It is built around openly available satellite data and computationally efficient ensemble models, which significantly reduces reliance on time-consuming and resource-intensive field surveys. This accessibility is especially important in settings where financial resources, technical infrastructure, and institutional capacity for large-scale environmental monitoring are limited. By combining ecological suitability with clear exclusions for urban and agricultural land, the approach aligns closely with on-ground land-use realities. It avoids direct competition with farming and settlement needs, while directing restoration efforts toward areas where ecological recovery is genuinely feasible. This balance makes the framework well suited to the socio-economic conditions of developing regions, where environmental goals must be pursued without undermining local livelihoods. The value of the model extends beyond ecological assessment to tangible socio-economic outcomes. Identifying high-potential restoration areas allows governments and local agencies to design payment-for-ecosystem-services programs and participate in carbon markets, thereby linking restoration activities with income generation and livelihood support. At the same time, broader ecological gains—such as improved water regulation, reduced soil erosion, and biodiversity recovery—contribute directly to community resilience and overall well-being. Importantly, the framework offers a replicable and scalable template for embedding restoration within national development strategies. In many Global South contexts, where environmental degradation and climate vulnerability intersect, this approach bridges the gap between scientific rigor and policy practicality. It transforms restoration from a long-term aspiration into a workable development tool, delivering climate, ecological, and socio-economic benefits in an integrated and mutually reinforcing manner.
4.4. Comparative Applicability of RF and KNN Models
Although the Random Forest model demonstrated superior cross-validated predictive performance, the KNN approach may retain practical value under specific conditions. Random Forest is particularly well suited for regional-scale modeling involving high-dimensional environmental predictors and nonlinear ecological interactions. In contrast, KNN may perform competitively in localized applications characterized by dense sampling, strong spatial continuity, or micro-topographic heterogeneity, where nearby observations share similar environmental conditions. Furthermore, KNN offers computational simplicity and interpretability advantages in smaller operational settings. Therefore, model selection should consider spatial scale, data structure, and intended application rather than global performance metrics alone.
4.5. Contribution to the Spectroscopy Domain
This study contributes to the spectroscopy domain by demonstrating how spectroscopy-derived satellite reflectance information, when embedded within a machine-learning framework, can be translated from descriptive vegetation mapping into predictive ecological assessment. While imaging spectroscopy has traditionally been used to classify vegetation and map compositional patterns, it is applied here to infer key ecosystem functions, including biomass accumulation and carbon storage. By drawing on reflectance information across the visible to shortwave infrared spectrum, the analysis captures subtle differences in canopy chemistry, pigment composition, and structural complexity that collectively drive ecosystem productivity. When these spectral characteristics are integrated into a Random Forest framework, their functional relevance becomes more explicit. Variable importance results point to specific wavelength regions that are particularly sensitive to photosynthetic activity and moisture dynamics, improving the transparency and interpretability of the model. This close integration of spectroscopy with machine learning represents a methodological shift, repositioning spectral data from a tool for observation to one capable of anticipating ecological potential. As hyperspectral sensors continue to advance in both spectral and spatial resolution, approaches of this kind are likely to play an increasingly central role in Earth system science. They offer the capacity to monitor restoration progress over time, assess functional traits, and link observations across scales from individual canopies to entire regions. In this sense, the study provides both conceptual and technical advances, demonstrating that spectroscopy-derived data can actively inform ecological restoration and carbon modeling. By situating spectroscopy-derived data within a predictive ecological framework, the work underscores its growing importance for understanding and managing environmental change at regional and global scales
4.6. SDG and Policy Implications
The spatially explicit restoration potential maps generated in this study provide actionable guidance aligned with global climate and sustainability frameworks, including Sustainable Development Goals (SDG 13: Climate Action; SDG 15: Life on Land) and Nationally Determined Contributions (NDCs) under international climate agreements.
Operationally, model outputs can be translated into local action through a three-stage pathway: (1) identification of high carbon-gain priority zones using RF-derived predictions; (2) integration with land tenure, socio-economic constraints, and slope suitability layers to determine feasible restoration sites; and (3) incorporation into Measurement, Reporting, and Verification (MRV) systems to track carbon sequestration progress for NDC compliance.
At district or sub-regional levels, these maps can support afforestation budget allocation, payment-for-ecosystem-services schemes, and monitoring of restoration performance over time. By linking spectroscopy-derived environmental predictors with machine learning-based carbon estimation, the framework bridges scientific modeling with policy implementation mechanisms. Overall, the framework demonstrates the value of integrating technology, ecological understanding, and policy processes to support the large-scale implementation of nature-based solutions. Because the framework relies exclusively on open-access satellite data and computationally efficient ensemble models, it is readily scalable to other tropical regions with similar ecological constraints. District-scale carbon estimates derived from this approach can directly support land-use planning, afforestation prioritization, and reporting under Nationally Determined Contributions (NDCs). Furthermore, spatially explicit carbon estimates enhance transparency and credibility for participation in voluntary carbon markets and results-based restoration financing.
4.7. Overall Application to Climate Mitigation
Forests continue to stand out as one of the most scalable and readily deployable options for climate mitigation. The results of this study show that restoration guided by imaging spectroscopy can translate this broad potential into targeted, on-the-ground action with a high degree of precision. While the estimated carbon gain of 0.12 petagrams is specific to the study area, it highlights the scale of mitigation that could be achieved if similar approaches were applied consistently across degraded tropical landscapes worldwide. By combining machine learning with spectral information, restoration planning is anchored in ecological conditions rather than generalized assumptions. This ensures that interventions are directed toward locations where carbon gains are both substantial and long-lasting. In this way, restoration complements technological decarbonization efforts, functioning not only as a carbon sink but also as a means of reinforcing essential ecosystem services such as water regulation, soil conservation, and landscape stability. The conversion of spectral observations into spatially explicit ecological insight helps bridge the divide between global climate ambitions and local implementation. It demonstrates how advances in remote sensing can support mitigation strategies that are transparent, measurable, and adaptable over time. More broadly, the findings reinforce the idea that long-term climate resilience depends on the health and integrity of ecosystems. Data-driven restoration, grounded in spectroscopy and ecological modeling, provides a scientifically sound pathway for rebuilding that integrity. In this sense, the framework represents both a methodological step forward and a practical blueprint for a low-carbon future secured through nature-based solutions.
Although 10-fold cross-validation reduces variance in performance estimates and provides internal consistency, it does not explicitly account for spatial autocorrelation among neighboring pixels. In spatially structured environmental datasets, random partitioning may yield optimistic performance estimates because nearby observations share similar predictor characteristics. Consequently, reported metrics should be interpreted as cross-validated performance within the sampled dataset rather than fully independent generalization accuracy. Future extensions of this framework will incorporate spatial or block cross-validation strategies to obtain more conservative and spatially independent estimates of predictive performance for operational forest restoration planning.
5. Conclusions
This research provides new insights into how imaging spectroscopy-derived data, when combined with ensemble learning, can transform the way forest restoration and carbon sequestration potential are quantified across heterogeneous landscapes. By integrating high-resolution spectral and environmental data, the Random Forest framework achieved strong predictive skill, explaining 87% of the spatial variability in tree cover. The approach effectively captured complex biophysical gradients and ecological feedback that are often missed by conventional regression or neighborhood-based models.
The identification of nearly 104,800 hectares of restorable land in Visakhapatnam underscores the magnitude of untapped ecological potential within rapidly developing tropical landscapes. These areas, concentrated in Narsipatnam and Ananthagiri, exhibit the soil fertility, moisture balance, and topographic stability necessary to sustain long-term regeneration. The corresponding carbon gain—estimated at approximately 0.12 petagrams—illustrates the district’s capacity to make a meaningful contribution to subnational and national climate mitigation goals. More importantly, the restoration of these landscapes would yield cascading co-benefits, from biodiversity recovery and soil enrichment to enhanced hydrological stability and microclimatic regulation. The results reaffirm that spectroscopy-informed ensemble modeling offers a powerful and scalable means to guide evidence-based restoration planning. By combining spatial precision with ecological relevance, the framework bridges the gap between site-level field measurements and national carbon inventories, offering a pathway to integrate restoration science into policy and land management. Ultimately, this research highlights the central role of remote sensing and machine learning in advancing nature-based climate solutions. The ability to translate spectral information into actionable ecological insights marks a step change in restoration analytics, moving from descriptive assessments toward predictive, spatially resolved planning. Continued refinement through temporal monitoring and socio-ecological integration will further enhance its utility for adaptive forest management and sustainable development.
5.1. Limitations
The model training and validation were conducted using randomly sampled pixels, some degree of optimistic bias in reported performance metrics may arise due to spatial autocorrelation among neighboring observations. Adjacent pixels often share similar environmental characteristics, which can artificially inflate predictive accuracy when random cross-validation is applied to spatial data. While ensemble models such as Random Forest partially mitigate overfitting through bootstrap aggregation, they do not eliminate spatial dependence effects.
5.2. Future Work
Future applications of this framework will incorporate spatial or block cross-validation strategies, in which training and validation data are separated by geographic distance or ecological zones. Such approaches will provide more conservative and spatially independent performance estimates, further strengthening inference for operational forest restoration planning.
5.3. Data Sources
All datasets used in this research are publicly available through their respective repositories: SoilGrids (
https://soilgrids.org, accessed on 26 February 2026), WorldClim (
https://worldclim.org, accessed on 26 February 2026), GMTED (
https://topotools.cr.usgs.gov, accessed on 26 February 2026), GlobeLand30 (
https://www.earthdata.nasa.gov/data/catalog/lpcloud-glance30-001, accessed on 26 February 2026), and Global Forest Watch (
https://www.globalforestwatch.org, accessed on 26 February 2026). No human or animal subjects were involved, and no ethical approval was required.