Evaluating Disturbance Regime Stratification for Aboveground Biomass Estimation in a Heterogeneous Forest Landscape: Insights from the Atewa Landscape, Ghana

Adams, Lukman B.; Hayakawa, Yuichi S.

doi:10.3390/rs18050765

Open AccessArticle

Evaluating Disturbance Regime Stratification for Aboveground Biomass Estimation in a Heterogeneous Forest Landscape: Insights from the Atewa Landscape, Ghana

by

Lukman B. Adams

and

Yuichi S. Hayakawa

^*

Faculty of Environmental Earth Science, Hokkaido University, Sapporo 060-0810, Hokkaido, Japan

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(5), 765; https://doi.org/10.3390/rs18050765 (registering DOI)

Submission received: 30 November 2025 / Revised: 16 February 2026 / Accepted: 28 February 2026 / Published: 3 March 2026

(This article belongs to the Section Forest Remote Sensing)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Excessive heterogeneity or homogeneity in regimes may affect forest aboveground biomass modeling.
Human-mediated disturbance factors exhibited weaker heteroscedastic behavior with increasing distance and showed intermediate importance in aboveground biomass modeling.

What are the implications of the main findings?

A combination of heterogeneous and homogeneous regimes overcomes the challenge of increased noise or reduced variance, thereby improving modeling accuracy.
Human-mediated disturbance factors counter model biases introduced by other predictor variables.

Abstract

Optical and passive remote sensing-based estimation of aboveground biomass (AGB) using forest structural stratification has shown improvements over global models. This study investigated whether stratification by human-mediated disturbances improves prediction accuracy. Disturbance variables included proximity to mines, roads, and settlements, evaluated across three regimes: the full Atewa landscape (“FSR”), the Atewa Range Forest Reserve (“FR”), and the surrounding disturbed area (“SR”). Predictor selection for regimes was performed using recursive feature elimination with cross-validation, applied to random forest (RF) and support vector machine (SVM) algorithms. AGB was then estimated using local, global, and retuned global models, and the results were compared using the coefficient of determination (r²) and root mean square error (RMSE). The global RF model achieved the best performance (r² = 0.54; RMSE = 57.71 Mg/ha), likely due to structured heterogeneity captured across combined regimes. The “SR” models, however, performed poorly, indicating that excessive unstructured heterogeneity introduces noise and redundancy that weaken predictions. The low performance of the “FR” regime was attributed to spectral saturation and limited variance in observed AGB. Although disturbance factors added minimal bias, heteroscedasticity was evident in the “SR” and “FSR” regimes. Overall, this study indicates that disturbance-based stratification may not necessarily improve AGB estimation accurately compared to global models. However, it highlights the value of disturbance information for AGB modeling in heterogeneous forest landscapes.

Keywords:

disturbance regime stratification; GEDI; multi-sensor data; Atewa; forest aboveground biomass; machine learning

1. Introduction

The importance of forests in sustaining terrestrial ecosystems cannot be overemphasized as they serve as key reservoirs of biodiversity, regulators of local and regional climatic systems, and major sinks for atmospheric carbon [1,2]. Through carbon sequestration, forests effectively mitigate greenhouse gas accumulation by storing carbon within their biomass and soils, thereby reducing atmospheric CO₂ concentrations and moderating the intensity of the global greenhouse effect [3]. To ensure the sustained contribution of forests to climate change mitigation and the enhancement of ecosystem services, it is imperative that forest conservation and management practices receive high priority [4]. This is where the role of forest aboveground biomass (AGB) estimation comes to the forefront as it provides a quantitative basis for assessing carbon sequestration and emission dynamics over time, as well as for verifying the reporting processes essential to carbon accounting frameworks [5].

The application of remote sensing modeling in estimating forest AGB has been crucial in reducing the burden of labor, time wastage, and the high cost associated with direct field estimates. It has also largely eliminated the need for cutting down trees and harvesting parts such as branches and stems; however, directly estimating forest AGB, even though it is more accurate, is a less sustainable method of estimation [6]. The coupled application of remotely sensed data with field observed data has in recent years produced high accuracy in AGB estimations at various spatial resolutions while mostly utilizing freely accessible satellite data from missions such as Sentinels 1 and 2, NASA’s Global Ecosystem Dynamics and Investigations (GEDIs), the Landsat series, and the Japan Aerospace Exploration Agency’s (JAXA) Phased Array type L-band Synthetic Aperture Radar (PALSAR) products [7]. Despite its advantages, the application of remotely sensed data in AGB modeling introduces inherent uncertainties arising from sensor limitations, atmospheric effects, and data processing errors, all of which can adversely affect modeling accuracy: spectral saturation, spatial resolution limitations, and cloud cover limitations in optical data, terrain effects on SAR backscatter polarizations quality, and the expensive and limited access to LiDAR data [8,9,10]. To mitigate the inherent limitations associated with individual remote sensing datasets, data fusion techniques have been increasingly employed, enabling the synergistic integration of multi-sensor data to improve model robustness and reduce uncertainty in AGB estimation [11].

NASA’s GEDI, the first spaceborne LiDAR system, provides 3D vegetation structure data derived at the footprint level across temperate, subtropical, and tropical forests. It was launched to the International Space Station (ISS) in December 2018 and commenced full operational data acquisition in April 2019. The application of LiDAR takes advantage of its high precision to accurately measure vertical tree structure information even through small canopy gaps, ultimately producing accurate estimates of aboveground biomass density (AGBD; the spatial expression of AGB per unit area) [12]. Since its inception, several studies on the estimation of AGB have been conducted, leveraging forest structural data as well as the estimated biomass derived from the GEDI [13,14,15,16]. It needs to be emphasized, however, that despite its generally good correlation with National Forest Inventory (NFI)-derived AGBD and other AGBD models, values predicted by the GEDI still record uncertainties and inconsistencies that could be improved [17], therefore necessitating further ground truth validation in even more tropical forests to improve the model performance for better predictions.

As efforts to enhance the accuracy of AGB estimation intensify, there has been an increasing application of machine learning algorithms alongside traditional linear regression models [18,19]. This growing adoption is driven by the recognition that the relationships between forest biomass and predictor variables, such as canopy structure, spectral reflectance, and topographic attributes, are often non-linear and complex [20]. Machine learning techniques, including Random Forests, Support Vector Regression, and Gradient Boosting, have proven effective in capturing these intricate relationships by modeling higher-order interactions and reducing the biases inherent in linear approaches. Moreover, these algorithms facilitate the integration of multisource datasets, such as multispectral, radar, and LiDAR-derived variables, by jointly capturing complementary spectral, structural, and scattering information, thereby enhancing model robustness and transferability across diverse forest types [21].

To further enhance modeling accuracy, researchers have increasingly adopted stratified approaches to AGB estimation. In this method, field data categorized into specific strata, such as forest structural types, stand age, or species compositions, are integrated with ancillary remote sensing variables to develop stratum-specific models. This stratification approach improves the precision of population parameter estimates and reduces variability within modeling units [22]. The goal of stratification is to partition the landscape into relatively homogeneous strata with reduced internal heterogeneity, thereby enabling more accurate representation and modeling of spatial variations in AGB and associated structural parameters [23]. Studies have demonstrated that forest type-based stratification yields marginally higher accuracy in AGB estimation compared to unstratified global models, which has contributed to its increasing application in biomass modeling frameworks [24,25,26,27].

Several studies have documented forest degradation and subsequent deforestation driven by human activities [28,29,30]. Anthropogenic degradation directly alters forest structure and composition, thereby influencing the spatial distribution of aboveground biomass [31]. Because spectral, SAR, and LiDAR observations are strongly linked to canopy structure and photosynthetic functioning, they are widely used as proxies for AGB estimation [32,33]. However, approaches that rely exclusively on these structural and spectral indicators may underrepresent the role of human disturbance processes that influence forest structure and shape biomass patterns, particularly in landscapes where anthropogenic pressures are a dominant driver of forest dynamics. Most studies on stratum-based AGB estimation have traditionally relied on stratification based on forest type or structure, under the assumption that ecological heterogeneity can be better captured within relatively homogeneous classes [24,25,26,27]. However, in highly dynamic tropical environments such as the Atewa landscape, human-induced disturbances, particularly mining, agriculture, logging, and settlement expansion, constitute major drivers that influence the spatial patterns and heterogeneity of forest vegetation [34,35]. Even when forest type is relatively homogeneous, it does not exclusively guarantee a stable and reliable proxy for forest structure in human-modified landscapes, as the disturbance factors introduce strong spatial and temporal variability into the modification of canopy structure, regeneration, and complete loss of vegetation [36,37,38]. In such contexts, incorporating disturbance factors may improve the characterization of spatial variability in AGB. This study departs from the conventional forest type stratification by adopting a disturbance-regime framework that explicitly integrates human-mediated features, such as proximity to mines, roads, and settlements, as a complementary framework for biomass modeling. In many tropical regions where reliable forest inventory information is limited, stratification based on forest type can be difficult to implement robustly, as such frameworks typically rely on detailed knowledge of stand structure, age classes, or species composition. Under these constraints, disturbance-based stratification represents a practical alternative for AGB estimation in human-modified forest landscapes, where anthropogenic processes are a dominant driver of spatial variability in forest structure. Assessing the relative performance of regime-specific versus global models, this study provides an important test of whether stratification by disturbance conditions improves AGB predictions, offering insights into the unique challenges of biomass estimation in human-modified tropical forest mosaics.

2. Materials and Methods

2.1. Study Area

The Atewa landscape, situated within the High Forest Zone (HFZ) of southern Ghana, represents a complex and heterogeneous environment encompassing the Atewa Range Forest Reserve (ARFR) and a surrounding mosaic of land uses including secondary forests, croplands, plantations, settlements, and mining sites. This spatial heterogeneity gives rise to two distinct disturbance regimes: the ARFR, which remains relatively undisturbed and ecologically intact (referred to as the “FR” regime in this study), and the adjacent landscape, which is highly dynamic and subjected to intensive anthropogenic activities, particularly agriculture and artisanal mining (referred to as the “SR” regime; Figure 1). The study area was stratified into two disturbance regimes based on the official forest reserve boundary, representing protected forest conditions (FR) and surrounding human-modified landscapes (SR). This stratification captures a clear contrast in disturbance intensity, land use pressure, and vegetation structure relevant to AGB modeling.

The ARFR itself is an upland evergreen forest situated along the Atewa mountain range, the principal highland feature in the region, reaching elevations of approximately 842 m above sea level. Identified as a major biological hotspot in Ghana, the reserve provides critical habitat for more than 100 globally threatened species, underscoring its ecological significance and conservation value [39]. In the surrounding landscape, while characterized by several human activities, mining activities have been reported to increase annually, recording a rate of change of 12.3% and mined area of up to 8005.2 ha between 2018 and 2023 [40]. The study area is characterized by wet semi-equatorial climatic conditions with average monthly temperatures ranging from 24 to 29 °C. The rainfall pattern is bimodal, with the major season beginning in March and ending in July, while the minor season begins in September and ends in November. Annually, the area records a mean rainfall value between 1200 and 1600 mm [41].

2.2. Data Pre-Processing

Multi-source remote sensing data were used in the study, where optical data from Landsat 8, topographic data from the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM), L-band SAR data from ALOS/PALSAR 2, and proximity data for mines, roads, and settlements were fused to model AGB. Forest biomass data were acquired from GEDI’s Level 4A (L4A) predicted AGBD data. A summary of the predictor variables used in the development of the models is presented in Table 1.

2.2.1. GEDI AGBD Data

The GEDI level 4A data for the study area were downloaded from the website of NASA [42]. To ensure adequate spatial coverage and to provide each regime-specific model with a sufficient number of training samples, AGBD data spanning January and December of 2022 were selected. The data were then subjected to a rigorous filtration process to ensure that high-quality data were used. Data that were deemed invalid included observations with quality flag values other than 1, as this flag identifies footprints with unreliable waveform retrievals and increased uncertainty in biomass estimates [43]. Data with a standard error > 50% were also excluded to remove highly uncertain estimates that can disproportionately influence model error [44]. Lastly, estimates occurring on slopes > 30° were excluded as they have been found to be associated with high errors [45]. These thresholds were chosen to balance biomass estimate reliability with sufficient spatial coverage across disturbance regimes, although stricter filtering could further improve data quality. Also, when multiple GEDI footprints intersected a 25 m pixel, footprint-level values were aggregated using the mean. After data filtering, a total of 13,008 GEDI AGBD data points were selected as quality data for the modeling, with 2611 belonging to the “FR” regime and 10,397 belonging to the “SR” regime. GEDI footprints are spatially discontinuous due to orbital sampling constraints, resulting in uneven spatial coverage across the study area (Figure 1). Figure 2 shows the distribution of GEDI AGBD data in the two regimes.

2.2.2. Earth Observation Data

The optical satellite data used were from the Landsat Operational Land Imager (OLI) 30 m resolution images. The pre-processed level 2A product, which has already undergone geometric, radiometric, and atmospheric corrections for the study area, was downloaded from the Google Earth Engine (GEE) platform. For an accurate cloud-free annual representation of the study area, the median composite of available scenes for the year 2022 was generated for the study. The NDVI, Soil Adjusted Vegetation Index (SAVI), and Enhanced Vegetation Index (EVI) were then calculated from the Landsat bands.

The SAR data, ALOS/PALSAR 2, were freely acquired from the JAXA website [46]. This 25 m resolution L-band SAR had already been ortho-rectified, slope-corrected, and radiometrically calibrated, ensuring geometric and radiometric consistency. The annual mosaicked data of the HH and HV polarizations are provided at a resolution of 25 m in digital numbers. To reduce the coherent combination of backscatter signals from different sources that manifests as speckle noise [47], the refined Lee speckle filter with a 5 × 5 window kernel was applied to the images. The backscatter coefficient of the images was then calculated using the equation provided by JAXA in Equation (1).

σ^{0} = 10 \log_{10} (D N^{2}) - C F

(1)

where:

σ⁰ = radar backscatter (dB), DN = pixel amplitude, and CF = Calibration Factor = −83.0.

The SRTM DEM data at 30 m resolution were downloaded from the USGS website [48] and used to generate the elevation, slope, and aspect data for the study area. To assess the effects of terrain features on soil erosion and hydrological processes, which are correlated with vegetation growth, the Topographic Wetness Index (TWI) and Topographic Position Index (TPI) were derived from the DEM data [49,50]. The topographic data were particularly useful as the two regimes were characterized by different topographic makeups: The “FR” regime is a relatively rugged and elevated terrain, while the “SR” regime is predominantly flat to undulating.

To improve the quality of the data fed into the machine learning algorithms for AGB estimation, all non-forested pixels of the study area were masked using a land cover map of the study area based on research by [40], which applied the supervised Geographic Object-Based Image Analysis (GEOBIA) approach for classification. Forest and non-forest classes were obtained by reclassifying forested and non-forested classes. The source study reported an overall classification accuracy of 89.44% for the year 2022, with producer’s and user’s accuracies exceeding 87% for vegetated classes. Also, since the observed AGBD data used in this study (derived from LiDAR footprint from GEDI beams) have approximately 25 m spatial resolution, all other raster data were resampled to 25 m to maintain spatial alignment with the AGBD measurements.

2.2.3. Human-Induced Disturbance Data

The disturbance regimes referred to in this study were delineated based on the dominant human-driven factors in the study area, such as mining activities, human settlements, and roads. As such, their inclusion in the AGB model hinged on their proximity to the forested parts of the area. To quantify the proximity of these features, their vector datasets were digitized using OpenStreetMap [51] as a reference. The vector datasets were then rasterized to compute the distance of each pixel from the nearest anthropogenic feature using QGIS’s (version 3.34) raster distance tool [52]. The resulting maps were then resampled to 25 m for spatial alignment and consistency. Like the earth observation data, the proximity maps present continuous spatial data throughout the landscape as predictors for estimating AGB.

2.3. Machine Learning Algorithms

Machine learning (ML) algorithms were used over traditional regressions in this study as they can handle the non-linear relationships that exist between forest biomass, tree structure, and the environment [53]. Two machine learning algorithms were employed for the modeling of AGB in each of the regimes: The random forest (RF) and the support vector machine (SVM) algorithms. The RF model was employed based on its ability to handle large amounts of data while effectively handling non-linear relationships in heterogeneous environments [54,55]. On the other hand, the SVM was included because of its ability to handle small data sizes with high dimensionality [56], as in the case of this study, where regime-specific models like the “FR” model utilize a small data size compared to other models. Models from these algorithms were developed using the scikit-learn Python package (version 1.8.0) [57].

2.3.1. Random Forest

Random forest is an ensemble of trees that can be applied to classification and regression tasks. In its classification applications, RF develops several decision trees using the Classification and Regression Trees (CARTs) technique, where each tree randomly produces a class prediction, followed by a final model prediction based on the class with the most votes. Regression tasks using RF use the same techniques as in the classification tasks, except predictions in the regression tasks are made by averaging the outputs of each decision tree. Because of the application of bootstrap aggregation and random feature selection, RF produces high-variance trees with low bias, ultimately reducing the risk of overfitting as seen in individual decision trees [58,59]. The hyperparameters that were used to tune the RF models are as follows: (1) n_estimators, which determines the number of trees used to build the forest, (2) min_samples_split, the number of random features required to split a node, (3) min_samples_leaf, the minimum number of samples that must be in a leaf node after splitting, (4) max_features, the minimum number of features to consider during a node split, and (5) max_depth, which controls the depth or growth of each tree.

2.3.2. Support Vector Machine

Support vector machine is a non-parametric statistical learning technique that performs regression and classification by finding the best hyperplane that separates the dataset into distinct classes in a high-dimensional space [60]. The SVM algorithm has been widely applied to remote sensing studies because of its ability to handle data of high dimensionality [61]. Its kernel function increases its versatility in handling both linear and non-linear relationships. Thus, SVM can transform data into a higher-dimensional space to determine an optimal decision boundary when the original dataset is not separable linearly. The hyperparameters used to tune the SVM models are as follows: (1) kernel, which transforms data into a higher-dimensional feature space, (2) gamma, which defines the influence of a single training instance on the decision boundary, (3) the regularization parameter, C, which balances the maximization of the classification margin and the minimization of classification errors on the training input and the (4) epsilon, which maintains a margin of tolerance around the hyperplane while fitting the data. Table 2 shows the hyperparameter values used for the RF and SVM models in different regimes.

2.4. Model Building and Evaluation

RF and SVM models were developed for each of the two disturbance regimes considered in this study. The regime-specific models were compared with two global models developed from the combination of the “FR” and “SR” regimes: The global (“FSR”) model and the global model with retuned hyperparameters (“FSR_r” model). Rather than manually selecting features based on the prevailing conditions in a regime, such as mining, roads, settlement, or topography, the recursive feature elimination with cross-validation (RFECV) method built into scikit-learn was used to select the most important features that would produce the most accurate model. The RFECV is a wrapper feature selection method that is based on the recursive feature elimination (RFE) method, which identifies strongly relevant, relevant, redundant, or irrelevant features, and selects the strongly relevant to relevant features to improve model accuracy [5]. In the RFECV, the RFE is combined with cross-validation in a backward elimination process, which begins with the complete feature set while progressively eliminating the less important features to a final feature subset that produces the highest accuracy [62]. The RFECV works with the RF algorithm by leveraging its inherent feature_importances_ attribute to rank features during the elimination process. For the SVM, the RFECV method was first combined with a linear SVM model using the linear kernel. Like the case of the RF, RFECV leverages the linear SVM model’s feature coefficient values (coef_ attribute) to rank features according to their importance in the elimination process [63]. The feature importance ranking by the RFECV-SVM was done using the absolute values of the variable coefficients. After the most important features were derived through linear SVM and RFECV, the radial basis function (rbf) kernel of the SVM algorithm was used to ultimately estimate the forest AGB based on the selected features due to its ability to capture non-linear relationships. No additional feature elimination was performed when transferring the global SVM predictors to the local regimes because the global model was applied without retraining. As a result, regime-specific coefficient-based rankings could not be derived, particularly since the final SVM models used the rbf kernel, which does not provide interpretable feature importance. This raises a limitation as the transfer of the global SVM model to the local regimes introduces a bias in the comparison of variable rankings. This contrasts with the RF framework, where feature importance can still be derived after model transfer. For each model, the dataset was split into 80% training and 20% testing. The best performing model(s) were used for a wall-to-wall AGB mapping of the landscape.

The optimal hyperparameter configurations for all models were determined through a randomized search (RandomizedSearchCV) with a minimum of five-fold cross-validation (cv = 5) and ten random iterations (n_iter = 10) applied. The randomized search approach was applied to save computational time while minimizing the risk of overfitting the training sample.

Two stages of model evaluation were employed to ensure a robust model performance assessment. The models were first evaluated using a 5-fold cross-validation with the training data randomly partitioned into 5 (k) equal subsets. During cross-validation, in each iteration, one subset out of the k is retained while the remaining k−1 is used for the training process. The process is looped until all remaining subsets are used for validation [64]. Ultimately, the optimized models were evaluated on the independent test subset to assess their generalizing abilities on unseen data. Figure 3 shows a summary of the model building process. The performances of the various models were compared based on the mean coefficient of determination (r²) values from the cross-validation and Root Mean Square Error (RMSE) values using Equations (2) and (3).

r^{2} = 1 - \frac{\sum_{1}^{n} {(B_{m} - B_{0})}^{2}}{\sum_{1}^{n} {(B_{m} - {\bar{B}}_{0})}^{2}}

(2)

RMSE = \sqrt{\frac{\sum_{i = 1}^{n} {(B_{m} - B_{0})}^{2}}{n}}

(3)

where

B_{m}

= modeled AGB,

B_{0}

= observed AGB

, {\bar{B}}_{0}

= mean observed AGB, and

n

= number of samples.

3. Results

3.1. Feature Selection

Since the development of the regime-specific models was grounded on the differences in human-mediated disturbances that affect the distribution of forest AGB, it was imperative that the most important predictive variables in each regime be focused on. Rather than manually selecting these regime-specific prevailing variables based on the assumption of their role in AGB distribution, feature selection using RFECV was employed. This was also important in saving computational time and power, as well as reducing the risk of overfitting in the models. The results for feature selection by the regime-specific models and the global models are shown in Figure 4a–c. From the RF models in the “FR” regime, slope was most frequently ranked as the most important variable, except in the local model, where it came after Landsat band 6 as the second most important variable. This was followed by other important predictors such as Landsat band 5, EVI, proximity to settlements, and elevation, with the relative ordering varying across models. The ALOS/PALSAR 2 HV polarization consistently ranked among the least important predictors in both the “FR” regime-specific model and the global models. For the SVM model in the “FR” regime, the least performing Landsat band 3 in the “FR” regime FSR model came out on top, followed by slope and Landsat band 6. Some variables that were of low importance from the RF models, like HV, aspect, and TWI, were eliminated. In the “SR” regime, while slope, elevation, and Landsat bands 6 and 7 dominated the most important variables across all RF models, the vegetation indices were mostly ranked low, contrary to RF models in the “FR” regime. The human factor, proximity to roads, was next after the variables of highest importance in the RF models. The SVM model for the “SR” regime ranked Landsat band 4 as the most important variable, followed by the NDVI, and Landsat bands 6, 3, and 7. The topographic features were ranked intermediate to low in importance, while the ALOS/PALSAR 2 data were among the least important. The top five most important variables for the global (landscape) RF model decreased in the following order: elevation, Landsat band 6, slope, Landsat band 7, and proximity to roads. The SVM model for the landscape ranked the NDVI and the Landsat bands 4 and 6 on top, while ranking the TWI and Landsat bands 2 and 5 as the least important. The ALOS/PALSAR 2 data were eliminated.

From the feature ranking results, slope appeared to be a consistently important variable in both the “FR” and “SR” regimes, except for the “SR” and global SVM models. In the “FR” regime, it exhibited high and stable importance across the RF models, with relative importance values ranging from 0.040 to 0.064. It ranked as the most important predictor in all RF models except the local RF, where it ranked second behind the Landsat band 6. A similar pattern was observed in the FR-SVM model, where it ranked second (14.1), closely following the Landsat band 7 (14.3). In the “SR” regime, it was consistently the top-ranked predictor across all RF models, with relative importance values ranging from 0.16 to 0.25. In each case, it was ranked one or two places above the Landsat band 5 or 6 (Figure 4). Considering the landscape in general, the RF models’ performances were mostly driven by topographic features (elevation and slope), while the SVM’s predictive strength was directly from the vegetation-related NDVI. The disturbance metrics were mostly ranked intermediate to low in terms of their contribution to AGB prediction. Also, unlike the SVM, the RF models consistently used all 19 predictor variables in every model to achieve the best results.

3.2. AGB Model Comparisons

A summary of the performances of the regime-specific and global models is presented in Table 3. The RF models were generally found to outperform the SVM models in every regime scenario. A general trend observed across all models was a decline in predictive performance with decreasing regime area. In the landscape, the RF model achieved the overall highest r² value of 0.54 with an RMSE of 57.71 Mg/ha. The corresponding SVM model was the most accurate SVM model in all regimes, with an r² of 0.46 and RMSE of 62.91. All the RF models in the “SR” regime produced similar performances, while the corresponding local (SR model) and retuned global SVM models produced identical r² values of 0.23. The untuned global model in the “SR” regime produced the least r² value of 0.21. Models in the “FR” regime were the worst performing across the board. While the best “FR” regime model came from the retuned global model, it showed only a marginal improvement in RMSE compared to the local model. The best SVM model in the “FR” regime, however, was the retuned global model, achieving an r² value of 0.14. Generally, the RF and SVM global (FSR) models were the best performing in the regime-specific scenarios. The FSR RF model was the best performing and was used in the wall-to-wall AGB estimation of the Atewa landscape. The retuned global (FSR_r) models achieved a slight improvement in accuracy over the global models when they were applied to local regimes, as seen in the slight reduction in RMSE values except in the “SR” SVM models. The consistently superior performance of the RF models relative to SVM across all regimes highlights the robustness of the algorithm in capturing complex, non-linear relationships characteristic of highly heterogeneous landscapes [54].

3.3. Influence of Human-Induced Disturbances on Model Prediction Errors

Aside from their importance as predictors in estimating AGB (Figure 4), it is also important to understand their role in error propagation in the prediction process. We employed the use of residual plots between the predicted and observed AGB against human-induced disturbances to assess heteroscedasticity. The heteroscedasticity analysis was limited to disturbance-related factors because error behavior associated with spectral, structural, and topographic predictors in AGB modeling has been widely reported [65,66,67], whereas the contribution of human-mediated disturbance variables to error propagation remains less explored. Although the best-performing AGB models exhibited heteroscedasticity, with residual spread increasing at higher predicted AGB values (Figure 5), the residuals plotted against distance to disturbance factors in the landscape and “SR” regimes showed a different pattern where residuals were averagely densely clustered at short distances but with a wide spread, which attenuated with increasing distance (Figure 6). This trend is more pronounced in the “SR” regime compared to the “FSR”. The plot of residuals vs. the distance to settlements, however, averagely showed a relatively uniform clustering with reduced residuals at all distances in the “SR” and “FSR” regimes. That is, in the “FSR” and “SR” regimes, residuals associated with human-induced disturbance factors showed a reduced spread on average with increasing distance, despite the presence of larger residuals at short distances. In the “FR” regime, unlike the landscape and “SR” regimes, residuals showed a persistently wide spread with relatively large magnitudes across all distances to disturbance factors, indicating consistently high prediction uncertainty rather than distance-dependent attenuation.

3.4. AGB Spatial Distribution

The landscape RF model was used to model the wall-to-wall AGB of the study area (Figure 7) as it achieved the best performance. From the modeled AGB, the “FR” regime recorded a minimum of 22.30 Mg/ha and a maximum of 342.27 Mg/ha. The mean estimated AGB for the “FR” regime was 185.70 Mg/ha. By visual inspection, it is observed that the fringes of the forest reserve (“FR” regime) mostly recorded the lowest AGB values. In the surrounding areas (“SR” regime), the highest recorded estimated AGB value was 322.32 Mg/ha, with a low of 13.24 Mg/ha. The “SR” regime recorded a mean AGB value of 59.48 Mg/ha.

4. Discussion

4.1. The Effect of Regime Stratification on Predicting AGB

Most published studies that applied stratification in estimating forest AGB using remote sensing achieved marginal model accuracy over global models [24,25,26,27]. This was mostly attributed to the homogenous grouping of forest types based mainly on structural attributes and species [26]. Contrary to these studies, this study employed stratification based on human-induced disturbances. Stratifying according to prevailing disturbances implies that Cochran’s ideal stratification, where each class occupies a unique stratum, does not apply in this study [68]. Thus, shifting focus to the role played by human pressures in estimating forest AGB. The poor model performances in the local regimes, “FR” and “SR”, could be attributed to the regime stratification, which introduced pronounced heterogeneity or homogeneity, which might have affected the machine learning algorithms’ generalizing abilities. The “FR” stratum comprises dense and structurally complex vegetation, which often leads to spectral and backscatter saturation, where remotely sensed reflectance signals cease to increase proportionally with higher AGB values. This saturation effect limits the sensitivity of optical and SAR sensors in capturing biomass variability within high-biomass forest areas, ultimately affecting prediction performances [69,70]. Also, uneven representation of target values in training data has been shown to degrade model performance by increasing uncertainty and limiting variance explanation [71]. Accordingly, the imbalanced AGB distribution in the “FR” regime, with fewer extreme high-biomass samples (Figure 2), likely contributed to the reduced r². A high level of internal variability was observed in the “SR” regime. With variations in vegetation such as crop lands, plantations, secondary forests, regenerating vegetation patches, and human-induced disturbances such as mines (active and abandoned), roads, and settlements, the machine learning model is expected to explain more variance in the regime [24]. However, the unstructured mosaicked nature of the predictor variables in the “SR” regime (excessive heterogeneity) might have affected the model’s ability to capture meaningful relationships, resulting in its poor predictions. This is evident from the correlation matrices in Figure 8 where the correlation strength between AGB and most of the predictors is reduced in the “SR” regime, compared to the general landscape. Thus, the excessively complex nature of the data from the “SR” regime likely introduced considerable noise and redundancy into the dataset, thereby reducing the model’s predictive performance [72]. Creating substrata within the “SR” regime to capture the different types of vegetation is likely to solve the issue of excessive heterogeneity. However, that will come at the cost of a high number of training samples. Contrary to the poor prediction in the “FR” and “SR” regimes, the models for the landscape yielded the best performance in both the RF and SVM models. This could be attributed to the advantage of a broader range of explainable relationships between predictors and AGB in the landscape. Specifically, in the landscape, areas of high AGBD values are concentrated in the elevated and less disturbed portion (FR), whereas the surrounding regions (SR) exhibit a highly heterogeneous vegetation mosaic influenced by diverse anthropogenic activities. Considering the regimes (strata) in isolation, they present limited variability or excessive and/or unstructured heterogeneity. However, when combined, they provide distinctive patterns of contrasting biophysical and disturbance gradients that the machine learning algorithm can discern and produce accurate predictions [73], hence the use of the term “structured heterogeneity” for the landscape. Despite explaining 54% of the variability in AGB, the accuracy of the best performing global RF model aligns with the accuracy outcomes of several studies that estimate AGB in tropical regions, with values falling between 0.30 and 0.95 [74]. Although machine learning algorithms have widely enhanced AGB prediction accuracy [18,19,75], especially under multi-sensor data fusion, model performance may still be limited by data quality and size, as well as sensor saturation effects [65,69]. In such cases, individual algorithms can exhibit reduced accuracy, while their ensembling often improves robustness by compensating for algorithm- and sensor-specific weaknesses [76,77]. However, reported increases in r² values are found in studies that integrate passive remote sensing techniques such as SAR and LiDAR. In particular, LiDAR-derived structural metrics are known to reduce the effects of spectral and backscatter saturation that often limit optical- and SAR-based biomass estimation [20]. The absence of LiDAR data in the present study reflects both geographic unavailability and acquisition cost constraints. However, incorporating such datasets in future work would likely improve model sensitivity to canopy structure and enhance predictive performance.

Generally, the results suggest that even though optical and passive sensors capture disturbance effects, they are not specific to disturbance regimes. Thus, disturbance regime-based stratification may not necessarily benefit from multi-source data fusion. However, multi-source predictors within a global modeling framework allow these complementary sensor responses to be exploited more effectively than when models are confined to disturbance-based strata, particularly in heterogeneous forest landscapes.

4.2. Influence of Human-Induced Disturbances on Predicting AGB

While producing poorly performing models, regime-specific characteristics played key roles in explaining the variations in forest AGB. For instance, differences between the “FR” and “SR” regimes were strongly reflected in the spectral predictor importance, where the dominance of the dense and structurally complex vegetation in the “FR” regime likely enhanced the sensitivity of Landsat band 5 (NIR) and vegetation indices, which are particularly responsive to canopy structure and photosynthetic activity. The sensitivity of Landsat band 5 and vegetation indices reflects their strong coupling with biomass-related structural attributes, even in high-biomass tropical forests where saturation effects may limit biomass discrimination [32]. As a result, these spectral variables continue to play a dominant role in explaining AGB variability under closed-canopy conditions. In the “SR” regime, the inclusion of the disturbance features, specifically roads, though not the most important, consistently came after the topographic features and some spectral bands in the RF models. This observation suggests that because anthropogenic features exert an indirect influence on AGB distribution, their predictive strength is comparatively weaker than that of spectral reflectance variables, which more directly capture vegetation structure, density, and physiological conditions within the landscape. Thus, the disturbance features play a role in shaping biomass patterns through mechanisms such as forest fragmentation, access-related degradation, and land use pressure, rather than through direct measurement of vegetation structure [31]. As a result, disturbance variables contribute complementary contextual information that helps explain spatial variability in AGB, even though their predictive strength remains weaker than that of spectral reflectance variables, which more directly capture vegetation density, structure, and physiological condition. It is, however, important to note that the feature importance rankings produced by the SVM model are based on the coefficients of the variables using the linear kernel, which was intentionally used within the RFECV framework to enable stable and interpretable feature ranking. In this context, variable importance reflects only linear relationships between predictors and AGB. Because of this, predictors that influence AGB through non-linear responses, such as the saturation of spectral and backscatter reflectance at high biomass, are likely to be underrepresented in the linear SVM importance rankings. However, in this study, the influence of such non-linearities appears limited as key SAR variables (HH and HV), as well as vegetation indices, exhibited broadly consistent importance rankings between the RF and SVM models (Figure 4). This consistency suggests that although saturation effects are widely reported in AGB estimations [20], their impact on feature importance was not dominant in this dataset. Nevertheless, the possibility of non-linear signal-biomass interactions cannot be entirely overlooked, particularly in structurally complex, high-biomass forest conditions, which justifies the subsequent use of a non-linear SVM (rbf_kernel) for AGB prediction following feature selection. It is observed from the plots of residuals and the human-induced disturbances that the inclusion of the disturbance factors, especially in the “SR” regime and in the landscape in general, did not contribute significantly to systematic prediction bias at high AGB values by the models. That is, a major source of prediction bias in the disturbed regimes (SR and FSR) may be associated with the spectral, structural, and topographic data, considering the overall heteroscedastic nature of the AGB residuals. For instance, while some relief variability occurs in the “SR” regime, the “FR” regime constitutes the most rugged highland area of the study region. In such terrain, SAR imaging geometry effects can reduce the sensitivity of backscatter to biomass [78], leading to bias propagation in the predicted AGB and contributing to the lower importance of SAR predictors. Also, areas near mines, roads, and settlements are more likely to experience canopy removal or regeneration, which may affect the spatiotemporal distribution of AGB. Without these contextual variables, the model relies only on spectral and topographic signals, which must explain both natural forest structure and human damage at the same time. This may end up increasing prediction uncertainty. By adding disturbance metrics, the model can distinguish intact forest from disturbance-influenced vegetation, which reduces unexplained variability and stabilizes prediction errors across the landscape and in areas with high AGB. The plots of residuals against both predicted AGB and the human-induced disturbance variables in the “FR” regime showed similar patterns, characterized by widely dispersed residuals. This trend can be attributed to the dominance of dense tropical forest cover within the regime, where anthropogenic disturbances are reduced and spatially limited. This results in minimal structural and spectral variations in vegetation across different distances from disturbance sources. Subsequently, the residual distributions relative to disturbance features are identical to those observed with predicted AGB, indicating that disturbance-related variables exert limited influence on reducing model bias within the “FR” regime. Contrary to results from research that applied forest type stratification for increased homogeneity and subsequent improved model performances [24,25,26,27], stratification according to disturbance regimes, as applied in this study, suggests the need for structured heterogeneity in complex landscapes to increase the generalizability of the prediction model.

5. Conclusions

This study deviated from the conventional forest type stratification that focuses on the structural characteristics, age, and species applied in AGB estimation. We assessed the relevance of the inclusion of human-mediated disturbance in estimating AGB, thus stratifying according to disturbances in each regime. By incorporating disturbance-specific predictors, such as distance to mines, roads, and settlements, with conventional spectral, structural, and topographic variables, the research results reveal a departure from the prevailing results in the literature. While forest type stratification often enhances model accuracy, the global (RF and SVM) models in this study outperformed the regime-specific models. The global RF model, which was the best performing model overall, produced an r² value of 0.54 with an RMSE of 57.71 Mg/ha. This was followed by the global SVM model (r² = 0.46, RMSE = 62.91). All local models produced the highest r² value of 0.26. This highlights the limitations of regime-specific modeling in highly heterogeneous human-impacted landscapes and underscores the potential of global models to better generalize across mixed-disturbance environments. This study shows that excessive homogeneity or heterogeneity may affect modeling performance due to reduced variance or excessive noise, such as that found in the local regimes of this study. The disturbance factors reduced model bias at increased distances, even though the overall AGB models remained heteroscedastic. While disturbance factors may not exhibit as direct a relationship with AGB as satellite-derived structural metrics, they provide essential contextual information that captures variability associated with human-mediated disturbances. This variability is not fully represented by spectral, topographic, or structural predictors alone. Thus, in highly disturbed and heterogeneous landscapes, disturbance-related predictors, although not the most influential, can still play an important role in improving model behavior. This study, therefore, contributes another perspective to AGB estimation research by demonstrating that stratification by disturbance factors is not universally beneficial, while disturbance-aware modeling may offer a more robust framework for monitoring and managing biomass in disturbed tropical forests. The research contributes a methodological advancement in the estimation of AGB, especially in disturbed landscapes where human factors play a crucial role in the spatiotemporal distribution of AGB. The results also bring to light the relevance of managing human-mediated disturbance factors in forested landscapes.

Author Contributions

Conceptualization, L.B.A.; methodology, L.B.A.; software, L.B.A.; validation, Y.S.H.; formal analysis, L.B.A.; investigation, L.B.A.; resources, Y.S.H.; data curation, L.B.A.; writing—original draft preparation, L.B.A.; writing—review and editing, Y.S.H.; visualization, L.B.A.; supervision, Y.S.H.; project administration, Y.S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

This research was partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 23K20541 and the JSPS Program for Forming Japan’s Peak Research Universities (J-PEAKS) Grant Number JPJS00420230001. L.B.A. acknowledges support from a scholarship by the Ministry of Education, Culture, Sports, Science and Technology, Japan (MEXT). The authors express their gratitude to the faculty of Environmental Earth Science for technical and administrative support. We also appreciate the editor and all five anonymous reviewers whose insightful suggestions greatly improved the quality of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AGB	Aboveground biomass
AGBD	Aboveground biomass density
ARFR	Atewa Range Forest Reserve
CART	Classification and Regression Trees
DEM	Digital Elevation Model
EVI	Enhanced Vegetation Index
GEDI	Global Ecosystem Dynamics and Investigations
GEE	Google Earth Engine
GEOBIA	Geographic Object-based Image Analysis
HFZ	High Forest Zone
ISS	International Space Station
JAXA	Japan Aerospace Exploration Agency
ML	Machine learning
NFI	National Forest Inventory
NIR	Near-infrared
OLI	Operational Land Imager
PALSAR	Phased Array type L-band Synthetic Aperture Radar
RBF	Radial Basis Function
RF	Random forest
RFE	Recursive Feature Elimination
RFECV	Recursive Feature Elimination with cross-validation
RMSE	Root Mean Square Error
SAR	Synthetic Aperture Radar
SAVI	Soil Adjusted Vegetation Index
SRTM	Shuttle Radar Topography Mission
SVM	Support Vector Machine
TPI	Topographic Position Index
TWI	Topographic Wetness Index

References

O’Callaghan, C.J.; Irwin, S.; Byrne, K.A.; O’Halloran, J. The role of planted forests in the provision of habitat: An Irish perspective. Biodivers. Conserv. 2017, 26, 3103–3124. [Google Scholar] [CrossRef]
Thom, D.; Rammer, W.; Seidl, R. The impact of future forest dynamics on climate: Interactive effects of changing vegetation and disturbance regimes. Ecol. Monogr. 2017, 87, 665–684. [Google Scholar] [CrossRef]
Torre-Tojal, L.; Bastarrika, A.; Boyano, A.; Lopez-Guede, J.M.; Grana, M. Above-ground biomass estimation from LiDAR data using random forest algorithms. J. Comput. Sci. 2022, 58, 101517. [Google Scholar] [CrossRef]
Mehmood, K.; Anees, S.A.; Rehman, A.; Tariq, A.; Liu, Q.; Muhammad, S.; Rabbi, F.; Pan, S.; Hatamleh, W.A. Assessing forest cover changes and fragmentation in the Himalayan temperate region: Implications for forest conservation and management. J. For. Res. 2024, 35, 82. [Google Scholar] [CrossRef]
Ibrahim, S.A.; Balzter, H.; Tansey, K. Machine learning feature importance selection for predicting aboveground biomass in African savannah with landsat 8 and ALOS PALSAR data. Mach. Learn. Appl. 2024, 16, 100561. [Google Scholar] [CrossRef]
Wang, Z.; Ma, Y.; Zhang, Y.; Shang, J. Review of remote sensing applications in grassland monitoring. Remote Sens. 2022, 14, 2903. [Google Scholar] [CrossRef]
Khan, M.N.; Tan, Y.; Gul, A.A.; Abbas, S.; Wang, J. Forest aboveground biomass estimation and inventory: Evaluating remote sensing-based approaches. Forests 2024, 15, 1055. [Google Scholar] [CrossRef]
Hojo, A.; Avtar, R.; Nakaji, T.; Tadono, T.; Takagi, K. Modeling forest above-ground biomass using freely available satellite and multisource datasets. Ecol. Inform. 2023, 74, 101973. [Google Scholar] [CrossRef]
Tadese, S.; Soromessa, T.; Bekele, T.; Bereta, A.; Temesgen, F. Above Ground Biomass Estimation Methods and Challenges: A Review. Measurement 2019, 9, 12–25. [Google Scholar]
Ji, L.; Wylie, B.K.; Nossov, D.R.; Peterson, B.; Waldrop, M.P.; McFarland, J.W.; Rover, J.; Hollingsworth, T.N. Estimating aboveground biomass in interior Alaska with Landsat data and field measurements. Int. J. Appl. Earth Obs. Geoinf. 2012, 18, 451–461. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, T.; Batelaan, O.; Duan, L.; Wang, Y.; Li, X.; Li, M. Spatiotemporal fusion of multi-source remote sensing data for estimating aboveground biomass of grassland. Ecol. Indic. 2023, 146, 109892. [Google Scholar] [CrossRef]
Duncanson, L.; Kellner, J.R.; Armston, J.; Dubayah, R.; Minor, D.M.; Hancock, S.; Healey, S.P.; Patterson, P.L.; Saarela, S.; Marselis, S.; et al. Aboveground biomass density models for NASA’s Global Ecosystem Dynamics Investigation (GEDI) lidar mission. Remote Sens. Environ. 2022, 270, 112845. [Google Scholar] [CrossRef]
Zhang, L.; Yang, L.; Sun, J.; Zhu, Q.; Wang, T.; Zhao, H. Estimation of Tree Species Diversity in Warm Temperate Forests via GEDI and GF-1 Imagery. Forests 2025, 16, 570. [Google Scholar] [CrossRef]
Alvites, C.; O’sullivan, H.; Francini, S.; Marchetti, M.; Santopuoli, G.; Chirici, G.; Lasserre, B.; Marignani, M.; Bazzato, E. High-resolution canopy height mapping: Integrating nasa’s global ecosystem dynamics investigation (gedi) with multi-source remote sensing data. Remote Sens. 2024, 16, 1281. [Google Scholar] [CrossRef]
Guo, Q.; Du, S.; Jiang, J.; Guo, W.; Zhao, H.; Yan, X.; Zhao, Y.; Xiao, W. Combining GEDI and sentinel data to estimate forest canopy mean height and aboveground biomass. Ecol. Inform. 2023, 78, 102348. [Google Scholar] [CrossRef]
Adrah, E.; Jaafar, W.S.W.M.; Bajaj, S.; Omar, H.; Leite, R.V.; Silva, C.A.; Cardil, A.; Mohan, M. Analyzing canopy height variations in secondary tropical forests of Malaysia using NASA GEDI. IOP Conf. Ser. Earth Environ. Sci. 2021, 880, 012031. [Google Scholar] [CrossRef]
Hunka, N.; Santoro, M.; Armston, J.; Dubayah, R.; McRoberts, R.E.; Næsset, E.; Quegan, S.; Urbazaev, M.; Pascual, A.; May, P.B.; et al. On the NASA GEDI and ESA CCI biomass maps: Aligning for uptake in the UNFCCC global stocktake. Environ. Res. Lett. 2023, 18, 124042. [Google Scholar] [CrossRef]
Zhang, Y.; Zou, Y.; Wang, Y. Remote Sensing of Forest Above-Ground Biomass Dynamics: A Review. Forests 2025, 16, 821. [Google Scholar] [CrossRef]
Thapa, B.; Lovell, S.; Wilson, J. Remote sensing and machine learning applications for aboveground biomass estimation in agroforestry systems: A review. Agrofor. Syst. 2023, 97, 1097–1111. [Google Scholar] [CrossRef]
Lu, D.; Chen, Q.; Wang, G.; Liu, L.; Li, G.; Moran, E. A survey of remote sensing-based aboveground biomass estimation methods in forest ecosystems. Int. J. Digit. Earth 2014, 9, 63–105. [Google Scholar] [CrossRef]
Yan, X.; Li, J.; Smith, A.R.; Yang, D.; Ma, T.; Su, Y.; Shao, J. Evaluation of machine learning methods and multi-source remote sensing data combinations to construct forest above-ground biomass models. Int. J. Digit. Earth 2023, 16, 4471–4491. [Google Scholar] [CrossRef]
McRoberts, R.E.; Tomppo, E.O. Remote sensing support for national forest inventories. Remote Sens. Environ. 2007, 110, 412–419. [Google Scholar] [CrossRef]
Haakana, H.; Heikkinen, J.; Katila, M.; Kangas, A. Efficiency of post-stratification for a large-scale forest inventory—Case Finnish NFI. Ann. For. Sci. 2019, 76, 9. [Google Scholar] [CrossRef]
Wu, Z.; Liu, X.; Cheng, S.; Yang, C.; Wang, Z.; Liu, Y.; Dong, L.; Li, F.; Hao, Y. Evaluating the effectiveness of forest type stratification for aboveground biomass inference. Int. J. Appl. Earth Obs. Geoinf. 2025, 143, 104829. [Google Scholar] [CrossRef]
Chen, L.; Ren, C.; Zhang, B.; Wang, Z.; Man, W.; Liu, M. Improved object-based mapping of aboveground biomass using geographic stratification with GEDI data and multi-sensor imagery. Remote Sens. 2023, 15, 2625. [Google Scholar] [CrossRef]
Jiang, X.; Li, G.; Lu, D.; Chen, E.; Wei, X. Stratification-based forest aboveground biomass estimation in a subtropical region using airborne lidar data. Remote Sens. 2020, 12, 1101. [Google Scholar] [CrossRef]
Latifi, H.; Fassnacht, F.E.; Hartig, F.; Berger, C.; Hernández, J.; Corvalán, P.; Koch, B. Stratified aboveground forest biomass estimation by remote sensing data. Int. J. Appl. Earth Obs. Geoinf. 2015, 38, 229–241. [Google Scholar] [CrossRef]
Kumar, A.; Gopal, R.; Sabnam, S.; Kumar, R.; Prasad, M.; Mahto, D.; Kumari, V. Impact of Forest Degradation on the Livelihood of Forest-Dependent Communities. In Forest Degradation and Management: An Indian Perspective; Springer Nature: Cham, Switzerland, 2025; pp. 255–265. [Google Scholar]
Kumar, R.; Kumar, A.; Saikia, P. Deforestation and forests degradation impacts on the environment. In Environmental Degradation: Challenges and Strategies for Mitigation; Springer International Publishing: Cham, Switzerland, 2022; pp. 19–46. [Google Scholar]
Souza, C., Jr.; Siqueira, J.; Sales, M.; Fonseca, A.; Ribeiro, J.; Numata, I.; Cochrane, M.A.; Barber, C.P.; Roberts, D.A.; Barlow, J. Ten-year Landsat classification of deforestation and forest degradation in the Brazilian Amazon. Remote Sens. 2013, 5, 5493–5513. [Google Scholar] [CrossRef]
Ioki, K.; Tsuyuki, S.; Hirata, Y.; Phua, M.-H.; Wong, W.V.C.; Ling, Z.-Y.; Saito, H.; Takao, G. Estimating above-ground biomass of tropical rainforest of different degradation levels in Northern Borneo using airborne LiDAR. For. Ecol. Manag. 2014, 328, 335–341. [Google Scholar] [CrossRef]
Muhe, S.; Argaw, M. Estimation of above-ground biomass in tropical afro-montane forest using Sentinel-2 derived indices. Environ. Syst. Res. 2022, 11, 5. [Google Scholar] [CrossRef]
Brede, B.; Calders, K.; Lau, A.; Raumonen, P.; Bartholomeus, H.M.; Herold, M.; Kooistra, L. Non-destructive tree volume estimation through quantitative structure modelling: Comparing UAV laser scanning with terrestrial LIDAR. Remote Sens. Environ. 2019, 233, 111355. [Google Scholar] [CrossRef]
Amponsah, A.; Nasare, L.I.; Tom-Dery, D.; Baatuuwie, B.N. Land cover changes of Atewa Range Forest Reserve, a biodiversity hotspot in Ghana. Trees For. People 2022, 9, 100301. [Google Scholar] [CrossRef]
Nyamekye, C.; Ghansah, B.; Agyapong, E.; Kwofie, S. Mapping changes in artisanal and small-scale mining (ASM) landscape using machine and deep learning algorithms—A proxy evaluation of the 2017 ban on ASM in Ghana. Environ. Chall. 2021, 3, 100053. [Google Scholar] [CrossRef]
Garcia-Montiel, D.C.; Scatena, F.N. The effect of human activity on the structure and composition of a tropical forest in Puerto Rico. For. Ecol. Manag. 1994, 63, 57–78. [Google Scholar] [CrossRef]
Milodowski, D.T.; Coomes, D.A.; Swinfield, T.; Jucker, T.; Riutta, T.; Malhi, Y.; Svátek, M.; Kvasnica, J.; Burslem, D.F.R.P.; Ewers, R.M.; et al. The impact of logging on vertical canopy structure across a gradient of tropical forest degradation intensity in Borneo. J. Appl. Ecol. 2021, 58, 1764–1775. [Google Scholar] [CrossRef]
Gobbi, B.; Van Rompaey, A.; Gasparri, N.I.; Vanacker, V. Forest degradation in the Dry Chaco: A detection based on 3D canopy reconstruction from UAV-SfM techniques. For. Ecol. Manag. 2022, 526, 120554. [Google Scholar] [CrossRef]
McCullough, J.; Alonso, L.E.; Naskrecki, P.; Wright, H.E.; Osei-Owusu, Y. A rapid biological assessment of the Atewa Range Forest Reserve, eastern Ghana. RAP Bull. Biol. Assess. 2007, 47, 180–191. [Google Scholar]
Adams, L.B.; Hayakawa, Y.S. Mapping alluvial mine dynamics in the Atewa landscape in Ghana using Geographic Object-Based Image Analysis (GEOBIA) and GIS. Environ. Monit. Assess. 2025, 197, 473. [Google Scholar] [CrossRef]
Kusimi, J.M. Characterizing land disturbance in Atewa range forest reserve and buffer zone. Land Use Policy 2015, 49, 471–482. [Google Scholar] [CrossRef]
NASA. Earthdata Search. Available online: https://search.earthdata.nasa.gov/ (accessed on 8 August 2025).
Dubayah, R.O.; Armston, J.; Kellner, J.R.; Duncanson, L.; Healey, S.P.; Patterson, P.L.; Hancock, S.; Tang, H.; Bruening, J.M.; Hofton, M.A.; et al. GEDI L4A Footprint Level Aboveground Biomass Density, Version 2.1; ORNL DAAC: Oak Ridge, TN, USA, 2022. [Google Scholar]
Shendryk, Y. Fusing GEDI with earth observation data for large area aboveground biomass mapping. Int. J. Appl. Earth Obs. Geoinf. 2022, 115, 103108. [Google Scholar] [CrossRef]
Liu, A.; Cheng, X.; Chen, Z. Performance evaluation of GEDI and ICESat-2 laser altimeter data for terrain and canopy height retrievals. Remote Sens. Environ. 2021, 264, 112571. [Google Scholar] [CrossRef]
JAXA. ALOS. Available online: https://www.eorc.jaxa.jp/ALOS (accessed on 8 July 2025).
Parhad, S.V.; Warhade, K.K.; Shitole, S.S. Speckle noise reduction in sar images using improved filtering and supervised classification. Multimed. Tools Appl. 2024, 83, 54615–54636. [Google Scholar] [CrossRef]
USGS. Earthexplorer. Available online: https://earthexplorer.usgs.gov/ (accessed on 23 August 2025).
Lemke, D.; Dimov, L.; Czech, H.; Knight, P.; Finch, W.; Condit, R. Relationship between topographic variables and live aboveground tree biomass on a large temperate forest plot. For. Ecosyst. 2025, 14, 100338. [Google Scholar] [CrossRef]
Salinas-Melgoza, M.A.; Skutsch, M.; Lovett, J.C. Predicting aboveground forest biomass with topographic variables in human-impacted tropical dry forest landscapes. Ecosphere 2018, 9, e02063. [Google Scholar] [CrossRef]
OpenStreetMap Contributors. Planet Dump. OpenStreetMap Foundation, 2022. Available online: https://www.openstreetmap.org (accessed on 18 August 2025).
QGIS Development Team. QGIS Geographic Information System (Version 3.34); Open Source Geospatial Foundation: Beaverton, OR, USA, 2023; Available online: https://qgis.org (accessed on 20 August 2025).
Liu, Z.; Peng, C.; Work, T.; Candau, J.N.; DesRochers, A.; Kneeshaw, D. Application of machine-learning methods in forest ecology: Recent progress and future challenges. Environ. Rev. 2018, 26, 339–350. [Google Scholar] [CrossRef]
Amin, G.; Imtiaz, I.; Haroon, E.; Saqib, N.U.; Shahzad, M.I.; Nazeer, M. Assessment of machine learning algorithms for land cover classification in a complex mountainous landscape. J. Geovisualization Spat. Anal. 2024, 8, 34. [Google Scholar] [CrossRef]
Dhanda, P.; Nandy, S.; Kushwaha, S.P.; Ghosh, S.; Murthy, Y.K.; Dadhwal, V.K. Optimizing spaceborne LiDAR and very high resolution optical sensor parameters for biomass estimation at ICESat/GLAS footprint level using regression algorithms. Prog. Phys. Geogr. 2017, 41, 247–267. [Google Scholar] [CrossRef]
Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. SPRS. J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Roy, A.D.; Debbarma, S. Comparing the allometric model to machine learning algorithms for aboveground biomass estimation in tropical forests. Ecol. Front. 2024, 44, 1069–1078. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Mantero, P.; Moser, G.; Serpico, S.B. Partially supervised classification of remote sensing images through SVM-based probability density estimation. IEEE Trans. Geosci. Remote Sens. 2005, 43, 559–570. [Google Scholar] [CrossRef]
Awad, M.; Fraihat, S. Recursive feature elimination with cross-validation with decision tree: Feature selection method for machine learning-based intrusion detection systems. J. Sens. Actuator Netw. 2023, 12, 67. [Google Scholar] [CrossRef]
Qiu, A.; Yang, Y.; Wang, D.; Xu, S.; Wang, X. Exploring parameter selection for carbon monitoring based on Landsat-8 imagery of the aboveground forest biomass on Mount Tai. Eur. J. Remote Sens. 2020, 53, 4–15. [Google Scholar] [CrossRef]
Ramezan, C.A.; Warner, T.E.; Maxwell, A. Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification. Remote Sens. 2019, 11, 185. [Google Scholar] [CrossRef]
Lamahewage, S.H.; Witharana, C.; Riemann, R.F.R.; Worthley, T. Comparing Machine Learning and Statistical Models for Remote Sensing-Based Forest Aboveground Biomass Estimations. Forests 2025, 16, 1430. [Google Scholar] [CrossRef]
Yang, C.; Liu, A.; Chen, Y. Seventeen-year reconstruction of tropical forest aboveground biomass dynamics in Borneo using GEDI L4B and multi-sensor data fusion. Remote Sens. 2025, 17, 3231. [Google Scholar] [CrossRef]
Yu, X.; Ge, H.; Lu, D.; Zhang, M.; Lai, Z.; Yao, R. Comparative study on variable selection approaches in establishment of remote sensing model for forest biomass estimation. Remote Sens. 2019, 11, 1437. [Google Scholar] [CrossRef]
Cochran, W. Sampling Techniques; John Wiley: New York, NY, USA, 1977. [Google Scholar]
Soja, M.J.; Quegan, S.; d’Alessandro, M.M.; Banda, F.; Scipal, K.; Tebaldini, S.; Ulander, L.M. Mapping above-ground biomass in tropical forests with ground-cancelled P-band SAR and limited reference data. Remote Sens. Environ. 2021, 253, 112153. [Google Scholar] [CrossRef]
Zhao, P.; Lu, D.; Wang, G.; Wu, C.; Huang, Y.; Yu, S. Examining Spectral Reflectance Saturation in Landsat Imagery and Corresponding Solutions to Improve Forest Aboveground Biomass Estimation. Remote Sens. 2016, 8, 469. [Google Scholar] [CrossRef]
Avelino, J.G.; Cavalcanti, G.D.; Cruz, R.M. Resampling strategies for imbalanced regression: A survey and empirical analysis. Artif. Intell. Rev. 2024, 57, 82. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Gong, Z.; Zhong, P.; Hu, W. Diversity in machine learning. IEEE Access 2019, 7, 64323–64350. [Google Scholar] [CrossRef]
Abbas, S.; Wong, M.S.; Wu, J.; Shahzad, N.; Muhammad Irteza, S. Approaches of satellite remote sensing for the assessment of above-ground biomass across tropical forests: Pan-tropical to national scales. Remote Sens. 2020, 12, 3351. [Google Scholar] [CrossRef]
Li, Y.; Li, M.; Li, C.; Liu, Z. Forest aboveground biomass estimation using Landsat 8 and Sentinel-1A data with machine learning algorithms. Sci. Rep. 2020, 10, 9952. [Google Scholar] [CrossRef]
Zhou, J.; Zan, M.; Zhai, L.; Yang, S.; Xue, C.; Li, R.; Wang, X. Remote sensing estimation of aboveground biomass of different forest types in Xinjiang based on machine learning. Sci. Rep. 2025, 15, 6187. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.; Sun, Z.; Zhang, H.; Zhang, H.; Qiu, H. Aboveground forest biomass estimation using tent mapping atom search optimized backpropagation neural network with Landsat 8 and Sentinel-1A data. Remote Sens. 2023, 15, 5653. [Google Scholar] [CrossRef]
Wu, Y.; Chen, Y.; Tian, C.; Yun, T.; Li, M. Estimation of subtropical forest aboveground biomass using active and passive Sentinel data with canopy height. Remote Sens. 2025, 17, 2509. [Google Scholar] [CrossRef]

Figure 1. A map of the study area showing the two distinct regimes adopted in the study: the ARFR, referred to as the “FR” regime, and the surrounding area, referred to as the “SR” regime. The green vector points show the spatial distribution of GEDI AGBD observations across the study area. Satellite image: contains modified Copernicus Sentinel 2 data (2022).

Figure 2. GEDI AGBD distribution in the (a) “FR” regime and (b) “SR” regime. Dashed red line = mean AGBD.

Figure 3. Summary of model development process.

Figure 4. Variable importances for (a) “FR” regime, (b) “SR” regime, and (c) landscape; n is the number of predictors used by the ML algorithm to develop the model; Band 2 = Landsat 8 blue band; Band 3 = Landsat 8 green band; Band 4 = Landsat 8 red band; Band 5 = Landsat 8 NIR band; Band 6 = Landsat 8 SWIR 1 band; Band 7—Landsat 8 SWIR 2 band.

Figure 5. Residuals vs. predicted AGB plots from the best predictive models for the (a) landscape (FSR), (b) “SR” regime, and (c) “FR” regime. The solid red lines represent the range of distribution of the residuals, while the dashed red line represents the zero-residual reference line.

Figure 6. Residuals vs. human-induced disturbances from the best predictive models for the (a) landscape (FSR), (b) “SR” regime, and (c) “FR” regime. The solid red lines represent the range of distribution of the residuals, while the dashed red line represents the zero-residual reference line.

Figure 7. Modeled AGB of Atewa landscape.

Figure 8. Correlation matrix between response and predictor variables in (a) landscape (FSR), (b) “SR” regime, and (c) “FR” regime.

Table 1. Predictor variables used for model development.

Data Source	Feature
Landsat 8	Band 2—Blue
	Band 3—Green
	Band 4—Red
	Band 5—NIR
	Band 6—SWIR 1
	Band 7—SWIR 2
Landsat Vegetation Indices	NDVI
	SAVI
	EVI
ALOS/PALSAR 2	HH
ALOS/PALSAR 2	HV
SRTM DEM	Elevation
	Slope
	Aspect
Topographic indices	TPI
Topographic indices	TWI
Proximity data	Distance to mines
	Distance to roads
	Distance to settlements

Table 2. Hyperparameter values used for tuning models.

			FR		SR		LS
		Test Value Range	Local	LS_r	Local	LS_r
RF	n_estimators	1–200	173	287	173	275	181
	min_samples_split	2–6	4	5	4	4	3
	min_samples_leaf	2–6	4	2	4	4	7
	max_features	‘sqrt’, 0.5–1.0	‘sqrt’	‘sqrt’	‘sqrt’	8	0.7
	max_depth	1–100	5	5	7	7	5
SVM	kernel		‘rbf’	‘rbf’	‘rbf’	‘rbf’	‘rbf’
	gamma	‘scale’, 0.01–1	‘scale’	‘scale’	‘scale’	‘scale’	‘scale’
	C	1–100	100	100	100	100	10
	epsilon	0.1–0.5	0.5	0.5	0.5	0.5	0.1

‘sqrt’ is the square-root feature selection strategy used for the RF model; ‘rbf’ is the radial basis function kernel used in the SVM; ‘scale’ is the automatic scaling of the SVM gamma parameter based on the number of input features and their variance.

Table 3. Performances of various models represented by r² and RMSE values.

Regime	Model
	RF						SVM
	Local		FSR_r		FSR		Local		FSR_r		FSR
	r²	RMSE	r²	RMSE	r²	RMSE	r²	RMSE	r²	RMSE	r²	RMSE
FSR					0.54	57.71					0.46	62.91
SR	0.26	41.73	0.26	41.72	0.26	41.83	0.23	42.54	0.23	42.68	0.21	43.00
FR	0.17	90.93	0.17	90.86	0.16	90.99	0.13	92.86	0.14	92.39	0.13	93.10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Adams, L.B.; Hayakawa, Y.S. Evaluating Disturbance Regime Stratification for Aboveground Biomass Estimation in a Heterogeneous Forest Landscape: Insights from the Atewa Landscape, Ghana. Remote Sens. 2026, 18, 765. https://doi.org/10.3390/rs18050765

AMA Style

Adams LB, Hayakawa YS. Evaluating Disturbance Regime Stratification for Aboveground Biomass Estimation in a Heterogeneous Forest Landscape: Insights from the Atewa Landscape, Ghana. Remote Sensing. 2026; 18(5):765. https://doi.org/10.3390/rs18050765

Chicago/Turabian Style

Adams, Lukman B., and Yuichi S. Hayakawa. 2026. "Evaluating Disturbance Regime Stratification for Aboveground Biomass Estimation in a Heterogeneous Forest Landscape: Insights from the Atewa Landscape, Ghana" Remote Sensing 18, no. 5: 765. https://doi.org/10.3390/rs18050765

APA Style

Adams, L. B., & Hayakawa, Y. S. (2026). Evaluating Disturbance Regime Stratification for Aboveground Biomass Estimation in a Heterogeneous Forest Landscape: Insights from the Atewa Landscape, Ghana. Remote Sensing, 18(5), 765. https://doi.org/10.3390/rs18050765

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Evaluating Disturbance Regime Stratification for Aboveground Biomass Estimation in a Heterogeneous Forest Landscape: Insights from the Atewa Landscape, Ghana

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Pre-Processing

2.2.1. GEDI AGBD Data

2.2.2. Earth Observation Data

2.2.3. Human-Induced Disturbance Data

2.3. Machine Learning Algorithms

2.3.1. Random Forest

2.3.2. Support Vector Machine

2.4. Model Building and Evaluation

3. Results

3.1. Feature Selection

3.2. AGB Model Comparisons

3.3. Influence of Human-Induced Disturbances on Model Prediction Errors

3.4. AGB Spatial Distribution

4. Discussion

4.1. The Effect of Regime Stratification on Predicting AGB

4.2. Influence of Human-Induced Disturbances on Predicting AGB

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI