Geolocation-Corrected UAV–GEDI Bridging Samples and Stacking Ensemble Models for Regional AGB Mapping in Subtropical Mountainous Forests of Simao District, Yunnan

Yang, Haiyun; Dong, Wenquan; Zhang, Wangfei; Hu, Jiaqi; Ji, Yongjie

doi:10.3390/rs18111796

Open AccessArticle

Geolocation-Corrected UAV–GEDI Bridging Samples and Stacking Ensemble Models for Regional AGB Mapping in Subtropical Mountainous Forests of Simao District, Yunnan

by

Haiyun Yang

^1,2,

Wenquan Dong

³

,

Wangfei Zhang

^1,2

,

Jiaqi Hu

^1,2 and

Yongjie Ji

^4,5,6,*

¹

College of Forestry, Southwest Forestry University, Kunming 650224, China

²

Key Laboratory for Forest Resources Conservation and Utilization in the Southwest Mountains of China, Ministry of Education, Southwest Forestry University, Kunming 650224, China

³

Department of Earth and Environmental Sciences, Lund University, 22100 Lund, Sweden

⁴

College of Soil and Water Conservation, Southwest Forestry University, Kunming 650224, China

⁵

Key Laboratory of Ecological Environment Evolution and Pollution Control in Mountainous & Rural Areas of Yunnan Province, Kunming 650224, China

⁶

Zhanyi Karst Ecosystem Observation and Research Station, Kunming 650224, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(11), 1796; https://doi.org/10.3390/rs18111796

Submission received: 23 April 2026 / Revised: 27 May 2026 / Accepted: 29 May 2026 / Published: 1 June 2026

(This article belongs to the Special Issue Advances in Forest Aboveground Biomass Mapping Using Multi-Source Remote Sensing and Machine Learning)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A geolocation-corrected UAV–GEDI bridging framework was developed to construct footprint-scale AGB labels, and the Stacking ensemble showed the best internal cross-validation performance among candidate footprint-level models.
For EBKRP-based continuous mapping, the Landsat TCW, Sentinel-2 IRECI, and Sentinel-1 VH/VV ratio covariate scheme achieved the best external validation performance, indicating the importance of complementary optical, red-edge, and radar information.

What are the implications of the main findings?

The proposed UAV–GEDI bridging workflow provides a feasible methodological reference for regional forest AGB mapping in complex mountainous environments supported by spaceborne LiDAR.
The EBKRP covariate comparison suggests that integrating complementary optical, red-edge, and radar information can improve continuous biomass mapping, while sample imbalance and uncertainty propagation should be considered when interpreting map accuracy.

Abstract

Accurate mapping of aboveground biomass (AGB) in mountainous forests is essential for carbon stock assessment and ecological management, yet remains challenging due to the difficulty of linking local high-precision observations with regionally continuous coverage. To address this issue, we developed a hierarchical framework integrating local reference construction, UAV–GEDI bridging, footprint-level modeling, and regional continuous mapping, applied to the mountainous forests of Simao District, Pu’er City, Yunnan Province, China. Field plot measurements and UAV-borne LiDAR data were first used to construct a local AGB reference product, which was then transferred to the GEDI footprint scale through geolocation correction and footprint-scale quality control, yielding 252 valid bridging samples across three UAV flight zones, with approximately 65% originating from the TYH zone. Among five candidate models evaluated for GEDI footprint-level AGB estimation, the Stacking ensemble model performed best, with a pooled out-of-fold R² of 0.736 and RMSE of 24.15 Mg ha⁻¹, and was subsequently applied to 89,579 GEDI footprints across the study area. For regional continuous mapping, the empirical Bayesian kriging regression prediction (EBKRP) scheme combining Landsat TCW, Sentinel-2 IRECI, and the Sentinel-1 polarization ratio achieved the best external validation performance, with R² of 0.622 and RMSE of 26.05 Mg ha⁻¹ based on 61 independent field plots. These results indicate that the proposed hierarchical framework effectively bridges local high-precision observations and regional continuous AGB mapping in complex mountainous forest environments, offering a systematic methodological reference for GEDI-based forest carbon monitoring.

Keywords:

forest above-ground biomass; GEDI; UAV LiDAR; bridging samples; Stacking ensemble model; empirical Bayesian kriging

1. Introduction

Forests are major reservoirs of biomass and carbon in terrestrial ecosystems and play critical roles in carbon cycling, biodiversity conservation, and climate regulation. Forest aboveground biomass (AGB) directly reflects the organic matter and carbon stored in aboveground components and serves as a key variable linking forest productivity, ecosystem functioning, and carbon dynamics. Reliable regional AGB maps are essential for forest resource monitoring, carbon accounting, and ecosystem management [1,2].Traditional field-based AGB estimation mainly relies on plot surveys and allometric equations. Although these approaches are biologically interpretable, their application to regional-scale mapping is often constrained by high costs, limited spatial representativeness, and weak extrapolation capability under complex terrain. With the development of remote sensing, optical imagery, synthetic aperture radar (SAR), and light detection and ranging (LiDAR) have become indispensable for large-area AGB estimation, and the synergistic use of multi-source remote sensing data with machine learning has been widely recognized as an effective way to improve estimation accuracy [3,4,5,6].

Among the currently available datasets, GEDI (Global Ecosystem Dynamics Investigation), a spaceborne full-waveform LiDAR mission, provides footprint-scale observations of forest vertical structure and has created new opportunities for regional AGB estimation. However, GEDI observations are acquired as discrete along-track samples rather than wall-to-wall coverage and therefore cannot directly produce continuous biomass maps. In addition, GEDI footprints are subject to non-negligible geolocation errors, and these uncertainties may be further amplified in mountainous forests with strong topographic variation and pronounced structural heterogeneity [7,8]. Under such conditions, footprint positioning errors can substantially affect waveform matching, reference label extraction, and the reliability of structural variables. Consequently, geolocation correction and local calibration based on high-precision reference data are important prerequisites for reliable GEDI-based AGB mapping in complex mountainous environments [7,8,9,10].

To link local observations with regional wall-to-wall variables, previous studies have explored upscaling or bridging strategies in which airborne or UAV LiDAR serves as an intermediate layer for transferring field plot information to larger-scale remote sensing observations [11,12,13,14]. With the availability of GEDI observations, recent studies have further combined GEDI footprints with wall-to-wall optical imagery or other Earth observation variables for large-area forest canopy height and biomass mapping, such as global canopy-height mapping based on GEDI–Landsat or GEDI–Sentinel-2 fusion and regional AGB mapping using GEDI-derived estimates and continuous remote sensing covariates [15,16]. These studies have demonstrated the value of GEDI as an intermediate sampling layer for regional forest parameter estimation. However, in complex mountainous forests, associated issues such as footprint geolocation error, support-domain consistency between local reference data and GEDI observations, and reference-label reliability remain insufficiently integrated into a unified bridging framework. Therefore, how to stably transfer locally modeled AGB estimates to the GEDI footprint scale while ensuring spatial correspondence and label reliability remains a key challenge in GEDI-based regional AGB mapping.

At the GEDI footprint scale, machine learning methods have been widely used to capture the complex nonlinear relationships between remote sensing variables and forest biomass [5,17,18]. In particular, Stacking ensembles have shown potential for improving predictive robustness by integrating heterogeneous learners from different modeling paradigms [19,20]. Unlike single-family ensemble methods such as RF and XGBoost, Stacking combines the outputs of linear, kernel-based, and tree-based models through a meta-learner. For GEDI footprint-level AGB modeling, where the sample size is limited and predictors include structural, optical, and radar variables, this strategy may help reduce model-family-specific bias and improve robustness. Therefore, Stacking was included as one of the candidate models and was compared with MLR, RF, SVR, and XGBoost under the same feature-selection and cross-validation framework. Discrete GEDI-based AGB predictions need to be further extended to continuous raster surfaces for regional mapping. In this study, EBKRP was adopted because it can incorporate wall-to-wall covariates while accounting for local spatial autocorrelation and uncertainty in semivariogram estimation, which is relevant for AGB mapping in heterogeneous mountainous forests [21,22,23]. Compared with purely deterministic interpolation or non-spatial machine-learning prediction, EBKRP provides a geostatistical framework for combining point-based predictions with continuous background variables. Therefore, it was used here as the spatial upscaling method for regional continuous AGB mapping. Nevertheless, systematic benchmarking against alternative methods, such as regression kriging, geographically weighted regression, RF-based spatial prediction, and IDW, remains a valuable direction for future work. Nevertheless, the complementary contributions and redundancy effects of different wall-to-wall covariates in this stage still require systematic evaluation under independent field-plot constraints.

Simao District, Pu’er City, Yunnan Province, is a representative mountainous forest region characterized by complex terrain, high forest coverage, and strong spatial heterogeneity in forest structure and AGB. These characteristics make it well suited for testing the full methodological chain of local high-precision reference construction, GEDI footprint bridging, footprint-level modeling, and regional continuous mapping under complex mountainous conditions. Accordingly, this study develops a hierarchical AGB estimation framework for the mountainous forests of Simao District with three specific objectives: (1) to construct and evaluate a bridging framework for transferring local transferring locally modeled AGB estimates to the GEDI footprint scale, and to assess the effects of geolocation correction, support-domain consistency, and label quality control on bridging-sample reliability; (2) to compare the complementarity of GEDI structural variables and multi-source wall-to-wall features in footprint-level AGB estimation and identify the optimal model for GEDI footprint-level modeling; and (3) to compare different EBKRP covariate schemes based on study-area-wide GEDI predictions through external validation using independent field plots, so as to determine the optimal continuous mapping strategy for complex mountainous forests. Framework performance is primarily evaluated by external validation R² and RMSE against 61 fully independent field plots, which is the only accuracy metric independent of the training process.

2. Materials and Methods

2.1. Study Area and Field Plot Data

The study area is located in Simao District, Pu’er City, southern Yunnan Province, China (22°27′–23°06′N, 100°19′–101°27′E; Figure 1). The region is characterized by complex topography and geomorphology, with an overall pattern of higher elevation in the northwest and lower elevation in the southeast, and an uplifted central part, exhibiting typical middle-mountain and deep-valley landforms. Elevation ranges from 547 m at Xiaoganlanba along the Lancang River to 2143 m at Dalu Mountain, with a maximum relative elevation difference of 1596 m, indicating pronounced terrain variation. Simao District has a low-latitude plateau subtropical monsoon climate, with a mean annual temperature of 18.9 °C and a mean annual precipitation of 1487.5 mm, providing favorable hydrothermal conditions [24]. The forest coverage of the study area exceeds 70%, and the main vegetation types include monsoon evergreen broad-leaved forest and warm coniferous forest. Among them, Pinus kesiya var. langbianensis is the dominant tree species, accounting for approximately 33% of the total area. The combined effects of complex topographic variation, pronounced elevational gradients, and diverse forest types result in strong spatial heterogeneity in forest structure and aboveground biomass (AGB), making this area well suited for regional-scale remote sensing estimation and continuous mapping of AGB in mountainous forests. Within the study area, UAV LiDAR data were acquired across four flight zones, designated as STZ, TYH, WZS, and PUER, which together cover the major forest types and terrain conditions representative of the broader study area.

Field surveys were conducted in December 2020, and a total of 96 plots were collected (Figure 2). Among them, 35 plots overlapped with the UAV flight area and were used to construct the local AGB reference product, while the remaining 61 plots were reserved exclusively for external validation of the final regional continuous mapping results.

Within each plot, all trees with diameter at breast height (DBH) ≥ 5 cm were measured, and the main recorded variables included DBH, tree height, and canopy closure. Individual-tree AGB was calculated using the species-specific allometric equations listed in Table 1, and plot-level AGB density was obtained by summing individual-tree biomass and converting it to an area-based basis [25]. The plots covered the major forest types in the study area, including coniferous, broad-leaved, and mixed coniferous–broad-leaved forests.

2.2. Data Acquisition and Preprocessing

2.2.1. UAV LiDAR Data

The UAV LiDAR data used in this study were acquired in December 2020 using a Riegl VUX-1UAV sensor (RIEGL Laser Measurement Systems GmbH, Horn, Austria), with a maximum pulse repetition rate of 550 kHz, a field of view of 330°, and an average point cloud density of approximately 38 pts m⁻². The raw point clouds were preprocessed through attitude correction, noise removal, coordinate transformation, strip mosaicking, and system error correction. Ground-based classification and elevation normalization were then performed before feature extraction to ensure that the derived height and canopy structural metrics represented forest vertical structure rather than topographic variation [26,27,28].

2.2.2. GEDI Data

GEDI L1B, L2A, and L2B data covering the study area from January to December 2020 were downloaded from Earthdata and reprojected to WGS 84/UTM Zone 47N. GEDI L1B waveforms were used for geolocation correction, L2A data were used for quality screening, and L2B data were used to extract footprint-level structural variables, including RH100, canopy cover, plant area index, and foliage height diversity (Table 2) [7,29]. Only footprints satisfying the following criteria were retained for subsequent analysis: data quality flag = 1, degrade flag = 0, sensitivity ≥ 0.9, waveform assessment flag = 0, and relative elevation difference of less than 50 m between the GEDI footprint center and the surrounding terrain within a 25 m neighborhood. For each qualified footprint, a circular buffer with a radius of 12.5 m was established to approximate the GEDI footprint extent and to serve as the common spatial support domain for subsequent reference-label extraction and feature summarization [7,9,29].

The four L2B variables were selected to represent complementary dimensions of canopy structure: RH100 for maximum canopy height, cover for vertical closure, PAI for total plant area, and FHD_norm for vertical structural diversity. Although RH98 is generally preferred over RH100 as a top-of-canopy height proxy due to its lower sensitivity to noise returns, RH100 exhibited stronger correlation with plot-level AGB in preliminary screening under the dense closed-canopy conditions of this study area, where high-sensitivity quality filtering (sensitivity ≥ 0.9) substantially reduced the occurrence of anomalous maximum returns. Other L2B metrics, including RH50, RH75, and cover_z profile statistics, were excluded based on lower and more redundant explanatory power relative to the four retained variables in preliminary correlation screening.

2.2.3. Multi-Source Wall-to-Wall Remote Sensing Data

To support study-area-wide extrapolation and continuous mapping, Sentinel-1, Sentinel-2, and Landsat 8/9 wall-to-wall data were acquired through Google Earth Engine (GEE). The growing season from May to October 2020 was used to construct multi-source image collections [30]. From Sentinel-1, radar backscatter, polarization derivatives, incidence angle, and texture variables were extracted to characterize scattering intensity, structural variation, and spatial heterogeneity [31]. From Sentinel-2, surface reflectance bands and vegetation indices such as NDVI, EVI, and IRECI were derived to represent canopy spectral and red-edge characteristics [32]. From Landsat 8/9, variables including land surface temperature, tasseled cap wetness, brightness, greenness, and NBR were extracted to describe environmental and spectral background conditions [33]. The multi-source wall-to-wall covariates used in this study are summarized in Table 3.

For optical imagery, cloud and cloud-shadow masking was applied before temporal compositing. Percentile statistics (10%, 50%, and 90%) were used to reduce the influence of residual noise and extreme observations, while radar variables were composited within the same period to generate stable wall-to-wall covariates. To ensure spatial consistency among field plots, UAV-derived rasters, GEDI footprints, and wall-to-wall imagery, all variables associated with GEDI footprints were summarized using the same 12.5 m radius buffer centered on each footprint. This unified support domain was used consistently for reference-label extraction, multi-source feature summarization, and subsequent model input construction, thereby reducing scale mismatch and improving comparability among different data sources [26,29].

May–October 2020 was selected as the compositing window to represent growing-season canopy conditions and reduce cloud contamination in the optical image collections. However, this window does not fully coincide with the December 2020 field survey, and temporal inconsistencies may therefore exist between the remote-sensing covariates and field-measured AGB, particularly for optical variables affected by seasonal canopy variation. This temporal inconsistency is further addressed in Section 4.4.

2.3. Methodology

To achieve regional-scale continuous mapping of AGB in mountainous forests, this study developed a hierarchical framework consisting of local reference construction, UAV–GEDI bridging, footprint-level modeling, and regional continuous extrapolation (Figure 3). First, a local AGB reference product was generated within the UAV-covered area from field plots and UAV LiDAR data. Second, GEDI geolocation correction, quality control, and footprint-scale label extraction were integrated to construct UAV–GEDI bridging samples containing reference AGB labels, GEDI structural variables, and multi-source wall-to-wall features. Third, GEDI footprint-level AGB models were developed and compared under a unified RF–RFECV feature-selection framework to identify the optimal point-level model. Finally, the optimal model was applied to all quality-controlled GEDI footprints across the study area to generate point-level AGB predictions, which were further combined with wall-to-wall covariates using empirical Bayesian kriging regression prediction (EBKRP) to produce a continuous AGB map. The final mapping schemes were evaluated and compared using 61 independent field plots.

2.3.1. Construction of the Local AGB Reference Product in the UAV-Covered Area

Because UAV LiDAR coverage was spatially limited, a local AGB reference product was first constructed within the UAV-covered area and used as the reference source for subsequent GEDI footprint-level modeling. Based on normalized UAV LiDAR point clouds, candidate variables describing canopy vertical structure were extracted, including height dispersion metrics, canopy relief ratio, accumulated interquartile height metrics, mean and extreme heights, percentile heights, skewness, and kurtosis (Table 4) [26,27,28]. Pearson correlation analysis and multicollinearity diagnosis were then used to identify variables with relatively strong relationships to plot-level AGB while reducing redundancy among candidate features.

Given the limited number of field plots available within the UAV-covered area and the fact that the local reference product served as an intermediate layer rather than the final mapping output, a simple linear regression model was adopted to construct the local AGB reference product. The selected UAV LiDAR feature(s) were used as predictors, and the measured AGB values from the 35 plots overlapping the UAV flight area were used as the response variable. The optimal linear relationship was fitted using ordinary least squares and then applied to generate a local AGB raster within the UAV-covered area. Model performance was assessed using R² and RMSE.

2.3.2. Construction of UAV–GEDI Bridging Samples

Because GEDI footprints lack directly corresponding field-measured or UAV-derived AGB reference values, a UAV–GEDI bridging procedure was introduced to construct supervised samples for footprint-level modeling. Each bridging sample was formed by spatially linking a GEDI footprint with the local AGB reference product within the UAV-covered area, and simultaneously associating it with the corresponding GEDI structural variables and multi-source wall-to-wall features.

Prior to AGB reference value extraction, GEDI footprint positions were corrected to reduce spatial mismatch caused by horizontal geolocation error. Orbit-scale systematic offsets were estimated by matching GEDI L1B waveforms against simulated waveforms derived from high-density UAV LiDAR data within a local search window of 10 m radius, which was chosen to balance the expected magnitude of GEDI horizontal error against the risk of false matching [9,10]. The mean offset estimated from valid matched footprints within each orbit track was then applied uniformly to correct all footprints along that track. Only orbit tracks with a sufficient number of valid matched footprints were used for offset estimation, and correction quality was assessed by comparing waveform similarity before and after correction.

After geolocation correction, a circular buffer with a radius of 12.5 m was established for each GEDI footprint to approximate its spatial support domain. The reference AGB label for each footprint was computed as the mean of all valid UAV pixels within the buffer, as shown in Equation (1) [7,12].

A G B_{r e f, i} = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} A G B_{i j}

(1)

where

A G B_{r e f, i}

is the reference AGB label of the i-th GEDI footprint,

A G B_{i j}

is the AGB value of the j-th valid UAV pixel within the footprint buffer, and

n_{i}

is the number of valid pixels.

Because insufficient UAV coverage or excessive local heterogeneity within a footprint buffer may compromise label reliability, quality assessment was applied before finalizing the bridging sample set. Four quality indicators were calculated for each footprint as follows.

R_{v a l i d, i} = \frac{n_{i}}{N_{i}}

(2)

R_{o v e r l a p, i} = \frac{A (F_{i} \cap U)}{A (F_{i})}

(3)

σ_{A G B, i} = \sqrt{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} {(A G B_{i j} - A G B_{r e f, i})}^{2}}

(4)

C V_{A G B, i} = \frac{σ_{A G B, i}}{A G B_{r e f, i}}

(5)

where

N_{i}

denotes the total number of pixels within the clipped window of the i-th footprint,

F_{i}

represents the buffer of the i-th footprint,

U

denotes the valid UAV coverage area, and

A (\cdot)

represents area. Specifically,

R_{v a l i d, i}

(Equation (2)) is the valid-pixel ratio, defined as the proportion of valid UAV pixels within the footprint window;

R_{o v e r l a p, i}

(Equation (3)) is the geometric overlap ratio, measuring the fractional area of the footprint buffer covered by valid UAV data;

σ_{A G B, i}

(Equation (4)) is the local standard deviation of UAV-derived AGB values within the buffer, characterizing internal AGB dispersion; and

C V_{i}

(Equation (5)) is the coefficient of variation, reflecting the relative variability of AGB within the footprint. Only footprints simultaneously satisfying the thresholds for

R_{o v e r l a p, i}

and

R_{v a l i d, i}

were retained in the final bridging sample set [7,9]. Both thresholds were set to 0.5 based on preliminary sensitivity analysis and with reference to previous UAV–GEDI upscaling studies [12,13], and a minimum valid-pixel constraint was additionally imposed to avoid assigning reference values from only a few local pixels.

2.3.3. GEDI Footprint-Level AGB Modeling

After the bridging samples were established, GEDI structural variables and multi-source wall-to-wall remote sensing features were combined to form the candidate feature set for footprint-level AGB modeling. The candidate variables mainly included three groups: GEDI structural variables, Sentinel-1 radar features, and wall-to-wall optical features derived from Sentinel-2 and Landsat 8/9. To ensure consistency with the GEDI footprint scale, all wall-to-wall variables were summarized as mean values within the same 12.5 m radius footprint buffer used during bridging [7,30,31,32,33], as described in Equation (6).

X_{i} = \frac{1}{m_{i}} \sum_{k = 1}^{m_{i}} x_{i k}

(6)

where

X_{i}

is the summarized feature value of the i-th sample,

x_{i k}

is the value of the k-th valid pixel within the corresponding buffer, and

m_{i}

is the number of valid pixels.

Considering the limited number of bridging samples and the potential redundancy among candidate variables, a 10-fold cross-validation framework was adopted for model evaluation. In each outer split, nine folds were used for training and one fold for validation. Within each training fold, recursive feature elimination with cross-validation (RFECV) based on a random forest estimator was used to identify the optimal feature subset. Random forest was selected as the RFECV base estimator because it is insensitive to variable scale, can handle nonlinear relationships and heterogeneous inputs, and provides relatively stable feature-importance rankings [34,35]. This fold-wise feature-selection strategy allowed variable selection to be incorporated directly into model evaluation, thereby improving the robustness of model comparison and selection.

Following feature selection, five candidate models were compared to systematically evaluate the suitability of different modeling paradigms for GEDI footprint-level AGB estimation in complex mountainous forests. Multiple linear regression (MLR) was included as a linear parametric baseline to assess the linear explanatory power of the selected features. Random forest (RF) and extreme gradient boosting (XGBoost) were selected as representative tree-based nonlinear learners capable of handling high-dimensional inputs and complex variable interactions [36,37]. Support vector regression (SVR) was included for its strong generalization potential under small-sample, high-dimensional conditions through kernel-based implicit mapping [38]. A two-level Stacking ensemble model was further constructed to integrate the complementary strengths of the above heterogeneous base learners, with the expectation that secondary fusion could reduce individual model bias and improve overall predictive stability [19,20]. The corresponding model formulations are given in Equations (7)–(11).

y_{i} = β_{0} + \sum_{j = 1}^{p^{*}} β_{j} x_{i j} + ε_{i}

(7)

{\hat{f}}_{R F} (x) = \frac{1}{B} \sum_{b = 1}^{B} T_{b} (x)

(8)

\min_{w, b, ξ_{i}, ξ_{i}^{*}} \frac{1}{2} ∥ w ∥^{2} + C \sum_{i = 1}^{n} (ξ_{i} + ξ_{i}^{*})

(9)

L^{(t)} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})) + Ω (f_{t})

(10)

{\hat{y}}_{i}^{s t a c k} = g ({\hat{y}}_{i 1}, {\hat{y}}_{i 2}, {\hat{y}}_{i 3}, {\hat{y}}_{i 4})

(11)

where Equation (7) represents MLR, a linear additive model in which

β_{j}

are regression coefficients and

p^{*}

is the number of selected features; Equation (8) represents RF, which reduces estimation variance by averaging predictions across

B

bootstrap-trained regression trees

T_{b} (x)

; Equation (9) represents SVR, which minimizes a margin-based loss under penalty coefficient

C

with slack variables

ξ_{i}

,

ξ_{i}^{*}

; Equation (10) represents XGBoost, which iteratively adds regression trees

f_{t}

with an explicit regularization term

Ω (f_{t})

to control model complexity; and Equation (11) represents the Stacking model, where

g (\cdot)

is the Ridge meta-learner and

{\hat{y}}_{i 1}

–

{\hat{y}}_{i 4}

are out-of-fold predictions from the four base learners, For the Ridge meta-learner, the regularization parameter λ was tuned within each outer training fold, with λ ∈ {0.01, 0.1, 1, 10, 100}.

In the Stacking model, MLR, RF, SVR, and XGBoost served as first-level learners, and Ridge regression was adopted as the second-level meta-learner. The model-evaluation procedure was implemented within a nested cross-validation framework. At the outer level, 10-fold cross-validation was used to split the bridging samples into training and validation folds, and the validation fold was withheld from all model-training procedures. Within each outer training fold, RFECV-based feature selection and hyperparameter tuning for RF, SVR, and XGBoost were performed using only the training subset. For Stacking, an inner 5-fold cross-validation within the outer training subset was used to generate out-of-fold predictions from the base learners as meta-features for training the Ridge meta-learner. The fitted base learners and meta-learner were then applied to the withheld outer validation fold. This design ensured separation between model tuning and outer-fold evaluation within the random cross-validation framework, thereby reducing target leakage during model comparison.

Key parameters included the number of trees and maximum depth for RF, the penalty coefficient

C

and kernel width

γ

for SVR, and the learning rate, maximum depth, and number of estimators for XGBoost.

To compare the performance of the five candidate models, R², RMSE, MAE, and Bias were adopted as evaluation metrics, with formulas given in Equations (12)–(15).

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(12)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(13)

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(14)

Bias = \frac{1}{n} \sum_{i = 1}^{n} ({\hat{y}}_{i} - y_{i})

(15)

where

y_{i}

is the reference value,

{\hat{y}}_{i}

is the predicted value, and

\bar{y}

is the mean of the reference values. Equations (12)–(15) quantify overall prediction error, explanatory power, average deviation, and systematic bias, respectively.

The average RMSE across the 10 folds was used as the primary basis for model selection, while R², MAE, and Bias were considered jointly for comprehensive comparison. After the optimal model was identified, the same preprocessing and feature-selection procedure was reapplied to the full bridging sample set, and final model training was completed for subsequent study-area-wide GEDI point prediction.

2.3.4. Study-Area-Wide GEDI Point Prediction and Regional Continuous Mapping

EBKRP was selected because it can simultaneously incorporate wall-to-wall explanatory covariates and account for spatial autocorrelation in the residuals, making it well suited for extending discrete point predictions to regionally continuous raster surfaces [39,40]. Previous studies have demonstrated that combining regression-based prediction with kriging interpolation can improve continuous mapping accuracy compared with either approach applied independently, particularly in scenarios involving spatially sparse sample points and continuous background covariates [21,41].

To construct the EBKRP covariate schemes, wall-to-wall variables were selected based on their RFECV selection frequency, random forest importance ranking, and physical interpretability, with redundant or highly correlated variables excluded. A stepwise expansion strategy was then adopted, whereby variables were added sequentially by information-source type to examine the incremental contribution of different data sources to continuous mapping performance.

All EBKRP schemes were implemented in ArcGIS Pro 3.5.2 under a unified software and parameter framework. Internal cross-validation statistics provided by the software, including Mean Error, RMSE, average standard error (ASE), and root mean square standardized error (RMSSE), were used as supplementary references for comparing different schemes. To further evaluate external mapping performance, the 61 independent field plots were used as validation samples. These plots were not involved in local reference construction, UAV–GEDI bridging, or footprint-level model training, and therefore remained fully independent of the preceding modeling process. For each continuous AGB raster, predicted values were extracted at the plot locations and compared with the corresponding plot-based reference AGB values, with external performance evaluated using R², RMSE, MAE, and Bias. Because some support-domain mismatch may still exist between field plots and raster cells, the external validation results were treated as relative comparisons among schemes rather than as absolute accuracy estimates of the final product.

3. Results

3.1. Forest AGB Estimation Results in the UAV Flight Area

3.1.1. Univariate Sensitivity Analysis

The univariate sensitivity analysis showed that different UAV LiDAR height metrics responded differently to plot-level AGB (Figure 4). Percentile-based and extreme-value metrics directly related to canopy height exhibited relatively strong explanatory power, whereas distribution shape descriptors such as skewness and kurtosis showed comparatively weaker correlations. Among all variables, the 95th height percentile (percentile_95th) had the highest Pearson correlation coefficient with AGB (Rr = 0.69), indicating that upper-canopy boundary information was most sensitive to plot-scale biomass variation.

The correlation matrix further revealed substantial collinearity among height-related variables, particularly among maximum height, multiple height percentiles, and accumulated height metrics. For example, the correlation between max and percentile_95th approached 0.99, and percentile_90th and percentile_95th were similarly highly correlated. After jointly considering variable sensitivity to AGB and inter-variable collinearity, percentile_95th was selected as the core predictor for constructing the local AGB reference product.

3.1.2. Linear Regression Model for AGB Estimation

Based on percentile_95th, a univariate linear regression model was constructed for the 35 plots overlapping the UAV flight area. Leave-one-out cross-validation (LOOCV) of the n = 35 regression yielded R² = 0.632, RMSE = 24.02 Mg ha⁻¹, MAE = 19.68 Mg ha⁻¹, and Bias = −0.63 Mg ha⁻¹ (Table 5, Figure 5). The LOOCV scatterplot (Figure 5A) showed that predicted values were generally distributed around the 1:1 line with no pronounced systematic bias, and the LOWESS residual trend (Figure 5B) confirmed an approximately random residual distribution across the prediction range. Overall, the LOOCV R² of 0.632 confirms a robust predictive relationship between percentile_95th and plot-level AGB.

3.2. Results of UAV–GEDI Bridging Sample Construction

3.2.1. GEDI Geolocation Correction

Two representative shot samples were selected to illustrate waveform matching results before and after geolocation correction (Figure 6). For Shot 214601100300311003, representing a stand with relatively complex canopy structure, the simulated waveform showed limited correspondence with the observed waveform prior to correction. After applying the optimal offset (ΔX = −7.39 m, ΔY = 3.06 m), the correlation coefficient increased from 0.399 to 0.777, RMSE decreased from 0.160 to 0.103, and cosine similarity improved from 0.461 to 0.824. For Shot 214601100300311024, representing a stand with more uniform canopy structure, the initial similarity was already reasonable (R = 0.714, RMSE = 0.183, cosine similarity = 0.768); following correction (ΔX = −5.66 m, ΔY = −5.66 m), the main peak alignment improved markedly, with R increasing to 0.946, RMSE decreasing to 0.095, and cosine similarity reaching 0.951. Across both cases, geolocation correction yielded consistent improvements regardless of canopy structural complexity, indicating that it can substantially improve waveform matching in both structurally simple and complex forest stands and thereby provide a more reliable spatial foundation for subsequent bridging sample construction.

3.2.2. Screening Results of Bridging Samples

After overlaying quality-controlled GEDI footprints with the valid UAV coverage area, 331 footprints were found to fall within the UAV-covered region. Following dual screening based on overlap ratio and valid-pixel ratio, 252 valid bridging samples were finally obtained (Table 6). The progressive reduction from 331 spatially co-located footprints to 252 final samples reflects the strict requirements of geometric consistency and local valid-pixel support in high-quality bridging sample construction.

The final bridging samples were distributed across the TYH, STZ, and WZS plot groups (Figure 7), with 164, 62, and 26 samples, respectively; no valid samples were obtained from the PUER group. Across all bridging samples, the AGB range was 69.93–232.14 Mg ha⁻¹ with a mean of 134.85 Mg ha⁻¹, indicating that the sample set covered a relatively complete biomass gradient. No valid bridging samples were obtained from the PUER flight zone. This outcome is attributable to the relatively sparse forest biomass distribution within that zone, which produced high internal AGB heterogeneity within GEDI footprint buffers and caused the majority of PUER footprints to fail the dual quality-control thresholds during screening. However, the current sample set still primarily reflects the existing plot-group distribution rather than strict cross-plot independent generalization, and this imbalance should be considered when interpreting subsequent modeling results.

3.3. Feature Selection Results

Under the unified RF–RFECV framework, the importance ranking and selection frequency of candidate variables exhibited a clear hierarchical pattern (Figure 8). GEDI structural variables dominated the final model: rh100_m and fhd_norm had the highest importance values at 34.6% and 27.8%, respectively, while cover and pai contributed 5.1% and 4.9%. Multi-source wall-to-wall variables provided complementary contributions, with Landsat8/9_TCW_p90 and Landsat8/9_TCW_p50 each at 3.2%, S2_IRECI_p10 at 2.9%, and Sentinel-1 features including S1_VH_VV_Ratio_p10, S1_VH_p50_var, and S1_VH_p50_contrast also ranking relatively high. These results indicate that the high-contribution variables spanned multiple data sources, forming a composite feature system with GEDI structural variables as the core, complemented by radar and optical variables [17,42], a pattern broadly consistent with previous studies combining GEDI structural observations with multi-source wall-to-wall remote sensing for forest AGB modeling.

The RFECV curve showed that cross-validated RMSE declined rapidly as the number of features increased to approximately 10 and then gradually stabilized, reaching the minimum RMSE of 22.676 Mg ha⁻¹ at 19 features (Figure 8b). Therefore, the 19-feature subset was selected according to the minimum cross-validated RMSE criterion. Although a more parsimonious subset of approximately 10 features would be defensible under the one-standard-error rule, retaining 19 features preserved repeatedly selected radar and optical variables that may complement GEDI-derived canopy-structure metrics. The final subset therefore reflects a trade-off between predictive performance, multi-source information complementarity, and model complexity. Feature selection frequency results (Figure 9) confirmed the stability of key variables across folds. Landsat8/9_TCW_p50, S1_angle_p50, rh100_m, fhd_norm, cover, pai, S1_VV_p50, S1_VH_p50_var, S1_VH_VV_Ratio_p10, and Landsat8/9_TCW_p90 were retained in all 10 folds, while S1_VH_p50_2 and S2_IRECI_p10 were selected in 9 folds, and S1_VH_p50_contrast and S1_VH_p50_ent in 8 folds. By contrast, Landsat8/9_NBR_p50, S1_VH_p10, and S2_B12_p50 showed relatively low selection frequencies. Taken together, the retained features formed a hybrid combination with GEDI structural variables as the core and Sentinel-1 radar and Landsat/Sentinel-2 optical variables as complementary components [43,44], providing a unified input space for subsequent multi-model comparison.

3.4. GEDI Footprint-Level AGB Modeling Results

Based on the 252 bridging samples, five candidate models were compared using 10-fold cross-validation. The fold-mean R² and pooled out-of-fold R² describe different aspects of the cross-validation results. The fold-mean R² was calculated by averaging R² values obtained separately for each validation fold, whereas the pooled out-of-fold R² was calculated after concatenating all held-out predictions across the 10 folds. The latter summarizes overall out-of-fold agreement but may be influenced by the uneven spatial distribution of bridging samples. Therefore, both metrics were reported as internal cross-validation summaries rather than independent measures of regional generalization accuracy.

Overall, the five models showed varying degrees of predictive ability for GEDI footprint-level AGB estimation, with an overall ranking of Stacking > RF > XGBoost > SVR > MLR based on the internal cross-validation metrics (Table 7). Among them, Stacking showed the best overall internal cross-validation performance, with a mean R² of 0.702 ± 0.217, RMSE of 23.025 ± 5.127 Mg ha⁻¹, MAE of 17.152 ± 3.289 Mg ha⁻¹, and Bias of 0.798 ± 2.140 Mg ha⁻¹. RF ranked second, with an R² of 0.681 ± 0.115 and RMSE of 25.298 ± 4.857 Mg ha⁻¹. MLR performed weakest, with an R² of 0.521 ± 0.198 and RMSE of 30.755 ± 6.800 Mg ha⁻¹, suggesting that linear models were less able to capture the complex relationships between AGB and the multi-source feature set under the current data conditions. SVR yielded a mean Bias of −5.200 ± 4.386 Mg ha⁻¹, revealing a pronounced systematic underestimation tendency, whereas Stacking had the Bias closest to zero among all models.

The boxplot analysis (Figure 10) further showed that MLR and SVR were associated with higher RMSE levels and greater sensitivity to fold partitioning. XGBoost and RF shifted to lower error ranges, while Stacking showed the lowest median RMSE and a relatively narrow interquartile range among the five models, indicating comparatively stable RMSE performance across folds. However, its R² standard deviation reached 0.217, which was larger than that of RF (0.115), indicating considerable fold-level variability in explanatory power. This variability may be partly associated with the limited sample size and the uneven spatial distribution of bridging samples. Specifically, each validation fold contained approximately 25 samples, while about 65% of all bridging samples originated from TYH; therefore, differences in fold composition may have affected the representativeness of individual validation subsets and produced variable R² estimates.

The observed-versus-predicted scatterplots (Figure 11) confirmed these patterns. MLR exhibited substantial dispersion in the medium-to-high biomass range, and SVR showed consistent underestimation across the full range. XGBoost improved overall fitting trend but retained localized deviations. RF and Stacking produced fits closest to the 1:1 line, with Stacking achieving the most concentrated sample distribution and smallest overall dispersion. Based on per-sample reconstruction from pooled cross-validated predictions, Stacking achieved an R² of 0.736, RMSE of 24.15 Mg ha⁻¹, MAE of 17.60 Mg ha⁻¹, and Bias of 0.81 Mg ha⁻¹, confirming its best overall performance. RF ranked second with R² of 0.710, RMSE of 25.74 Mg ha⁻¹, MAE of 20.46 Mg ha⁻¹, and Bias of −2.10 Mg ha⁻¹. Stacking was therefore identified as the optimal model for subsequent study-area-wide GEDI point-level AGB prediction. These cross-validation results were used primarily for point-level model selection and should not be interpreted as the final accuracy of regional continuous mapping.

3.5. Regional Continuous Mapping Results and Validation

Following model selection, Stacking was applied to all 89,579 quality-controlled GEDI footprints across the study area. All footprints successfully completed feature extraction, preprocessing, and model inference, confirming the reliability of the processing pipeline for study-area-wide extrapolation. The resulting point-level AGB estimates ranged from 25.21 to 226.57 Mg ha⁻¹, with a mean of 116.77 Mg ha⁻¹.

3.5.1. External Validation and Comparison of Continuous AGB Rasters Generated by Different Wall-to-Wall Covariate Schemes

Four covariate schemes were developed and implemented using EBKRP to generate continuous AGB rasters. Rather than selecting variables solely by global importance ranking, covariates were chosen by first identifying the most representative variable within each information-source group—Landsat 8/9, Sentinel-2, and Sentinel-1—based on RFECV selection frequency, random forest importance, and physical interpretability, and then incorporating them sequentially across schemes. Scheme A included only Landsat8/9_TCW_p90 as the baseline optical moisture-background variable; Scheme B extended Scheme A by adding S2_IRECI_p10 as the representative Sentinel-2 red-edge variable; Scheme C further incorporated S1_VH_VV_Ratio_p10 as the representative Sentinel-1 radar variable; and Scheme D built upon Scheme C by additionally including Landsat89_TCW_p50 to test whether a second variable from the same source group could yield further gains. All four rasters were validated against 61 independent field plots (Table 8, Figure 12).

Scheme A performed the weakest, with R² of 0.417 and RMSE of 32.35 Mg ha⁻¹, indicating that a single optical variable was insufficient to represent AGB spatial heterogeneity across the study area. Scheme B showed substantial improvement upon the addition of S2_IRECI_p10, with R² increasing to 0.534 and RMSE decreasing to 28.92 Mg ha⁻¹, reflecting the added value of red-edge vegetation information in characterizing canopy chlorophyll content and physiological status [45]. Scheme C achieved the best overall performance, with R² of 0.622, RMSE of 26.05 Mg ha⁻¹, MAE of 20.29 Mg ha⁻¹, and Bias of −3.50 Mg ha⁻¹, demonstrating that the joint integration of optical moisture-background, red-edge, and radar scattering-structure information most effectively constrained the EBKRP prediction surface [43]. Scheme D, however, showed a measurable degradation relative to Scheme C, with R² declining to 0.581 and RMSE increasing to 27.42 Mg ha⁻¹, suggesting that the additional variable contributed limited new information and may have introduced redundancy into the mapping scheme, consistent with the broader finding that data redundancy can reduce rather than improve biomass model accuracy [46].

Overall, mapping accuracy improved consistently as the covariate set expanded from a single variable to a complementary multi-source combination, but declined when a variable with highly overlapping information content was introduced. Scheme C was therefore identified as the optimal configuration, achieving the highest R², lowest RMSE and MAE, and a relatively small absolute Bias among all candidates.

3.5.2. Internal Cross-Validation Results and Spatial Distribution of the Continuous AGB Raster for the Final Scheme

The point-level AGB prediction was generated for 89,579 quality-controlled GEDI footprints. During EBKRP fitting in ArcGIS Pro 3.5.2, 29 footprints did not meet the valid input requirements after covariate raster extraction and were therefore excluded from the internal diagnostic procedure. Consequently, EBKRP internal cross-validation diagnostics were calculated using 89,550 valid input samples and indicated stable local fitting behavior (Table 9). The Mean Error of 0.006 indicated negligible systematic bias. The RMSE of 14.7 Mg ha⁻¹ and ASE of 15.3 Mg ha⁻¹ were in close agreement, and the RMSSE of 0.9642 confirmed well-calibrated uncertainty estimates. Coverage rates of 91.3% and 95.8% within the 90% and 95% prediction intervals, respectively, along with a mean CRPS of 8.18, further indicated satisfactory probabilistic prediction performance. It should be noted that the EBKRP internal cross-validation RMSE of 14.74 Mg ha⁻¹ was used only as a supplementary diagnostic of interpolation stability within the GEDI-predicted point set. This value should not be interpreted as an independent measure of predictive accuracy against field-measured AGB.

The final continuous AGB map revealed pronounced spatial heterogeneity across the study area (Figure 13). Low-to-medium AGB values accounted for a large proportion of the mapped area, while high-value regions were concentrated within localized forest patches, forming spatially clustered high-AGB zones. Considered alongside the external and internal validation diagnostics, these results indicate that the final continuous raster reasonably captured the broad spatial distribution pattern of forest AGB across the study area.

4. Discussion

4.1. Representativeness of Bridging Samples and the Limits of Model Generalization

The UAV–GEDI bridging mechanism constitutes the core step through which local high-precision reference information was transferred to the GEDI footprint scale. After rigorous quality control, 252 valid bridging samples were retained, spanning an AGB range of 69.93–232.14 Mg ha⁻¹ and covering a gradient from low to high biomass values. Nevertheless, the spatial sources of these samples were markedly imbalanced: 164 samples (65%) originated from the TYH plot group, while STZ and WZS contributed 62 and 26 samples, respectively, and no valid samples were obtained from the PUER group. This spatial concentration implies that model training was predominantly conditioned on forest characteristics represented by TYH, and that extending the trained model to all 89,579 GEDI footprints across the study area inevitably entailed extrapolation to a broader range of forest types and terrain conditions. This limitation fundamentally reflects the constrained spatial coverage of UAV data acquisition, whose considerable cost and restricted flight extent represent the primary bottleneck for sample spatial diversity within the current framework.

At the model evaluation stage, random 10-fold cross-validation was employed to assess footprint-level modeling performance. When training samples exhibit pronounced spatial clustering, however, random partitioning may preserve substantial spatial autocorrelation between training and validation sets, potentially yielding overly optimistic performance metrics. Ploton et al. demonstrated that standard random cross-validation ignoring spatial autocorrelation can substantially overestimate predictive performance in AGB mapping [47], and Wadoux et al. further argued that K-fold cross-validation tends to produce optimistic estimates under spatially clustered sampling conditions [48]. It should be acknowledged, however, that spatial cross-validation may yield overly conservative estimates when the sample distribution diverges substantially from the prediction domain [48]. Given the limited size and spatial concentration of the bridging samples, the random cross-validation results are more appropriately interpreted as an upper-bound estimate of modeling capability rather than a direct measure of study-area-wide generalization accuracy. Accordingly, external validation based on the 61 independent field plots (R² = 0.622, RMSE = 26.05 Mg ha⁻¹) was adopted as the primary evidence for regional mapping accuracy. Although these plots provide the most reliable available evidence under the current framework, their representativeness may still be influenced by spatial imbalance and plot–raster support-domain mismatch.

Among related studies employing UAV or airborne LiDAR as an intermediate bridging layer, Wang et al. demonstrated that UAV LiDAR coverage extent directly constrains the distributional breadth of bridging samples and is a critical determinant of regional mapping reliability [13], and Chen et al. similarly identified spatial representativeness of local observations as a key bottleneck in heterogeneous mountainous forests [37]. Relative to these studies, the present work additionally incorporated GEDI geolocation correction, footprint support-domain harmonization, and label quality control into the bridging mechanism, placing more systematic emphasis on footprint-level label reliability. Future work could improve sample spatial balance by expanding UAV coverage, incorporating multi-site or multi-temporal acquisitions, or integrating airborne LiDAR data.

4.2. Discussion of GEDI Footprint-Level AGB Modeling Results

Among the five candidate models, overall performance was ranked Stacking > RF > XGBoost > SVR > MLR. MLR performed the weakest (R² = 0.521), suggesting that the relationships between AGB and the current multi-source feature set are strongly nonlinear under the present data conditions, consistent with findings reported by Chen et al. in heterogeneous mountainous environments [37]. Tree-based ensemble methods RF (R² = 0.681) and XGBoost (R² = 0.627) demonstrated markedly stronger nonlinear fitting capability, broadly consistent with the general pattern in existing multi-source remote sensing AGB studies [37,49].

The Stacking ensemble model achieved the best overall performance (R² = 0.702, RMSE = 23.025 Mg ha⁻¹, Bias = 0.798 Mg ha⁻¹), and was particularly distinguished by its Bias stability, with a between-fold Bias standard deviation of ±2.140 Mg ha⁻¹—substantially smaller than that of all other models. Among the five candidate models, Stacking showed the Bias value closest to zero and the smallest between-fold Bias variability, indicating relatively stable systematic-error performance across the random cross-validation folds. Prior studies have validated the advantages of stacking ensemble learning in bias control and predictive stability for forest AGB estimation under multi-source remote sensing support [50], and the present results are broadly consistent with this understanding. SVR exhibited a pronounced systematic underestimation tendency (Bias = −5.200 Mg ha⁻¹), possibly related to the behavior of the ε-insensitive loss function in the high-value range; when training labels already carry upstream underestimation from the UAV-based modeling process, SVR may further compress predictions in this range, though this interpretation remains inferential and warrants verification in future work.

Regarding feature contributions, GEDI structural variables played the dominant role, with rh100_m and fhd_norm attaining importance values of 34.6% and 27.8%, respectively. rh100_m directly characterizes forest vertical structure and is closely associated with stand volume and biomass accumulation, with particularly strong predictive contributions in mountainous forests where stand height exhibits substantial spatial variation [51]. fhd_norm quantifies the diversity of canopy leaf area vertical distribution based on the Shannon diversity index, capturing the complexity of vegetation vertical stratification from ground to canopy top [52]. In structurally complex subtropical mixed forests, multilayered canopy organization is strongly associated with total stand biomass accumulation, and the high importance of fhd_norm indicates that vertical structural diversity provides meaningful explanatory power beyond canopy height alone—a capability representing one of the core structural observation advantages of GEDI relative to optical remote sensing.

Beyond GEDI structural variables, the stable retention of Sentinel-1 radar features and Landsat TCW variables confirms consistent complementary value from multi-source wall-to-wall features. The Sentinel-1 VH/VV polarization ratio is sensitive to canopy volume scattering and vertical structure, forming effective information-level complementarity with optical variables [53,54]. The Landsat TCW wetness component integrates near-infrared and shortwave infrared information and has demonstrated strong capacity for characterizing canopy moisture content and forest structural properties [55]. The resulting composite feature system—anchored by GEDI vertical structure and complemented by radar scattering and optical moisture-background information—jointly characterizes AGB spatial variation across three distinct sensing dimensions, further underscoring the advantages of multi-source remote sensing synergy in complex mountainous forest environments [18,42].

4.3. Accuracy Analysis of Regional Continuous Mapping and Sources of Error

Based on study-area-wide GEDI point-level predictions and representative wall-to-wall covariates, four EBKRP covariate schemes were compared, with external validation performance ranked C > B > D > A (R² = 0.622, 0.534, 0.581, and 0.417, respectively). By combining regression analysis with empirical Bayesian kriging, EBKRP can simultaneously constrain the prediction surface through wall-to-wall covariates and exploit the spatial autocorrelation structure of sample points, and in principle offers advantages over either approach applied alone [39,40]. The results confirm that covariate configuration substantially influenced mapping performance, consistent with prior findings [21].

Scheme A (Landsat89_TCW_p90 only) performed the weakest, confirming that a single optical moisture variable is insufficient to characterize AGB spatial differentiation across the study area. The progressive addition of S2_IRECI_p10 (Scheme B) and S1_VH_VV_Ratio_p10 (Scheme C) yielded consistent accuracy improvements. As a red-edge chlorophyll index, S2_IRECI is sensitive to canopy physiological status and reduces signal saturation under high-biomass conditions [56,57], while the S1 VH/VV ratio introduces radar scattering-structure information complementary to optical variables [53,54]. Under the joint constraint of these three variable types, the EBKRP prediction surface characterized AGB spatial distribution from optical moisture background, canopy physiological vigor, and radar scattering perspectives, likely representing one of the main reasons why Scheme C achieved the best external validation performance. The external validation accuracy of Scheme C (R² = 0.622, RMSE = 26.05 Mg ha⁻¹) is comparable to prior studies: Chen et al. reported R² = 0.62–0.72 using GEDI and multi-sensor fusion in heterogeneous mountainous regions [37], and Huang et al. reported R² = 0.60–0.65 across multiple forest types in Yunnan Province [38], indicating that the final mapping accuracy was broadly within the range reported by related AGB mapping studies in complex forest environments. However, this comparison does not isolate the marginal contribution of the UAV–GEDI bridging step, because previous studies differ in reference data sources, validation designs, study areas, and modeling strategies. Quantifying the independent contribution of UAV bridging would require a controlled ablation experiment comparing the full workflow with a baseline model trained using coarser or alternative footprint-level reference labels.

The addition of Landsat89_TCW_p50 in Scheme D led to a decline in external validation accuracy (R² = 0.581). Given that TCW_p50 and TCW_p90 share a high degree of overlapping information content as statistics derived from the same spectral transformation, the additional variable contributed limited independent spatial constraint information and may have increased model fitting instability. This outcome illustrates that when a newly introduced covariate is highly redundant with existing variables, expanding the covariate set does not necessarily improve—and may instead impair—continuous mapping performance, underscoring the importance of prioritizing information complementarity over variable quantity in covariate selection.

Across the entire methodological chain, errors accumulated at each stage: the UAV-based reference product had a fitting RMSE of 21.29 Mg ha⁻¹, the mean fold RMSE of footprint-level modeling was approximately 23.025 Mg ha⁻¹, and the external validation RMSE of the final continuous mapping was 26.05 Mg ha⁻¹. Although these metrics correspond to different targets and evaluation scenarios, their overall increasing trend reflects the cumulative propagation of local modeling error, bridging-label transfer error, point-level model generalization error, and spatial interpolation error across the multilevel workflow. The underestimation tendency in high-value regions during UAV-based modeling may have further propagated into high-biomass areas through bridging-label transfer, representing a plausible contributing factor to the negative Bias observed in certain models.

It should be noted that only the final stage of the pipeline (EBKRP regional mapping) is validated against fully independent data. The intermediate stages—UAV-based reference construction, bridging sample labeling, and footprint-level modeling—are evaluated using fitting or internal CV metrics only. Each stage contributes its own prediction uncertainty to the downstream pipeline, and the cumulative effect of these errors is only partially captured by the final external validation.

4.4. Limitations and Future Perspectives

Several limitations in the methodological framework and result interpretation warrant acknowledgment.

First, regarding temporal consistency, the field survey was conducted in December 2020, GEDI data spanned the full year of 2020, and remote sensing imagery was composited over the growing season from May to October 2020. These differing acquisition windows introduced temporal mismatches that may have generated seasonal inconsistencies between plot-based AGB measurements and remote sensing observations, particularly for optical imagery subject to strong seasonal variation.

Second, at the level of GEDI geolocation correction, footprints outside the UAV-covered area could not be directly corrected through waveform matching and were instead uniformly adjusted using the mean orbit-level offset. Under complex terrain conditions, the spatial variability of geolocation error may be substantial, and the associated uncertainty cannot be entirely eliminated. The uncertainty discussed here primarily stems from the limitations of the uniform orbit-offset assumption outside the UAV coverage zone, rather than from a fundamental failure of the correction procedure itself.

Third, a scale mismatch persisted between field plots and the continuous mapping raster, and both support-domain discrepancies and geolocation errors may have affected the representativeness of validation outcomes. The spatial distribution and AGB gradient coverage of the 61 independent validation plots may further limit the generalizability of the external validation results, which are more appropriately interpreted as a basis for relative comparison among mapping schemes than as a direct characterization of the absolute accuracy of the final regional product.

The gap between EBKRP internal cross-validation RMSE and external validation RMSE indicates that internal kriging diagnostics may be overly optimistic relative to independent field-plot validation. Therefore, the final map accuracy in this study should be interpreted primarily based on the 61 independent validation plots.

Looking ahead, several directions merit further investigation. Improving the spatial representativeness of bridging samples—through expanded UAV coverage, multi-site acquisitions, or integration of airborne LiDAR—would strengthen the regional generalization capacity of the modeling framework. Adopting spatial cross-validation strategies that explicitly account for plot independence could yield more conservative and realistic accuracy estimates. Alternative spatial extrapolation approaches such as geographically weighted regression and random forest-based spatial prediction could be systematically benchmarked against EBKRP. More broadly, evaluating the transferability of the proposed methodological chain across diverse mountainous forest types and geographic regions remains an important step toward providing a generally applicable reference for regional forest carbon stock monitoring.

5. Conclusions

This study developed a hierarchical forest AGB estimation framework integrating local high-precision reference construction, UAV–GEDI bridging, footprint-level modeling, and regional continuous extrapolation, with application to the mountainous forests of Simao District, Pu’er City, Yunnan Province, China.

The UAV–GEDI bridging framework provided a feasible pathway for transferring local reference information to the GEDI footprint scale. Through geolocation correction, footprint support-domain harmonization, and label quality control, 252 valid bridging samples were obtained. In representative complex and simple canopy cases, waveform-matching correlation coefficients increased from 0.399 to 0.777 and from 0.714 to 0.946, respectively, indicating that geolocation correction substantially improved spatial matching quality and provided a relatively reliable sample foundation for subsequent modeling.

At the GEDI footprint level, the Stacking model showed the best overall internal cross-validation performance among the five candidate models, with a pooled out-of-fold R² of 0.736 and RMSE of 24.15 Mg ha⁻¹. However, this result should be interpreted as an internal model-comparison metric rather than as an independent measure of regional generalization accuracy. For the final continuous AGB map, independent validation using 61 reserved field plots yielded R² = 0.622 and RMSE = 26.05 Mg ha⁻¹, which was used as the primary accuracy benchmark. Future work should incorporate spatially stratified or leave-one-zone-out validation based on a larger and more balanced UAV–GEDI bridging sample set. For regional continuous mapping, the covariate scheme combining Landsat TCW, Sentinel-2 IRECI, and the Sentinel-1 polarization ratio achieved the best external validation performance, jointly constraining the EBKRP prediction surface from optical moisture-background, canopy physiological, and radar scattering perspectives. The decline in accuracy observed when a redundant covariate was introduced further underscores the importance of information complementarity over variable quantity in covariate selection.

Overall, the proposed framework provides a systematic methodological reference for regional AGB mapping and related carbon stock assessment in complex mountainous forest environments supported by GEDI spaceborne LiDAR.

Author Contributions

Conceptualization, H.Y. and W.Z.; methodology, H.Y.; software, H.Y.; validation, H.Y., W.D. and J.H.; formal analysis, H.Y.; investigation, H.Y. and J.H.; resources, W.Z. and Y.J.; data curation, H.Y. and J.H.; writing—original draft preparation, H.Y.; writing—review and editing, W.D., W.Z. and Y.J.; visualization, H.Y.; supervision, W.Z. and Y.J.; project administration, W.Z. and Y.J.; funding acquisition, W.Z. and Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant numbers 32371869, 32471865, and 32260720) and the Yunnan Fundamental Research Projects (grant number 202401BB070001-021). Additional support was provided by the Key Laboratory for Forest Resources Conservation and Utilization in the Southwest Mountains of China, Ministry of Education, and the Key Laboratory for Conservation and Utilization of In-forest Resource of Yunnan (grant number LXXK-2025D19).

Data Availability Statement

The GEDI L1B, L2A, and L2B data used in this study are publicly available through the NASA Earthdata platform at https://earthdata.nasa.gov (accessed on 28 May 2026). Sentinel-1 and Sentinel-2 data are freely accessible via the Copernicus Data Space Ecosystem at https://dataspace.copernicus.eu (accessed on 28 May 2026). Landsat 8/9 data are publicly available through the United States Geological Survey (USGS) EarthExplorer at https://earthexplorer.usgs.gov (accessed on 28 May 2026). All of the above datasets were also accessed and processed via Google Earth Engine (GEE) at https://earthengine.google.com (accessed on 28 May 2026). The UAV LiDAR data and field plot survey data used in this study are not publicly available due to data acquisition restrictions, but may be made available upon reasonable request to the corresponding author.

Acknowledgments

The authors sincerely thank the NASA Earthdata platform for providing GEDI L1B, L2A, and L2B data, whose footprint-scale observations of forest vertical structure formed the foundation of the proposed framework. The authors also gratefully acknowledge Google Earth Engine (GEE) for providing cloud-based access to multi-source remote sensing imagery and the computational infrastructure that enabled efficient large-area feature extraction. The European Space Agency (ESA) and the United States Geological Survey (USGS) are acknowledged for making Sentinel-1, Sentinel-2, and Landsat 8/9 data freely available, which were essential to the regional continuous AGB mapping component of this study. The authors sincerely thank the field data collection and UAV-LiDAR data acquisition teams at Southwest Forestry University for their support in data collection. During the preparation and revision of this manuscript, the authors used AI-assisted language tools to improve English readability, grammar, sentence structure, and clarity of expression. These tools were not used to generate research data, conduct statistical analyses, produce results, create figures or tables, interpret findings, or draw scientific conclusions. All AI-assisted outputs were carefully reviewed, edited, and verified by the authors. The authors take full responsibility for the accuracy, originality, and integrity of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AGB	Aboveground Biomass
UAV	Unmanned Aerial Vehicle
GEDI	Global Ecosystem Dynamics Investigation
LiDAR	Light Detection and Ranging
SAR	Synthetic Aperture Radar
EBKRP	Empirical Bayesian Kriging Regression Prediction
RF	Random Forest
RFECV	Recursive Feature Elimination with Cross-Validation
MLR	Multiple Linear Regression
SVR	Support Vector Regression
XGBoost	Extreme Gradient Boosting
DBH	Diameter at Breast Height
NDVI	Normalized Difference Vegetation Index
EVI	Enhanced Vegetation Index
IRECI	Inverted Red-edge Chlorophyll Index
NBR	Normalized Burn Ratio
TCW	Tasseled Cap Wetness
TCB	Tasseled Cap Brightness
TCG	Tasseled Cap Greenness
LST	Land Surface Temperature
GEE	Google Earth Engine
RMSE	Root Mean Square Error
MAE	Mean absolute error
CV	Coefficient of Variation
ASE	Average Standard Error
RMSSE	Root Mean Square Standardized Error
CRPS	Continuous Ranked Probability Score

References

Bonan, G.B. Forests and Climate Change: Forcings, Feedbacks, and the Climate Benefits of Forests. Science 2008, 320, 1444–1449. [Google Scholar] [CrossRef] [PubMed]
Pan, Y.; Birdsey, R.A.; Fang, J.; Houghton, R.; Kauppi, P.E.; Kurz, W.A.; Phillips, O.L.; Shvidenko, A.; Lewis, S.L.; Canadell, J.G.; et al. A Large and Persistent Carbon Sink in the World’s Forests. Science 2011, 333, 988–993. [Google Scholar] [CrossRef]
Lu, D. The potential and challenge of remote sensing-based biomass estimation. Int. J. Remote Sens. 2007, 27, 1297–1328. [Google Scholar] [CrossRef]
Lu, D.; Chen, Q.; Wang, G.; Liu, L.; Li, G.; Moran, E. A survey of remote sensing-based aboveground biomass estimation methods in forest ecosystems. Int. J. Digit. Earth 2014, 9, 63–105. [Google Scholar] [CrossRef]
Tian, L.; Wu, X.; Tao, Y.; Li, M.; Qian, C.; Liao, L.; Fu, W. Review of Remote Sensing-Based Methods for Forest Aboveground Biomass Estimation: Progress, Challenges, and Prospects. Forests 2023, 14, 1086. [Google Scholar] [CrossRef]
Ma, T.; Zhang, C.; Ji, L.; Zuo, Z.; Beckline, M.; Hu, Y.; Li, X.; Xiao, X. Development of forest aboveground biomass estimation, its problems and future solutions: A review. Ecol. Indic. 2024, 159, 111653. [Google Scholar] [CrossRef]
Dubayah, R.; Blair, J.B.; Goetz, S.; Fatoyinbo, L.; Hansen, M.; Healey, S.; Hofton, M.; Hurtt, G.; Kellner, J.; Luthcke, S.; et al. The Global Ecosystem Dynamics Investigation: High-resolution laser ranging of the Earth’s forests and topography. Sci. Remote Sens. 2020, 1, 100002. [Google Scholar] [CrossRef]
Duncanson, L.; Kellner, J.R.; Armston, J.; Dubayah, R.; Minor, D.M.; Hancock, S.; Healey, S.P.; Patterson, P.L.; Saarela, S.; Marselis, S.; et al. Aboveground biomass density models for NASA’s Global Ecosystem Dynamics Investigation (GEDI) lidar mission. Remote Sens. Environ. 2022, 270, 112845. [Google Scholar] [CrossRef]
Tang, H.; Stoker, J.; Luthcke, S.; Armston, J.; Lee, K.; Blair, B.; Hofton, M. Evaluating and mitigating the impact of systematic geolocation error on canopy height measurement performance of GEDI. Remote Sens. Environ. 2023, 291, 113571. [Google Scholar] [CrossRef]
Xu, Y.; Ding, S.; Chen, P.; Tang, H.; Ren, H.; Huang, H. Horizontal Geolocation Error Evaluation and Correction on Full-Waveform LiDAR Footprints via Waveform Matching. Remote Sens. 2023, 15, 776. [Google Scholar] [CrossRef]
Nelson, R.; Margolis, H.; Montesano, P.; Sun, G.; Cook, B.; Corp, L.; Andersen, H.-E.; deJong, B.; Pellat, F.P.; Fickel, T.; et al. Lidar-based estimates of aboveground biomass in the continental US and Mexico using ground, airborne, and satellite observations. Remote Sens. Environ. 2017, 188, 127–140. [Google Scholar] [CrossRef]
Wang, D.; Wan, B.; Liu, J.; Su, Y.; Guo, Q.; Qiu, P.; Wu, X. Estimating aboveground biomass of the mangrove forests on northeast Hainan Island in China using an upscaling method from field plots, UAV-LiDAR data and Sentinel-2 imagery. Int. J. Appl. Earth Obs. Geoinf. 2020, 85, 101986. [Google Scholar] [CrossRef]
Wang, Y.; Jia, X.; Chai, G.; Lei, L.; Zhang, X. Improved estimation of aboveground biomass of regional coniferous forests integrating UAV-LiDAR strip data, Sentinel-1 and Sentinel-2 imageries. Plant Methods 2023, 19, 65. [Google Scholar] [CrossRef]
Shamaoma, H.; Chirwa, P.W.; Zekeng, J.C.; Ramoelo, A.; Hudak, A.T.; Handavu, F.; Syampungani, S. Exploring UAS-lidar as a sampling tool for satellite-based AGB estimations in the Miombo woodland of Zambia. Plant Methods 2024, 20, 88. [Google Scholar] [CrossRef]
Potapov, P.; Li, X.; Hernandez-Serna, A.; Tyukavina, A.; Hansen, M.C.; Kommareddy, A.; Pickens, A.; Turubanova, S.; Tang, H.; Silva, C.E.; et al. Mapping global forest canopy height through integration of GEDI and Landsat data. Remote Sens. Environ. 2021, 253, 112165. [Google Scholar] [CrossRef]
Lang, N.; Jetz, W.; Schindler, K.; Wegner, J.D. A high-resolution canopy height model of the Earth. Nat. Ecol. Evol. 2023, 7, 1778–1789. [Google Scholar] [CrossRef] [PubMed]
Guo, Q.; Du, S.; Jiang, J.; Guo, W.; Zhao, H.; Yan, X.; Zhao, Y.; Xiao, W. Combining GEDI and sentinel data to estimate forest canopy mean height and aboveground biomass. Ecol. Inform. 2023, 78, 102348. [Google Scholar] [CrossRef]
Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Goulden, T. State-wide forest canopy height and aboveground biomass map for New York with 10 m resolution, integrating GEDI, Sentinel-1, and Sentinel-2 data. Ecol. Inform. 2024, 79, 102404. [Google Scholar] [CrossRef]
Tang, Z.; Xia, X.; Huang, Y.; Lu, Y.; Guo, Z. Estimation of National Forest Aboveground Biomass from Multi-Source Remotely Sensed Dataset with Machine Learning Algorithms in China. Remote Sens. 2022, 14, 5487. [Google Scholar] [CrossRef]
Zhang, Y.; Ma, J.; Liang, S.; Li, X.; Liu, J. A stacking ensemble algorithm for improving the biases of forest aboveground biomass estimations from multiple remotely sensed datasets. GIScience Remote Sens. 2022, 59, 234–249. [Google Scholar] [CrossRef]
Wu, Z.; Yao, F.; Zhang, J.; Liu, H. Estimating Forest Aboveground Biomass Using a Combination of Geographical Random Forest and Empirical Bayesian Kriging Models. Remote Sens. 2024, 16, 1859. [Google Scholar] [CrossRef]
Nguyen, T.H.; Jones, S.; Soto-Berelov, M.; Haywood, A.; Hislop, S. Landsat Time-Series for Estimating Forest Aboveground Biomass and Its Dynamics across Space and Time: A Review. Remote Sens. 2019, 12, 98. [Google Scholar] [CrossRef]
Palacios, M.B.; Steel, M.F.J. Non-Gaussian Bayesian Geostatistical Modeling. J. Am. Stat. Assoc. 2012, 101, 604–618. [Google Scholar] [CrossRef]
Liu, X.-L.; Ou, S.-L.; Lu, S.-F.; Yue, C.-R. Estimation of forest volume based on Sentinel-1A microwave remote sensing data. J. West China For. Sci. 2020, 49, 128–136. [Google Scholar] [CrossRef]
Xu, M.-L.; Wang, J.F.; Xu, H.; Ou, G.-L. Relationship between spatial structure of Pinus kesiya var. langbianensis natural forest and the above-ground biomass of individual trees. J. Yunnan Univ. Nat. Sci. Ed. 2020, 42, 364–373. [Google Scholar]
Coops, N.C.; Tompalski, P.; Goodbody, T.R.H.; Queinnec, M.; Luther, J.E.; Bolton, D.K.; White, J.C.; Wulder, M.A.; van Lier, O.R.; Hermosilla, T. Modelling lidar-derived estimates of forest attributes over space and time: A review of approaches and future trends. Remote Sens. Environ. 2021, 260, 112477. [Google Scholar] [CrossRef]
Puliti, S.; Breidenbach, J.; Astrup, R. Estimation of Forest Growing Stock Volume with UAV Laser Scanning Data: Can It Be Done without Field Data? Remote Sens. 2020, 12, 1245. [Google Scholar] [CrossRef]
Neuville, R.; Bates, J.S.; Jonard, F. Estimating Forest Structure from UAV-Mounted LiDAR Point Cloud Using Machine Learning. Remote Sens. 2021, 13, 352. [Google Scholar] [CrossRef]
Hancock, S.; Armston, J.; Hofton, M.; Sun, X.; Tang, H.; Duncanson, L.I.; Kellner, J.R.; Dubayah, R. The GEDI Simulator: A Large-Footprint Waveform Lidar Simulator for Calibration and Validation of Spaceborne Missions. Earth Space Sci. 2019, 6, 294–310. [Google Scholar] [CrossRef] [PubMed]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Torres, R.; Snoeij, P.; Geudtner, D.; Bibby, D.; Davidson, M.; Attema, E.; Potin, P.; Rommen, B.; Floury, N.; Brown, M.; et al. GMES Sentinel-1 mission. Remote Sens. Environ. 2012, 120, 9–24. [Google Scholar] [CrossRef]
Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
Roy, D.P.; Wulder, M.A.; Loveland, T.R.; Woodcock, C.E.; Allen, R.G.; Anderson, M.C.; Helder, D.; Irons, J.R.; Johnson, D.M.; Kennedy, R.; et al. Landsat-8: Science and product vision for terrestrial global change research. Remote Sens. Environ. 2014, 145, 154–172. [Google Scholar] [CrossRef]
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene Selection for Cancer Classification using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Yang, Q.; Niu, C.; Liu, X.; Feng, Y.; Ma, Q.; Wang, X.; Tang, H.; Guo, Q. Mapping high-resolution forest aboveground biomass of China using multisource remote sensing data. GIScience Remote Sens. 2023, 60, 2203303. [Google Scholar] [CrossRef]
Chen, L.; Ren, C.; Bao, G.; Zhang, B.; Wang, Z.; Liu, M.; Man, W.; Liu, J. Improved Object-Based Estimation of Forest Aboveground Biomass by Integrating LiDAR Data from GEDI and ICESat-2 with Multi-Sensor Images in a Heterogeneous Mountainous Region. Remote Sens. 2022, 14, 2743. [Google Scholar] [CrossRef]
Huang, T.; Ou, G.; Wu, Y.; Zhang, X.; Liu, Z.; Xu, H.; Xu, X.; Wang, Z.; Xu, C. Estimating the Aboveground Biomass of Various Forest Types with High Heterogeneity at the Provincial Scale Based on Multi-Source Data. Remote Sens. 2023, 15, 3550. [Google Scholar] [CrossRef]
Gribov, A.; Krivoruchko, K. Empirical Bayesian kriging implementation and usage. Sci. Total Environ. 2020, 722, 137290. [Google Scholar] [CrossRef] [PubMed]
Krivoruchko, K.; Gribov, A. Evaluation of empirical Bayesian kriging. Spat. Stat. 2019, 32, 100368. [Google Scholar] [CrossRef]
Wang, Y.; Wang, H.; Wang, C.; Zhang, S.; Wang, R.; Wang, S.; Duan, J. Co-Kriging-Guided Interpolation for Mapping Forest Aboveground Biomass by Integrating Global Ecosystem Dynamics Investigation and Sentinel-2 Data. Remote Sens. 2024, 16, 2913. [Google Scholar] [CrossRef]
Shendryk, Y. Fusing GEDI with earth observation data for large area aboveground biomass mapping. Int. J. Appl. Earth Obs. Geoinf. 2022, 115, 103108. [Google Scholar] [CrossRef]
Mohite, J.; Sawant, S.; Pandit, A.; Sakkan, M.; Pappula, S.; Parmar, A. Forest aboveground biomass estimation by GEDI and multi-source EO data fusion over Indian forest. Int. J. Remote Sens. 2024, 45, 1304–1338. [Google Scholar] [CrossRef]
Nian, Y.; Chen, S.; Chen, J.; Che, M.; Zhang, W.; Ali, J.S.; Zhang, H.; Wang, X.; Liao, B.; Wang, X. Mapping Subalpine Forest Aboveground Biomass in Qilian Mountain National Park Using UAV-LiDAR, GEDI, and Multisource Satellite Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 12407–12420. [Google Scholar] [CrossRef]
Tian, X.; Li, J.; Zhang, F.; Zhang, H.; Jiang, M. Forest Aboveground Biomass Estimation Using Multisource Remote Sensing Data and Deep Learning Algorithms: A Case Study over Hangzhou Area in China. Remote Sens. 2024, 16, 1074. [Google Scholar] [CrossRef]
Wang, E.; Huang, T.; Liu, Z.; Bao, L.; Guo, B.; Yu, Z.; Feng, Z.; Luo, H.; Ou, G. Improving Forest Above-Ground Biomass Estimation Accuracy Using Multi-Source Remote Sensing and Optimized Least Absolute Shrinkage and Selection Operator Variable Selection Method. Remote Sens. 2024, 16, 4497. [Google Scholar] [CrossRef]
Ploton, P.; Mortier, F.; Réjou-Méchain, M.; Barbier, N.; Picard, N.; Rossi, V.; Dormann, C.; Cornu, G.; Viennois, G.; Bayol, N.; et al. Spatial validation reveals poor predictive performance of large-scale ecological mapping models. Nat. Commun. 2020, 11, 4540. [Google Scholar] [CrossRef]
Wadoux, A.M.J.C.; Heuvelink, G.B.M.; de Bruin, S.; Brus, D.J. Spatial cross-validation is not the right way to evaluate map accuracy. Ecol. Model. 2021, 457, 109692. [Google Scholar] [CrossRef]
Yang, C.; Liu, A.; Chen, Y. Seventeen-Year Reconstruction of Tropical Forest Aboveground Biomass Dynamics in Borneo Using GEDI L4B and Multi-Sensor Data Fusion. Remote Sens. 2025, 17, 3231. [Google Scholar] [CrossRef]
Yang, C.; Liu, A.; Chen, Y.; Wang, C.; Cheng, X. Time-series reconstruction and mapping of forest aboveground biomass in the Great Xing’an Mountains of China using GEDI, MODIS, and machine learning. Ecol. Indic. 2025, 180, 114375. [Google Scholar] [CrossRef]
Qi, W.; Dubayah, R.O. Combining Tandem-X InSAR and simulated GEDI lidar observations for forest structure mapping. Remote Sens. Environ. 2016, 187, 253–266. [Google Scholar] [CrossRef]
Marselis, S.M.; Tang, H.; Armston, J.; Abernethy, K.; Alonso, A.; Barbier, N.; Bissiengou, P.; Jeffery, K.; Kenfack, D.; Labrière, N.; et al. Exploring the relation between remotely sensed vertical canopy structure and tree species diversity in Gabon. Environ. Res. Lett. 2019, 14, 094013. [Google Scholar] [CrossRef]
Georgopoulos, N.; Sotiropoulos, C.; Stefanidou, A.; Gitas, I.Z. Total Stem Biomass Estimation Using Sentinel-1 and -2 Data in a Dense Coniferous Forest of Complex Structure and Terrain. Forests 2022, 13, 2157. [Google Scholar] [CrossRef]
Mura, M.; Bottalico, F.; Giannetti, F.; Bertani, R.; Giannini, R.; Mancini, M.; Orlandini, S.; Travaglini, D.; Chirici, G. Exploiting the capabilities of the Sentinel-2 multi spectral instrument for predicting growing stock volume in forest ecosystems. Int. J. Appl. Earth Obs. Geoinf. 2018, 66, 126–134. [Google Scholar] [CrossRef]
Jin, S.; Sader, S.A. Comparison of time series tasseled cap wetness and the normalized difference moisture index in detecting forest disturbances. Remote Sens. Environ. 2005, 94, 364–372. [Google Scholar] [CrossRef]
Korhonen, L.; Hadi; Packalen, P.; Rautiainen, M. Comparison of Sentinel-2 and Landsat 8 in the estimation of boreal forest canopy cover and leaf area index. Remote Sens. Environ. 2017, 195, 259–274. [Google Scholar] [CrossRef]
Muhe, S.; Argaw, M. Estimation of above-ground biomass in tropical afro-montane forest using Sentinel-2 derived indices. Environ. Syst. Res. 2022, 11, 5. [Google Scholar] [CrossRef]

Figure 1. Overview of the study area.(Panels (a–d) correspond to the four UAV flight zones: (a) PUER, (b) STZ, (c) WZS, and (d) TYH).

Figure 2. Distribution of field plots.

Figure 3. Technical workflow.

Figure 4. Correlation coefficient matrix between UAV LiDAR point cloud features and plot-level AGB.

Figure 5. Fitted scatterplot and residual distribution of the linear regression model for overlapping plots.

Figure 6. Comparative analysis of GEDI waveform geolocation correction.

Figure 7. Distribution of bridging samples across plot groups.

Figure 8. Random forest feature importance ranking and RFECV feature number optimization results. (a) Random forest feature importance; (b) Variation in RFECV cross-validated RMSE with feature number.

Figure 9. Selection frequency of candidate features based on the RFECV process.

Figure 10. Accuracy of the five models under 10-fold cross-validation.

Figure 11. Comparison between predicted and observed values for the five models (10-fold cross-validation).

Figure 12. Scatterplot-based validation against independent field plots for the four schemes.

Figure 13. Remote sensing image and spatial distribution of forest aboveground biomass (AGB) in the study area. (a) High-resolution remote sensing image of the study area, with the red line indicating the study area boundary; (b) Spatial distribution map of forest AGB in the study area derived from inversion.

Table 1. Forest AGB models of various tree species in the study area.

Tree Species	Above-Ground Biomass Models
Pinus kesiya var. langbianensis (A.Chev.) Gaussen ex Bui	$M = 0.0582 D H B^{2.1203} H^{0.4668}$
Schima superba Gardner & Champ.	$M = 0.12045 D H B^{2.06446} H^{0.38265}$
Eucalyptus robusta Sm.	$\lg M = 0.814 \lg (D H B^{2} H) - 0.9816$
Cupressus funebris Endl.	$M = 0.010158 D H B^{2.94424} H^{0.41591}$
Cunninghamia lanceolata (Lamb.) Hook	$M = 0.10301 {({D H B}^{2} H)}^{0.7773}$
Quercus × leana Nutt.	$M = 0.22999 D H B^{1.39183} H^{0.57393}$
Castanopsis fargesii Franch.	$M = 0.1355 {(D H B^{2} H)}^{0.817} + 0.0275 {(D H B^{2} H)}^{0.8165}$
Various kinds of birch in southwest China	$M = 0.08907 D H B^{1.89807} H^{0.52019}$
Hard broadleaved	$\begin{matrix} M = 0.3507 {(D H B - 1.1948)}^{2} + (0.03017 D H B^{2.3643} \\ + 0.051) + (0.01813 D H B^{2} - 0.2477) \end{matrix}$
Soft broadleaved	$\begin{array}{l} M = 0.02739 {({D H B}^{2} H)}^{0.898869} + (0.01497 {({D H B}^{2} H)}^{0.875639} \\ + (0.01059 {({D H B}^{2} H)}^{0.813953}) + (0.0121 {({D H B}^{2} H)}^{0.854295} \end{array}$

Note: DBH represents diameter at breast height, H represents tree height, and M represents forest aboveground biomass (AGB).

Table 2. Main GEDI L2B variables used in this study.

Variable Name	Description
RH100	Relative height at the 100th percentile, used to characterize maximum canopy height
cover	Canopy cover
pai	Plant area index
fhd_norm	Foliage height diversity
sensitivity	Waveform sensitivity, used for quality screening

Note: The variables ultimately used for modeling were determined based on the data retained after quality control.

Table 3. Multi-source wall-to-wall covariates used in this study.

Data Source	Variable Group	Representative Variables/Statistics	Main Purpose
Sentinel-1	Radar backscatter	VH and VV backscatter (mainly p10, p50, p90)	Characterize canopy scattering intensity and structural variation
	Polarimetric derivatives	VH/VV ratio and VH–VV difference	Capture polarization contrast related to canopy structure and moisture conditions
	Geometry and texture	Incidence angle and GLCM texture metrics derived from VH	Describe observation geometry and spatial heterogeneity of radar response
Sentinel-2	Surface reflectance bands	Visible, red-edge, near-infrared, and shortwave infrared bands (mainly p50)	Represent canopy spectral properties and vegetation condition
Sentinel-2	Vegetation and red-edge indices	NDVI, EVI, and IRECI (p10, p50, p90)	Characterize vegetation vigor, canopy status, and chlorophyll-related information
Landsat 8/9	Thermal and tasseled cap variables	LST, TCW, TCB, and TCG	Represent thermal conditions, moisture gradients, and background environmental variation
Landsat 8/9	Vegetation-related index	NBR (mainly p50)	Supplement spectral and disturbance-related background information

Table 4. Point cloud feature variables used in this study.

Feature	Description
aad_z	Absolute deviation of height
canopy_relief_ratio	Canopy relief ratio
AIH_25th, 50th, 75th, 90th, 95th	Cumulative height contributed by point clouds at the 25%, 50%, 75%, 90%, and 95% accumulated height levels
sqrt_mean_sq	Root mean square of point cloud height
curt_mean_cube	Cube root of the mean cubed point cloud height
kurtosis	Kurtosis of point cloud height
skewness	Skewness of point cloud height
mean	Mean height of point clouds within the plot/grid
max	Maximum height of point clouds within the plot/grid
min	Minimum height of point clouds within the plot/grid
percentile_25th, 50th, 75th, 90th, 95th	Height percentiles of the point cloud vertical distribution at 25%, 50%, 75%, 90%, and 95%

Table 5. Leave-one-out cross-validation (LOOCV) results of the linear regression model for overlapping plots.

Model Type	Predictor	Sample Size	LOOCV R²	LOOCV RMSE (Mg ha⁻¹)	LOOCV MAE (Mg ha⁻¹)	LOOCV Bias (Mg ha⁻¹)
Univariate linear regression	percentile_95th	35	0.632	24.02	19.68	−0.63

Table 6. Statistics of the bridging sample screening procedure.

Step	Number of Samples	Number Removed Relative to Previous Stage	Retention Rate Relative to UAV-Overlapping Footprints/%
Within valid UAV coverage	331	-	100.00
overlap ≥ 0.5	299	32	90.33
valid_ratio ≥ 0.5 and n_pixels ≥ 25	252	47	76.13

Table 7. 10-fold CV model performance (mean ± std).

Model	R²	RMSE (Mg ha⁻¹)	MAE (Mg ha⁻¹)	Bias (Mg ha⁻¹)
MLR	0.521 ± 0.198	30.755 ± 6.800	23.726 ± 3.437	3.812 ± 5.661
SVR	0.558 ± 0.145	29.637 ± 2.425	23.959 ± 1.143	−5.200 ± 4.386
XGBoost	0.627 ± 0.158	27.245 ± 6.238	21.179 ± 3.743	1.504 ± 6.047
RF	0.681 ± 0.115	25.298 ± 4.857	20.445 ± 4.159	−2.128 ± 5.306
Stacking	0.702 ± 0.217	23.025 ± 5.127	17.152 ± 3.289	0.798 ± 2.140

Table 8. Independent field-plot validation results for different covariate schemes.

Scheme	Explanatory Rasters	RMSE (Mg ha⁻¹)	MAE (Mg ha⁻¹)	Bias (Mg ha⁻¹)	$R^{2}$
A	Landsat8/9_TCW_p90	32.35	24.66	8.50	0.417
B	Landsat89_TCW_p90 + S2_IRECI_p10	28.92	23.28	−7.20	0.534
C	Landsat89_TCW_p90 + S2_IRECI_p10 + S1_VH_VV_Ratio_p10	26.05	20.29	−3.50	0.622
D	Landsat89_TCW_p90 + S2_IRECI_p10 + S1_VH_VV_Ratio_p10 + Landsat89_TCW_p50	27.42	21.14	4.10	0.581

Table 9. Internal cross-validation statistics of EBKRP for the final scheme.

Metric	Value
Count	89,550
Mean CRPS	8.1766
Within 90% interval/%	91.3
Within 95% interval/%	95.8
Mean Error	0.006
RMSE	14.7
Mean Standardized Error	−0.000
RMSSE	0.964
ASE	15.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, H.; Dong, W.; Zhang, W.; Hu, J.; Ji, Y. Geolocation-Corrected UAV–GEDI Bridging Samples and Stacking Ensemble Models for Regional AGB Mapping in Subtropical Mountainous Forests of Simao District, Yunnan. Remote Sens. 2026, 18, 1796. https://doi.org/10.3390/rs18111796

AMA Style

Yang H, Dong W, Zhang W, Hu J, Ji Y. Geolocation-Corrected UAV–GEDI Bridging Samples and Stacking Ensemble Models for Regional AGB Mapping in Subtropical Mountainous Forests of Simao District, Yunnan. Remote Sensing. 2026; 18(11):1796. https://doi.org/10.3390/rs18111796

Chicago/Turabian Style

Yang, Haiyun, Wenquan Dong, Wangfei Zhang, Jiaqi Hu, and Yongjie Ji. 2026. "Geolocation-Corrected UAV–GEDI Bridging Samples and Stacking Ensemble Models for Regional AGB Mapping in Subtropical Mountainous Forests of Simao District, Yunnan" Remote Sensing 18, no. 11: 1796. https://doi.org/10.3390/rs18111796

APA Style

Yang, H., Dong, W., Zhang, W., Hu, J., & Ji, Y. (2026). Geolocation-Corrected UAV–GEDI Bridging Samples and Stacking Ensemble Models for Regional AGB Mapping in Subtropical Mountainous Forests of Simao District, Yunnan. Remote Sensing, 18(11), 1796. https://doi.org/10.3390/rs18111796

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Geolocation-Corrected UAV–GEDI Bridging Samples and Stacking Ensemble Models for Regional AGB Mapping in Subtropical Mountainous Forests of Simao District, Yunnan

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Field Plot Data

2.2. Data Acquisition and Preprocessing

2.2.1. UAV LiDAR Data

2.2.2. GEDI Data

2.2.3. Multi-Source Wall-to-Wall Remote Sensing Data

2.3. Methodology

2.3.1. Construction of the Local AGB Reference Product in the UAV-Covered Area

2.3.2. Construction of UAV–GEDI Bridging Samples

2.3.3. GEDI Footprint-Level AGB Modeling

2.3.4. Study-Area-Wide GEDI Point Prediction and Regional Continuous Mapping

3. Results

3.1. Forest AGB Estimation Results in the UAV Flight Area

3.1.1. Univariate Sensitivity Analysis

3.1.2. Linear Regression Model for AGB Estimation

3.2. Results of UAV–GEDI Bridging Sample Construction

3.2.1. GEDI Geolocation Correction

3.2.2. Screening Results of Bridging Samples

3.3. Feature Selection Results

3.4. GEDI Footprint-Level AGB Modeling Results

3.5. Regional Continuous Mapping Results and Validation

3.5.1. External Validation and Comparison of Continuous AGB Rasters Generated by Different Wall-to-Wall Covariate Schemes

3.5.2. Internal Cross-Validation Results and Spatial Distribution of the Continuous AGB Raster for the Final Scheme

4. Discussion

4.1. Representativeness of Bridging Samples and the Limits of Model Generalization

4.2. Discussion of GEDI Footprint-Level AGB Modeling Results

4.3. Accuracy Analysis of Regional Continuous Mapping and Sources of Error

4.4. Limitations and Future Perspectives

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI