Next Article in Journal
Predicting Tart Cherry Stem Water Potential Using UAV Multispectral Imagery and Environmental Data via Symbolic Regression
Previous Article in Journal
Addressing Dense Small-Object Detection in Remote Sensing: An Open-Vocabulary Object Detection Framework
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Advancing Forest Inventory and Fuel Monitoring with Multi-Sensor Hybrid Models: A Comparative Framework for Basal Area Estimation

by
Nasrin Salehnia
1,
Peter Wolter
1,*,
Brian R. Sturtevant
2 and
Dalia Abbas Iossifov
3
1
Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA 50011, USA
2
Northern Research Station, USDA Forest Service, Rhinelander, WI 54501, USA
3
USDA Forest Service, Forest Products Laboratory, Washington, DC 20250, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(6), 852; https://doi.org/10.3390/rs18060852
Submission received: 8 January 2026 / Revised: 18 February 2026 / Accepted: 27 February 2026 / Published: 10 March 2026

Highlights

This study evaluates how feature selection strategy affects the prediction of total and species-level forest basal area (BA) when fusing multi-season, multispectral imagery (Sentinel-2 and Landsat-9) imagery with LiDAR structural metrics. Using an identical standardized library of 175 predictors and 141 field plots from the Kawishiwi Ranger District in the Superior National Forest (Minnesota, USA), we benchmark four subset-selection pipelines (xPLS, GA-xPLS, RF-xPLS, and SVR-xPLS) and quantify model uncertainty with bootstrap confidence intervals.
What are the main findings?
  • RF-xPLS achieved the strongest pooled model performance for total and species-level basal area (BA), with well-behaved residuals and tight bootstrap confidence intervals.
  • A parsimonious 27-predictor subset preserved most of the skill, dominated by SWIR/NDII (canopy water/dry-matter), red-edge/NIR greenness features, and a single LiDAR structure metric (HQUAD).
What are the implications of the main findings?
  • Hybrid selection can reduce a high-dimensional, collinear multi-sensor feature space to a compact, interpretable set without sacrificing accuracy—supporting operational, wall-to-wall BA mapping.
  • The selected predictors indicate which inputs are most informative and where additional field plots (especially in very high-BA stands) would most effectively improve generalization and uncertainty.

Abstract

Fire suppression in the upper U.S. Midwest has led to the expansion of flammable coniferous ladder fuels, necessitating precise tracking of conifer species basal area (BA) for fire risk management. This study benchmarks four subset-selection pipelines—xPLS, GA-xPLS, RF-xPLS, and SVR-xPLS—to optimize the fusion of high-dimensional, collinear data from Sentinel-2, Landsat-9, and LiDAR sensors. Using 141 field plots in Minnesota’s Kawishiwi Ranger District of the Superior National Forest, we evaluated 175 predictors against eight BA response variables. Results show that RF-xPLS provided the superior accuracy–parsimony trade-off, achieving the highest pooled R2 (≈0.86) and lowest error with a compact 27-predictor block. GA-xPLS ranked second, excelling for specific species such as Pinus resinosa. The most effective predictors combined SWIR-based moisture indices, red-edge/NIR structure, and a single LiDAR-derived surface of vertical-structure (quadratic mean height). Our findings demonstrate that integrating machine learning selection engines with multi-sensor fusion substantially enhances the scalability and precision of forest inventory and fuels monitoring. This comparative framework offers practical insights for sustainable management and fire risk mitigation in northern temperate–boreal forests.

1. Introduction

Fire has long acted as an agent of change in the Superior National Forest (SNF) in northeastern Minnesota. Historic fire patterns served to create a dynamic mosaic of stand structures and species compositions through frequent, low-intensity surface fires and periodic larger burns [1,2]. In these hemiboreal landscapes, the absence of regular fire over the past century has facilitated the gradual increase of shade-tolerant, fire-susceptible species such as balsam fir (Abies balsamea (L.) Mill.) [3]. Historically, this species remained at low abundance due to recurrent fires, but now persists and regenerates under closed canopies resulting from post-settlement fire suppression and balsam fir’s strong shade tolerance [4,5]. As balsam fir becomes more prevalent, it contributes to ladder and fine fuels that are highly flammable and can promote more intense crown fires, complicating both natural wildfire behavior and remedial management [6,7,8]. These altered fuel conditions increase the risk of high-severity wildfires that threaten wilderness values, ecological resilience, and nearby communities [9]. Hence, accurate, spatially explicit fuel maps derived from remote sensing data are essential for monitoring fuel characteristics, to inform fire-behavior modeling and guiding fuel reduction treatments, including prescribed fire and mechanical thinning, to restore natural fire regimes, and reduce hazardous fuel accumulations [10,11,12].
Investments in remedial fuel reduction treatments to remove small-diameter coniferous ladder fuels (≤10 cm bole diameter) are often economically unfeasible due to the low economic value of the resulting biomass [13]. However, emerging markets seek to use such low-value biomass as source material to generate biochar [14] for remedial soil treatment and stabilization of coal combustion residues [15,16]. While biochar and carbon credits represent a burgeoning economic market and an option for small-diameter fuel treatments [17], current values are too low to incentivize widespread investment in fuel reduction treatments [13]. Given the increasingly limited resources available to forest managers, accurate and timely mapping of all coniferous fuels (overstory and ladder) is critical for targeted forest management planning to reduce wildland fire risk [12].
Forest basal area (BA) is a fuel characteristic that is routinely quantified using satellite-based remote sensing data [18,19], but the need for improvements in accuracy continues to drive research. Forest BA is the sum of the cross-sectional area of tree boles at breast height (1.37 m above ground) per unit ground area (e.g., m2⋅ha−1). This metric is widely used to characterize stocking levels [20,21,22] as well as stand structure and species composition [18,23]. Moreover, BA often correlates with forest attributes such as productivity, biomass, and carbon stocks [24,25]. Accurate estimation of total and species-level forest BA is essential for forest inventory, fire risk assessment, monitoring, and sustainable management, particularly in the context of climate change and carbon accounting [26,27]. Traditional field-based measurements, while precise, are time-consuming, labor-intensive, and spatially limited. Consequently, remote sensing (RS) technologies are increasingly critical for obtaining comprehensive, accurate, and scalable BA estimates, with recent advances demonstrating the efficacy of multi-platform and multi-sensor approaches [19,28,29].
The RS technologies, especially airborne Light Detection and Ranging (LiDAR) and multispectral satellite imagery, have emerged as powerful tools to scale up BA estimation across heterogeneous landscapes [30,31,32]. Airborne LiDAR provides detailed three-dimensional forest structural information, enabling the derivation of metrics such as canopy height, vertical heterogeneity, and statistical moments of height distributions [33,34,35]. Multispectral satellite sensors such as Sentinel-2 and Landsat-9 afford the use of complementary spectral information, including vegetation indices (e.g., normalized difference vegetation index (NDVI), moisture stress index (MSI)) that capture seasonal phenological variation [36,37,38] and structural information that bolsters fuel modeling and mapping efforts in this region [12]. Combined, these RS datasets provide a rich set of predictive information for modeling and mapping forest BA within complex, heterogeneous forest landscapes [39].
However, the abundance of potential predictors often leads to redundancy, collinearity, and overfitting, highlighting the need for effective variable selection [40]. The problem of high-dimensional feature spaces is particularly acute in forest RS, where platforms like LiDAR and multispectral sensors can generate dozens to hundreds of correlated metrics [19,41]. Several statistical and machine learning approaches have been employed to address this challenge. Partial Least Squares (PLS) regression is commonly applied in forest ecosystem studies to reduce dimensionality and handle correlated predictors [18,42]. While effective, a key limitation of standard PLS regression is that it constructs latent components using all original variables, often retaining redundant information and complicating model interpretation [43,44]. As such, stepwise selection, another conventional method, operates to reduce the number of predictors but is prone to overfitting and can be unstable with highly correlated predictors [45].
To overcome these limitations, more sophisticated machine learning and metaheuristic optimization techniques have gained prominence. For instance, a key study comparing variable selection methods for forest inventory with LiDAR sensor data [46] demonstrated that while machine learning techniques like Random Forest (RF) show promise, traditional methods like stepwise regression often failed to produce parsimonious and transferable models. This finding underscores the critical need for robust variable selection strategies tailored to high-dimensional RS data. Methods such as RF [47] and Support Vector Machines (SVM) [48] provide built-in measures of variable importance, but the resulting rankings and selected subsets can depend on model-tuning settings (hyperparameters; e.g., RF settings controlling tree growth and feature sampling, and SVM kernel and regularization/ε choices) and, when used alone, do not guarantee a globally optimal parsimonious subset. A notable study [49] combined satellite sensor data (Sentinel-1 and Sentinel-2) with field variables to estimate species-specific forest BA and found RF outperformed Multi-Layer Perceptron (MLP). Similarly, [50] implemented deep-learning architectures to forecast BA increment across multiple Himalayan Forest species. These studies demonstrate the increasing role of machine learning in forest BA modeling; however, they rely on conventional variable importance rankings and do not incorporate hybrid metaheuristic optimization for predictor selection. More recently, genetic algorithms (GA)—population-based evolutionary optimizers that iteratively refine candidate predictor subsets using selection, crossover, and mutation—have been recognized for efficiently searching large feature spaces and identifying near-optimal subsets in high-dimensional modeling tasks [51,52,53].
Hybrid modeling frameworks that integrate optimization algorithms with statistical or machine learning methods have demonstrated strong potential for variable selection in high-dimensional datasets. Originally developed in chemometrics for spectral analysis [54], GA-PLS combines the evolutionary search capability of Genetic Algorithms with the dimensionality-reduction strength of PLS, using model performance (e.g., Root Mean Square Error of Cross-Validation) as the fitness criterion [55,56]. Over the past decade, similar hybrid and metaheuristic methods have been increasingly applied in other scientific domains, including chemical, biomedical, and environmental remote sensing studies (e.g., [57,58,59]). However, their application to forest structural attribute estimation—particularly for BA modeling using LiDAR- and multispectral-derived predictors—remains largely unexplored.
In this study, we extend this line of research by evaluating several hybrid and machine-learning-based variable selection frameworks, including iterative exclusion partial least squares regression (xPLS) [60], genetic algorithm-optimized xPLS (GA-xPLS), random forest-guided xPLS (RF-xPLS), and support vector regression-guided xPLS (SVR-xPLS), to identify the most effective approach for modeling total and species-specific BA (m2·ha−1). We integrated 175 LiDAR-derived structural metrics and multispectral predictors for analyses in the Kawishiwi Ranger District of the Superior National Forest (Minnesota, USA). Ultimately, the outputs of our analyses will support wildfire risk assessments via the spatial distribution of conifer fuels capable of sustained crown-fire within these mixed forest systems, and “ladder fuels” comprised of shade-tolerant conifer species (e.g., Abies balsamea) with low branching architecture capable of transferring fire from the ground surface into the canopy [61,62]. Therefore, we aimed to:
  • Evaluate predictive performance of the four approaches (xPLS, GA-xPLS, RF-xPLS, and SVR-xPLS) in terms of RMSE and R2.
  • Identify most influential predictors of total and species-specific conifer BA.
  • Assess the complementarity of structural (LiDAR) and spectral (multispectral) predictors in explaining BA variation.
Understanding the relative strengths and weaknesses of these methods is essential for advancing forest inventory and fuel monitoring practices, especially within heterogeneous forest landscapes. Our results provide a comparative framework for optimizing RS-based predictors of forest BA, offering practical insights for large-scale forest inventory, carbon monitoring, ecosystem functioning, and wildfire risk management in northern temperate and boreal forests.

2. Materials and Methods

2.1. Study Area and Data Sources

Centered at 47.899° latitude and −91.659° longitude, the Kawishiwi Ranger District (KRD, Figure 1) has an area of ~2908 km2 and is one of five administrative field units within the Superior National Forest in northern Minnesota, USA. Forest cover within the KRD is diverse (five conifer genera and seven hardwood tree genera) and is considered part of the transitional hemiboreal region between temperate forests to the south and boreal forests to the north [1,63,64]. In general, these forests are extensively managed for wood fiber, which has resulted in a dominance of aspen (Populus tremuloides Michx. and P. grandidentata Michx.), paper birch (Betula papyrifera Marshall), spruce (Picea glauca (Moench) Voss, P. mariana (Mill.) B.S.P.), and balsam fir forest associations [65,66,67]. The northern portion of the region is largely protected (Boundary Waters Canoe Area Wilderness) and has an extensive fire history that supports vast stands of pioneer forest dominated by jack pine (Pinus banksiana Lamb.) as well as containing remnants of old-growth white and red pine (Pinus strobus L. and P. resinosa Ait.) forests [1,3].
To facilitate forest BA modeling and mapping across this region, we leveraged both airborne LiDAR sensor data and optical satellite sensor data. We obtained high-density LiDAR sensor data (2018–2020) from the U.S. Geological Survey (USGS) 3D Elevation Program. Satellite sensor data include multi-spectral, multi-seasonal Sentinel-2 reflectance data (10 m and 20 m spatial resolution) and Landsat-9 (30 m spatial resolution) surface reflectance data. We performed all data analyses in MATLAB R2024b using the Parallel Computing Toolbox on a 12-core Dell workstation. We conducted all data processing and visualization via ArcGIS Pro 3.6 and Python 3.12.

2.1.1. Predictor Variables

We derived 175 predictors from a combination of LiDAR-based structural metrics (see [10]) and both Sentinel-2 and Landsat-9 reflectance bands and vegetation indices (Table 1, Table 2, Table 3 and Table 4). We assembled Sentinel-2 predictors as multi-temporal observations to capture intra-seasonal foliar phenology and senescence dynamics (see [38,42]). In contrast, we included 30 m Landsat-9 sensor data as a complementary single-date image to (i) provide greater spatial integration of complex forest signatures and (ii) maintain continuity with regional forest-structure studies that leverage late-season imagery. We selected the cloud-free Landsat-9 image to capture optimal late-season forest senescence in this region, when maximum foliar contrast in visible and SWIR wavelengths typically occurs (see [38,42]).
Sentinel-2 predictors include multi-seasonal reflectance bands (March, May, June, August, September, and October) and vegetation indices such as ARVI, EVI7, NDII11, SAVI, and NDVI (Table 2), computed at 10 m and 20 m spatial resolutions according to wavelengths used. We selected spectral vegetation indices (VIs) to represent distinct biophysical sensitivities relevant to forest structure, including greenness/chlorophyll, canopy moisture, and senescence/structural contrast, while limiting redundancy among highly correlated indices. Thus, we prioritized indices that are (i) widely used and interpretable in forest remote-sensing studies and (ii) that complement the LiDAR-derived structural metrics.
Similarly, 30 m Landsat-9 predictors included raw bands and indices from 20 September 2023 imagery (Table 3). LiDAR-derived predictors quantify canopy height, vertical heterogeneity, and stand structure (Table 4) (see [10]).

2.1.2. Field Data and Response Variables

We collected forest BA data at 141 field plots within the KRD between 2024 and 2025. Each of the 141 field plots consists of a cluster of five variable radius subplots: one located at the intersection and four endpoints of two orthogonal 20 × 20 m transect lines placed at the plot center (Figure 2) within relatively homogenous forest type associations (i.e., ≥7 × 7 pixels or 0.5 ha). Sufficient stand size and homogeneity assured minimization of stand edge effects during analysis, and that GPS location errors, if greater than 5 m, would be inconsequential. We collected estimates of BA by species at each of the five subplots using angle count sampling via a metric BA factor two prism [68]. We recorded GPS coordinates at plot centers using a Garmin GPSMAP 62stc handheld receiver (Garmin Ltd., Olathe, KS, USA) with WAAS differential correction enabled. Expected horizontal accuracy for this class of handheld receiver is ~3–5 m under mixed forest canopy conditions [69].
We then entered field plot data into spreadsheets, checked for errors, and averaged across the five subplots per plot to determine total live BA, dead BA, deciduous BA, and BA for each of the eight conifer species. We modeled eight stand-level forest parameters as response variables (Y): total forest BA (TOTBA), black and white spruce (Picea mariana and Picea glauca; PICEA), jack pine (Pinus banksiana; PIBA), red pine (Pinus resinosa; PIRE), Eastern white pine (Pinus strobus; PIST), balsam fir (Abies balsamea; ABBA), tamarack (Larix laricina; LALA), and northern white cedar (Thuja occidentalis; THOC). All eight conifer species, to varying degrees, are relevant to wildfire risk within the KRD, including crown-fire potential and ladder fuels, especially ABBA. While LALA is a deciduous conifer (leaf-off mid-October to late May), we included it in our analyses as its crown-fire potential is relevant during leaf-on periods. We extracted values for all image-based predictor variables associated with our field sample plot locations (n = 141).

2.2. Modeling Framework

To evaluate how feature selection strategy influences prediction of total and species-specific BA, we compared four pipelines that operate on the same predictor space and the same standardized preprocessing: xPLS, GA-xPLS, RF-xPLS, and SVR-xPLS. These pipelines differ along two orthogonal dimensions: the subset search strategy and the regression engine used to score each candidate subset. Here, the ‘-xPLS’ suffix for GA-xPLS, RF-xPLS, and SVR-xPLS indicates that these methods share the same xPLS iterative exclusion framework (subset path and pooled error criterion); only the search strategy or scoring engine differs, and the final models are refit using PLS (xPLS, GA-xPLS) or RF/SVR (RF-xPLS, SVR-xPLS). The xPLS, RF-xPLS, and SVR-xPLS pipelines use a greedy backward-elimination path, removing the predictor at each step whose exclusion yields the lowest pooled out-of-sample error; GA-xPLS performs a global search over binary inclusion masks. Scoring is engine-appropriate, where we evaluate PLS and SVR candidates leave-one-out cross-validation, whereas Random Forest candidates are evaluated by out-of-bag error. For the greedy pipelines, the final model is selected at the iteration with the minimum pooled error along the path; for GA-xPLS, the selected model is the mask that minimizes the cross-validated objective. By holding the feature space and scaling fixed while varying only the search mechanism and modeling engine, this design isolates methodological effects and enables a fair comparison across the four approaches. For each response × method, we computed the Pearson correlation (r) between observed and predicted values. Two-sided p-values for r were adjusted for multiple comparisons with the Benjamini–Hochberg False Discovery Rate (BH–FDR) across 32 tests; significance is shown as * (q < 0.10) and ** (q < 0.05). Moreover, R2 = 1 − SSE/SST is reported.

2.2.1. Iterative Exclusion PLS (xPLS)

To evaluate how feature selection strategy influences prediction of total and species-specific BA, we compared four pipelines that operate on the same predictor space and the same standardized preprocessing: xPLS, GA-xPLS, RF-xPLS, and SVR-xPLS. These pipelines’ Partial Least Squares (PLS) regression is commonly applied when predictor variables are numerous and collinear [70,71], as it extracts latent components that maximize the covariance between predictors X (predictor matrix) and Y (response vector). In PLS, the matrices are decomposed as:
X S = T P + E , Y = U Q + F
where T and U are latent scores, P and Q are loadings (with P and Q denoting their transposes), and E and F are residuals. The weight vectors (W) are chosen to maximize the covariance between T and U, and the latent scores are then used for regression of Y on X. PLS can retain weak predictors, inflating model complexity without gains in accuracy. For xPLS, we used leave-one-out cross-validation (LOOCV) to select the latent component count at each exclusion step and to score candidate removals via pooled RMSE across responses. After selecting the best subset (minimum LOOCV pooled RMSE along the path), we refit the final PLS model and report in-sample (re-substitution) performance; comparative uncertainty is summarized with nonparametric bootstrap confidence intervals (CIs) computed on the full dataset. The result is a compact, interpretable linear model that emphasizes informative predictors and reduces redundancy [60].

2.2.2. GA-xPLS

To complement greedy elimination, GA-xPLS performs a global subset search with a GA [72,73,74] while keeping the PLS engine and component-selection rule identical to xPLS. Let X ∈ ℝn×p (175 predictors) and Y ∈ ℝn×M (M = 8 BA responses). Predictors are z-scored columnwise; Y remains in raw units. A chromosome z ∈ {0, 1}p defining the active subset S(z) = {j: zj = 1} and design XS. For a candidate latent dimension k, PLS yields coefficients B (S, k) and predictions:
Y ^ ( S ,   k ) = X S B S , k
Model quality is scored by the pooled leave-one-out RMSE across responses,
R M S E p o o l e d L O O ( S , k ) = 1 n M i = 1 n m = 1 M ( Y i m Y ^ i m ( i ) ( S , k ) ) 2
where Y ^ ( i ) are LOO predictions. The GA minimizes an optional sparsity-regularized objective:
I ( z ) = R M S E p o o l e d L O O ( S ( z ) , k * ( S ( z ) ) ) + λ z 0 p
With λ ≥ 0 (set small or zero in our main runs). The optimal subset is:
z * = a r g   min z 0,1 p I ( z )
Finally, we refit PLS on all rows using S ( z * ) and k * . GA-xPLS thus isolates the effect of global subset search while preserving the xPLS component policy and LOOCV scoring, enabling a fair comparison to the greedy pipelines. In summary, fitness was the pooled cross-validated RMSE computed with deterministic repeated 10-fold CV (3 repeats), with mild size penalties to discourage overly small/large sets. The number of PLS components was chosen via the same xPLS rule. After selecting the final subset, a PLS model was refit (LOOCV used for component confirmation), and coefficients were exported in raw-X units.

2.2.3. RF-xPLS

RF-xPLS follows the same greedy backward-elimination path as xPLS but replaces the scoring engine with Random Forests [47]. At iteration k with active set Sk ⊆ {1, …, p}, every candidate removal Sk ∖ {j} (here, “∖” denotes set difference, i.e., removing j from Sk) is evaluated by fitting eight regression forests (one for each BA response m = 1, …, 8 on bootstrap samples with out-of-bag (OOB) prediction enabled. For observation i, the OOB prediction for response m is the average of the trees that did not include i in their bootstrap fit:
y ^ i , m O O B ( S ) = 1 Τ ( i )   t Τ ( i ) f t m ( x i , S )
where Τ ( i ) is the set of trees for which i was out-of-bag. Candidate subsets are scored by the pooled OOB RMSE across all responses:
R M S E p o o l e d O O B ( S ) = 1 n M i = 1 n m = 1 M ( y i m y ^ i , m O O B ( S ) ) 2
The variable removed at step k is:
j k * = a r g   min j S k   R M S E p o o l e d O O B ( S k \ j ) ,           S k + 1 = S k   \   { j k * }
This path proceeds until one predictor remains; the selected RF-xPLS model is the subset S * along the path that attains the minimum R M S E p o o l e d O O B . Forest complexity is controlled by the number of trees T, the minimum leaf size L, and the per-split feature-subsampling rate. Final models are refit on all data with 500 trees (MinLeaf = 5), and we report both in-sample and OOB performance. For more information, refer to [75,76].

2.2.4. SVR-xPLS

The SVR method provides a margin-based linear model that is robust to collinearity and outliers [48,77]. Let X ∈ ℝn×p be the predictor matrix (z-scored columnwise) and, response Y ∈ ℝn×M the response matrix of basal area (BA) targets. For selected predictor subsets S 1 , , p and a given response r 1 , , M , ε -SVR with a linear kernel solves:
min w , b , ξ , ξ * 1 2   w r 2 2 + C i = 1 n ξ i + ξ i *
Subject to:
y i r ( w r x i , S + b r ) ε + ξ i ,           ( w r x i , S + b r ) y i r ε + ξ i * ,           ξ i , ξ i * 0    
where x i , S is row i of the z-scored design restricted to S, C > 0 controls the penalty on deviations outside the ε-tube, and wr, br are the linear SVR parameters. Predictions are y ^ i r = w r x i , S + b r .
SVR-xPLS uses the same iterative exclusion strategy as xPLS, but replaces the PLS engine used to score each candidate subset with SVR and evaluates performance by leave-one-out cross-validation (LOOCV) pooled across all M responses. At iteration k with active set Sk, each candidate removal Sk ∖ {j} is scored by:
R M S E p o o l e d ( S k \ j ) =   1 n M i = 1 n r = 1 M ( y i r y ^ i , r i ( S k \ j ) ) 2
where y ^ i , r i denotes the LOOCV prediction for sample iii from an SVR method fit trained without sample i. The predictor whose removal minimizes RMSEpooled is permanently discarded to form Sk + 1. This greedy backward-elimination continues until a preset cap on the number of predictors is reached, and the selected model is the iteration along this path with the lowest pooled LOOCV RMSE.

3. Results

3.1. Field Inventory and Predictor Overview

Plot-level total BA (TOTBA)—defined as the sum over all species—ranged from 4.0 to 88.0 (median 32.0, mean 33.7, SD 13.3; IQR 14.4). Seven species-level BA variables were recorded (PICEA, PIBA, PIRE, PIST, ABBA, LALA, and THOC). Across all plots, these tracked species together accounted for ~71% of the cumulative BA (Figure 3), with the remaining ~29% attributable to other species not individually enumerated. The summed BA of the tracked species was strongly correlated with TOTBA (r ≈ 0.80), indicating that the seven species capture most of the stand-level BA signal while acknowledging a substantive untracked component.
We plotted the distribution of z-scores for Sentinel-2 and Landsat-9 spectral bands (March–October; 10 m and 20 m) (Figure A1, Appendix B). Medians cluster near zero with a slight negative shift (≈−0.1 to −0.3). Interquartile ranges are modest (≈0.5–1.0 z) and broadly consistent across months and resolutions, with somewhat wider dispersion for NIR/SWIR bands (e.g., B8/B11/B12) than for the visible bands. Outliers are occasional and mostly within ±3 z (axes truncated at ±4 z), consistent with natural scene heterogeneity rather than systematic anomalies. Overall, the bands show stable central tendency and moderate variance, supporting downstream feature selection without additional trimming or rescaling.
We plotted the z-score distributions for 72 vegetation-index (VI) predictors across months and resolutions (Figure A2, Appendix B). Medians cluster near zero (≈−0.2 to +0.2) with modest IQRs (~0.6–1.0 z) for most greenness indices (NDVI/EVI/ARVI/TVI). Moisture- and stress-sensitive indices (e.g., NDII/MSI and related SWIR-based variants) show slightly broader dispersion and mild negative skew, consistent with scene-level moisture variability. Seasonal shifts are small but visible—late-season panels (Aug.–Sep.) tend to have marginally higher medians for greenness VIs than early-season panels. Whiskers typically extend to ~±2–2.5 z, and outliers are sparse and mostly within ±3 z, indicating limited influence of extreme values. Differences between 10 m and 20 m versions are subtle relative to the between-index variation. Overall, VI distributions are well-behaved and centered, providing a stable basis for downstream feature selection without additional trimming or rescaling.
Across the LiDAR-derived structure metrics, medians are centered close to zero (slight negative bias of ~−0.1 to −0.3 z for several metrics), indicating stable centering after z-scaling (Figure 4). Dispersion is moderate overall (typical IQR ≈ 0.6–0.9 z), with wider spreads for canopy-height descriptors and variability indices. In particular, height percentiles (H10PCT, H50PCT, H70PCT, H75PCT) and CHM show broader whiskers than robust summaries such as MEDMOD/MEDMAD. Variance- and heterogeneity-sensitive metrics (HVAR, HSTD, HCV) also exhibit larger tails, as do skewness/l-moment measures (HSKEW, LSKEW, LMOM2–LMOM4), which display occasional positive outliers (>+2 z) consistent with locally tall or top-heavy canopies. STRAT5 presents a few low outliers (<−2.5 z), reflecting plots with weak vertical layering. Importantly, nearly all observations lie within ±3 z, and no metrics exhibit problematic distributional behavior, supporting inclusion of the full LiDAR set in subsequent feature selection and modeling steps.

3.2. Model Selection and Cross-Validated Accuracy

To avoid conflating selection-stage predictive performance with refit diagnostics, we report two complementary performance summaries. (i) Validation-path RMSE quantifies predictive error during variable selection using cross-validation: For xPLS, pooled RMSE is computed via LOOCV for each candidate exclusion at each iteration, using the xPLS component-selection rule, and the retained subset is the one minimizing pooled LOOCV RMSE along the elimination path. For GA-xPLS, the GA fitness uses repeated K-fold CV (K = 10, repeats = 3) of pooled RMSE, and final selection-path performance is computed using the same CV-based pooled RMSE definition. (ii) Refit (in-sample) RMSE and R2 are computed after refitting the final selected model on all plots and are used for diagnostic visualization. These refit diagnostics typically show lower RMSE and higher R2 than cross-validated and OOB error because they are computed on the training data; we report them separately to avoid conflating selection-stage generalization with refit agreement.
In Figure 5, we compare head-to-head accuracy and model size across the four selection strategies. In-sample pooled RMSE (panel a) shows RF-xPLS as most accurate (3.52 m2 ha−1), with GA-xPLS (5.05) ≈ xPLS (5.108), and SVR-xPLS highest (5.71). Validation-path errors (panel b) confirm the same ranking: RF-xPLS lowest (5.74), SVR-xPLS close (6.50), while xPLS (17.18) and GA-xPLS (16.99) are much larger. Pooled R2 (panel c) mirrors this pattern (RF-xPLS = 0.86; GA-xPLS = 0.72; xPLS = 0.71; SVR-xPLS = 0.64). The dumbbell plot (Figure 5d) highlights optimism gaps between the selection-stage validation error and the final fit: smallest for SVR-xPLS (≈×1.1), moderate for RF-xPLS (≈×1.7), and largest for xPLS and GA-xPLS (≈×3.4 each). Finally, RF-xPLS attains the most parsimonious solution (27 predictors) versus GA-xPLS (38), xPLS (44), and SVR-xPLS (112) (Figure 5e). Overall, RF-xPLS provides the best accuracy–parsimony trade-off with only a modest optimism gap.

3.3. Species-Level Performance

Across species, the Taylor diagrams (Figure 6a,b) show a consistent pattern: models are generally under-dispersed relative to observations ( σ ^ /σo ≈ 0.2–0.7), and RF-xPLS occupies the most favorable region—closest to the higher-correlation arcs (r ≈ 0.6–0.8, depending on species) and nearer to the σ ^ /σo = 1 reference—followed by GA-xPLS and xPLS; SVR-xPLS typically sits leftward (lower r) and with stronger variance shrinkage. Species-wise, TOTBA and PIRE cluster in the best-skill zone (r ≳ 0.7–0.8), PIBA and PIST are moderate (r ≈ 0.6–0.7), while LALA and ABBA are harder to predict (r ≈ 0.45–0.6), and THOC shows the weakest agreement. These patterns are mirrored in the heatmaps: the R2 panel (Figure 6c) highlights RF-xPLS as the top performer for most responses (notably PIRE and TOTBA with the warmest hues), GA-xPLS is usually second, and SVR-xPLS is lowest in several species (e.g., LALA, THOC). All method–species correlations are statistically significant after BH–FDR control (asterisks in Figure 6c). Absolute errors (Figure 6d) corroborate this ranking—RF-xPLS achieves the smallest RMSE across nearly all species, GA-xPLS is competitive, xPLS sits mid-pack, and SVR-xPLS is the largest for several responses. Differences in absolute RMSE across rows (e.g., larger values for TOTBA, smaller for ABBA) reflect the inherent scale of each response rather than relative model performance. Overall, the two diagnostic views agree that RF-xPLS delivers the most accurate and best-calibrated predictions across species, with PLS variants competitive but more variance-shrunk, and SVR-xPLS comparatively weaker.

3.4. Model Skill Across Responses

We compared xPLS, GA-xPLS, RF-xPLS, and SVR-xPLS for all eight responses. R2 is computed as 1 − SSE/SST; error bars show bootstrap 95% CIs (B = 500). RMSE is reported in the natural units of each response. Figure 7 compares RMSE (with 95% bootstrap confidence intervals (CIs)) across responses. RF-xPLS achieves the lowest absolute error for most targets—particularly TOTBA, PICEA, PIBA, PIST, and THOC—indicating substantive reductions relative to the xPLS baseline, aligning with reports that RF tends to outperform linear baselines for LiDAR + multispectral BA modeling [49]. GA-xPLS is consistently competitive, matching or slightly surpassing RF-xPLS for PIRE and generally ranking second elsewhere. SVR-xPLS is more variable; it attains the best RMSE for LALA and is competitive for PIST but is clearly less effective for THOC and TOTBA. For ABBA, all methods yield comparatively small errors with largely overlapping intervals, suggesting limited separability. Overall, the RMSE pattern identifies RF-xPLS as the most reliable low-error approach, GA-xPLS as a robust alternative, and SVR-xPLS as response-dependent; importantly, these absolute-error results complement the R2 analysis by showing that models with similar explained variance can still differ meaningfully in prediction error magnitude.
In Figure 8, patterns largely echo the RMSE results but highlight where variance capture differs across methods. RF-xPLS attains the highest R2 for most targets—TOTBA, PICEA, PIBA, PIST, and LALA—typically with relatively tight intervals, indicating both strong fit and stability. GA-xPLS leads for PIRE and THOC, where its CIs sit above the alternatives, consistent with its low RMSE on those responses. SVR-xPLS is competitive for LALA (close to RF-xPLS) but underperforms for THOC and TOTBA. ABBA shows uniformly modest R2 with overlapping intervals, suggesting limited separability regardless of method. Notably, xPLS exhibits wide and partially negative lower CIs for LALA, signaling unstable fits relative to the other approaches. Overall, the R2 view confirms RF-xPLS as the most broadly effective model, with GA-xPLS excelling on PIRE and THOC; it also clarifies that methods with similar RMSE can differ in how much of the observed variance they explain.
The boxplots show that residuals for most responses are centered close to zero, indicating little systematic bias across methods, while the IQRs quantify error relative to each response’s natural variability (Figure 9). Consistent with the RMSE/R2 findings, RF-xPLS and GA-xPLS generally exhibit the tightest IQRs and fewer extreme points, especially for PICEA, PIBA, PIRE, and PIST, whereas xPLS and SVR-xPLS show wider spreads for several targets. The most challenging responses remain LALA and THOC: both display long tails and more outliers (|residual| ~≥2–3σo), with SVR-xPLS in particular showing occasional large positive departures for THOC, while GA-xPLS maintains comparatively compact, near-zero–median residuals on that species. Mild asymmetry (slightly longer negative tails) is visible for some totals/species (e.g., TOTBA, LALA), suggesting occasional under-prediction at the upper end of observed BA—an effect likely driven by a few high-BA stands rather than pervasive bias. Overall, the residual view corroborates the robustness ranking implied by RMSE and R2: RF-xPLS provides the most uniformly well-behaved errors, GA-xPLS is competitive and often best on THOC, and instability is most evident for xPLS and SVR-xPLS on the hardest responses.

3.5. Observation vs. Prediction

Given that RF-xPLS delivered the best performance across most responses in the cross-validated/bootstrapped summaries (Figure 7, Figure 8 and Figure 9), we focus the observation–prediction diagnostics on RF-xPLS (Figure 10). The refit shows strong in-sample agreement with points lying close to the 1:1 line; panel headers report r, R2 = 1 − SSE/SST, and RMSE. Correlations are high (r ≈ 0.90–0.97) and R2 spans 0.65–0.93: highest for PIRE (0.93), followed by TOTBA (0.86), THOC (0.85), PIBA (0.80), LALA (0.81), PIST (0.75), PICEA (0.68), and ABBA (0.65). RMSE reflects each response’s scale (e.g., TOTBA 4.95, THOC 5.40, PIRE 3.50, PIBA 3.06, PIST 2.84, LALA 2.60, ABBA 1.54, PICEA 4.00). Patterns suggest little systematic bias at low–moderate values, with mild under-prediction in the extreme upper tail for some species (notably THOC and LALA), consistent with limited high-BA samples and the residual distributions (Figure 9). Overall, these diagnostics visually corroborate the quantitative ranking from Figure A3, Figure A4 and Figure A5, Appendix C. We applied nonnegativity clipping (ŷ = max(0, ŷ)) only to a small number of negative predictions and had negligible impact on RMSE and R2. Notably, RF-xPLS produced no negative predictions (0% in all responses), so clipping had no effect for that method.

3.6. Selected Predictors by Recommended Model

Guided by the cross-validated and bootstrapped comparisons (Figure 7, Figure 8 and Figure 9), we adopt RF-xPLS as the recommended model for BA. The final subset contains 27 RS predictors spanning Sentinel-2 red-edge/NIR/SWIR bands, multi-season vegetation indices, one LiDAR-derived structural metric, and late-season Landsat-9 features. Below (Table 5), we list the retained signals only; detailed ecological interpretation is deferred to Section 4. All other selected data through all models are presented in Figure A6, Appendix D.

4. Discussion

Across four subset-selection pipelines operating on an identical predictor library, RF-xPLS delivered the most favorable accuracy–parsimony trade-off. RF-xPLS attained the lowest pooled errors with the fewest predictors (27), showed consistently higher or competitive R2 across responses (Figure 6, Figure 7 and Figure 8), and produced compact, near-zero-centered residuals (Figure 9). Observation–prediction panels (Figure 10) confirmed strong agreement with minimal bias over most of the response range, with only mild under-prediction at the extreme upper tail where field samples are sparse. High-basal-area conditions are less common within our field plot network: of 141 plots, 107 (75.9%) have TOTBA < 40 m2·ha−1, 27 (19.1%) are 40–60 m2·ha−1, and 7 (5.0%) exceed 60 m2·ha−1, which is common among many THOC stands across the KRD. This sampling distribution likely contributes to the mild upper-tail under-prediction; expanding targeted, stratified sampling in high-stocking stands would further strengthen model support for extreme TOTBA values.
Methodologically, nonparametric bootstrap resampling (B = 500) provides sampling-robust uncertainty for RMSE and R2 [78], while BH–FDR control guards against multi-response multiple-testing inflation; together, these indicate stability rather than noise-driven differences. Taken together, results point to a compact, transferable subset that blends SWIR moisture/chemistry, red-edge/NIR structure, seasonal greenness/phenology, one LiDAR vertical-structure metric, and late-season Landsat-9 checks (Table 5). A supplementary sensor-ablation analysis (Appendix E: Table A2) shows that Sentinel-2 satellite sensor data explains most of the predictive skill (S2-only: RMSE = 3.70, R2 = 0.84), while the full multi-sensor model performs best overall (S2 + L9 + LiDAR: RMSE = 3.48, R2 = 0.88), indicating modest but consistent gains from combining Landsat-9 and LiDAR sensor data with Sentinel-2 satellite sensor data.
Beyond the methodological comparison, it is important to consider when the ensemble of models meets the mapping objectives and where it struggles. For total basal area (TOTBA) and some crown-fire conifers (e.g., PIRE), cross-validated skill was highest, with relatively tight confidence intervals and well-behaved residuals (Figure 6, Figure 7, Figure 8 and Figure 9). These results suggest that our approach should be reliable for delineating stands with high overall stocking and dense, tall conifer canopies that are most relevant for sustained crown fire. PIBA and PIST showed intermediate performance, but still with correlations and R2 values sufficient to distinguish low-, moderate-, and high-BA conditions. In practical terms, the RF-xPLS model appears well suited for wall-to-wall mapping of TOTBA and dominant, tall conifers—core ingredients for identifying potential crown-fire “source” stands in the KRD, consistent with previous fuel-mapping work that leveraged remotely sensed stand structure and fuel layers to quantify crown-fire hazard [79].
By contrast, understory-specific ladder fuel or structurally subtle species proved more challenging. LALA and ABBA consistently occupied the lower end of the skill spectrum in cross-validated diagnostics (Figure 6), with broader residual distributions (Figure 9). Part of this difficulty likely reflects the underlying data structure rather than model failure. ABBA spans a relatively small range in our plots and does not form dominated stands; instead, it typically appears as a subcanopy component within various species mixtures within this region [12]. Thus, models are tasked with learning subtle fluctuations around low ABBA values rather than a strong gradient from absence to dominance, which naturally limits achievable R2. For LALA, high predicted values tended to coincide with stands where total BA was comparatively low, representing sparse lowland conifer conditions rather than dense upland forests. These low-stocking, often heterogeneous stands provide limited dynamic range in both TOTBA and LALA BA, constraining model contrast even when the species spectral signal is present. Comparable patterns—where proportional or species-level BA maps remain most accurate for dominant species and show more modest R2 for minor components—have also been reported in regional species mapping studies that combine structural and spectral remote-sensing covariates [80].
These patterns also highlight broader challenges in mapping ladder fuel species that reside partly or primarily in the understory. Species such as PICEA, ABBA, and THOC can occur as shade-tolerant subcanopy or mid-story conifers with low branching architecture that effectively act as ladder fuels. From a remote-sensing perspective, both the spectral- and LiDAR-based predictors used here are biased toward the upper canopy: Sentinel-2 and Landsat-9 sensor bands integrate reflectance from sunlit crowns, while our single LiDAR-derived metric (HQUAD) emphasizes the contribution of taller returns. In vertically complex stands, vigorous overstory pines or spruces can obscure the signal from subcanopy ABBA or THOC. The residual analyses (Figure 9) and mild under-prediction at the upper tail for several species (Figure 10) are consistent with this structural occlusion: when ladder fuel species accumulate appreciable BA beneath an already tall canopy, their incremental effect on top-of-canopy reflectance and quadratic mean height may be muted. As a result, species-specific BA maps for ABBA, PICEA, and THOC should be interpreted as relative intensity surfaces rather than precise inventories—particularly where the species occupies a subcanopy niche. Similar limitations, in which airborne LiDAR sensor data captures tall overstory structure accurately but underestimates density in shorter trees and understory layers, have been widely documented [81].
Despite these limitations, the ensemble still shows meaningful efficacy for the intended fuel application. First, the strong performance for TOTBA and tall crown-fire conifers (notably PIRE and, to a lesser degree, PIBA and PIST) supports using the RF-xPLS outputs to delineate stands with high crown-fire potential. Second, even where species-level BA skill is more modest (e.g., ABBA, LALA), the maps capture broad spatial patterns and relative hotspots that can be combined with structural metrics (e.g., HQUAD, canopy height thresholds) to define operational ladder fuel classes. For example, areas with moderate to high predicted ABBA overlaid on tall-canopy, high-TOTBA stands are plausible candidates for elevated ladder fuel hazard, even if the absolute BA estimates carry more uncertainty. Conversely, LALA-dominated areas that coincide with low TOTBA and lower canopy heights are likely to be sparse lowland conifer types with a different fire behavior profile—more relevant for surface and transition fire than for sustained crown fire. In this sense, the ensemble of models is “fit for purpose” as a decision-support tool: it provides robust gradients in key fuel variables while clearly signaling where additional field sampling or more specialized sensors (e.g., higher density LiDAR postings or understory-focused metrics) would be needed for finer-grained ladder fuel quantification. This emphasis on canopy fuels and ladder fuel structure parallels the variables highlighted in crown-fire behavior models—especially canopy bulk density and canopy base height—and in recent work deriving such canopy fuel attributes from Sentinel-2A sensor data and related remote-sensing products for crown-fire hazard assessment [82].
The retained features are mechanistically coherent for BA, which integrates stem density and size, and is often strongly correlated with canopy characteristics such as leaf area index and crown closure [83]. The SWIR bands and NDII-type indices (here, NDII11) are sensitive to canopy water and dry-matter chemistry (cellulose–lignin) as shown by modern leaf-to-canopy spectroscopy work and recent reviews [84,85], with NDII now widely treated alongside NDMI/NDWI as a canopy water-content proxy [86,87,88]. Seasonal phenology timing likely contributes to why March and August SWIR-based predictors were retained. With respect to March phenology, leaf-off conditions for deciduous forest components increase visibility of and sensitivity to green understory coniferous components, which have distinct NIR and SWIR reflectance characteristics compared to non-photosynthetic materials (standing or forest floor). In August, forest stands are near peak leaf area and may experience stronger moisture limitation. Thus, August SWIR-based predictors can add contrast among low-, moderate-, and high-stocking conditions, supporting discrimination toward the upper end of structural gradients. Red-edge/NIR features capture chlorophyll concentration and internal leaf/canopy structure—signals repeatedly shown to improve biomass/BA retrievals, particularly where 705–740 nm sensitivity is available [89,90]. Greenness indices (SAVI, EVI-family, TVI, NDVI) summarize photosynthetic capacity while mitigating soil/illumination effects [91,92]. Multi-season sampling (Mar./May/Jun./Sep./Oct.) embeds phenological amplitude/timing that often track stand vigor and composition.
A relevant comparison is [93], who estimated multiple foliar traits (including equivalent water thickness and leaf mass per area) in spruce–fir stands using Sentinel-2 satellite sensor data and site variables. Their best-performing models emphasized Sentinel-2 red-edge information and site controls (e.g., depth-to-water-table), and they report that SWIR reflectance was not consistently related to canopy water or dry-matter trait proxies in that context. In contrast, our response variables are structural (total and species-specific basal area), and the SWIR predictors (e.g., SWIR bands and SWIR-based indices) can contribute indirectly by capturing canopy closure, shadow/background exposure, and moisture-related stand conditions at the pixel scale—especially when interpreted jointly with LiDAR-derived structure. This difference in target variable (foliar chemistry/traits vs. stand structure) provides a plausible explanation for why SWIR predictors can appear more influential for basal-area mapping (see [12,19,42]) than for leaf-level trait estimation.
A single LiDAR height-distribution summary (HQUAD) complements spectra by injecting direct vertical structure information, echoing a broad body of work on LiDAR’s value for characterizing forest structural attributes and its complementarity with optical RS data [94,95]. Ecologically, HQUAD emphasizes taller canopy elements because the quadratic weighting gives disproportionate influence to higher returns, making it sensitive to overstory dominance and canopy stratification. As a result, in stands dominated by tall, relatively uniform overstory pines such as Pinus resinosa, HQUAD tends to track canopy stature and structural dominance closely, whereas in mixed stands where Abies balsamea is often present as a subcanopy or midstory component, HQUAD is driven primarily by the overstory and may capture fir-related variation more indirectly. This difference helps explain why a single height profile metric can be highly informative for stand-level BA patterns while still having varying strength across species depending on their typical canopy position. It is important to note that the prominence of the LiDAR quadratic mean height (HQUAD) is strongly supported by prior work. A recent synthesis identifies HQUAD among the most effective single LiDAR predictors for biomass/structure across forest types [96]. Foundational work likewise reported tight links between LiDAR-derived height summaries and stand structure/biomass, with small-footprint LiDAR sensor data capturing crown-scale variability that correlates with BA and aboveground biomass [97]. Some authors [98] explicitly highlight quadratic mean height as a strong explanatory variable, explaining how the squaring step accentuates the contribution of tall stems in high-BA stands. Empirical mapping studies that fuse LiDAR with multispectral time series further show height metrics—including HQUAD—consistently rank among top predictors and improve biomass/BA estimation and upscaling [99,100]. These lines of evidence explain why, from 175 candidates, our model retained a single LiDAR-derived feature—HQUAD—to inject robust vertical structure information, while the remaining spectral predictors provide complementary chemistry–moisture and phenology signals.
Late-season Landsat-9 sensor features (B1 coastal/blue, B4 red, September SVR) provide cross-sensor checks on VIS–SWIR contrast and aerosol/illumination sensitivity when sun–sensor geometry is challenging [101]. This multi-mechanism blend is consistent with the broader RS literature emphasizing the complementarity of LiDAR-based structure with multispectral seasonality for stand attributes (e.g., [28,32]. In our case, a single LiDAR-based predictor plus carefully chosen multispectral features were sufficient, suggesting that for BA—unlike fine-scale fuel strata—broad canopy stature and seasonal spectral dynamics carry much of the predictive signal, a finding that aligns with the principle of structural saturation in optical data being resolved by LiDAR [102].
Although our models were developed in Minnesota’s hemiboreal forests, the RF-xPLS workflow is designed to be portable because it selects a compact set of predictors that represent broadly relevant controls on stand structure (SWIR moisture/dry-matter sensitivity, red-edge/NIR canopy structure, and LiDAR-derived height distribution). While our models demonstrate high stability and performance across the ~2908 km2 KRD, their direct transferability to other conifer-dominated forests will require regional recalibration. We recommend performing external validation using a modest but representative set of plot samples to account for local variations in species composition, disturbance history, and phenological timing. We expect comparable performance in other temperate–boreal regions where BA covaries with canopy closure, height distributions, and seasonal spectral dynamics; however, accuracy may shift with differences in species composition (deciduous–conifer mixing vs. conifer dominance), disturbance history, site moisture gradients, and phenology timing. For transfer to a new region, we recommend (i) harmonizing predictor definitions across available sensors (Sentinel-2 and/or Landsat-8/9), (ii) aligning seasonal image windows to local phenology rather than fixed calendar months, and (iii) performing regional recalibration and external validation using a modest but representative set of plot samples, with emphasis on underrepresented structural extremes. Where airborne LiDAR sensor data are unavailable, one may substitute analogous canopy height/structure layers (e.g., spaceborne LiDAR or regional canopy height products) with an expected trade-off in vertical-detail sensitivity.
Our comparative ranking mirrors prior findings that Random Forest frequently outperforms linear baselines for forest structural variables derived from LiDAR + multispectral inputs, including species-level BA (e.g., [49]), while PLS remains competitive and interpretable [18,70]. Metaheuristic selection (e.g., GA-xPLS) is increasingly reported to improve parsimony without sacrificing accuracy in high-dimensional, collinear settings [55], a pattern we observe with GA-xPLS close behind RF-xPLS for several species. The superiority of SWIR-based and red-edge-aware subsets is also consistent with studies highlighting water/chemistry sensitivity of SWIR and chlorophyll/structure sensitivity of red-edge for biomass/BA and canopy structure [31,36]. Our results extend this evidence by showing that a compact 27-variable set spanning these processes yields strong, stable performance across eight forest BA response variables in a hemiboreal mixed wood forest.

5. Conclusions

Across four selection frameworks applied to the same inputs, RF-xPLS emerged as the recommended approach for modeling total and species-level BA in a hemiboreal forest mosaic. RF-xPLS achieved the strongest overall skill with a parsimonious 27-variable subset (84.6% reduction from the initial 175-predictor set) and exhibited well-behaved residuals and tight bootstrap confidence intervals. The predictors that consistently mattered align with forest biophysics: SWIR bands/NDII11 for canopy water and dry-matter chemistry, red-edge/NIR features and greenness indices for chlorophyll and crown density, a single LiDAR metric (HQUAD) to inject vertical structure, and cross-sensor late-season Landsat-9 features that provide an independent end-of-growing-season snapshot of visible–SWIR reflectance, helping the model remain consistent when canopy greenness declines and moisture-related SWIR responses strengthen. Practically, this subset and model offer a clear path to wall-to-wall BA mapping wherever similar RS inputs can be accessed, while highlighting where additional field plots—especially in very high-BA stands—could further reduce tail bias and strengthen external validation. Future work may (i) generate spatial BA surfaces using the selected RF-xPLS model, (ii) test transferability across years and neighboring districts, and (iii) examine how species composition and disturbance history modulate the identified spectral–structural signals. In practical terms, these maps can help managers locate dense fuel stands (crown and ladder), decide where fuel reduction or prescribed burning would be most useful, and evaluate how stand structure changes following management or disturbance. They also provide a clear picture of tree species distributions and stocking relevant to forest productivity, diversity, and related ecosystem services. Hence, we suspect such technologies will enhance fire risk planning and sustainable forest management within the Kawishiwi Ranger District of the Superior National Forest and similar forests elsewhere.

Author Contributions

Conceptualization, N.S. and P.W.; methodology, N.S. and P.W.; software, N.S.; validation, N.S. and P.W.; formal analysis, N.S.; investigation, N.S. and P.W.; resources, P.W. and D.A.I.; data curation, N.S. and P.W.; writing—original draft preparation, N.S.; writing—review and editing, P.W., B.R.S., and D.A.I.; visualization, N.S.; supervision, P.W.; project administration, P.W., B.R.S., and D.A.I.; funding acquisition, P.W., B.R.S., and D.A.I. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the U.S. Department of Agriculture, Forest Service, under Grant No. 24-JV-11111137-108, and Grant No. GR-029426-00001. The findings and conclusions in this publication are those of the authors and should not be construed to represent any official USDA or U.S. Government determination or policy.

Data Availability Statement

WorldView commercial satellite data for licensed users are available at https://earthexplorer.usgs.gov/ (accessed on 28 December 2025). Landsat data are freely available from the U.S. Geological Survey Earth Explorer at https://earthexplorer.usgs.gov/ (accessed on 28 December 2025). Sentinel-2 data are freely available from the Registry of Open Data on AWS at https://registry.opendata.aws/sentinel-2/ (accessed on 5 May 2025). Lidar data are freely available from the U.S. Geological Survey (USGS) 3D Elevation Program at https://apps.nationalmap.gov/downloader/ (accessed on 28 December 2025). Requests for the codes and scripts used in this study may be directed to the first author, Nasrin Salehnia (salehnia@iastate.edu), or the corresponding author, Peter Wolter (ptwolter@iastate.edu).

Acknowledgments

This work is supported by the U.S. Department of Agriculture, Forest Service. The findings and conclusions in this publication are those of the authors and should not be construed to represent any official USDA or U.S. Government determination or policy.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1. Hyperparameter settings and implementation details for the three-hybrid subset-selection pipelines used in this study.
Table A1. Hyperparameter settings and implementation details for the three-hybrid subset-selection pipelines used in this study.
(A) RF-xPLS: Iterative Exclusion with Random Forests, Selecting Removals by Minimum Pooled OOB RMSE
ItemSetting usedImplementation detail
RF algorithmRandom Forest regressionMATLAB TreeBagger (Statistics and ML Toolbox)
Number of treesNumTrees = 500Fixed for all responses
Minimum leaf sizeMinLeaf = 5Fixed for all responses
Predictors per split (mtry)mtry = round(sqrt(p_init)), capped as p shrinksNumPredictK0 = round(sqrt(size(Xmat,2))); mtry = min(size(Xok,2), NumPredictK0)
OOB predictionOn‘OOBPrediction’, ‘on’
Selection criterionMinimum pooled OOB RMSE across responsesAt each step, drop variable yielding lowest pooled OOB RMSE
Permutation importanceOOB permuted importance (diagnostic)oobPermutedPredictorImportance(tb) (try/catch for compatibility)
(B) GA-xPLS: Subset Search Under a Hard Size Cap, Using Repeated K-Fold CV Fitness
GA representationBinary mask (0/1 predictors)PopulationType = ‘doubleVector’ with IntCon = 1:nVars, rounded to 0/1
Population size350‘PopulationSize’, 350
Max generations800‘MaxGenerations’, 800
Stall generation limitInf (disabled)‘StallGenLimit’, inf
Selection functionTournament selection‘SelectionFcn’, {@selectiontournament, 6}
Tournament size6tournSize = 6
Crossover functionTwo-point crossover‘CrossoverFcn’, @crossovertwopoint
Crossover fraction0.7‘CrossoverFraction’, 0.70
Mutation functionUniform mutation‘MutationFcn’, {@mutationuniform, 0.25}
Mutation rate0.25mutationRate = 0.25
Size penalty weightλ = 0.10penSize = λ × (s/p)
Band-violation penalty0.5bandPenalty = 0.5
Fitness CV schemeRepeated K-fold CV: K = 10, repeats = 3useKFoldFitness = true; cvK = 10; cvRepeats = 3
(C) SVR-xPLS: Iterative Exclusion with SVR Using LOOCV, Fixed CCC, and Response-Specific ε
SVR algorithmSupport Vector RegressionMATLAB fitrsvm (Statistics and ML Toolbox)
Kernellinear or RBF (user choice)KernelFunction = ‘linear’ or ‘rbf’
BoxConstraint (C)C = 1.0globalC = 1.0 used for all responses
Epsilon (ε)ε = 0.1 × std(y) (response-specific)respEpsilon = std(y_current) × 0.1 per response model
CV scheme during exclusionLOOCVInner loop over observations: for cvIdx = 1:numObs
Selection criterionMinimum pooled RMSE across responsespooled RMSE = sqrt(mean(rmseVec.^2))

Appendix B

The x-axis labels correspond to the band predictors included for each month (Sentinel-2: B2–B4, B8 at 10 m; B5–B7, B8A, B11–B12 at 20 m; Landsat-9: B1–B7 in Sep. where present). Because values are z-scored, these boxplots compare distribution shape/spread and outliers across bands and months (not absolute reflectance magnitudes). See Section 2.1.1 (Predictor variables) for full definitions and derivation of the spectral bands/indices shown here.
Figure A1. Distributions of standardized spectral reflectance band predictors by month, including Mar., May, Jun., Aug., Sep., Oct. Each panel shows boxplots of z-scored band reflectance across the 141 field plots for the spectral-band predictors used in the candidate library, including Sentinel-2 MSI 10 m bands (B2–B4, B8) and the 20 m suite (B5–B7, B8A, B11–B12; plus, any 20 m resampled versions of B2–B4 where included), and Landsat-9 OLI bands where available (B1–B7; September acquisition). Z-scores were computed per predictor across plots (unitless; mean 0, SD 1), so the y-axis indicates relative high/low reflectance compared with the plot mean for that band and month. Boxes span the interquartile range with medians; whiskers extend to 1.5× IQR; points denote outliers.
Figure A1. Distributions of standardized spectral reflectance band predictors by month, including Mar., May, Jun., Aug., Sep., Oct. Each panel shows boxplots of z-scored band reflectance across the 141 field plots for the spectral-band predictors used in the candidate library, including Sentinel-2 MSI 10 m bands (B2–B4, B8) and the 20 m suite (B5–B7, B8A, B11–B12; plus, any 20 m resampled versions of B2–B4 where included), and Landsat-9 OLI bands where available (B1–B7; September acquisition). Z-scores were computed per predictor across plots (unitless; mean 0, SD 1), so the y-axis indicates relative high/low reflectance compared with the plot mean for that band and month. Boxes span the interquartile range with medians; whiskers extend to 1.5× IQR; points denote outliers.
Remotesensing 18 00852 g0a1
All predictors in this figure were standardized across plots (z-score; z = (xμ)/σ), so values are unitless and directly comparable across indices. Indices shown include ARVI, EVI7, EVI8, MSR, SAVI, NDII11, TVI, NDVI (with “10 m or 20 m” indicating native spatial resolution), plus MSI (moisture stress index), SVR (short wave infrared-to-visible ratio), and AI (Normalized Difference Autumn Index) where available. Variables prefixed with “L9_” are computed from Landsat-9 (Sep. panel), while the remaining month-tagged predictors are from Sentinel-2. See Section 2.1.1 (Predictor variables) for full definitions and derivation of the spectral bands/indices shown here.
Figure A2. Boxplots of z-scored vegetation indices and related spectral metrics derived primarily from Sentinel-2 (10–20 m) and, where noted, Landsat-9. Panels show monthly distributions for Mar., May, Jun., Aug., Sep., Oct. across the 141 plots. Boxes indicate the interquartile range (IQR) with median; whiskers extend to 1.5× IQR; points denote outliers.
Figure A2. Boxplots of z-scored vegetation indices and related spectral metrics derived primarily from Sentinel-2 (10–20 m) and, where noted, Landsat-9. Panels show monthly distributions for Mar., May, Jun., Aug., Sep., Oct. across the 141 plots. Boxes indicate the interquartile range (IQR) with median; whiskers extend to 1.5× IQR; points denote outliers.
Remotesensing 18 00852 g0a2

Appendix C

Figure A3. Observed vs. predicted BA for eight responses using the xPLS model. Points are samples; the dashed line is 1:1. Panel headers report r, R2 = 1 − SSE/SST, and RMSE; predictions are nonnegativity–clipped.
Figure A3. Observed vs. predicted BA for eight responses using the xPLS model. Points are samples; the dashed line is 1:1. Panel headers report r, R2 = 1 − SSE/SST, and RMSE; predictions are nonnegativity–clipped.
Remotesensing 18 00852 g0a3
Figure A4. Observed vs. predicted BA for eight responses using the GA-xPLS model. Points are samples; the dashed line is 1:1. Panel headers report r, R2 = 1 − SSE/SST, and RMSE; predictions are nonnegativity–clipped.
Figure A4. Observed vs. predicted BA for eight responses using the GA-xPLS model. Points are samples; the dashed line is 1:1. Panel headers report r, R2 = 1 − SSE/SST, and RMSE; predictions are nonnegativity–clipped.
Remotesensing 18 00852 g0a4
Figure A5. Observed vs. predicted BA for eight responses using the SVR-xPLS model. Points are samples; the dashed line is 1:1. Panel headers report r, R2 = 1 − SSE/SST, and RMSE; predictions are nonnegativity–clipped.
Figure A5. Observed vs. predicted BA for eight responses using the SVR-xPLS model. Points are samples; the dashed line is 1:1. Panel headers report r, R2 = 1 − SSE/SST, and RMSE; predictions are nonnegativity–clipped.
Remotesensing 18 00852 g0a5

Appendix D

Predictor labels follow the naming convention in Section 2.1.1 (Predictor variables). “B#” denotes a multispectral band (Sentinel-2 unless prefixed by “L9”), month tags (Mar./May/Jun./Aug./Sep./Oct.) indicate seasonal image timing, and “10 m/20 m” indicates spatial resolution after preprocessing/resampling. Derived spectral indices (e.g., NDVI, EVI, SAVI, TVI, NDII11, MSI, AI, SVR) and LiDAR height distribution metrics (e.g., HQUAD and other canopy height summaries) are included as separate predictors.
Figure A6. All selected predictors through each method. Binary selection map of predictors retained by each subset selection pipeline (xPLS, GA-xPLS, RF-xPLS, and SVR-xPLS). Rows list candidate predictors and columns indicate methods; blue cells (value = 1) denote predictors included in the final model, and white cells (value = 0) denote predictors not selected.
Figure A6. All selected predictors through each method. Binary selection map of predictors retained by each subset selection pipeline (xPLS, GA-xPLS, RF-xPLS, and SVR-xPLS). Rows list candidate predictors and columns indicate methods; blue cells (value = 1) denote predictors included in the final model, and white cells (value = 0) denote predictors not selected.
Remotesensing 18 00852 g0a6

Appendix E

Table A2 summarizes a sensor ablation analysis of the RF-xPLS workflow, where we refit the Random Forest model (RF-xPLS) using only predictors from specific sensor sources to quantify each sensor’s contribution to BA estimation. Scenario indicates the predictor set used (Sentinel-2 only, Landsat-9 only, LiDAR only, or their combinations). nPred is the total number of predictors included in that scenario, while nS2, nL9, and nLiDAR are the counts of predictors contributed by Sentinel-2, Landsat-9, and LiDAR, respectively (so nPred = nS2 + nL9 + nLiDAR). Performance is reported as RMSE_pooled_train, the pooled root-mean-square error computed across all eight responses using refit (train–fit) predictions, and R2_pooled_train, the corresponding pooled coefficient of determination computed as R2 = 1 − SSE/SST using pooled sums over all responses. Predictions were clipped to nonnegative values before computing metrics to enforce physically meaningful basal area estimates.
Table A2. Sensor ablation results for the RF-xPLS framework, reporting pooled train–fit RMSE and pooled R2 = 1 − SSE/SST) for models fit with Sentinel-2 only, Landsat-9 only, LiDAR only, and their combinations, using the 27-predictor library.
Table A2. Sensor ablation results for the RF-xPLS framework, reporting pooled train–fit RMSE and pooled R2 = 1 − SSE/SST) for models fit with Sentinel-2 only, Landsat-9 only, LiDAR only, and their combinations, using the 27-predictor library.
ScenarionPrednS2nL9nLiDARRMSE_Pooled_TrainR2_Pooled_Train
S2_only2323003.700.84
L9_only30306.130.59
LiDAR_only10017.270.42
Optical_S2 + L92623303.550.86
S2 + LiDAR2423013.560.86
L9 + LiDAR40315.590.66
All_S2 + L9 + LiDAR2723313.480.88

References

  1. Heinselman, M.L. Fire in the virgin forests of the Boundary Waters Canoe Area, Minnesota. Quat. Res. 1973, 3, 329–382. [Google Scholar] [CrossRef]
  2. Ohmann, L.F.; Grigal, D.F. Early revegetation and nutrient dynamics following the 1971 Little Sioux Forest fire in northeastern Minnesota. For. Sci. 1979, 25, a0001–z0001. [Google Scholar] [CrossRef]
  3. Frelich, L.E.; Reich, P.B. Spatial patterns and succession in a Minnesota southern-boreal forest. Ecol. Monogr. 1995, 65, 325–346. [Google Scholar] [CrossRef]
  4. Ali, A.A.; Asselin, H.; Larouche, A.C.; Bergeron, Y.; Carcaillet, C.; Richard, P.J. Changes in fire regime explain the Holocene rise and fall of Abies balsamea in the coniferous forests of western Québec, Canada. Holocene 2008, 18, 693–703. [Google Scholar] [CrossRef]
  5. Scheller, R.M.; Mladenoff, D.J.; Crow, T.R.; Sickley, T.A. Simulating the effects of fire reintroduction versus continued fire absence on forest composition and landscape structure in the Boundary Waters Canoe Area, northern Minnesota, USA. Ecosystems 2005, 8, 396–411. [Google Scholar] [CrossRef]
  6. Brown, J.K.; Smith, J.K. Wildland Fire in Ecosystems: Effects of Fire on Flora; General Technical Report RMRS-GTR-42-vol. 2; US Department of Agriculture, Forest Service, Rocky Mountain Research Station: Ogden, UT, USA, 2000; 257p. [Google Scholar]
  7. de Lafontaine, G.; Payette, S. Long-term fire and forest history of subalpine balsam fir (Abies balsamea) and white spruce (Picea glauca) stands in eastern Canada inferred from soil charcoal analysis. Holocene 2011, 22, 191–201. [Google Scholar] [CrossRef]
  8. Dickinson, M.B.; Johnson, E.A.; Artiaga, R. Fire spread probabilities for experimental beds composed of mixedwood boreal forest fuels. Can. J. For. Res. 2013, 43, 321–330. [Google Scholar] [CrossRef]
  9. Kreider, M.R.; Higuera, P.E.; Parks, S.A.; Rice, W.L.; White, N.; Larson, A.J. Fire suppression makes wildfires more severe and accentuates impacts of climate change and fuel accumulation. Nat. Commun. 2024, 15, 2412. [Google Scholar] [CrossRef] [PubMed]
  10. Engelstad, P.S.; Falkowski, M.; Wolter, P.; Poznanovic, A.; Johnson, P. Estimating Canopy Fuel Attributes from Low-Density LiDAR. Fire 2019, 2, 38. [Google Scholar] [CrossRef]
  11. Szpakowski, D.M.; Jensen, J.L.R. A review of the applications of remote sensing in fire ecology. Remote Sens. 2019, 11, 2638. [Google Scholar] [CrossRef]
  12. Wolter, P.T.; Olbrich, J.J.; Johnson, P.J. Modeling sub-boreal forest canopy bulk density in Minnesota, USA, using synthetic aperture radar and optical satellite sensor data. Fire Ecol. 2021, 17, 26. [Google Scholar] [CrossRef]
  13. Wear, D.N.; Wibbenmeyer, M.; Joiner, E. Enhancing the economic feasibility of fuel treatments: Market and policy pathways for US Federal Lands. For. Policy Econ. 2024, 169, 103365. [Google Scholar] [CrossRef]
  14. Pierson, D.; Andika, R.; Brewen, J.; Clark, N.; Hardy, M.C.; McCollum, D.; McCormick, F.H.; Morisette, J.; Nicosia, T.; Page-Dumroese, D.; et al. Beyond the Basics: A Perspective on Barriers and Opportunities for Scaling Up Biochar Production from Forest Slash. Biochar 2024, 6, 1. [Google Scholar] [CrossRef]
  15. Puettmann, M.; Sahoo, K.; Wilson, K.; O’Neil, E. Life cycle assessment of biochar produced from forest residues using portable systems. J. Clean. Prod. 2020, 250, 119564. [Google Scholar] [CrossRef]
  16. Zhao, T.; Li, G.; Zhang, S.; Yang, H.; Ni, W.; Shao, A. Enhancement effect of biochar on the immobilization of heavy metals and anions in MSWI fly ash/bottom ash–coal fly ash-based cementitious materials: Risk assessment model and mechanisms. J. Hazard. Mater. 2025, 497, 139615. [Google Scholar] [CrossRef]
  17. Trapero, J.R.; Alcazar-Ruiz, A.; Dorado, F.; Sanchez-Silva, L. Biochar price forecasting: A novel methodology for enhancing market stability and economic viability. J. Environ. Manag. 2025, 377, 124681. [Google Scholar] [CrossRef]
  18. Wolter, P.T.; Townsend, P.A.; Sturtevant, B.R. Estimation of forest structural parameters using 5 and 10 meter SPOT-5 satellite data. Remote Sens. Environ. 2009, 113, 2019–2036. [Google Scholar] [CrossRef]
  19. Wolter, P.T.; Townsend, P.A. Multi-sensor data fusion for estimating forest species composition and abundance in northern Minnesota. Remote Sens. Environ. 2011, 115, 671–691. [Google Scholar] [CrossRef]
  20. Hovind, H.J.; Rieck, C.E. Basal Area and Point-Sampling: Interpretation and Application (No. 23); Department of Natural Resources: Madison, WI, USA, 1961. [Google Scholar]
  21. O’Brien, R. Comprehensive Inventory of Utah’s Forest Resources, 1993; US Department of Agriculture, Forest Service, Rocky Mountain Research Station: Ogden, UT, USA, 1999. [Google Scholar] [CrossRef]
  22. Vanclay, J.K.; Sands, P.J. Calibrating the self-thinning frontier. For. Ecol. Manag. 2009, 259, 81–85. [Google Scholar] [CrossRef]
  23. Bhattarai, R.; Rahimzadeh-Bajgiran, P.; Weiskittel, A.; Meneghini, A.; MacLean, D.A. Spruce budworm tree host species distribution and abundance mapping using multi-temporal Sentinel-1 and Sentinel-2 satellite imagery. ISPRS J. Photogramm. Remote Sens. 2021, 172, 28–40. [Google Scholar] [CrossRef]
  24. Balderas Torres, A.; Lovett, J.C. Using basal area to estimate aboveground carbon stocks in forests: La Primavera Biosphere’s Reserve, Mexico. Forestry 2013, 86, 267–281. [Google Scholar] [CrossRef]
  25. Fischer, S.M.; Wang, X.; Huth, A. Distinguishing mature and immature trees allows estimating forest carbon uptake from stand structure. Biogeosciences 2024, 21, 3305–3319. [Google Scholar] [CrossRef]
  26. Pretzsch, H. Forest Dynamics, Growth and Yield; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
  27. Yanai, R.D.; Young, A.; Campbell, J.L.; Westfall, J.A.; Barnett, C.J.; Dillon, G.A.; Green, M.B.; Woodall, C.W. Measurement uncertainty in a national forest inventory: Results from the northern region of the USA. Can. J. For. Res. 2023, 53, 163–177. [Google Scholar] [CrossRef]
  28. Duncanson, L.; Kellner, J.R.; Armston, J.; Dubayah, R.; Minor, D.M.; Hancock, S.; Healey, S.P.; Patterson, P.L.; Saarela, S.; Marselis, S.; et al. Aboveground biomass density models for NASA’s Global Ecosystem Dynamics Investigation (GEDI) lidar mission. Remote Sens. Environ. 2022, 270, 112845. [Google Scholar] [CrossRef]
  29. Lahssini, K.; Teste, F.; Dayal, K.R.; Durrieu, S.; Ienco, D.; Monnet, J.M. Combining LiDAR Metrics and Sentinel-2 Imagery to Estimate Basal Area and Wood Volume in Complex Forest Environment via Neural Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4337–4348. [Google Scholar] [CrossRef]
  30. Wulder, M.A.; White, J.C.; Nelson, R.F.; Næsset, E.; Ørka, H.O.; Coops, N.C.; Hilker, T.; Bater, C.W.; Gobakken, T. Lidar sampling for large-area forest characterization: A review. Remote Sens. Environ. 2012, 121, 196–209. [Google Scholar] [CrossRef]
  31. Féret, J.-B.; Asner, G.P. Mapping tropical forest canopy diversity using high-fidelity imaging spectroscopy. Ecol. Appl. 2014, 24, 1289–1296. [Google Scholar] [CrossRef] [PubMed]
  32. White, J.C.; Coops, N.C.; Wulder, M.A.; Vastaranta, M.; Hilker, T.; Tompalski, P. Remote Sensing Technologies for Enhancing Forest Inventories: A Review. Can. J. Remote Sens. 2016, 42, 619–641. [Google Scholar] [CrossRef]
  33. Allouis, T.; Durrieu, S.; Vega, C.; Couteron, P. Stem volume and above-ground biomass estimation of individual pine trees from lidar data: Contribution of full-waveform signals. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 924–934. [Google Scholar] [CrossRef]
  34. Luo, S.; Wang, C.; Xi, X.; Pan, F.; Peng, D.; Zou, J.; Nie, S.; Qin, H. Fusion of airborne LiDAR data and hyperspectral imagery for aboveground and belowground forest biomass estimation. Ecol. Indic. 2017, 73, 378–387. [Google Scholar] [CrossRef]
  35. Zhang, W.; Wan, P.; Wang, T.; Cai, S.; Chen, Y.; Jin, X.; Yan, G. A Novel Approach for the Detection of Standing Tree Stems from Plot-Level Terrestrial Laser Scanning Data. Remote Sens. 2019, 11, 211. [Google Scholar] [CrossRef]
  36. Berra, E.F.; Gaulton, R. Remote sensing of temperate and boreal forest phenology: A review of progress, challenges and opportunities in the intercomparison of in-situ and satellite phenological metrics. For. Ecol. Manag. 2021, 1480, 118663. [Google Scholar] [CrossRef]
  37. Gong, Z.; Ge, W.; Guo, J.; Liu, J. Satellite remote sensing of vegetation phenology: Progress, challenges, and opportunities. ISPRS J. Photogramm. Remote Sens. 2024, 217, 149–164. [Google Scholar] [CrossRef]
  38. Wolter, P.T.; Mladenoff, D.J.; Host, G.; Crow, T.R. Improved Forest classification in the northern Lake States using multi-temporal Landsat imagery. Photogramm. Eng. Remote Sens. 1995, 61, 1129–1143. [Google Scholar]
  39. Brown, S.; Narine, L.L.; Gilbert, J. Using airborne lidar, multispectral imagery, and field inventory data to estimate basal area, volume, and aboveground biomass in heterogeneous mixed species forests: A case study in southern Alabama. Remote Sens. 2022, 14, 2708. [Google Scholar] [CrossRef]
  40. Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar] [CrossRef]
  41. Maltamo, M.; Næsset, E.; Vauhkonen, J. Forestry Applications of Airborne Laser Scanning: Concepts and Case Studies; Springer Netherlands: Dordrecht, The Netherlands, 2014. [Google Scholar] [CrossRef]
  42. Wolter, P.; Townsend, P.A.; Sturtevant, B.R.; Kingdon, C.C. Remote sensing of the distribution and abundance of host species for spruce budworm in Northern Minnesota and Ontario. Remote Sens. Environ. 2008, 112, 3971–3982. [Google Scholar] [CrossRef]
  43. Chong, I.G.; Jun, C.H. Performance of some variable selection methods when multicollinearity is present. Chemom. Intell. Lab. Syst. 2005, 78, 103–112. [Google Scholar] [CrossRef]
  44. Spiegelman, C.H.; McShane, M.J.; Goetz, M.J.; Motamedi, M.; Yue, Q.L.; Coté, G.L. Theoretical justification of wavelength selection in PLS calibration: Development of a new algorithm. Anal. Chem. 1998, 70, 35–44. [Google Scholar] [CrossRef]
  45. Heinze, G.; Wallisch, C.; Dunkler, D. Variable selection—A review and recommendations for the practicing statistician. Biom. J. 2018, 60, 431–449. [Google Scholar] [CrossRef]
  46. Moser, P.; Vibrans, A.C.; McRoberts, R.E.; Næsset, E.; Gobakken, T.; Chirici, G.; Mura, M.; Marchetti, M. Methods for variable selection in LiDAR-assisted Forest inventories. Forestry 2017, 90, 112–124. [Google Scholar] [CrossRef]
  47. Breiman, L. Random Forests. Mach. Learn. 2004, 5, 5–32. [Google Scholar] [CrossRef]
  48. Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
  49. Bhattarai, R.; Rahimzadeh-Bajgiran, P.; Weiskittel, A.; Homayouni, S.; Gara, T.W.; Hanavan, R.P. Estimating species-specific leaf area index and basal area using optical and SAR remote sensing data in Acadian mixed spruce-fir forests, USA. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102727. [Google Scholar] [CrossRef]
  50. Casas-Gómez, P.; Torres, J.F.; Linares, J.C.; Troncoso, A.; Martínez-Álvarez, F. Forecasting basal area increment in forest ecosystems using deep learning: A multi-species analysis in the Himalayas. Ecol. Inform. 2025, 85, 102951. [Google Scholar] [CrossRef]
  51. McKearnan, S.B.; Vock, D.M.; Marai, G.E.; Canahuate, G.; Fuller, C.D.; Wolfson, J. Feature selection for support vector regression using a genetic algorithm. Biostatistics 2023, 24, 295–308. [Google Scholar] [CrossRef]
  52. Yang, P.; Li, C.; Qiu, Y.; Huang, S.; Zhou, J. Metaheuristic Optimization of Random Forest for Predicting Punch Shear Strength of FRP-Reinforced Concrete Beams. Materials 2023, 16, 4034. [Google Scholar] [CrossRef] [PubMed]
  53. Feng, G. Feature selection algorithm based on optimized genetic algorithm and the application in high-dimensional data processing. PLoS ONE 2024, 19, e0303088. [Google Scholar] [CrossRef] [PubMed]
  54. Leardi, R.; González, A.L. Genetic algorithms applied to feature selection in PLS regression: How and when to use them. Chemom. Intell. Lab. Syst. 1998, 41, 195–207. [Google Scholar] [CrossRef]
  55. Mehmood, T.; Liland, K.H.; Snipen, L.; Sæbø, S. A review of variable selection methods in partial least squares regression. Chemom. Intell. Lab. Syst. 2012, 118, 62–69. [Google Scholar] [CrossRef]
  56. Yun, Y.H.; Bin, J.; Liu, D.L.; Xu, L.; Yan, T.-L.; Cao, D.-S.; Xu, Q.-S. A hybrid variable selection strategy based on continuous shrinkage of variable space in multivariate calibration. Anal. Chim. Acta 2019, 1058, 58–69. [Google Scholar] [CrossRef] [PubMed]
  57. Yoo, D.G.; Kim, J.H. Meta-heuristic algorithms as tools for hydrological science. Geosci. Lett. 2014, 1, 4. [Google Scholar] [CrossRef]
  58. Adnan, R.M.; Liang, Z.; Trajkovic, S.; Zounemat-Kermani, M.; Li, B.; Kisi, O. Daily streamflow prediction using optimally pruned extreme learning machine. J. Hydrol. 2021, 599, 126354. [Google Scholar] [CrossRef]
  59. Salehnia, N.; Ahn, J. Modelling and reconstructing tree ring growth index with climate variables through artificial intelligence and statistical methods. Ecol. Indic. 2022, 134, 108496. [Google Scholar] [CrossRef]
  60. Wolter, P.T.; Berkley, E.A.; Peckham, S.D.; Singh, A.; Townsend, P.A. Exploiting tree shadows on snow for estimating forest basal area using Landsat data. Remote Sens. Environ. 2012, 121, 69–79. [Google Scholar] [CrossRef][Green Version]
  61. Van Wagner, C.E. Conditions for the start and spread of crown fire. Can. J. For. Res. 1977, 7, 23–34. [Google Scholar] [CrossRef]
  62. Scott, J.H.; Reinhardt, E.D. Assessing Crown Fire Potential by Linking Models of Surface and Crown Fire Behavior (No. 29); US Department of Agriculture, Forest Service, Rocky Mountain Research Station: Fort Collins, CO, USA, 2001. [Google Scholar] [CrossRef]
  63. Brandt, J.P. The extent of the North American boreal zone. Environ. Rev. 2009, 17, 101–161. [Google Scholar] [CrossRef]
  64. Baker, W.L. Landscape ecology and nature reserve design in the Boundary Waters Canoe Area, Minnesota. Ecology 1989, 70, 23–35. [Google Scholar] [CrossRef]
  65. Wolter, P.T.; White, M.A. Recent forest cover type transitions and landscape structural changes in northeast Minnesota. Landsc. Ecol. 2002, 17, 133–155. [Google Scholar] [CrossRef]
  66. Pastor, J.; Sharp, A.; Wolter, P. An application of Markov models to the dynamics of Minnesota’s forests. Can. J. For. Res. 2005, 35, 3011–3019. [Google Scholar] [CrossRef]
  67. Friedman, S.K.; Reich, P.B. Regional legacies of logging: Departure from presettlement forest conditions in northern Minnesota. Ecol. Appl. 2005, 15, 726–744. [Google Scholar] [CrossRef]
  68. Grosenbaugh, L.R. Plotless timber estimates—New, fast, easy. J. For. 1952, 50, 322–337. [Google Scholar] [CrossRef]
  69. Joyce, M.; Moen, R. Accuracy of a Modular GPS/GLONASS Receiver; Report Number: NRRI/TR-2018/28, Release 1.0; University of Minnesota Duluth: Duluth, MN, USA, 2018; Available online: https://hdl.handle.net/11299/204331 (accessed on 20 December 2025).
  70. Wold, S.; Sjostrom, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
  71. Tran, T.N.; Afanador, N.L.; Buydens, L.M.; Blanchet, L. Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC). Chemom. Intell. Lab. Syst. 2014, 138, 153–160. [Google Scholar] [CrossRef]
  72. Hasegawa, K.; Miyashita, Y.; Funatsu, K. GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists. J. Chem. Inf. Comput. Sci. 1997, 37, 306–310. [Google Scholar] [CrossRef] [PubMed]
  73. Leardi, R. Application of genetic algorithm–PLS for feature selection in spectral data sets. J. Chemom. 2000, 14, 643–655. [Google Scholar] [CrossRef]
  74. Zareef, M.; Arslan, M.; Hassan Md, M.; Ahmad, W.; Chen, Q. Comparison of Si-GA-PLS and Si-CARS-PLS build algorithms for quantitation of total polyphenols in black tea using the spectral analytical system. J. Sci. Food Agric. 2023, 103, 7914–7920. [Google Scholar] [CrossRef]
  75. Altmann, A.; Tolosi, L.; Sander, O.; Lengauer, T. Permutation importance: A corrected feature importance measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef] [PubMed]
  76. Genuer, R.; Poggi, J.M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett. 2010, 31, 2225–2236. [Google Scholar] [CrossRef]
  77. Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
  78. Bland, J.M.; Altman, D.G. Statistics Notes: Bootstrap resampling methods. BMJ 2015, 350, h2622. [Google Scholar] [CrossRef]
  79. Keane, R.E.; Burgan, R.; van Wagtendonk, J. Mapping wildland fuels for fire management across multiple scales: Integrating remote sensing, GIS, and biophysical modeling. IJWF 2001, 10, 301–319. [Google Scholar] [CrossRef]
  80. Bell, D.M.; Gregory, M.J.; Churchill, D.J.; Smith, A.C. Mapping with height and spectral remote sensing implies that environment and forest structure jointly constrain tree community composition in temperate coniferous forests of eastern Washington, United States. Front. For. Glob. Change 2022, 5, 962816. [Google Scholar] [CrossRef]
  81. Richardson, J.J.; Moskal, L.M. Strengths and limitations of assessing forest density and spatial configuration with aerial LiDAR. Remote Sens. Environ. 2011, 115, 2640–2651. [Google Scholar] [CrossRef]
  82. Arellano-Pérez, S.; Castedo-Dorado, F.; López-Sánchez, C.A.; González-Ferreiro, E.; Yang, Z.; Díaz-Varela, R.A.; Álvarez-González, J.G.; Vega, J.A.; Ruiz-González, A.D. Potential of Sentinel-2A data to model surface and canopy fuel characteristics in relation to crown fire hazard. Remote Sens. 2018, 10, 1645. [Google Scholar] [CrossRef]
  83. Buckley, D.S.; Isebrands, J.G.; Sharik, T.L. Practical field methods of estimating canopy cover, PAR, and LAI in Michigan Oak and pine stands. North. J. Appl. For. 1999, 16, 25–32. [Google Scholar] [CrossRef]
  84. Féret, J.B.; Gitelson, A.A.; Noble, S.D.; Jacquemoud, S. PROSPECT-D: Towards modeling leaf optical properties through a complete lifecycle. Remote Sens. Environ. 2017, 193, 204–215. [Google Scholar] [CrossRef]
  85. Cavender-Bares, J.; Gamon, J.A.; Townsend, P.A. The Use of Remote Sensing to Enhance Biodiversity Monitoring and Detection: A Critical Challenge for the Twenty-First Century. In Remote Sensing of Plant Biodiversity; Cavender-Bares, J., Gamon, J.A., Townsend, P.A., Eds.; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
  86. Zhou, H.; Zhou, G.; Song, X.; He, Q. Dynamic Characteristics of Canopy and Vegetation Water Content during an Entire Maize Growing Season in Relation to Spectral-Based Indices. Remote Sens. 2022, 14, 584. [Google Scholar] [CrossRef]
  87. Eliades, F.; Sarris, D.; Bachofer, F.; Michaelides, S.; Hadjimitsis, D. Understanding Tree Mortality Patterns: A Comprehensive Review of Remote Sensing and Meteorological Ground-Based Studies. Forests 2024, 15, 1357. [Google Scholar] [CrossRef]
  88. Ma, H.; Weiss, M.; Malik, D.; Berthelot, B.; Yebra, M.; Nolan, R.H.; Mialon, A.; Zeng, J.; Quan, X.; Tagesson, H.T.; et al. Satellite canopy water content from Sentinel-2, Landsat-8 and MODIS: Principle, algorithm and assessment. Remote Sens. Environ. 2025, 326, 114801. [Google Scholar] [CrossRef]
  89. Belgiu, M.; Dragut, L. Random Forest in Remote Sensing: A Review of Applications and Future Directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  90. Vidican, R.; Mălinaș, A.; Ranta, O.; Moldovan, C.; Marian, O.; Ghețe, A.; Ghișe, C.R.; Popovici, F.; Cătunescu, G.M. Using Remote Sensing Vegetation Indices for the Discrimination and Monitoring of Agricultural Crops: A Critical Review. Agronomy 2023, 13, 3040. [Google Scholar] [CrossRef]
  91. Liu, J.; Pattey, E.; Jego, G. Assessment of vegetation indices for regional crop green LAI estimation from Landsat images over multiple growing seasons. Remote Sens. Environ. 2012, 123, 347–358. [Google Scholar] [CrossRef]
  92. Vélez, S.; Martínez-Peña, R.; Castrillo, D. Beyond Vegetation: A Review Unveiling Additional Insights into Agriculture and Forestry through the Application of Vegetation Indices. J 2023, 6, 421–436. [Google Scholar] [CrossRef]
  93. Bhattarai, R.; Rahimzadeh-Bajgiran, P.; Mech, A. Estimating nutritive, non-nutritive and defense foliar traits in spruce-fir stands using remote sensing and site data. For. Ecol. Manag. 2023, 549, 121461. [Google Scholar] [CrossRef]
  94. Hyde, P.; Dubayah, R.; Walker, W.; Blair, J.B.; Hofton, M.; Hunsaker, C. Mapping Forest structure for wildlife habitat analysis using multi-sensor (LiDAR, SAR/InSAR, ETM+, Quickbird) synergy. Remote Sens. Environ. 2006, 102, 63–73. [Google Scholar] [CrossRef]
  95. Asner, G.P.; Mascaro, J.; Muller-Landau, H.C.; Vieilledent, G.; Vaudry, R.; Rasamoelina, M.; Hall, J.S.; van Breugel, M. A universal airborne LiDAR approach for tropical forest carbon mapping. Oecologia 2012, 168, 1147–1160. [Google Scholar] [CrossRef]
  96. Borsah, A.A.; Nazeer, M.; Wong, M.S. LIDAR-Based Forest biomass remote sensing: A review of metrics, methods, and assessment criteria for the selection of allometric equations. Forests 2023, 14, 2095. [Google Scholar] [CrossRef]
  97. Lefsky, M.A.; Cohen, W.B.; Parker, G.G.; Harding, D.J. Lidar remote sensing for ecosystem studies. BioScience 2002, 52, 19–30. [Google Scholar] [CrossRef]
  98. Chen, Q. LiDAR remote sensing of vegetation biomass. In Remote Sensing of Natural Resources, 1st ed.; Routledge: London, UK, 2013. [Google Scholar]
  99. Lu, D.; Chen, Q.; Wang, G.; Moran, E.; Batistella, M.; Zhang, M.; Vaglio Laurin, G.; Saah, D. Aboveground Forest biomass estimation with Landsat and LiDAR data and uncertainty analysis of the estimates. Int. J. For. Res. 2012, 2012, 436537. [Google Scholar] [CrossRef]
  100. Li, L.; Guo, Q.; Tao, S.; Kelly, M.; Xu, G. LiDAR with multi-temporal MODIS provides a means to upscale predictions of forest biomass. ISPRS J. Photogramm. Remote Sens. 2015, 102, 198–208. [Google Scholar] [CrossRef]
  101. Choate, M.J.; Rengarajan, R.; Storey, J.C.; Lubke, M. Landsat 9 Geometric Commissioning Calibration Updates and System Performance Assessment. Remote Sens. 2023, 15, 3524. [Google Scholar] [CrossRef]
  102. Lu, D.; Chen, Q.; Wang, G.; Liu, L.; Li, G.; Moran, E. A survey of remote sensing-based aboveground biomass estimation methods in forest ecosystems. Int. J. Digit. Earth. 2014, 9, 63–105. [Google Scholar] [CrossRef]
Figure 1. Flowchart illustrating the methodology used to model basal area (BA). The study utilized remote sensing data (Sentinel-2, Landsat-9, Vegetation Indices (VIs), and LiDAR) and field data to estimate BA for seven different species and total BA using four different modeling approaches.
Figure 1. Flowchart illustrating the methodology used to model basal area (BA). The study utilized remote sensing data (Sentinel-2, Landsat-9, Vegetation Indices (VIs), and LiDAR) and field data to estimate BA for seven different species and total BA using four different modeling approaches.
Remotesensing 18 00852 g001
Figure 2. Single ground plot composed of five variable radius subplots separated by 10 m along two perpendicular axes. The area surrounding each subplot center varies in radius in accordance with tree bole diameters [68].
Figure 2. Single ground plot composed of five variable radius subplots separated by 10 m along two perpendicular axes. The area surrounding each subplot center varies in radius in accordance with tree bole diameters [68].
Remotesensing 18 00852 g002
Figure 3. The percent of each species according to TOTBA.
Figure 3. The percent of each species according to TOTBA.
Remotesensing 18 00852 g003
Figure 4. Boxplots of z-scored LiDAR canopy-structure predictors. Boxes show the interquartile range with median; whiskers extend to 1.5× IQR; points denote outliers. See Table 4 for variable definitions.
Figure 4. Boxplots of z-scored LiDAR canopy-structure predictors. Boxes show the interquartile range with median; whiskers extend to 1.5× IQR; points denote outliers. See Table 4 for variable definitions.
Remotesensing 18 00852 g004
Figure 5. Comparative performance of xPLS, GA-xPLS, RF-xPLS, and SVR-xPLS: (a) in-sample pooled RMSE (m2 ha−1); (b) validation-path RMSE; (c) in-sample pooled R2 (R2 = 1 − SSE/SST); (d) dumbbell plot showing reduction from validation to in-sample (×factors); (e) number of selected predictors. RF-xPLS achieves the lowest error with the fewest variables. Validation-path RMSE denotes the cross-validated pooled RMSE used during variable selection (LOOCV for xPLS; repeated K-fold CV fitness for GA-xPLS), whereas in-sample metrics are refit diagnostics on the full dataset.
Figure 5. Comparative performance of xPLS, GA-xPLS, RF-xPLS, and SVR-xPLS: (a) in-sample pooled RMSE (m2 ha−1); (b) validation-path RMSE; (c) in-sample pooled R2 (R2 = 1 − SSE/SST); (d) dumbbell plot showing reduction from validation to in-sample (×factors); (e) number of selected predictors. RF-xPLS achieves the lowest error with the fewest variables. Validation-path RMSE denotes the cross-validated pooled RMSE used during variable selection (LOOCV for xPLS; repeated K-fold CV fitness for GA-xPLS), whereas in-sample metrics are refit diagnostics on the full dataset.
Remotesensing 18 00852 g005
Figure 6. Cross-validated skill by species and method. (a,b) Taylor diagrams (Pearson r vs. normalized SD; reference arc σ ^ /σo = 1) for different species; symbols: ☆ xPLS, □ GA-xPLS, △ RF-xPLS, + SVR-xPLS. (c) R2 (=1 − SSE/SST) heatmap with BH–FDR significance (** q < 0.05). (d) RMSE heatmap (m2 ha−1).
Figure 6. Cross-validated skill by species and method. (a,b) Taylor diagrams (Pearson r vs. normalized SD; reference arc σ ^ /σo = 1) for different species; symbols: ☆ xPLS, □ GA-xPLS, △ RF-xPLS, + SVR-xPLS. (c) R2 (=1 − SSE/SST) heatmap with BH–FDR significance (** q < 0.05). (d) RMSE heatmap (m2 ha−1).
Remotesensing 18 00852 g006
Figure 7. RMSE by response and method—bootstrap mean ±95% CI. Bars show the bootstrap mean RMSE for each response and method; error bars are the 2.5–97.5 percentiles from a nonparametric bootstrap of observation–prediction pairs (B = 500).
Figure 7. RMSE by response and method—bootstrap mean ±95% CI. Bars show the bootstrap mean RMSE for each response and method; error bars are the 2.5–97.5 percentiles from a nonparametric bootstrap of observation–prediction pairs (B = 500).
Remotesensing 18 00852 g007
Figure 8. Coefficient of determination (R2 = 1 − SSE/SST) by response and method—bootstrap mean ±95% CI. Bars report the R2 between observed and predicted values; error bars are 95% bootstrap intervals computed as in Figure 7 (B = 500).
Figure 8. Coefficient of determination (R2 = 1 − SSE/SST) by response and method—bootstrap mean ±95% CI. Bars report the R2 between observed and predicted values; error bars are 95% bootstrap intervals computed as in Figure 7 (B = 500).
Remotesensing 18 00852 g008
Figure 9. Residual distributions normalized by σo. Side-by-side boxplots show residuals (Predicted−Observed) divided by the response’s observed standard deviation σo, grouped by response and colored by method. Boxes denote the interquartile range with median lines; whiskers extend to 1.5× IQR, and open circles mark more extreme points. Values clustered around zero indicate minimal bias; shorter boxes/whiskers indicate tighter error dispersion.
Figure 9. Residual distributions normalized by σo. Side-by-side boxplots show residuals (Predicted−Observed) divided by the response’s observed standard deviation σo, grouped by response and colored by method. Boxes denote the interquartile range with median lines; whiskers extend to 1.5× IQR, and open circles mark more extreme points. Values clustered around zero indicate minimal bias; shorter boxes/whiskers indicate tighter error dispersion.
Remotesensing 18 00852 g009
Figure 10. Observed vs. predicted BA for eight responses using the RF-xPLS model. Points are samples; the dashed line is 1:1. Panel headers report r, R2 = 1 − SSE/SST, and RMSE; All plotted values are expressed in units of square meters per hectare (m2/ha).
Figure 10. Observed vs. predicted BA for eight responses using the RF-xPLS model. Points are samples; the dashed line is 1:1. Panel headers report r, R2 = 1 − SSE/SST, and RMSE; All plotted values are expressed in units of square meters per hectare (m2/ha).
Remotesensing 18 00852 g010
Table 1. Summary of predictor variables used in the study. Sentinel-2 bands include seasonal reflectance values at 10 m and 20 m resolution. Vegetation indices (ARVI, EVI7, EVI8, MSR, SAVI, NDII11, TVI, NDVI, SVR, AI, MSI, GVAR); LiDAR predictors from canopy metrics. Landsat-9 resampled to 10 m to match Sentinel-2 imagery (nearest neighbor).
Table 1. Summary of predictor variables used in the study. Sentinel-2 bands include seasonal reflectance values at 10 m and 20 m resolution. Vegetation indices (ARVI, EVI7, EVI8, MSR, SAVI, NDII11, TVI, NDVI, SVR, AI, MSI, GVAR); LiDAR predictors from canopy metrics. Landsat-9 resampled to 10 m to match Sentinel-2 imagery (nearest neighbor).
CategoryVariable
Sentinel-2 bands (10 and 20 m, multi-seasonal)B2, B3, B4, B8 (Mar., Jun., Aug., Oct. [10 m])
B2, B3, B4, B5, B6, B7, B8A, B11, B12 (Mar., Jun., Aug., Sep., Oct. [20 m])
Sentinel-2 derived vegetation indicesARVI, EVI7, EVI8, MSR, SAVI, NDII11, TVI (Mar., May, Sep., Oct., Jun., Aug. [20 m])
NDVI (Mar., May, Jun., Oct., Aug. [10 m and 20 m])
Landsat-9 (30-m)B1, B2, B3, B4, B5, B6, B7, NDVI, NDAI, MSI, SVR (Sep.)
LiDAR-derived predictors (10 m spatial resolution via canopy metrics)CHM, STRAT5_M, MEDMODE, MEDMAD, LSKEW, LMOM2, LMOM3, LMOM4, HVAR, HSTD, HSKEW, HQUAD, HCV, H10PCT, H50PCT, H60PCT, H70PCT, H75PCT
LiDAR predictor abbreviations and definitions are provided in Table 4.
Table 2. Spectral vegetation indices (VIs) used as predictor variables for modeling forest basal area (BA) using Sentinel-2 satellite sensor data.
Table 2. Spectral vegetation indices (VIs) used as predictor variables for modeling forest basal area (BA) using Sentinel-2 satellite sensor data.
VIsEquation
ARVI (Atmospherically Resistant Vegetation Index)(B8A − 2B4 + B2)/(B8A + 2B4 + B2)
EVI7 (Enhanced Vegetation Index7)2.5 × (B7 − B4)/(1 + B7 + 6B4 − 7.5B2)
EVI8 (Enhanced Vegetation Index8)2.5 × (B8A − B4)/(1 + B8A + 6B4 − 7.5B2)
MSR (Modified Simple Ratio)((B7/B4) − 1)/sqrt ((B7/B4) + 1)
SAVI (Soil Adjusted Vegetation Index)1.5 × (B8A − B4)/(B8A + B4 + 0.5)
NDII11 (Normalized Difference Infrared Index11)(B8A − B11)/(B8A + B11)
TVI (Triangular Vegetation Index)0.5 × {(120 × (B6 − B3)) − (200 × (B4 − B3))}
NDVI (Normalized Difference Vegetation Index)(B8A − B4)/(B8A + B4)
Table 3. Spectral vegetation indices (VIs) used as predictor variables for modeling forest basal area (BA) using Landsat 9 satellite sensor data. Landsat-9 bands (B2 blue, B3 green, B4 red, B5 NIR, B6 SWIR1, B7 SWIR2). Landsat-9 was resampled (nearest).
Table 3. Spectral vegetation indices (VIs) used as predictor variables for modeling forest basal area (BA) using Landsat 9 satellite sensor data. Landsat-9 bands (B2 blue, B3 green, B4 red, B5 NIR, B6 SWIR1, B7 SWIR2). Landsat-9 was resampled (nearest).
VIsEquation
NDVI (Normalized Difference Vegetation Index)(B5 − B4)/(B5 + B4)
MSI (Moisture Stress Index)(B6 − B5)/(B6 + B5)
NDAI (Normalized Difference Autumn Index)(B4 − B2)/(B4 + B2)
SVR (Short Wave Infrared to Visible Ratio)(B6 + B7)/(B2 + B3 + B4)
Prior to modeling, NDVI, MSI, and NDAI were linearly rescaled as (VI + 1) × 100, and SVR was multiplied by 1.5, to harmonize predictor ranges; these monotonic transformations do not change observation rankings or Pearson correlations. ‘SVR’ in predictor names denotes the Shortwave Infrared-to-Visible Ratio (spectral index), whereas ‘SVR-xPLS’ refers to Support Vector Regression in the modeling pipeline.
Table 4. Lidar-derived predictors.
Table 4. Lidar-derived predictors.
PredictorsMetric Description
CHMCanopy Height Model
STRAT5_MMean height of vegetation > 5 m and ≤10 m
MEDMODMedian absolute deviation from mode height
MEDMADMedian absolute deviation from median height
LSKEWL-moment skewness
LMOM4Fourth L-moment (kurtosis)
LMOM3Third L-moment (skewness)
LMOM2Second L-moment (scale/dispersion)
HVARVariance of heights
HSTDStandard deviation of all return heights
HSKEWKurtosis of heights
HQUADQuadratic mean height
HCVCoefficient of variation of heights
H75PCTAverage height 75th percentile
H70PCTAverage height 70th percentile
H60PCTAverage height 60th percentile
H50PCTAverage height 50th percentile (median)
H10PCTAverage height 10th percentile
Table 5. RF-xPLS’s 27 selected predictors grouped into six interpretable classes.
Table 5. RF-xPLS’s 27 selected predictors grouped into six interpretable classes.
CategoryPredictorsPhysical Signal
SWIR moisture and dry-matter chemistry (S2)B11_Mar_20m, B12_Mar_20m, B11_Aug_20m, B12_Sep_20m, May_NDII11Liquid water content; cellulose/lignin; canopy/wood dryness (SWIR absorption; NDII11 water sensitivity).
Canopy density and red-edge/NIR structure (S2)B6_Mar_20m, B8_May_10m, B5_May_20m, B8A_May_20m, B7_Oct_20mLeaf/crown density; internal leaf structure; red-edge sensitivity to chlorophyll and canopy structure.
Greenness indices (soil/illumination-robust) (S2)Mar_EVI7, May_SAVI, Sep_EVI8, OCT_TVI, AUG_TVI, NDVI_Jun_20m, NDVI_Oct_10mPhotosynthetic activity/pigment state; SAVI/TVI mitigate soil/illumination; NDVI seasonal amplitude.
SWIR-to-Visible Ratio (SVR; L9-only)SVR_Mar_20m, SVR_Oct_20m, AUG_SVR_20m, Sep_SVR_20mBroad VIS↔SWIR contrast: higher values track drier canopy/greater dry-matter (SWIR) relative to VIS brightness.
Phenology and VIS band proxies (incl. cross-sensor checks)B3_Jun_20m, B3_AUG_10m, B2_Oct_10m (S2);
B1_L9_Sep (coastal/aerosol), B4_L9_Sep (red)
Seasonal pigment dynamics (green-up/peak/senescence) and late-season VIS stability/contrast via cross-sensor bands
Vertical structure (LiDAR)HQUADHeight-distribution curvature (vertical layering)
Notes: L9 = Landsat 9; S2 = Sentinel2; “SVR” in predictor names denotes Short Wave Infrared-to-Visible Ratio, not Support Vector Regression.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Salehnia, N.; Wolter, P.; Sturtevant, B.R.; Abbas Iossifov, D. Advancing Forest Inventory and Fuel Monitoring with Multi-Sensor Hybrid Models: A Comparative Framework for Basal Area Estimation. Remote Sens. 2026, 18, 852. https://doi.org/10.3390/rs18060852

AMA Style

Salehnia N, Wolter P, Sturtevant BR, Abbas Iossifov D. Advancing Forest Inventory and Fuel Monitoring with Multi-Sensor Hybrid Models: A Comparative Framework for Basal Area Estimation. Remote Sensing. 2026; 18(6):852. https://doi.org/10.3390/rs18060852

Chicago/Turabian Style

Salehnia, Nasrin, Peter Wolter, Brian R. Sturtevant, and Dalia Abbas Iossifov. 2026. "Advancing Forest Inventory and Fuel Monitoring with Multi-Sensor Hybrid Models: A Comparative Framework for Basal Area Estimation" Remote Sensing 18, no. 6: 852. https://doi.org/10.3390/rs18060852

APA Style

Salehnia, N., Wolter, P., Sturtevant, B. R., & Abbas Iossifov, D. (2026). Advancing Forest Inventory and Fuel Monitoring with Multi-Sensor Hybrid Models: A Comparative Framework for Basal Area Estimation. Remote Sensing, 18(6), 852. https://doi.org/10.3390/rs18060852

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop