1. Introduction
Serving as both an ecological corridor and economic hub, the Yellow River Basin contributes significantly to China’s sustainable development goals [
1]. It holds rich mineral resources and is widely recognized as China’s “Energy Basin” [
2]. However, large-scale mining operations have caused surface subsidence, ground fissures, and declines in groundwater levels. These disturbances have led to widespread degradation of trees, shrubs, and grasses [
3,
4]. In the Henan section of the basin, these intensive extraction activities have further engendered severe environmental consequences, including the loss of vegetation ecological functions and exacerbated soil erosion [
5,
6], which hinder ecological protection and high-quality development in the Yellow River Basin [
7].
Accurate tree species classification is a fundamental task for vegetation monitoring, biodiversity assessment, and evaluation of ecological restoration outcomes [
8,
9]. It helps assess ecosystem stability and restoration success, especially in human-disturbed environments like mining regions [
10,
11]. However, in post-mining landscapes, vegetation is often sparse, fragmented, and heavily influenced by variable ground conditions and human activities. These factors make species-level identification using remote sensing data exceptionally difficult [
12,
13].
Ground surveys provide accurate tree species information at the plot level and remain a standard method for vegetation inventory. However, this approach is difficult to apply in mining areas due to rough terrain, high safety risks, and the near impossibility of achieving complete spatial coverage [
14]. Remote sensing offers a broader view and has been widely used for ecological monitoring in mining areas over the past 50 years [
15]. Researchers have applied indices such as NDVI, FVC, and the Ecological Environment Quality Index to assess restoration status at regional scales [
16,
17,
18]. Most existing studies use medium-resolution satellite images. These images have mixed-pixel problems, and vegetation indices like NDVI tend to saturate in complex mining environments [
19].
Recent advances in unmanned aerial vehicle (UAV) platforms have addressed many of these limitations by enabling rapid acquisition of high-spatial-resolution imagery and three-dimensional data over rugged and inaccessible terrain [
20,
21]. LiDAR sensors capture canopy height and three-dimensional structure, which helps separate individual trees from the underlying shrub and grass layers [
22,
23,
24,
25]. Hyperspectral sensors record continuous reflectance spectra and support biochemical discrimination among species [
26,
27,
28]. High-resolution RGB imagery further provides fine-scale texture and morphological details [
29,
30]. The combination of these sensors produces complementary information layers. This multi-source data framework significantly improves the accuracy and reliability of tree species classification in complex, multi-layered vegetation environments [
31,
32,
33]. Recent advancements in UAV-based multi-source data fusion have significantly improved the accuracy of tree species classification in various forest ecosystems, demonstrating great potential for fine-scale vegetation mapping [
34,
35,
36].
However, accurate tree species classification in mining areas remains challenging due to heterogeneous vegetation structure, spectral similarity among species, and limited field accessibility. Recent studies have applied UAV-based remote sensing to address these issues, yet most efforts remain constrained to single-sensor or dual-sensor configurations. Luo et al. [
37] used UAV RGB imagery alone and developed an improved Faster R-CNN model for individual tree detection in a coal mine afforestation area. Although detection accuracy was high, the method lacked spectral information necessary for species-level discrimination. Deng et al. [
38] employed UAV hyperspectral imagery and proposed a 3DCNN model with attention mechanisms for tree species classification in a mining restoration site. While spectral resolution was sufficient, structural data such as canopy height and three-dimensional form were not captured.
To overcome the limitations of single-sensor approaches, several studies have explored dual-source fusion. Zhong et al. [
39] integrated UAV LiDAR and RGB data using an improved YOLOv8 model and achieved higher tree species identification accuracy in complex mixed forests than either data source alone. He et al. [
19] fused UAV RGB and LiDAR data and proposed a multi-scale hierarchical classification method for fine-scale vegetation mapping in an open-pit phosphate mining area. However, both studies lacked hyperspectral data, which provides continuous spectral signatures essential for biochemical discrimination among closely related species.
In non-mining environments, multi-sensor fusion has shown greater potential. By integrating UAV-based LiDAR, hyperspectral, and ultrahigh-resolution RGB data, Qin et al. [
34] successfully classified tree species in subtropical broadleaf forests with high accuracy. Yet their method relied on a single random forest classifier and was developed in a non-mining environment; its transferability to heterogeneous, ecologically disturbed mining restoration sites remains uncertain. Meanwhile, Gominski et al. [
40] proposed an automated method to mine species labels from public inventory data, but their work focused on scalable labeling rather than multi-sensor fusion in disturbed mining landscapes.
Despite these advances, the combined use of all three sensor types has not been systematically investigated in mining areas, which differ fundamentally from natural forests [
41,
42]. As a result, methods developed for natural forests cannot be directly transferred to mining areas, where several unique challenges converge. (1) Canopy structure is often heterogeneous and irregular due to tree growth on constrained or reconstructed soils. (2) The species pool is deliberately limited yet ecologically strategic, resulting in complex mosaics of trees, shrubs, and grasses. (3) Mixed-age stands emerge from phased planting over time. (4) Spectral and structural confusion frequently occurs both among different life forms—such as young trees versus tall shrubs-and among species that exhibit similar stress responses to residual mining impacts. This widespread pattern of tree–shrub–grass intergrowth, as observed in the mining areas along the Henan section of the Yellow River Basin, adds substantial complexity to individual tree detection and feature extraction [
43,
44]. These conditions call for analytical approaches specifically designed for heterogeneous, human-disturbed mining landscapes [
45].
To address this research gap, this study centers on the Yushan coal mining area in the Henan section of the Yellow River Basin, a representative ecological restoration site, and develops a UAV-based multi-source remote sensing framework tailored for individual tree species classification in complex, multi-layered mining environments. The specific objectives are to: (1) extract 278 features per individual tree from synchronously acquired LiDAR, hyperspectral, and RGB data; (2) systematically evaluate the classification performance of seven machine learning algorithms including Random Forest, Support Vector Machine, K-Nearest Neighbors, Decision Tree, Gradient Boosting, Logistic Regression, and XGBoost, using 1095 ground-truth tree samples; (3) identify the optimal classifier under the dual constraints of statistical significance and model stability via 5 × 5 repeated cross-validation combined with the Friedman test and Nemenyi post hoc analysis; and (4) analyze the key discriminative features, elucidate the complementary roles of hyperspectral and structural attributes, and apply the optimized model to generate a high-resolution species distribution map at the individual tree level across the entire mining area to establish a practical, empirically evaluated technical pipeline for effective tree monitoring.
4. Results
4.1. Comparative Performance of Machine Learning Models
4.1.1. Overall Model Performance
The comprehensive evaluation of seven machine learning classifiers revealed distinct performance patterns (
Table 5). XGBoost emerged as the superior algorithm, achieving the highest test accuracy (OA = 0.897) and Kappa coefficient (κ = 0.811), closely followed by Gradient Boosting (OA = 0.891, κ = 0.796). Ensemble methods dominated the performance ranking, with boosting algorithms occupying the top two positions. Random Forest secured third place (OA = 0.824), while Logistic Regression and K-Nearest Neighbors both achieved identical test accuracies (OA = 0.812). Decision Tree (OA = 0.794) and Support Vector Machine (OA = 0.715) demonstrated progressively lower performance.
4.1.2. Statistical Significance Testing Based on 5 × 5 Cross-Validation
1. Friedman Test for Overall Performance Differences
A 5 × 5 cross-validation scheme combined with the non-parametric Friedman test was employed to systematically assess whether the observed performance differences among the seven classifiers were statistically significant. This approach accounts for both model variability and data sampling effects, thereby enhancing the stability of the statistical inference.
Figure 3 presents the average ranks of the seven classifiers across the 25 cross-validation folds.
The Friedman test yielded a chi-square statistic of 138.4187 with a p-value < 0.000001, providing strong evidence to reject the null hypothesis. This result confirms that the seven algorithms exhibit statistically significant differences in classification performance.
2. Model Ranking and Cross-Validation Performance
The average ranks and performance metrics derived from the 5 × 5 cross-validation procedure are presented in
Table 6, where a lower average rank indicates superior performance.
The cross-validation results exhibit strong consistency with the independent test set performance. XGBoost maintained the highest ranking across both evaluation approaches, with a cross-validation mean accuracy of 0.8877 closely matching its test set accuracy of 0.8970. This alignment underscores the stable generalization capability of the models.
3. Post hoc Nemenyi Test for Pairwise Comparisons
The critical difference (CD) was calculated as 1.644. Pairwise rank differences exceeding this threshold were deemed statistically significant.
The critical difference (CD = 1.644) is shown in
Figure 4; classifiers connected by a horizontal line are not significantly different at α = 0.05. The results delineate a clear performance hierarchy:
Top tier: XGBoost and Gradient Boosting exhibited statistically equivalent performance (rank difference = 0.800 < CD).
Intermediate tier: Random Forest, Logistic Regression, and K-Nearest Neighbors formed a middle group, significantly outperformed by the top tier, though within-tier differences were largely non-significant.
Lower tier: Decision Tree and SVM ranked lowest, demonstrating statistically significant inferiority to all other classifiers.
4.1.3. Model Performance and Stability Analysis
1. Cross-Validation Performance Stability
The standard deviation of cross-validation accuracy reflects model stability (
Figure 5):
High Performance, High Stability: XGBoost: 0.8877 ± 0.0162, Gradient Boosting: 0.8780 ± 0.0151. These algorithms demonstrate both high accuracy and low variability.
Moderate Performance, Moderate Stability: Random Forest: 0.8281 ± 0.0267, Logistic Regression: 0.8199 ± 0.0191, K-Nearest Neighbors: 0.8135 ± 0.0170.
Lower Performance, Higher Variability: Decision Tree: 0.7587 ± 0.0306, Support Vector Machine: 0.6944 ± 0.0237.
2. Performance-Computation Efficiency Trade-off
XGBoost achieves the optimal trade-off, delivering top-tier accuracy with a training time of 2.36 s—a 61.7% reduction relative to Gradient Boosting while maintaining statistically equivalent performance. KNN trains rapidly (0.002 s) but at a substantial cost to accuracy. Random Forest offers moderate efficiency (0.243 s) yet underperforms boosting-based models in classification accuracy (
Figure 6).
3. Generalization Performance Assessment
Strong agreement between cross-validation and test set accuracy confirms stable generalization. XGBoost (CV: 0.8877, Test: 0.8970, Δ = +0.0093) and Gradient Boosting (CV: 0.8780, Test: 0.8909, Δ = +0.0129) exhibit slight positive generalization gaps, suggesting marginal underfitting during cross-validation. Random Forest shows near-identical performance (CV: 0.8281, Test: 0.8242, Δ = −0.0039). Overall consistency across evaluation protocols underscores model stability (
Figure 7).
4.1.4. Statistical Performance Tiers
Statistical analysis delineated three distinct performance tiers among the seven classifiers (
Figure 8), with direct implications for ecological monitoring applications:
Tier 1 (Optimal)—XGBoost and Gradient Boosting: statistically equivalent top-tier accuracy; recommended for effective tree species discrimination.
Tier 2 (Competent)—Random Forest, Logistic Regression, KNN: viable alternatives under computational or interpretability constraints, though significantly outperformed by Tier 1.
Tier 3 (Limited)—Decision Tree and SVM: substantially lower accuracy; not recommended for detailed species mapping in this context.
4.1.5. Conclusion of Comparative Analysis
Comprehensive statistical evaluation using 5 × 5 cross-validation and significance testing supports four main conclusions:
(1) Significant performance differences exist among the seven classifiers (Friedman test: χ2 = 138.42, p < 0.000001).
(2) XGBoost is the optimal classifier, achieving the highest average rank (1.10) with accuracy statistically equivalent to Gradient Boosting but superior computational efficiency.
(3) Three statistically distinct performance tiers were identified, with boosting ensembles (XGBoost, Gradient Boosting) consistently leading.
(4) Model stability is confirmed by strong agreement between cross-validation and test set performance, validating generalization capability.
These findings justify the selection of XGBoost for subsequent feature importance analysis and operational species mapping in the Yushan mining restoration area. The systematic 5 × 5 cross-validation framework ensures conclusions are stable against sampling variability, providing reliable, evidence-based guidance for ecological monitoring and restoration in the Yellow River Basin.
4.2. Feature Importance of the Optimal Model
To elucidate the decision-making mechanism of the optimal XGBoost model and validate the contribution of multi-source data, an analysis of feature importance was conducted. The results, measured by the mean decrease in impurity [
58], reveal the relative contribution of individual features to the classification process. The top 20 most important features are listed in
Figure 9, and the cumulative feature importance distribution is shown in
Figure 10.
The XGBoost feature importance analysis reveals three key principles underlying its species discrimination capability:
(1) Blue-edge spectral dominance: the top-ranked feature (ρ468.62, importance = 0.0358) in the 470–500 nm region is sensitive to carotenoids and leaf surface traits. Four additional blue-region bands in the top ten underscore pigment-related absorption as critical for taxonomic differentiation. However, as demonstrated by the correlation filtering analysis (
Section 4.3), the high importance of these blue-edge bands largely stems from strong collinearity with adjacent spectral bands. After removing highly correlated features (|r| > 0.9), none of the original blue-edge bands were retained, yet the XGBoost performance remained identical (OA = 0.897, Kappa = 0.811). Therefore, these importance values should be viewed as directional indications rather than precise ecological attributions. The discriminative information originally associated with blue-edge wavelengths can be equivalently captured by other, less correlated features (e.g., vegetation indices, tree height, texture).
(2) Structural–biochemical integration: tree height (TH) ranks third, confirming that three-dimensional structure is essential for accurate classification. Three vegetation indices (PSSR, VOG, RVSI) in the top eleven provide complementary physiological and biochemical information beyond raw spectra.
(3) Multi-spectral synergy: the model leverages features across green, red, red-edge, NIR, and NIR–SWIR transition regions, each linked to chlorophyll reflectance, absorption, leaf internal structure, or canopy water content. This broad spectral utilization enables robust species discrimination even when raw signatures are similar.
It is important to interpret the feature importance results with caution. The importance metric used here—mean decrease in impurity—is known to be sensitive to correlated predictors. To empirically assess the impact of multicollinearity, we conducted a correlation-based filtering analysis (see
Section 4.3). After removing feature pairs with Pearson correlation coefficient > 0.9, the retained feature set (40 features) yielded identical XGBoost performance (OA = 0.897, Kappa = 0.811) to the original full set. Notably, none of the original blue-edge bands were retained in this reduced set (see
Supplementary Figure S2 for the list of retained features), indicating that their high importance in the full set largely stemmed from collinearity rather than independent discriminative information. The fact that model performance remained unchanged after removing these bands demonstrates that the discriminative information is equivalently captured by other features (e.g., vegetation indices, tree height, texture). Therefore, the ecological interpretation of blue-edge importance should be considered directional rather than precise.
These results demonstrate that XGBoost excels by synthesizing structural, pigment, physiological, and biophysical information—an approach consistent with ecological theory. Practically, prioritizing blue-edge bands (~468 nm), key vegetation indices (PSSR, VOG), and structural metrics (TH) can reduce dimensionality while preserving accuracy, improving efficiency in large-scale mapping. In contrast, given that the importance of specific blue-edge bands is largely driven by collinearity, dimensionality reduction should focus on feature families (e.g., spectral indices, texture, structure) rather than on individual narrow bands. Future work should test the consistency of this importance pattern across diverse forest types and sensor systems.
4.3. Feature Redundancy Analysis
To assess the impact of feature redundancy and multicollinearity on model performance and feature importance interpretation, we conducted a correlation-based filtering analysis. Pairwise Pearson correlation coefficients were computed among all 278 features using the training set. Feature pairs with an absolute correlation coefficient greater than 0.9 were identified, and one feature from each such pair was removed, resulting in a reduced set of 40 relatively uncorrelated features (only ~14.4% of the original 278). The optimal classifier (XGBoost) was retrained on this reduced set following the same protocol as in
Section 3.2.
The results are summarized in
Supplementary Figure S1. XGBoost achieved an overall accuracy of 0.897 and a Kappa of 0.811 on the reduced feature set, identical to the performance obtained with the full 278-feature set. The training time decreased from 2.36 s to 0.53 s (a reduction of 77.5%).
Notably, although blue-edge bands ranked highly in the original feature importance analysis (
Section 4.2), none of them were retained in the reduced set. This indicates that the high importance of these bands largely stemmed from collinearity with adjacent spectral bands, rather than from independent discriminative information. The fact that XGBoost performance remained unchanged after removing these bands demonstrates that the discriminative information originally attributed to specific blue-edge wavelengths can be equivalently captured by other features (e.g., vegetation indices, tree height, texture). This redundancy analysis supports the robustness of the model and suggests that future work can safely reduce feature dimensionality without loss of accuracy, while gaining substantial computational efficiency. Detailed results, including the list of retained features and the correlation matrix, are provided in the
Supplementary Materials.
4.4. Class-Wise Performance of the Optimal Model
The classification performance of the optimal model for each target species was evaluated using User’s Accuracy (UA), Producer’s Accuracy (PA), and F1-score (
Figure 11). Results indicate an apparent association between classification accuracy and the number of training samples per class, while also revealing difficulties arising from spectral and morphological similarities among certain species.
The optimized model exhibited marked variation in per-species classification performance, closely tied to training sample size and class distinctiveness.
Sophora japonica (n = 688) achieved the highest F1-score (0.939), supported by high user’s accuracy (UA = 0.917) and exceptional producer’s accuracy (PA = 0.962), indicating robust discriminative learning and effective generalization. Shrubs (n = 156) also attained strong and balanced metrics (F1 = 0.917), suggesting that their distinctive structural or spectral signatures were well captured despite moderate sample size.
For Quercus variabilis (n = 116), the model delivered high recall (PA = 0.941) but moderate precision (UA = 0.842), implying that while most Quercus variabilis individuals were correctly identified, confusion occurred with other broadleaved species.
Greater difficulties emerged for
Populus tomentosa (n = 108) and
Ligustrum quihoui (n = 27).
Populus tomentosa exhibited high UA (0.889) yet critically low PA (0.500), indicating that half of the
Populus tomentosa individuals were misclassified likely due to phenotypic similarity to coexisting tall species or insufficient coverage of intra-class variability.
Ligustrum quihoui, with the fewest samples, suffered from severe underfitting across all metrics (UA = PA = F1 = 0.500), reflecting a failure to learn class-specific features under extreme data scarcity. The normalized confusion matrix (
Figure 12) further illustrates the specific misclassification patterns among species.
These results demonstrate that classification performance is strongly conditioned by training set composition. Abundant or distinct classes achieve reliable discrimination, whereas minority classes or those with ambiguous spectral signatures are prone to omission or confusion. Addressing class imbalance—via data augmentation, strategic sampling, or incorporation of additional discriminative features—is therefore essential for improving model equity and robustness in future applications.
The variance in sample size among the five categories (Ligustrum quihoui, Populus tomentosa, Quercus variabilis, Sophora japonica, and Shrubs) reflects the ecological reality of the study area. Although this imbalance is associated with lower precision for minority classes (e.g., Ligustrum quihoui), the model maintained high overall robustness. Future improvements will involve exploring cost-sensitive learning or class-weighting strategies to further optimize the detection of underrepresented species without distorting their natural frequency in the landscape.
4.5. Tree Species Classification Map of the Yushan Mining Area
The ultimate applied output of this study is the wall-to-wall tree species classification map for the Yushan mining area, presented in
Figure 13. This map was generated by applying the optimized XGBoost model, validated in the preceding sections, to predict the species for every individual tree object detected across the entire study area.
The resulting classification map (
Figure 13) visualizes the patterns quantified in
Table 7.
Shrubs constitute the most abundant class and exhibit a widespread, pervasive distribution across the study area, indicative of early successional stages or areas experiencing limited tree establishment. The
Sophora japonica, a recognized pioneer species, forms the second-largest component, often appearing in sizable, contiguous patches that likely correspond to areas of historical planting or vigorous natural colonization.
Quercus variabilis is present in significant numbers but displays a more fragmented or clustered spatial pattern, potentially associated with specific microhabitats or later successional niches. The populations of
Populus tomentosa and
Ligustrum quihoui are minimal and highly localized, suggesting they are minor components in the current vegetation assemblage.
This spatially explicit inventory translates the model’s predictive capability into a tangible representation of the restoration landscape’s structure. The prevalence of shrubs and pioneer trees, as captured by the map, provides a critical baseline for assessing the current successional stage and informing future management interventions aimed at steering the ecosystem towards a more mature and diverse forest state.
5. Discussion
5.1. Interpretation of Model Performance and the Superiority of Data Fusion
Comparative analysis of the seven classifiers reveals two key insights that underpin the methodological contribution of this study.
First, XGBoost consistently outperformed all competing models, achieving the highest overall accuracy (0.897), Kappa coefficient (0.811), and macro F1-score (0.891). Its high Kappa value indicates substantial agreement beyond chance, confirming robust classification reliability. Gradient Boosting delivered comparable accuracy but required approximately three times the training time, highlighting XGBoost’s superior optimization and computational efficiency.
Second, the strong performance of tree-based ensemble methods—particularly boosting algorithms—can be directly attributed to the characteristics of the fused multi-source dataset. The integration of spectral, textural, and structural features produces a high-dimensional, heterogeneous feature space characterized by non-linear and interactive relationships with target classes. Tree-based models are inherently well suited to such complexity: they capture non-linear decision boundaries and hierarchical interactions via recursive partitioning, while their built-in feature importance metrics facilitate dimensionality navigation and mitigate sensitivity to noise or redundancy.
Conversely, models relying on linear separability or distance-based metrics exhibited clear limitations. Logistic Regression achieved only moderate accuracy (OA = 0.812), suggesting an inability to fully model the non-linear decision surfaces embedded in the fused feature space. K-Nearest Neighbors (KNN), despite near-zero training time, suffered from degraded predictive performance (F1 = 0.787)—a manifestation of the curse of dimensionality, wherein distance metrics lose discriminative power in high-dimensional spaces. Support Vector Machine (SVM), despite its theoretical strength in high-dimensional settings, yielded the lowest overall accuracy. This is likely attributable to increased class overlap and distributional complexity in the fused feature space, which hinder the identification of a globally optimal separating hyperplane.
Collectively, the observed performance hierarchy—ensemble boosting > bagging > linear/instance-based models—empirically validates the superiority of the data fusion paradigm. The fused feature set provides rich, complementary information that sophisticated non-linear models can effectively exploit, whereas simpler models fail to unlock its full potential. Thus, the synergy between comprehensive multi-source data fusion and the powerful pattern recognition capacity of gradient boosting constitutes the foundation of the effective classification framework established in this study.
It should be noted that the visual appearance in
Figure 13 is influenced by species spatial distribution patterns and cartographic overlay effects among vegetation of different heights; the exact quantitative comparison should be based on
Table 7.
To contextualize our findings within the broader literature,
Table 8 provides a qualitative comparison of key methodological aspects across relevant prior studies and the present work.
As summarized in
Table 8, direct numerical comparison is hindered by substantial differences in data sources, environments, classifiers, and class definitions. Our study extends prior efforts by combining multi-source fusion with a systematic multi-classifier evaluation under systematic statistical testing in a heterogeneous post-mining restoration landscape.
5.2. Ecological Insights from Feature Importance and Species Distribution Patterns
Beyond technical performance, the feature importance analysis of the optimized XGBoost model and the resultant wall-to-wall species map provide ecologically meaningful insights into forest recovery dynamics within the Yushan mining area.
5.2.1. Decoding Feature Importance: A Trait-Based Perspective
The predominance of red-edge indices, chlorophyll-sensitive vegetation metrics, and LiDAR-derived canopy height among top predictors reveals a trait-based foundation for species discrimination. High importance of photosynthetic-related spectral features indicates that foliar biochemistry—particularly pigment composition and nitrogen status—varies substantially among species, reflecting divergent strategies in light use, stress tolerance, or resource acquisition. For instance, the distinct spectral signature of Sophora japonica, a nitrogen-fixing pioneer, likely facilitates its reliable identification.
Maximum canopy height delineates a structural–successional gradient, serving as a proxy for competitive dominance, colonization timing, and life history strategy rather than a mere dimensional metric. Taller individuals, typically early colonizers or fast-growing species, dominate the overstory, whereas suppressed or understory cohorts indicate later successional status or microsite constraints. The model’s reliance on this metric underscores vertical stratification as a key axis of forest recovery.
5.2.2. Interpreting Spatial Distribution Patterns
The final classification map reveals non-random spatial aggregation of species, interpretable through the lens of restoration history and environmental heterogeneity. Contiguous patches of pioneer species likely reflect historical planting blocks, preserving the spatial legacy of initial rehabilitation interventions. In contrast, more dispersed or topographically constrained distributions of slower-growing, site-sensitive species suggest niche partitioning along moisture, radiation, or edaphic gradients.
Spatial mixing or abrupt boundaries between species patches further inform ecological processes such as competition, succession, and facilitation. Fine-scale interspersion may indicate natural regeneration or secondary succession, while sharp edges often demarcate anthropogenic planting limits or abrupt soil transitions. Thus, the map functions as a spatial hypothesis generator, identifying priority zones for targeted fieldwork on soil properties, microclimate, or planting history.
The XGBoost model effectively learns to discriminate species based on ecologically meaningful traits—biochemical and structural—that are directly linked to adaptive strategies and life-history variation. The resulting species distribution map transcends a mere classification product; it constitutes a spatially explicit snapshot of ongoing ecological processes—succession, competition, and niche differentiation—shaping the recovery trajectory of this anthropogenically disturbed ecosystem. This trait-mediated interpretation bridges remote sensing observations with foundational ecological theory, thereby enhancing the scientific and applied value of the methodological framework for restoration monitoring.
5.3. Implications for Restoration Monitoring and Management in the Yellow River Basin
The proposed framework, integrating UAV-based multi-source remote sensing with machine learning, offers transformative potential for evidence-based restoration in mining-disturbed regions of the ecologically fragile Yellow River Basin. Beyond proof-of-concept, it provides actionable solutions to key challenges in large-scale, long-term restoration monitoring and adaptive management.
5.3.1. Transition to Spatially Explicit, Quantitative Baselines
Conventional monitoring, based on plot surveys or moderate-resolution satellite imagery, fails to capture fine-scale heterogeneity in ecosystem recovery. This study provides high-resolution, wall-to-wall quantitative baselines that deliver precise metrics—such as pioneer species density, late-successional canopy cover, and woody–shrub configuration—across entire sites. These spatially explicit inventories establish objective reference states, enabling systematic quantitative evaluation of restoration interventions and successional trajectories.
5.3.2. Precision Restoration Informed by Spatial Ecology
The generated species distribution map encodes critical species–environment relationships. Coupling species patterns with LiDAR-derived topographic attributes (slope, aspect, topographic wetness index) allows identification of key species’ micro-habitat preferences. For instance, associating Populus tomentosa with moist depressions and Sophora japonica with dry south-facing slopes directly informs spatially targeted revegetation. Matching species to empirically delineated suitable habitats optimizes seed sourcing and planting, enhancing survival rates, reducing costs, and accelerating self-sustaining ecosystem development. Moreover, mapping restoration-stalled areas (e.g., persistent shrublands) enables prioritization of secondary interventions.
5.3.3. A Cost-Effective Adaptive Management Framework
Rapid UAV surveys, coupled with a machine learning pipeline, establish a repeatable, scalable, and cost-effective monitoring protocol. Biennial or seasonal resurveys using consistent methods enable longitudinal assessment of forest dynamics, informing adaptive management through data-driven adjustments. Spectral stress indices (e.g., the Photochemical Reflectance Index, PRI) capture native species expansion or health decline, offering early warning signals for timely intervention. This approach reconfigures restoration management from static, project-based efforts into a dynamic, feedback-driven process—critical for achieving long-term sustainability in the complex and variable Yellow River Basin.
5.4. Limitations and Future Research
While this study presents an effective framework for tree species classification in coal mining context, several limitations should be acknowledged to contextualize the findings and guide subsequent work.
5.4.1. Limitations of the Current Study
(1) Sample size and class imbalance: Despite a total of 1095 samples, severe class imbalance persisted. Although partially mitigated via class weighting, this likely biased performance toward dominant species and limited reliable detection of rare taxa. Thus, the demonstrated high accuracy primarily applies to species with adequate training samples; for severely underrepresented classes, the framework remains exploratory and requires further validation.
(2) Spatial and temporal specificity: Data were collected from a single mining area during a single phenological stage. Model transferability to other regions, forest types, or seasons remains unvalidated; captured spectral–structural signatures are temporally constrained and may not reflect full annual cycles.
(3) Minimal use of LiDAR structural information: This study extracted only tree height (TH) from the LiDAR data. Richer three-dimensional metrics—such as crown volume, canopy porosity, vertical complexity, and point density profiles—were not used. Consequently, the term “LiDAR fusion” in this paper refers to the integration of a single structural summary rather than a comprehensive set of LiDAR-derived features. The full potential of LiDAR point clouds for species discrimination in complex mining environments remains untapped here and awaits future investigation.
(4) High-dimensional feature space relative to sample size: Another limitation concerns the high dimensionality of the fused feature set (278 features) relative to the sample size (1095 samples). Although tree-based ensemble methods such as XGBoost incorporate regularization (e.g., L1/L2 penalties, column subsampling) to mitigate overfitting, the risk remains non-negligible, especially for minority classes. In this study, we did not perform explicit feature selection before model training. Future work should systematically apply dimensionality reduction techniques—such as recursive feature elimination (RFE), principal component analysis (PCA), or Boruta—to identify the most discriminative subset of features. Such approaches could further enhance model generalizability, reduce computational overhead, and improve interpretability, particularly when transferring the framework to other mining restoration sites with different species compositions.
(5) Potential multi-source registration errors: Although we carefully co-registered the LiDAR, hyperspectral, and RGB data to a common spatial resolution (0.14 m) and coordinate system, residual misalignment—especially in areas with steep terrain or complex crown boundaries—cannot be completely ruled out. Such registration errors may introduce noise in feature extraction (e.g., mismatches between spectral signatures and canopy height) and could disproportionately affect smaller crowns or species with irregular geometries. Future work should quantify registration uncertainty (e.g., using ground control points or mutual information metrics) and develop co-registration refinement strategies tailored to heterogeneous mining landscapes. Additionally, using a multi-scale feature extraction window or probabilistic feature assignment could help mitigate the impact of residual misalignments.
5.4.2. Future Research Directions
(1) Expanding data diversity and volume: Future efforts should prioritize larger, balanced, and multi-temporal datasets. Incorporating seasonal acquisitions would enable phenology-aware classification and improve discrimination of deciduous vs. evergreen species. Extending the framework to multiple restoration sites would facilitate systematic assessment of model transferability.
(2) Advanced LiDAR feature engineering: Full exploitation of 3D point cloud data—including crown volume, vertical foliage profiles, and point density metrics—promises to enrich structural feature sets and narrow performance gaps between spectrally similar species.
(3) Incorporation of environmental context: Integrating auxiliary spatial data—topographic derivatives, soil maps, or planting records—would enable models to learn species–environment relationships, improving predictive accuracy in sparsely vegetated or topographically complex areas and enhancing ecological realism.
(4) Robust feature importance evaluation: Given the high dimensionality and collinearity among spectral features, future work should apply more robust importance measures—such as permutation importance, SHAP values, or grouped importance by feature family—to validate ecological interpretations derived from mean decrease in impurity. The correlation filtering analysis in
Section 4.3 demonstrates that collinearity can substantially inflate the importance of individual narrow bands; therefore, caution is warranted when attributing ecological meaning to specific wavelengths without proper decorrelation.
6. Conclusions
This study developed and evaluated an integrated framework for the accurate classification of individual tree species within a complex coal mining landscape of the Yellow River Basin. By integrating UAV-based hyperspectral (full bands and indices), RGB (textural features), and a minimal LiDAR-derived structural metric (tree height), and by conducting a rigorous, statistically grounded evaluation of seven machine learning classifiers, the research provides clear insights for both methodological advancement and practical ecological management.
Our principal findings are threefold. First, the systematic comparative analysis, reinforced by statistical significance testing, identified XGBoost as the optimal classifier. It achieved a superior and stable test set performance (Overall Accuracy = 0.897, Kappa = 0.811), demonstrating an exceptional capacity to model the complex, non-linear interactions within the high-dimensional, multi-source feature space. Second, the analysis unequivocally proved the critical importance of multi-sensor data fusion. The complementary information from LiDAR (canopy structure), hyperspectral imagery (biochemical properties), and RGB data (crown texture) was indispensable for high classification accuracy. Feature importance analysis confirmed that spectral and structural features were the primary drivers of model performance. Third, the operational workflow, from effective treetop detection using an adapted WST-Ncut method in a heterogeneous landscape to the application of the optimal model, successfully generated a high-resolution, wall-to-wall tree species distribution map for the entire Yushan mining area.
The primary methodological contribution of this work is the establishment of a reproducible, robust, and statistically evaluated processing chain for individual tree species classification, encompassing multi-sensor data fusion, comprehensive model evaluation, and operational large-area mapping. Beyond methodology, this study delivers significant practical value. The resulting detailed species map and the underlying framework equip ecological restoration managers in the Yellow River Basin with a potentially powerful tool for establishing quantitative baselines, enabling precision restoration planning, and facilitating long-term monitoring through adaptive management. This work demonstrates that the integration of multi-source UAV remote sensing with advanced machine learning offers a precise, scalable, and efficient solution for monitoring ecosystem recovery, thereby contributing directly to the sustainable management of ecologically vulnerable regions. We explicitly acknowledge that LiDAR data were severely underutilized in this study. Future work must incorporate richer three-dimensional structural metrics to more fully exploit the complementary information offered by LiDAR point clouds. Given the study’s limitations (single site, single season, class imbalance, and minimal LiDAR feature exploitation), the framework should be viewed as a demonstration of potential rather than a fully operational solution. Future work across multiple sites and seasons is necessary to assess generalizability.