1. Introduction
Accurate spatial distribution and quantitative information on forest vegetation are crucial for sustainable forest management in natural environments, where a diverse range of tree species provides economic, biological, and ecosystem value. Over the past two decades, researchers have made significant efforts to obtain forest vegetation information through remote sensing imagery, especially with the increased availability of high-temporal, -spatial, and -spectral resolution satellite data [
1,
2,
3,
4]. However, two key challenges remain: (1) High-temporal-resolution satellite imagery, such as Sentinel series data, demonstrates unique advantages in vegetation phenology monitoring due to its high-frequency observation capability. However, its relatively low spatial resolution struggles to meet the detailed requirements for fine-scale mountain vegetation classification. This contradiction is particularly pronounced in complex topographic regions: Firstly, the high fragmentation of vegetation patches in mountainous environments leads to mixed vegetation types within individual pixels under low spatial resolution, causing severe mixed-pixel effects. Secondly, accurate extraction of vertical structural information for mountain vegetation requires sub-meter-level spatial details, whereas existing high-temporal-resolution satellite imagery fails to capture critical transition zone features, such as mixed-species tree boundaries. (2) There is a need for a robust analytical method to integrate different sensor images for the refined classification of mountainous vegetation.
Researchers have investigated the impact of temporal resolution or phenology on the extraction of vegetation information in natural environments using high-spatial-resolution imagery [
5,
6,
7,
8]. They have demonstrated the advantages of using multi-temporal or multi-temporal images over single high-resolution remote sensing images for vegetation classification [
9,
10,
11]. multi-temporal high-resolution image is crucial for capturing key phenological developmental characteristics of tree species during different transitional periods. In wetland environments, Kristie [
12] explored the use of bi-seasonal and single-temporal WorldView-2 satellite data to map four adjacent plant communities and three mangrove types. The results indicated that bi-seasonal data were more effective in distinguishing all areas of interest compared to single-temporal data. For forested areas, Li Zhe [
13]. found that among the timing, feature selection, and classifiers, the timing had the greatest impact on tree species classification results, based on GF-2 images from January, April, and November. These previous studies were limited by the timing of image acquisition, with each study covering only 2–3 phenological phases during the growing season. Therefore, there is a need for more systematic research that utilizes images captured during each phenological phase across four growing seasons to fill this gap. Such studies would provide a comprehensive understanding of the importance of phenology at different taxonomic levels, facilitating the refined classification of mountainous vegetation.
With the rapid development of remote sensing technology, UAV aerial images has demonstrated unique advantages in vegetation classification due to its centimeter-level spatial resolution and flexible acquisition cycles [
14,
15]. Compared to traditional satellite remote sensing, UAV platforms can overcome cloud interference and revisit cycle limitations. Particularly in heterogeneous terrains such as mountainous areas and urban green spaces, their multi-angle and multi-temporal data acquisition capabilities effectively capture spatiotemporal variations in vegetation canopy structure and phenological features, providing a data foundation for refined classification [
16]. For instance, Wen Yuting [
17] et al. explored technical pathways for precise tree species classification in small regions using UAV visible-light imagery combined with object-oriented classification methods, verifying its application potential in forest resource management. She Jie [
18] et al. achieved vegetation classification accuracy exceeding 91% by integrating UAV images with object-oriented segmentation and machine learning algorithms (Support Vector Machine and Random Forest). These studies indicate that the high spatial resolution of UAV images can significantly alleviate mixed-pixel issues in mountainous environments, while its flexible temporal combinations better capture spectral differences during critical phenological phases of vegetation, thereby enhancing classification accuracy in complex scenarios.
Due to limitations in data availability and issues such as cloud contamination, acquiring high-spatial-resolution and high-temporal-frequency imagery can be challenging. By combining images from different sensors, more comprehensive and accurate land cover information can be obtained. For instance, Wang Erli [
19] et al. utilized two GF-1 images and two Pleiades images, while Jin Jia [
20] combined GF satellite imagery with resource satellite imagery. By analyzing the phenological characteristics of vegetation using three different seasonal images, they demonstrated that in forests with complex species distributions, using multi-source and multi-temporal remote sensing image can significantly enhance classification accuracy when comparing the results of multi-temporal versus single-temporal remote sensing classification. With the continuous advancement of remote sensing technology, UAV aerial imagery has shown great potential for application in vegetation classification [
21,
22,
23]. By complementing UAV imagery with high-resolution satellite imagery, data from multiple sources and seasons can capture greater variations in vegetation conditions, thereby promising to enhance classification accuracy.
High-resolution remote sensing imagery provides rich spatial and spectral information, creating a multidimensional feature space for forest remote sensing classification. However, this also presents challenges for image classification. Traditional pixel-based classification methods struggle to effectively mitigate the “salt-and-pepper effect”, which arises from significant spectral variability among similar land covers. Additionally, issues such as “spectrally similar but different objects” and “spectrally different but similar objects” exacerbate the misclassification of land cover types [
24]. To address the characteristics of high-spatial-resolution imagery, researchers have explored the use of object-oriented classification techniques to improve classification accuracy. Multiple studies have shown that object-oriented classification methods can effectively reduce the impact of related issues on classification results, thereby enhancing overall classification precision. The object-oriented classification method involves first segmenting the image using image segmentation techniques to generate independent objects. Based on these independent objects, vegetation classification is then performed. This approach effectively utilizes various types of information, including texture and spectral data, to improve classification accuracy. It has shown promising results in analyzing the classification of complex forest ecosystems [
25,
26,
27,
28]. Non-parametric classifiers, such as Random Forest (RF), are often the preferred choice for tree species mapping due to their generally higher predictive accuracy. Additionally, RF classifiers do not require normally distributed input data. Notably, the performance disparities among different classifiers are particularly critical in mountain vegetation classification. Lin Yi et al. [
29] conducted a comparative study using UAV imagery and found that when the study area was divided into 10 categories, the object-oriented Support Vector Machine achieved significantly higher overall accuracy (OA = 90%) under small-sample conditions compared to the Mobile-Unet deep learning model (OA = 84%). These results suggest that traditional classification methods may outperform deep learning in mountainous regions characterized by complex vegetation types and sample collection challenges. Although the feature self-learning capability of deep learning is theoretically appealing, its reliance on large-scale annotated data fundamentally conflicts with the scarcity of vegetation samples in mountainous areas. Coupled with the limited generalization capability caused by insufficient model interpretability, object-oriented classification methods remain a more robust choice under current conditions.
In this study, our objectives are (1) to explore a technique suitable for integrating multi-temporal and different sensor images for the refined classification of mountainous vegetation; (2) to identify the key input variables for accurately distinguishing 11 different vegetation types; and (3) to determine which datasets and seasonal images yield the best accuracy for mountainous vegetation classification. Our research utilizes four high-resolution remote sensing images from different imaging times and data sources, including two aerial images, one WorldView-2 image, and one UAV image. By combining hierarchical extraction based on vegetation phenological characteristics with a Random Forest model, we aim to obtain information on the 11 vegetation types in the study area.
3. Research Methods
This study focuses on 11 typical vegetation types in Yuntai Mountain as the research subject. It combines stratified extraction based on phenological features of vegetation with object-oriented classification methods to obtain vegetation information for Yuntai Mountain. The study first analyzes the spectral characteristics of vegetation across various images, then extracts vegetation indices and textural features. Stratified extraction of vegetation is performed based on the differences in spectral indices exhibited by the vegetation in different seasonal phases. For vegetation that cannot be stratified based on phenological feature differences, an object-oriented classification method is used for extraction. The classification results are then compared with the single-temporal classification results. The technical approach is illustrated in
Figure 2.
3.1. Spectral Features
3.1.1. Band Spectral Features
Vegetation exhibits spectral reflectance characteristics that are distinctly different from other typical land features, such as soil, water bodies, and buildings. Commonly used spectral features include mean, brightness, variance, and band gray ratios. Influenced by phenological features, each vegetation type shows significant changes in spectral features across different seasons due to variations in pigment content, leaf moisture content, and other factors. Consequently, the spectral differences of the same vegetation type change significantly in different seasons. Therefore, conducting spectral feature analysis on multi-temporal remote sensing data is essential for extracting vegetation type information for Yuntai Mountain.
3.1.2. Multispectral Image Vegetation Indices
Similarly, vegetation indices are often used in studies for extracting information on different land cover types. Vegetation indices can qualitatively and quantitatively evaluate the health status and coverage of vegetation, and they exhibit clear temporal relevance. The vegetation indices of different seasonal vegetation types show certain differences. To enhance the reliability of vegetation information extraction for Yuntai Mountain, this study selects the Normalized Difference Vegetation Index (NDVI) [
30] for extraction based on the summer image.
3.1.3. Visible Light Vegetation Indices
Since the aerial image used in this study only includes three visible light bands—red (R), green (G), and blue (B)—and lacks near-infrared bands, it is not possible to extract common vegetation indices. However, within the visible light range, the reflectance of different vegetation types varies across different bands. Therefore, researchers have combined operations on two or three bands to enhance the characteristics of a specific band, aimed at distinguishing different types of vegetation. Currently, commonly used visible light vegetation indices include the Normalized Green–Blue Difference Index (NGBDI), Excess Red Index (EXR), Excess Green Index (EXG), Red–Green–Blue Vegetation Index (RGBV), and Excess Green–Excess Red Difference Index (EXGR), with the calculation formulas shown in
Table 3 [
31,
32,
33,
34].
3.2. Object-Oriented Multi-Scale Segmentation
3.2.1. Multi-Scale Segmentation Based on Multi-Temporal Image Classification
Unlike traditional pixel-based classification methods, object-oriented classification approaches process image objects as the fundamental units, which are obtained through scale segmentation. Multiscale segmentation can take into account both the macroscopic and microscopic features of land cover, effectively utilizing the spectral and texture information of high-resolution remote sensing image to achieve better extraction results. Therefore, the concept of multi-scale segmentation is widely applied in experiments for extracting various types of land cover. Commonly used segmentation parameters in multi-scale segmentation include segmentation scale, shape factor, and compactness.
Stratified extraction of vegetation information requires careful consideration of the shape and distribution of classification targets, as well as the size of the coverage area. Vegetation areas are a key focus of the study, with widespread coverage in the research area. At a larger scale, it is possible to effectively segment the vegetation range while also fairly completely separating non-vegetated areas. Based on the vegetation segmentation, multiple tests confirmed that a segmentation scale of 150 effectively extracts the range of evergreen forests in winter image. The evergreen forest vegetation is relatively dispersed in the study area, and a smaller segmentation scale can isolate the sporadically distributed
Pinus thunbergii. The deciduous forests identified through decision tree classification include
Catalpa bungei and
Castanea mollissima. The distribution range of these two types of vegetation in the study area is relatively small, and tests revealed that a scale of 50 effectively segments the small range of
Catalpa bungei. Specific segmentation parameters are shown in
Table 4.
3.2.2. Multi-Scale Segmentation Based on Single-Temporal Image Classification
Before conducting vegetation classification, the four sets of images are first subjected to multiscale segmentation. Experiments show that with a shape factor of 0.3 and compactness of 0.5, six scales of 50, 60, 70, 80, 90, and 100 are set for image segmentation. As shown in
Table 5, for the spring image, a segmentation scale of 60 effectively separates the main vegetation types, such as
Cunninghamia lanceolata and
Quercus acutissima. In the summer image, due to the lush growth of vegetation during this seasonal phase, the separability between different vegetation types is low. At a segmentation scale of 80, the roads are completely segmented without fragmentation, and the segmentation results among different vegetation types are satisfactory. In the autumn image, the
Catalpa bungei are in bloom, appearing as white circular shapes; at a segmentation scale of 80, they are effectively extracted. For the winter image, which captures only the evergreen portion, comparisons of segmentation effects at different scales reveal that a segmentation scale of 60 effectively extracts sporadically present evergreen vegetation, such as
Pinus thunbergii and
Pinus taeda.
3.3. Feature Selection
Due to the diversity and complexity of vegetation in mountainous areas, if the number of selected feature combinations is too small during classification experiments, it becomes difficult to adequately represent the information among different vegetation types, resulting in unsatisfactory classification outcomes. Conversely, if too many feature combinations are selected, it can lead to data redundancy and impose significant operational pressure on the classifier. Therefore, by applying certain computational rules for feature selection, the optimal feature combinations for different images and segmentation scales can be identified. This approach is expected to alleviate the operational pressure on the classifier and reduce time costs while achieving better classification results.
This study selected a total of 20 spectral features, including Mean, Mode, Quantile, Standard Deviation, Skewness, Hue, Saturation, and Intensity. Additionally, 14 textural features were chosen for experimentation, including GLCM Homogeneity, GLCM Dissimilarity, GLDV Entropy, and GLDV Mean. The total number of spring images is 34, summer images is 79, autumn images is 32, and winter images is 34. Using the feature space optimization tool provided by eCognition 9.0, the most suitable feature combinations for classifying vegetation in Yuntai Mountain were selected. Feature Space Optimization is a tool that finds the optimal combination of classification features within a nearest-neighbor classifier by comparing the features of selected classes to identify the feature combinations that produce the maximum average minimum distance (Maximum Separation) between samples of two different classes.
3.4. Vegetation Classification Methods
3.4.1. Decision Tree Model
Decision trees are a commonly used supervised classification method that are easy to understand and implement. During the implementation process, operators do not need extensive background knowledge, allowing for relatively quick attainment of satisfactory classification results. This method is widely applied in remote sensing classification, particularly for land cover classification based on multiple remote sensing images. For regions with widespread distribution and diverse land cover types, the object-oriented CART decision tree method can efficiently achieve good classification results. Yao Bo [
35] compared the effectiveness of maximum likelihood, object-oriented nearest-neighbor, and CART decision tree methods based on multi-scale segmentation. The results indicated that the decision tree method is well-suited for extracting vegetation information in coastal wetlands. Additionally, due to the differences in spectral characteristics of vegetation across different seasons, Lei Guangbin [
36] designed five classification strategies for vegetation information extraction using the object-oriented CART decision tree method based on multi-source and multi-temporal remote sensing images. The above studies indicate that, in response to the differences in spectral characteristics of vegetation across different seasons, hierarchical extraction using decision tree classification can quickly yield satisfactory classification results. Therefore, this paper constructs a CART decision tree to implement hierarchical classification of vegetation in Yuntai Mountain based on the eCognition 9.0 platform.
3.4.2. Random Forest Model
Random Forest is a classifier that consists of multiple decision trees. It creates random vectors from the input data and then builds multiple decision trees based on these random vectors, ultimately forming a combined decision tree. Random Forest can quickly process a large number of input variables while effectively balancing errors, making it widely used in vegetation classification. Berhane [
37] based on high-resolution satellite imagery, compared the classification performance of a complete set of non-parametric classifiers, including Decision Tree (DT), Rule-Based (RB), and Random Forest (RF). The results indicate that although all classifiers appear to be suitable, the RF classification method outperformed both the DT and RB methods, achieving an overall accuracy (OA) greater than 81%. Zhang [
38] implemented information extraction of coastal wetlands based on GF-3 polarized SAR imagery using an object-oriented Random Forest algorithm. The classification method with optimal feature combination achieved an accuracy exceeding 90%.