Geo-Object-Based Vegetation Mapping via Machine Learning Methods with an Intelligent Sample Collection Scheme: A Case Study of Taibai Mountain, China

: Precise vegetation maps of mountainous areas are of great signiﬁcance to grasp the situation of an ecological environment and forest resources. In this paper, while multi-source geospatial data can generally be quickly obtained at present, to realize effective vegetation mapping in mountainous areas when samples are difﬁcult to collect due to their perilous terrain and inaccessible deep forest, we propose a novel and intelligent method of sample collection for machine-learning (ML)-based vegetation mapping. First, we employ geo-objects (i.e., polygons) from topographic partitioning and constrained segmentation as basic mapping units and formalize the problem as a supervised classiﬁcation process using ML algorithms. Second, a previously available vegetation map with rough-scale label information is overlaid on the geo-object-level polygons, and candidate geo-object-based samples can be identiﬁed when all the grids’ labels of vegetation types within the geo-objects are the same. Third, various kinds of geo-object-level features are extracted according to high-spatial-resolution remote sensing (HSR-RS) images and multi-source geospatial data. Some unreliable geo-object-based samples are rejected in the candidate set by comparing their features and the rules based on local expert knowledge. Finally, based on these automatically collected samples, we train the model using a random forest (RF)-based algorithm and classify all the geo-objects with labels of vegetation types. A case experiment of Taibai Mountain in China shows that the methodology has the ability to achieve good vegetation mapping results with the rapid and convenient sample collection scheme. The map with a ﬁner geographic distribution pattern of vegetation could clearly promote the vegetation resources investigation and monitoring of the study area; thus, the methodological framework is worth popularizing in the mapping areas such as mountainous regions where the ﬁeld survey sampling is difﬁcult to implement.


Introduction
Mountain ecosystems, including rugged plateaus, hills, and mountains themselves, are important climate regulators of Earth's landform [1]. Forest vegetation, as the most important component in mountain ecosystems, is sensitive to climate change and thus can reveal the dynamics of the local environment [2]. Thus, analyzing the spatial distribution of mountain vegetation will help to reveal the regularity and regional differences of a mountainous geographic environment. While the information from conventional land cover mapping is vague for different vegetation, single types, i.e., forests, are rough in granularity for investigating and planning natural resources [3]. Therefore, the specialized research on vegetation mapping has been deepened based on a traditional ground-based survey, which has a high cost, a long cycle, and a heavy workload, which thus makes it difficult to meet the needs of practical applications [4].
A high-precision vegetation-type map is one of the main inputs of scientific research on the earth's surface system. In the past, the compilation of vegetation-type maps was completed by a large number of ground surveys. Although the ground survey method is accurate, it is time-consuming and laborious. Moreover, due to the limitation of natural conditions, the ground survey can only cover a small area. Because of the advantage of full coverage, remote sensing (RS) data can make up for the limitations of quadrat surveys. In order to reduce the cost of investigation, early studies on vegetation mapping were mainly based on low-and medium-resolution remote sensing satellite images, such as moderate resolution imaging spectroradiometer (MODIS) and Landsat thematic mapper (TM) data [5]. Subsequently, research works based on relatively high-spatial-resolution remote sensing (HSR-RS) images emerged one after another [6]. In recent years, the spatial resolution of remote sensing images has been further improved, which makes it possible to monitor forest vegetation resources in more detail, with a fine classification of vegetation types and the identification of forest species (groups) [7,8]. With the continuous improvement of the spatial resolution of satellite RS images, the level of vegetation classification that can be achieved by remote sensing images in a large area is increasing. For example, India used IRS LISS-III satellite images with medium-resolution in 2015 to draw a 1:50,000 vegetation map of about 100 vegetation types in India by combining it with a topographic map, a digital elevation model, a biogeographic map, and other auxiliary data [9].
In current work, vegetation mapping is mostly conducted using intelligent classification technologies with multi-source remote sensing images and auxiliary information [10]. Knowledge mining, expert systems, multitemporal composite classification, object-based classification [11], and targeted classification strategies [12] have been successfully introduced in vegetation classification, forestland information extraction, and fine forest-type recognition [13,14]. The research progress in this area is analyzed from the following four aspects.
First, with the improvement in the spatial resolution of remote sensing images, the geometric structure and texture information of vegetation is more obvious, and their spatial heterogeneity is enhanced. However, there are fewer spectral bands, causing differences in the spectral characteristics of different types of vegetation to become smaller and weaken the statistical separability in the image spectral domain. Under this condition, to avoid the noise problem of pixel-based methods, image segmentation and object-based vegetation classification is currently the main methods in this field. Thus, object-oriented information extraction has been widely used in vegetation classification because of the high fragmentation and the "salt and pepper phenomenon" of pixel-based methods. For example, the results of tree species classification were analyzed by using QuickBird optical image, lidar, and fusion methods by combining object-oriented segmentation and decision trees [15]. Gilbertson et al. used machine learning to classify eight crops based on Landsat 8 images of five periods [16]. In this work, compared with pixel-based classification, the accuracy of object-oriented classification was improved by about 15%. Based on the HSR GaoFeng No.1 (GF-1) satellite and synthetic aperture radar (SAR) data, object-based and pixel-based wetland vegetation classification results were compared using a random forest algorithm, and better results were obtained [17]. Tigges et al. used RapidEye HSR satellite remote sensing image data and, by using object-oriented classification, realized the classification of eight types of urban tree species in Berlin, the capital of Germany, with an overall accuracy of 85% [18]. Although object-oriented classification has achieved great success, obtaining refined-accurate object units of vegetation from HSR images has not been deeply studied, especially in mountainous areas with complex topography and geomorphology.
Second, considering the relationship between vegetation and environment, many scholars have found that adding auxiliary data such as altitude, aspect, soil type, and precipitation to vegetation RS classification can also improve the accuracy of vegetation classification [19]. For instance, RS images, combined with geological data, a digital elevation model, and other environmental data, were used to classify nine tree species in the mountainous area of Southern Turkey [20]. The authors showed that, by additionally using environmental data, compared with using only optical image classification, the overall classification accuracy of vegetation classification is improved. Pan et al. classified 47 land cover types in China based on climate, terrain, and spectral data, and the overall accuracy of classification was 86.3% [21]. In addition, because the growth and development of different vegetation types have different phenological characteristics, their spectral information will change with the seasons. The advantage of multitemporal remote sensing data in vegetation classification is more obvious than that of single temporal remote sensing data. For example, Immitzer et al. used Sentinel-2 images of 18 periods to classify 5 coniferous trees and 7 broad-leaved trees in Central European forests [22]. Compared with the best results of single temporal classification, by using their proposed methods, the average classification accuracy of broad-leaved tree species was improved from 72.9% to 85.7%, and that of conifer species from 83.8% to 95.3%.
Third, machine learning (ML) has been used to solve the vegetation classification problem of remote sensing images. For example, the authors in [23] used GF-1 and GF-2 images to automatically classify crops based on convolutional neural networks, with an overall accuracy of 95.9%. Based on Landsat EVI data of time-series, the authors in [24] used deep learning to classify 14 crop types, with an overall accuracy of 85.54%. The authors in [25][26][27][28][29] adopted an artificial neural network (ANN) models for vegetation prediction and mapping. Although these applications of ML significantly improved the accuracy of remote sensing vegetation classification, a higher number of samples is required, and a large amount of training data needs to be calibrated for each type of vegetation. However, in the application of vegetation classification, due to the uneven distribution of various types of vegetation, it is easy to have the problem of uneven distribution of samples when obtaining samples, which affects the accuracy of the final classification. In recent years, transfer learning is proposed to solve the problem of insufficient training data and uneven distribution. At present, some scholars have begun to explore its application to remote sensing vegetation classification. For example, Gong P. et al. developed the world's first set of global land cover products with a 10 m spatial resolution by transferring the 30 m spatial resolution land cover samples in 2015 using Sentinel-2 images [29].
Fourthly, the acquisition of a sufficient number of representative sample data is also a neck-jammed procedure affecting the development of classification in vegetation mapping. Using the visual interpretation results of higher resolution remote sensing images is one of the most commonly used methods. For example, Pouliot D. et al. used hydrological, elevation, and highway network data, Google images, and Landsat TM images as a reference to collect samples for monitoring forests in Canada with MODIS data [30]. Klein I. et al. employed Landsat TM image interpretation results to sample MODIS data-based vegetation mapping in Central Asia [31]. Most of these methods consist of large-scale low spatial resolution mapping, which is not applicable to fine-scale-mappingbased HSR-RS images. In addition, this kind of method also needs the substantial expert experience of prior and reliable interpretation. Another sample collection method is based on survey data. Training samples based on plot surveys and Google data were selected to extract land cover information of forest vegetation in Northern China [32]. However, there are also problems with this method, as forest vegetation is mostly distributed in complex terrain, especially in mountain areas where the accessibility of survey sampling is limited under complex terrain conditions [33], which makes it difficult to effectively collect training samples for ML-based classification. Therefore, most of the methods of sample Remote Sens. 2021, 13, 249 4 of 23 selection for supervised vegetation classification rely on professional experience and visual interpretation sampling combined with artificial survey data. This is time-consuming, and quickly achieving more automatic detection has become important [34]. However, there are some studies on fast, automatic sampling methods for mountain areas.
Thus, although there are some studies on vegetation type classification via the fine recognition of HSR-RS images [7,8], difficulties such as low recognition accuracy, a low degree of detail, low reliability, and insufficient samples still exist in fine vegetation mapping, which makes it difficult to meet the application requirements of accurately grasping vegetation type information. There are also still difficulties and bottlenecks in the RS information extraction of vegetation types, especially in mountainous areas. Meanwhile, in the context of global change, the compilation of high-precision vegetation type maps is of great scientific and indicative significance for the in-depth study of the spatiotemporal variation of vegetation in the earth surface system, which is aimed at revealing natural geographical and ecological patterns and identifying the characteristics of natural geographical changes under the background of global change. Fortunately, with the recent development of the theory and method of RS information extraction, the condition of using multi-source big data to carry out high-precision vegetation mapping has begun to mature.
Therefore, in order to obtain a fine spatial map with accurate information of vegetation types in mountainous areas, we are determined to develop a geo-object-based vegetation mapping technical framework via ML methods with an intelligent sample collection scheme. The objective of our work is to make the interpretation of vegetation information more accurate by using some intelligent design. Hence, this paper employs geo-objects from HSR-RS images as the basic unit of vegetation mapping. Multi-feature extraction based on multi-source geospatial data is conducted by synthetically utilizing various types of geographic information from auxiliary data. An automatic sampling method for ML-based vegetation classification is also designed under the guidance of historical interpretation maps and achieves the rapid extraction of a large number of vegetation samples via the combination of prior data. Its innovation lies in reducing the human intervention in the sampling process, thus improving sampling efficiency in large areas and the objectivity of samples. The experimental results of a case study in a typical mountainous area show that the proposed procedure can provide reliable samples for the rapid extraction of vegetation cover information. It is effective for quick mapping with a fine spatial distribution of vegetation.
The remainder of the paper is organized as follows: The proposed method is presented in Section 2. In Section 3, the experiments and their results are described, and the effectiveness of our method is evaluated. A discussion is presented in Section 4. The conclusions of the paper are provided in Section 5.

Technical Framework and Methods
Mountain vegetation mapping has its own characteristics in the field of spatial mapping, including the spatial aggregation of vegetation observed data, the vertical difference with the elevation change, and the difficulty of collecting samples. Meanwhile, fine and accurate mountain vegetation mapping have important practical needs in ecological and other fields. These particularities and requirements, as well as the rapid development of RS technology and the enrichment of multi-source spatial data, necessitate a new innovative design for mountain vegetation mapping. To effectively mine the spatial distribution pattern of vegetation in mountain areas, we propose an intelligent technical mapping framework of vegetation types based on the basic units of geo-objects from HSR-RS images. Figure 1 illustrates the entire procedure based on multi-source geospatial data, which contains five steps, namely, geo-object extraction, multi-feature extraction, geo-object-based training sample collection, supervised classification, and accuracy assessment. The following subsections describe the implementation of these processes, described in Figure 1. framework of vegetation types based on the basic units of geo-objects from HSR-RS images. Figure 1 illustrates the entire procedure based on multi-source geospatial data, which contains five steps, namely, geo-object extraction, multi-feature extraction, geo-object-based training sample collection, supervised classification, and accuracy assessment. The following subsections describe the implementation of these processes, described in Figure 1.

Geo-Object Extraction for Vegetation Mapping
With the improvement of the spatial resolution of HSR-RS data, geo-objects with clear polygon edges have been universally recognized as basic units of various geographic information mapping [35][36][37][38]. The advantages of the geo-object-based image analysis (GEOBIA) framework lie in the possibility of using expert knowledge, multi-source object-related features, such as spectrum, shape, texture, and context features, and ancillary data [39][40][41]. Hence, the first step of our method is extracting internally homogeneous geo-objects for vegetation mapping, which is typically generated by the bottom-up segmentation of HSR-RS images. However, there are special requirements for geo-object extraction in mountain areas, as some spatial distribution regularities such as the inherent boundaries of ridges and valleys should be followed [42,43]. Therefore, we designed a geo-object extraction process by combining a top-down partitioning according to terrain data with the traditional bottom-up segmentation of HSR-RS images [43]. Topographic partitioning was first performed in order to partition a large mountain area into several smaller zones, and segmentation was then implemented in each individual zone. Thus, we could obtain initial geo-semantic objects with terrain, spectrum, and texture homogeneities. Utilizing these spatial distribution characteristics for a top-down partitioning, combined with the traditional bottom-up segmentation process, was expected to provide semantic objects with surrounding information for mountain vegetation mapping. Next, we explain the relevant details of the two main steps.

Topographic Partitioning
First, we partitioned a targeted mountainous area into several smaller zones as independent units for subsequent segmentation by analyzing the topographic features of mountains that are reflected in digital elevation model (DEM) data [44]. The slope aspect

Geo-Object Extraction for Vegetation Mapping
With the improvement of the spatial resolution of HSR-RS data, geo-objects with clear polygon edges have been universally recognized as basic units of various geographic information mapping [35][36][37][38]. The advantages of the geo-object-based image analysis (GEOBIA) framework lie in the possibility of using expert knowledge, multi-source object-related features, such as spectrum, shape, texture, and context features, and ancillary data [39][40][41]. Hence, the first step of our method is extracting internally homogeneous geo-objects for vegetation mapping, which is typically generated by the bottom-up segmentation of HSR-RS images. However, there are special requirements for geo-object extraction in mountain areas, as some spatial distribution regularities such as the inherent boundaries of ridges and valleys should be followed [42,43]. Therefore, we designed a geo-object extraction process by combining a top-down partitioning according to terrain data with the traditional bottom-up segmentation of HSR-RS images [43]. Topographic partitioning was first performed in order to partition a large mountain area into several smaller zones, and segmentation was then implemented in each individual zone. Thus, we could obtain initial geo-semantic objects with terrain, spectrum, and texture homogeneities. Utilizing these spatial distribution characteristics for a top-down partitioning, combined with the traditional bottom-up segmentation process, was expected to provide semantic objects with surrounding information for mountain vegetation mapping. Next, we explain the relevant details of the two main steps.

Topographic Partitioning
First, we partitioned a targeted mountainous area into several smaller zones as independent units for subsequent segmentation by analyzing the topographic features of mountains that are reflected in digital elevation model (DEM) data [44]. The slope aspect feature calculated from DEM data was selected as the key factor to generate topographic zones due to its significant impact on the temperature variation and the species distribution in mountain areas [45][46][47]. A top-down spatial partitioning was conducted based on the slope aspect, and some topographic zones represented by large-scale polygons were generated as boundary constraints of the following image segmentation step.

Constrained Segmentation
Second, to generate small-scale geo-objects within each topographic zone, we further carried out a mean-shift segmentation via bottom-up region merging based on spectral and texture homogeneity from HSR-RS images [48][49][50]. In this process, the pixels in each topographic zone were clustered into several meaningful image objects according to the given scale parameter. Finally, the segmentation boundaries of geo-objects were extracted as basic mapping polygons within each topographic zone, and a whole geo-object-based map could be structured in the mountain areas. We assumed the vegetation type of each geo-object was unique and needed to be identified according to the following stages.

Multi-Feature Extraction for Geo-Objects
The terrain in mountainous areas is always complex, and the phenomenon of a "different type of object with the same spectrum and the same type of objects with a different spectrum" is more serious for vegetation on HSR-RS images. It is difficult for recognition with one data source to meet the application requirements of a fine identification of vegetation types. It was necessary to overlay multi-source data on these mapping units (i.e., geo-objects) to enrich their feature description. Hence, we made full use of various spatial data to design a multi-source feature of geo-objects as auxiliary information.
First, three types of image-based features of geo-objects, i.e., spectrum, shape, and texture features, were extracted from HSR-RS images. Typical features in this part included the mean and standard deviation of spectral signals, the normalized difference water index (NDWI), the normalized difference vegetation index (NDVI), the length-width ratio, the main direction, the shape index [51], and texture measures based on gray-level co-occurrence matrices (GLGMs) [52].
Second, several topographic and geomorphological features of geo-objects were extracted based on DEM data and a geomorphological map. The elevation, slope, slope direction (north or south), aspect, the degree of hill shade, and the geomorphological type were collected within each geo-object.
Third, meteorological factors, soil factors, land cover type, the net primary productivity (NPP) layer, and the vegetation index sequence of annual, quarterly, and monthly averages were also added as covariate features according to the calculation using relevant spatial information products.
After integrating these multi-source spatial data into each geo-object, we could obtain a multi-feature description of each geo-object. Thus, a high-dimensional attribute table was constructed for the subsequent discrimination of vegetation types using ML algorithms.

Geo-Object-Based Training Sample Collection
For ML-based supervised classification, adequate training samples are prepositive information. However, as we stated in the introduction, it was difficult to collect a large number of vegetation samples by field surveys in mountainous areas. Therefore, this step concerns the method of rapidly collecting geo-object-based training samples from historical material of interpretations. Historical maps of vegetation type were easy to be obtained, but they were often outdated and made in large-scales with a rough spatial resolution. Although these previous products do not match the current requirement of vegetation mapping based on HSR-RS images, they do contain effective information from previous expert interpretations [53,54]. Thus, rapidly transferring labels of different vegetation types (i.e., the values of the target variable in the ML framework) was feasible if the incorrect or outdated information could be effectively found and filtered out. The implementation of this part is shown in Figure 2 with 4 steps. vious expert interpretations [53,54]. Thus, rapidly transferring labels of different vegetation types (i.e., the values of the target variable in the ML framework) was feasible if the incorrect or outdated information could be effectively found and filtered out. The implementation of this part is shown in Figure 2 with 4 steps.

HSR-RS image of mountainous area
Are all the grids covered by the polygon of the geo-object labeled by a same label?  Step 1: The spatial uniform selection of one geo-object was carried out by randomly thinning among every eight geo-objects and ensuring the uniformity of candidate geoobjects in spatial distribution.
Step 2: A conversion from vector data to raster data was first conducted on a vegetation type map (vector format with a rough-scale) from historical interpretation. Then, the grids and their labels of vegetation types in the raster data were overlaid on the polygons of geo-objects. Candidate geo-objects for sampling were selected if all the girds in one polygon were labeled by the same label. This step was referred to as the first round of checking. Geo-objects that do not satisfy this restriction were filtered out and could not be selected as training samples. There may have been a large number of such geo-objects due to the scale differences (i.e., the transfer from the rough-scale historical map to the finescale of geo-objects from HSR-RS images).
Step 3: Purification operation was further performed on the candidate geo-objects, as some incorrect labels may exist in the sets. Those geo-objects could be referred to as reliable samples if their concerned features (i.e., elevation, slope aspect, etc.), spatial positions, distribution pattern, and vegetation labels were in accordance with the rules of vegetation distribution from local expert knowledge. Some unreliable samples were rejected in the candidate set, and the purified geo-objects were collected as the final training samples with transferred labels of vegetation types. This step was referred to as the second round of checking. According to this rapid sampling scheme via prior label transfer and selection constraint under two rounds of checking, we could automatically collect a large number of geo-object-based training samples by viewing pre-interpreted vegetation maps as an important reference.
Step 4: An upsampling was further conducted to balance the numbers of training samples for different vegetation types. Thus, a structured geo-object-based table, including multiple features (i.e., environmental variables), several labeled geo-objects (i.e., training samples), and a larger number of unlabeled geo-objects (i.e., test samples), was prepared for the ML methods in the next step. Step 1: The spatial uniform selection of one geo-object was carried out by randomly thinning among every eight geo-objects and ensuring the uniformity of candidate geoobjects in spatial distribution.
Step 2: A conversion from vector data to raster data was first conducted on a vegetation type map (vector format with a rough-scale) from historical interpretation. Then, the grids and their labels of vegetation types in the raster data were overlaid on the polygons of geo-objects. Candidate geo-objects for sampling were selected if all the girds in one polygon were labeled by the same label. This step was referred to as the first round of checking. Geo-objects that do not satisfy this restriction were filtered out and could not be selected as training samples. There may have been a large number of such geo-objects due to the scale differences (i.e., the transfer from the rough-scale historical map to the fine-scale of geo-objects from HSR-RS images).
Step 3: Purification operation was further performed on the candidate geo-objects, as some incorrect labels may exist in the sets. Those geo-objects could be referred to as reliable samples if their concerned features (i.e., elevation, slope aspect, etc.), spatial positions, distribution pattern, and vegetation labels were in accordance with the rules of vegetation distribution from local expert knowledge. Some unreliable samples were rejected in the candidate set, and the purified geo-objects were collected as the final training samples with transferred labels of vegetation types. This step was referred to as the second round of checking. According to this rapid sampling scheme via prior label transfer and selection constraint under two rounds of checking, we could automatically collect a large number of geo-object-based training samples by viewing pre-interpreted vegetation maps as an important reference.
Step 4: An upsampling was further conducted to balance the numbers of training samples for different vegetation types. Thus, a structured geo-object-based table, including multiple features (i.e., environmental variables), several labeled geo-objects (i.e., training samples), and a larger number of unlabeled geo-objects (i.e., test samples), was prepared for the ML methods in the next step.

Supervised Classification of Vegetation Types
Next, we needed to train a classification model using ML methods to construct the relationship between the multiple features (i.e., the explanatory variables) and the vegetation type (i.e., target variable). In this study, tree-based ML methods were employed to generate spatial classifiers for vegetation types that train samples as decision trees (DTs, i.e., prediction models) of discriminant rules. Among these tree-based ML methods, a random forests (RF) algorithm was relatively efficient to fit DTs [55]. The RF method has a wide application prospect, as it outperforms most other classification methods, such as the ANN and support vector machine (SVM), in terms of accuracy by overcoming the issue of overfitting. Its essence is to use a variety of tree-based classifiers to vote and determine the results of integration. That is, N trees will have N classification results for an input sample. The RF method then integrates all the classified voting results and specifies the type with the most voting times as the final output. In the field of remote sensing, its advantages are that it can achieve better extraction results without setting many parameters and can effectively run on large datasets, thus meeting the requirements of large-scale learning. It has been proven to be robust, presenting high prediction accuracies, and thus was beneficial for fitting in our framework when modeling nonlinear relationships between a large number of variables with vegetation types. Hence, in this paper, this RF algorithm was selected for a training classification model due to its robustness of learning within a high dimensional feature space. More detailed descriptions and implementations of this algorithm can be found in the literature [55].
After the RF-based training, supervised geo-object-based classification was conducted to predict the labels (i.e., vegetation type) for all the geo-objects without vegetation type information. Their predicted vegetation type was revised with a label check using the rules of vegetation distribution from local expert knowledge. Finally, a geo-object-based map of vegetation type was produced with their variation of species.

Accuracy Assessment
Accuracy assessment was further conducted to evaluate the geo-object-based mapping results. Some points with determined vegetation types were pre-collected from field survey observations, and their spatial matched geo-objects were referred to as verification samples. By comparing classified resulted with these verification samples, measures of overall accuracy (OA) and kappa coefficient based on the confusion matrix [56] were retrieved to evaluate the accuracy of vegetation classification. The OA was computed by dividing the number of correctly classified geo-objects by the number of entire validation samples, while the kappa coefficient was calculated to determine whether the resulted were significantly better than those of a random assignment. As the measures of the kappa coefficient were considered a bad metric in remote sensing classification tasks [57], OA was mainly used as a measure of accuracy evaluation in this paper, and a larger value denotes a better result, and vice versa.

The Study Area and Its Vegetation Classification System
The experimental study area was selected in Taibai Mountain, the main peak of the Qinling Mountains (see Figure 3). It is one of the most famous mountains in Central China with an altitude of 3767.2 m, and its relative height difference is more than 3000 m (see Figure 4).
The Qinling Mountains are the natural dividing boundary line between the north and the south of China. Horizontally, they have a transitional nature from one physical geographical condition to another and transitional properties of evolution from one geological tectonic unit to another. Vertically, they have unique vertical landscapes with clear vertical zones of climate distribution, soil distribution, and vegetation distribution. This kind of transition and complexity is most obvious in Taibai Mountain.
Its characteristics are formed due to its special geographical location, the southern margin of the warm temperate zone and the northern boundary of the subtropical zone. It is controlled by the Mongolian cold air mass in winter and affected by the Pacific subtropical high zone in summer, which causes the north-south transition and alternation of climatic conditions. Under such circumstances, a wide range of species, rich biological resources, and special natural landscapes exist in the well-preserved ecosystem of Taibai Mountain. In particular, the phenomenon of vertical zoning of mountain vegetation is significant, which makes it a classic study area of vegetation spectrum and mapping in China.   The Qinling Mountains are the natural dividing boundary line between the north and the south of China. Horizontally, they have a transitional nature from one physical geographical condition to another and transitional properties of evolution from one geological tectonic unit to another. Vertically, they have unique vertical landscapes with clear vertical zones of climate distribution, soil distribution, and vegetation distribution. This kind of transition and complexity is most obvious in Taibai Mountain.
Its characteristics are formed due to its special geographical location, the southern margin of the warm temperate zone and the northern boundary of the subtropical zone. It is controlled by the Mongolian cold air mass in winter and affected by the Pacific subtropical high zone in summer, which causes the north-south transition and alternation of climatic conditions. Under such circumstances, a wide range of species, rich biological resources, and special natural landscapes exist in the well-preserved ecosystem of Taibai Mountain. In particular, the phenomenon of vertical zoning of mountain vegetation is   Figure 6). The Qinling Mountains are the natural dividing boundary line between the north and the south of China. Horizontally, they have a transitional nature from one physical geographical condition to another and transitional properties of evolution from one geological tectonic unit to another. Vertically, they have unique vertical landscapes with clear vertical zones of climate distribution, soil distribution, and vegetation distribution. This kind of transition and complexity is most obvious in Taibai Mountain.
Its characteristics are formed due to its special geographical location, the southern margin of the warm temperate zone and the northern boundary of the subtropical zone. It is controlled by the Mongolian cold air mass in winter and affected by the Pacific subtropical high zone in summer, which causes the north-south transition and alternation of climatic conditions. Under such circumstances, a wide range of species, rich biological resources, and special natural landscapes exist in the well-preserved ecosystem of Taibai Mountain. In particular, the phenomenon of vertical zoning of mountain vegetation is Thus, the vegetation types in this mountainous area are abundant. According to the 1:1,000,000 vegetation survey of China, Table 1 presents a vegetation classification system of Taibai Mountain with a seven vegetation type group and fourteen vegetation types (including the type of no vegetation). The goal of our research was to realize such vegetation mapping from a small-scale perspective. The codes were extracted from the legend of a 1:1,000,000 vegetation map of China.

Experimental Data Set
The accurate identification of vegetation types in the above classification system relies on multi-source data, as geospatial data can make the modeling factors pluralistic. The following data were collected for our designed geo-object extraction, multifeature calculation, and quick sampling.

HSR-RS Images
First of all, HSR-RS images in wide space ranges can provide high-resolution sources of visual features with the advantages of a fast acquisition speed and high comprehensiveness [58], which makes it possible to map vegetation on a fine scale. Therefore, three phases of Chinese GF-2 satellite images with a 0.8 m spatial resolution were collected and preprocessed in our study. They were acquired in July 2016 (i.e., Figure 3), November 2016, and January 2017, respectively. In addition, four Sentinel-2A images taken from four different seasons were downloaded as auxiliary data. Based on these data, three types of image features, i.e., the spectrum, shape, and texture features introduced in Section 2.2, were further extracted according to the spectral reflectance, geometric shape, and texture representation of geo-objects. The relevant calculation steps are described in Section 2.2.

Topographic and Geomorphic Data
Topography and elevation affect the vegetation distribution, as the general trend is that species richness decreases with the increase of altitude [59]. Here, five commonly used modeling factors, including elevation, slope, aspect, degree of hill shade, and geomorphological type, were, respectively derived from a conterminous 30 m ASTER GDEM dataset (see Figure 4, http://www.gdem.aster.ersdac.or.jp/) and a public geomorphic dataset with a 1 km spatial resolution provided by the Data Center for Resources and Environmental Sciences (DCRES) of the Chinese Academy of Sciences (http://www.resdc.cn). The mean values in the DEM-derived data were used as topographic features.
In addition, the mountain area was artificially divided into south and north slopes according to the ridgeline (see the yellow line in Figure 5). Thus, whether the slope direction was the south or the north was an important factor of geo-objects in the study area. Besides the north-south slope division, topographic partitioning of Section 2.1.1 was further carried out along the ridges and valleys (i.e., the green lines in Figure 5) for extracting constraint boundaries of topographic zones. Moreover, image segmentation based on a GF-2 HSR-RS image was further conducted in each topographic zone (see Step of Section 2.1.2 and the red lines in Figure 6). Segmentation boundaries of geo-objects were extracted in the vector format of polygons (see the white lines in Figure 6). The geo-objects with clear boundaries were generated as basic polygons of mapping in this study. The relevant calculation steps are described in Section 2.1.
In addition, the mountain area was artificially divided into south and north slopes according to the ridgeline (see the yellow line in Figure 5). Thus, whether the slope direction was the south or the north was an important factor of geo-objects in the study area. Besides the north-south slope division, topographic partitioning of Section 2.1.1 was further carried out along the ridges and valleys (i.e., the green lines in Figure 5) for extracting constraint boundaries of topographic zones. Moreover, image segmentation based on a GF-2 HSR-RS image was further conducted in each topographic zone (see Step of Section 2.1.2 and the red lines in Figure 6). Segmentation boundaries of geo-objects were extracted in the vector format of polygons (see the white lines in Figure 6). The geo-objects with clear boundaries were generated as basic polygons of mapping in this study. The relevant calculation steps are described in Section 2.1. (a) Image details and slope dividing lines of subarea 1 (b) Image details and slope dividing lines of subarea 2 In addition, the mountain area was artificially divided into south and north slopes according to the ridgeline (see the yellow line in Figure 5). Thus, whether the slope direction was the south or the north was an important factor of geo-objects in the study area. Besides the north-south slope division, topographic partitioning of Section 2.1.1 was further carried out along the ridges and valleys (i.e., the green lines in Figure 5) for extracting constraint boundaries of topographic zones. Moreover, image segmentation based on a GF-2 HSR-RS image was further conducted in each topographic zone (see Step of Section 2.1.2 and the red lines in Figure 6). Segmentation boundaries of geo-objects were extracted in the vector format of polygons (see the white lines in Figure 6). The geo-objects with clear boundaries were generated as basic polygons of mapping in this study. The relevant calculation steps are described in Section 2.1.

Sample Points for Verification
For accuracy assessment, it was necessary to collect some sample points to evaluate the geo-object-based mapping resulted. Thus, in May 2018, we selected several points along the sampling lines located in the forests around the roads and where we could enter.

Sample Points for Verification
For accuracy assessment, it was necessary to collect some sample points to evaluate the geo-object-based mapping resulted. Thus, in May 2018, we selected several points along the sampling lines located in the forests around the roads and where we could enter. As shown in Figure 4, our sampling points were mainly distributed on a cross-section in the north-south direction, the slopes on the top of Taibai Mountain, and the periphery of the main road into Taibai Mountain. More than one sample plot needed to be be set for each 150 m elevation along the sample line. We recorded the number, survey date, altitude, longitude, latitude, slope direction, and slope of each sample site. Their vegetation type labels were labeled by local vegetation experts on the spot. In October 2018, some points were supplemented along the sampling lines. After this, all collected points were organized and linked into corresponding geo-objects, and the vegetation types of 204 geo-objects were thus labeled in the study area. They accounted for about 1% of all the geo-objects and were used for validation of the mapping results. The relevant calculation step of accuracy assessment is described in Section 2.5.

Land Cover Data
To distinguish the influence of different land cover types on vegetation mapping, we also collected a land cover product to mask some geo-objects for training sample collection. Here, the product with a 10 m spatial resolution [29,60], FROM-GLC10 (http://data.ess.tsinghua.edu.cn), was employed to be integrated into geo-objects as prior land cover information. The geo-objects with four land cover types of forest, shrubland, grassland, and cropland were referred to as focus targets for the training sample collection of Section 2.3, and those with the land cover types of wetland, water, tundra, impervious surface, bare land, and snow/ice were removed, being areas of no vegetation. The transferred land cover information was also employed as a prior feature of geo-objects for our supervised classification.

Meteorological and Climate Data
There is always a certain relationship between the distribution of vegetation and the meteorological and climatic conditions, especially in the areas of Taibai Mountain, a mountain with a large elevation difference. Here, the annual average rainfall, annual average sunshine hours, cumulative annual accumulated temperature, and mean temperature were obtained from the datasets of DCRES (http://www.resdc.cn) as well as the dryness and wetness index products with a 1 km spatial resolution. They were used to calculate the corresponding features of the geo-objects.

Soil-Related and NPP Data
Soil products with a 250 m spatial resolution were also collected in the feature stack. The grids in the SoilGrids 250 m products [61] with the datasets of soil types, soil physical properties, and soil chemical properties were downscaled to a 0.8 m resolution using bicubic resampling and averaged within each geo-object using their mode values or mean values.
In addition, NPP is a key parameter in characterizing terrestrial ecological processes and has a great connection with vegetation. Thus, an NPP dataset produced by DCRES was further employed to extract an indicator of the process of a surface carbon cycle.

Vegetation Index Sequence Data
The NDVI can well reflect the vegetation cover on the surface and has great significance for monitoring of vegetation resources. The NDVI dataset can be used to study and monitor the regional vegetation and its change. Therefore, three public vegetation index sequence datasets inversed from SPOT-4 satellite imagery with a 1 km spatial resolution were collected to characterize the parameters of vegetation and their changes. They were produced via the average values of the NDVI in China since 1998 within time resolutions of year, quarter, and month, respectively. Seventeen data in 2016-1 annual datum, 4 quarterly data, and 12 monthly data-were used to extract the geo-objects' covariate features of the vegetation index sequence.

Vegetation Type Map from Historical Interpretation and Rule Set from Local Expert Knowledge
As a typical research area of the north-south transition zone of China, the spatial distribution of vegetation in Taibai Mountain was previously studied with a rough scale, such as the 1:1,000,000 vegetation map [62]. Figure 7 presents a collected 1:500,000 vegetation type map of Taibai Mountain from historical interpretation. This was the vegetation mapping result at the highest resolution in the region. However, although most of the results are correct, there are many wrong labels of vegetation types based on our verification sample points. Furthermore, its spatial scale is not fine enough for a high-resolution survey. The intuitive mapping errors are caused by the fact that the distribution of vegetation does not follow the knowledge of local experts well, which must be referred to in the classification process [63].
Remote Sens. 2021, 13, x FOR PEER REVIEW 14 of 24 identify many false labels in Figure 7 and transfer the correct vegetation type information to the geo-objects so as to quickly collect a large number of reliable training samples.

Result Analysis
In the analysis of the experimental results, we first show the effectiveness of the sampling scheme and the advantages of our results based on the proposed method by comparing the traditional historically interpreted vegetation maps.

The Collected Geo-Object-Based Samples and Its Mapping Results
The processed data were input to the mapping procedure of Figure 1. A number of training samples were immediately collected in a rapid way based on the geo-object-units of Figure 6. The distributions of the selected training samples for the target mapping are shown in Figure 8, based on some reliable geo-objects according to the scheme of Figure  2. After visual cross-examination, we can confirm that these uniformly distributed samples were basically correct, as the geo-object-based sampling method overcomes the limitation of salt and pepper noise in the pixel-based method. Furthermore, an upsampling was further used because the numbers of training samples of different types were unbalanced in the initial set. Thus, the unlabeled geo-objects in Figure 8 could be classified ac-  Table 2. It can be concluded that the vertical variation of vegetation on the north slope and south slope of Taibai Mountain is only a little different. For example, the basal zone of the north slope is different from that of the south slope, as their intervals of elevation of cultivated vegetation are different. This is due to the different geographical locations and climatic conditions on the north slope and south slope. The factors such as wind, light, precipitation, soil properties, temperature, longitude and latitude, and slope direction have an impact on the vertical distribution of vegetation.
This prior knowledge of vertical distribution is an important law that has been summarized by predecessors. It can be referred to as effective rules that should be followed in the mapping process. In addition, because the terrain is complex, a comprehensive and detailed ground investigation is difficult, and there is an insufficient amount of measurable field samples. In this study, the ranges of elevation values for different vegetation groups in Table 2 were set as basic rules of vegetation mapping. They were employed in the second round of checking of geo-object-based training sample collection (see the relevant calculation step in Section 2.3). That is, if the vegetation distribution in Figure 7 violates these rules, the corresponding labels are considered as wrong, and the relevant geo-objects are removed from candidate geo-objects of training samples. In this way, we can identify many false labels in Figure 7 and transfer the correct vegetation type information to the geo-objects so as to quickly collect a large number of reliable training samples.

Result Analysis
In the analysis of the experimental results, we first show the effectiveness of the sampling scheme and the advantages of our results based on the proposed method by comparing the traditional historically interpreted vegetation maps.

The Collected Geo-Object-Based Samples and Its Mapping Results
The processed data were input to the mapping procedure of Figure 1. A number of training samples were immediately collected in a rapid way based on the geo-object-units of Figure 6. The distributions of the selected training samples for the target mapping are shown in Figure 8, based on some reliable geo-objects according to the scheme of Figure 2. After visual cross-examination, we can confirm that these uniformly distributed samples were basically correct, as the geo-object-based sampling method overcomes the limitation of salt and pepper noise in the pixel-based method. Furthermore, an upsampling was further used because the numbers of training samples of different types were unbalanced in the initial set. Thus, the unlabeled geo-objects in Figure 8 could be classified according to the training of these collected samples. proposed purification operation (i.e., the second round of checking in Figure 2) can eliminate erroneous transferring labels to a certain extent and then improve the accuracy of the classifier. In addition, although the RF algorithm can lightly deal with the imbalance of different types of samples, this problem affects the performance of classifiers to a certain extent. The proposed automatic sampling may aggravate this imbalance. Therefore, the added up-resampling scheme in the collection process balances the number of samples and improves the performance of the classifiers. Thus, taken together, the combination of purified and balanced sampling can lead to more reliable training samples for vegetation classification in the study area. Clearly, it can be seen that pre-interpreted vegetation maps are important historical bases for updating and contain a substantial amount of expert knowledge. Prior label transferring and selection constraints using two rounds of checking are effective ways of collecting samples quickly in mountain areas. This utilization is also worth popularizing in other mapping fields.   To test the training sample collection, we compared the accuracies achieved using different sampling schemes. Here, the mapping results of four schemes, i.e., unpurified and unbalanced sampling (i.e., the implementation without Step 3 and Step 4 in Section 2.3), unpurified and balanced sampling (i.e., the implementation without Step 3 in Section 2.3), purified and unbalanced sampling (i.e., the implementation without Step 4 in Section 2.3), and purified and balanced sampling (i.e., the implementation with Step 3 and Step 4 in Section 2.3), are compared in Table 3. The comparisons in accuracy clearly show that the proposed purification operation (i.e., the second round of checking in Figure 2) can eliminate erroneous transferring labels to a certain extent and then improve the accuracy of the classifier. In addition, although the RF algorithm can lightly deal with the imbalance of different types of samples, this problem affects the performance of classifiers to a certain extent. The proposed automatic sampling may aggravate this imbalance. Therefore, the added up-resampling scheme in the collection process balances the number of samples and improves the performance of the classifiers. Thus, taken together, the combination of purified and balanced sampling can lead to more reliable training samples for vegetation classification in the study area. Clearly, it can be seen that pre-interpreted vegetation maps are important historical bases for updating and contain a substantial amount of expert knowledge. Prior label transferring and selection constraints using two rounds of checking are effective ways of collecting samples quickly in mountain areas. This utilization is also worth popularizing in other mapping fields.

Comparison with Historical Interpreted Vegetation Maps
Fine-scale vegetation mapping is relatively rare among previous studies. For this study area, there are two kinds of historically interpreted vegetation maps, namely a 1:1,000,000 vegetation map of China and our referenced 1:50,000 vegetation map, with which it is difficult to meet the requirements of a fine vegetation survey. Table 4 compares different mapping results, where our interpreted vegetation map was obtained using the optimal experimental setting (see Figure 9). Note that, for Taibai Mountain, the accuracy is relatively low for the 1:1,000,000 vegetation map, which was produced several years ago based on a rough, national scale. In addition, the quantitative accuracy verification also shows that our geo-object-based mapping results (i.e., Figure 9) are a relatively large improvement in accuracy compared with the 1:50,000 vegetation map we referred to in the process of sample collection. Clearly, the mapping results using our framework are better than these public datasets, and the improvement in accuracy for our fine-scale-based results is remarkable with respect to rough-scale-based results, which are partly caused by the advantage of our geo-object-based method with HSR-RS images. Furthermore, Figures 10 and 11 show comparisons in the subareas A and B of Figure 9, respectively. Obviously, the results from the 1:50,000 vegetation map were visually coarse and cannot produce the information consisting of the geographical entities. It is conceivable that such rough-scale maps were artificially interpreted by visually combining mediumand low-resolution remote sensing images. On the contrary, our geo-object-based mapping can provide fine vegetation type information by homogeneous polygons with irregular boundaries (see Figures 10c and 11c). In addition, large areas of vegetation were generally mapped into a whole one polygon with non-smooth and coarse boundaries via the roughscale-based mapping (see Figures 10b and 11b). Therefore, compared with conventional mapping, our geo-object-based mapping can provide visually superior maps by showing more abundant vegetation information and spatial details in pattern differences. It is conducive to the discovery of knowledge of geographic patterns is based on vegetation spatial variation law. map we referred to in the process of sample collection. Clearly, the mapping results using our framework are better than these public datasets, and the improvement in accuracy for our fine-scale-based results is remarkable with respect to rough-scale-based results, which are partly caused by the advantage of our geo-object-based method with HSR-RS images.  Furthermore, Figures 10 and 11 show comparisons in the subareas A and B of Figure 9, respectively. Obviously, the results from the 1:50,000 vegetation map were visually coarse and cannot produce the information consisting of the geographical entities. It is conceivable that such rough-scale maps were artificially interpreted by visually combining medium-and low-resolution remote sensing images. On the contrary, our geo-object-based mapping can provide fine vegetation type information by homogeneous polygons with irregular boundaries (see Figures 10c and 11c). In addition, large areas of vegetation were generally mapped into a whole one polygon with non-smooth and coarse boundaries via the rough-scale-based mapping (see Figures 10b and 11b). Therefore, compared with conventional mapping, our geo-object-based mapping can provide visually superior maps by showing more abundant vegetation information and spatial details in pattern differences. It is conducive to the discovery of knowledge of geographic patterns is based on vegetation spatial variation law.

Analysis of Vertical Distribution of Vegetation on the North and South Slopes of Taibai Mountain
The variation of vegetation diversity with altitude has always been an interesting issue for ecologists. According to our vegetation mapping results in Figure 9, we can further analyze the vertical distribution of the vegetation of Taibai Mountain. An intuitive comparison of the mapping results of Figure 9 and the DEM data of Figure 4 shows there is a high correlation between elevation and vegetation distribution in this area. The effect of vertical distribution is also very obvious for different types of vegetation. Taibai Mountain is the highest peak in the eastern part of the mainland of China. The vertical height difference from the foot to the top of the mountain is larger than 3000 m. With the increase of altitude, variant climate zones are formed. Similar to the latitude gradient, the altitude gradient, as another geographic gradient, has become an important aspect of the gradient pattern of biodiversity due to its inclusion of various environmental factors such as temperature, humidity, and light [64]. Thus, under the comprehensive action of various environmental factors (geology, landform, climate, soil, etc.), the vegetation in Taibai Mountain also changes with the elevation and forms the vegetation belt. According to the existing literature, the vegetation in the experimental area can be divided into four vegetation belts, namely, an alpine shrub meadow belt, a coniferous forest belt, a birch forest belt, and a deciduous oak forest belt. Each vegetation belt can be further divided into several vegetation subzones according to the different main species. This is basically consistent with the results of Figure 9. Our designed sample collection scheme, using the ruleset from local expert knowledge from Table 2, plays an important role in these results.
Additionally, there are similar vegetation community types between the north and south slopes of Taibai Mountain. Their difference in vegetation distribution is small, which is mainly reflected in the upward movement of vegetation formed by the change in natural conditions in terms of climate and meteorology, such as heat and precipitation. That is, the upper limit of the distribution of the main vegetation communities on the south slope is generally higher than that on the north slope. Moreover, compared with the south slope, the north slope is steeper, and its relative height difference is larger, which makes its vertical distribution of vegetation relatively complete and clear. While the boundary of the reserve is generally high in the south slope, the area of broadleaf forests forest is not as large as that of the north slope. These phenomena can also be clearly seen in Figure 9.

Relative Importance Analysis of Environmental Variables
As mentioned above, the difference in environmental factors affected the vegetation distribution in the experimental area. Therefore, we further analyzed the role of different environmental factors in the employed vegetation classification model. Based on the multisource data we used, 73 features (i.e., environmental variables) were input into the RF classifier for learning. We used this algorithm to extract the relative importance of different features in the modeling based on the measure of information entropy. The estimated importance of the top 20 significant environmental variables is shown in Figure 12, from which the following conclusions can be drawn.
the distribution of vegetation in Taibai Mountain.
Third, many indicators indicating vegetation growth and ecological status are highly correlated with vegetation distribution. For example, the features of the NDVI from GF-2 data (ranking 5th), the average value of the NDVI in the first quarter (6th), July (15th), and April (20th) from SPOT-4 data, and the net primary productivity (9th) have great significance for prediction. These indicators show the growth degree of different types of vegetation. Meanwhile, the variables of land cover type (ranking 10th) and soil erosion value (10th) also determine the external surface of land cover and internal soil conditions of vegetation growth.
Fourthly, some image-extracted features, such as two measures of a Gray-level Cooccurrence Matrix (GLCM) from GF-2 data (angular second moment (ranking 7th) and contrast (14th)), the border length of the polygon of the geo-object (16th), and the mean spectrum value of the first band from GF-2 data (19th), have a certain correlation with the determination of vegetation types. This can be explained by the fact that some types of vegetation are reflected in the texture and spectrum of HSR-RS images [65].
First, the climate-related factors are most important for the distribution of vegetation, as there is a large proportion of this kind of variables, such as the annual temperature (ranking 1st), wetness index (2nd), annual rainfall (8th), annual accumulated temperature (13th), and dryness index (18th). It is understood that the variant climate zones are critical for the formed vegetation distribution.
Second, the terrain-related factors, including the variables of geomorphological type (ranking 3rd), the mean value of elevation (4th), the south or north slope (11th), and the standard deviation value of elevation (12th), are also important features influencing the vegetation classification. As stated above, elevation and slope direction play a key role in the distribution of vegetation in Taibai Mountain.
Third, many indicators indicating vegetation growth and ecological status are highly correlated with vegetation distribution. For example, the features of the NDVI from GF-2 data (ranking 5th), the average value of the NDVI in the first quarter (6th), July (15th), and April (20th) from SPOT-4 data, and the net primary productivity (9th) have great significance for prediction. These indicators show the growth degree of different types of vegetation. Meanwhile, the variables of land cover type (ranking 10th) and soil erosion value (10th) also determine the external surface of land cover and internal soil conditions of vegetation growth.
Fourthly, some image-extracted features, such as two measures of a Gray-level Cooccurrence Matrix (GLCM) from GF-2 data (angular second moment (ranking 7th) and contrast (14th)), the border length of the polygon of the geo-object (16th), and the mean spectrum value of the first band from GF-2 data (19th), have a certain correlation with the determination of vegetation types. This can be explained by the fact that some types of vegetation are reflected in the texture and spectrum of HSR-RS images [65].
Finally, the importance of other variables is relatively small. However, undoubtedly, they marginally improve the accuracy of vegetation classification. Overall, the analysis results of the importance of variables are basically consistent with previous knowledge of the vegetation characteristics of Taibai Mountain.

The Achievements and Novelty of This Study
Although it seems that the proposed methodology is tailored for the particular case that we are dealing with, the designed pipeline in this paper can be generalized to a large number of mountainous areas. That is, the genericity of the proposed pipeline is acceptable in a broad sense. The reasons for this as follows: First of all, with the continuous improvement of the spatial and spectral resolution of remote sensing images, fine and accurate vegetation mapping has become possible. Meanwhile, the multi-source auxiliary data, such as the historical interpretations of vegetation survey maps, are accumulated continuously and obtained easily. How these data can be used in the intelligent mapping of vegetation has been of wide concern among scholars. The proposed method in this paper was designed based on the latest available data. In addition, due to the complexity of mountainous terrain, it is difficult to obtain basic mapping units and vegetation samples from field surveys. This problem is generally recognized as a difficult challenge. In order to overcome these bottlenecks, we developed new methods of geo-object extraction and training sample collection. These are designs oriented to actual needs and situations. Furthermore, the popular-inference-method-based ML was introduced into the process of vegetation type classification. This way is reliable via an automatic pattern and has good generalization ability.
Based on these illustrations, the novelty of this research can be further summarized as follows: First, the basic mapping units in this study were geo-objects with finer polygon boundaries. They were extracted from H-RS images and DEM data according to topographic partitioning and constrained segmentation, which are more consistent with our cognition on the distribution unit of vegetation in mountainous areas and reflect the continuity and spatial gradient characteristics of the vegetation distribution. Meanwhile, multi-source data were integrated with these geo-object-based units, and mapping on a micro-spatial scale was conducted to meet the requirements of fine vegetation surveys. Second, multi-source auxiliary data were applied synergistically in the mapping process via data association to these geo-object units. Potential correlation factors were identified to improve vegetation classification accuracy by designing high-dimensional environmental variables. Meanwhile, nonlinear modeling based on the RF-based ML algorithm was employed in the domain of vegetation mapping, which was demonstrated to be robust for multi-variable relational analysis in accurate spatial predication. Third, a geo-object-level sample collection method based on prior label transfer from a historically interpreted map was ingeniously designed by combining it with a rule set from local expert knowledge. This provides an effective procedure for rapid vegetation sample selection in mountainous areas. This formalized vegetation mapping strategy is cost-effective for ML-based mapping, where sample collection is difficult. In addition, the mapping results can be further updated via this kind of rapid production.

Conclusions
Fine and accurate vegetation mapping results have important practical needs in the current survey. To realize effective vegetation mapping in mountainous areas, some intelligent technical solutions need to be introduced with the support of multi-source auxiliary data. Therefore, based on the background of the fast and accurate extraction of vegetation cover information in mountain areas via the supervised classification method, this paper combined HSR-RS images and multi-source spatial data to realize a geo-object-based vegetation mapping technical framework. Several technological steps of the procedure were designed with the following three aspects: geo-object extraction considering terrain zoning and constraints, the utilization of multi-source geospatial data influencing the spatial distribution of vegetation and a nonlinear model of a tree-based ML algorithm, and quick sample collection using local expert experience and vegetation information from historical interpretations. The experimental results of Taibai Mountain, China, demonstrated its effectiveness in rapid mapping, where samples from field surveys are not easy to collect. The advantages of the proposed method, compared with traditional vegetation mapping technologies, are mainly reflected in the following two aspects: First of all, although there are mapping units at the geo-object level, this paper focuses on a mountainous area and considers the topographic partition and spatial constraints in the acquisition of refined-accurate geo-object units, which is different from the traditional land cover mapping technology. Second, in the acquisition of samples for ML, we designed a set of intelligent and convenient schemes to automatically collect samples from interpretations of historical maps, which is more beneficial when it is difficult to collect samples in the actual mountain areas. Combined with these two aspects of innovation, we can produce relatively fine and reliable mountain vegetation maps. Moreover, this kind of fine mapping result will serve as an important information reference for the investigation of vegetation in mountain areas.
Based on the above analysis, the proposed methodology can be further studied in the following directions. First, semi-supervised learning making use of unlabeled samples or unreliable labeled samples [66], active learning solving informative sample selection [67,68], and reinforcement learning focusing on iterative improvement of samples and classification accuracy [69] are innovative methods that are worthy of further applying into our framework for acquiring robust learning. Second, the recent deep learning (DL) approaches are becoming pervasive in the domain of RS [24,70]. Integrating DL methods into our technical framework is a potential development direction. One possible way is to design certain DL architectures to fuse multi-source data or replace the RF classifier. Another way is to adopt DL techniques to calibrate our model outputs, as uncertainty often exists in the model parameters, thus resulting in errors in model outputs. We believe that the combination of our proposed technical framework and DL may boost not only the model accuracy but also improve the mapping of vegetation types. Third, more efficient features, such as the corresponding index for a specific vegetation type, and more data sources, such as hyperspectral and LiDAR data, could be introduced [71]. The rules of filtering geo-object-based samples shown in Figure 2 could be further optimized by integrating more domain knowledge, such as spectral libraries and biological laws. Fourthly, we assume the vegetation type of each geo-object is unique in the proposed technical framework. This assumption can be well guaranteed on the basis of the proposed geo-object extraction process in this paper. In other words, the types and characteristics of the land cover are relatively consistent in the spatial scope of each geo-object. However, although we have not made a comprehensive assessment, we believe that there is a certain idealism in this assumption. In an actual experiment, it is possible that there are multiple vegetation types in a geo-object. This can impact the mapping accuracy of our method. We cannot deny this, but we believe that the proportion of this impact will not be too large and can be allowed to exist. Therefore, when this happens, we can further optimize our technical solutions in future work. For example, we can consider the re-decomposition of the geo-object and carry out research works of sub-geo-object mapping similar to sub-pixel mapping. These are challenging problems in our proposed framework that are worthy of further investigation, the solutions to which are expected to yield better results.