1. Introduction
Wetlands are precious natural resources with ecological values such as carbon sequestration [
1], water purification [
2], and soil conservation [
3]. Therefore, it enjoys the reputation as “the kidney of the earth” [
4]. Karst landscapes have become one of the wetland types with significant research value because of their unique hydrological and geological conditions, and karst wetlands are an important wetland type widely distributed in karst areas. The Huixian Karst International Important Wetland (referred to as the Huixian Karst Wetland) is located in the core of the East Asian Karst region, the third largest karst region in the world, and is currently the largest karst wetland in China [
5]. The Huixian Karst Wetland covers a complete vegetation sequence of terrestrial vegetation–wet vegetation–water-holding vegetation–submerged vegetation and floating vegetation and is the largest and most representative karst wetland in the world’s subtropical peaks and plains. Clarifying vegetation distribution patterns within wetlands is one of the core elements of wetland plant diversity research, and vegetation regeneration succession and spatial pattern distribution play an important role in wetland ecosystem stability and biodiversity [
6]. Therefore, it is important to have an accurate and detailed classification of wetland vegetation communities. However, due to the high spatial and temporal homogeneity of wetlands [
7] and the wide variation in plant species types and vegetation structure, the fine classification of wetland vegetation is a challenging task. Remote sensing (RS) systems offer frequent Earth observation datasets with diverse characteristics and extensive coverage [
8]. This makes them highly appealing for monitoring changes in wetland dynamics at both local and global scales [
9]. Moreover, RS has been proven to enhance the accuracy of wetland classification and change detection [
10]. However, traditional remote sensing platforms such as satellites or aviation are characterized by high cost, vulnerability to weather, and low temporal and spatial resolution, and their applications still face many difficulties. To change this situation, low-altitude unmanned aerial vehicle (UAV) remote sensing technology has become a cost-effective alternative for the flexible acquisition of ultra-high spatial and temporal resolution remote sensing images and has been widely used in wetland vegetation classification and mapping [
11,
12], tree species identification [
13], biomass estimation [
14], and vegetation cover estimation [
15]. In addition, it has developed as an important tool for fine classification and monitoring wetland vegetation communities [
16].
Different vegetation has different physiological information and phenological characteristics [
17], and timely and accurate information about the unique canopy layer in different growing periods is essential for the fine classification of vegetation communities. However, due to the high spectral similarity and poor separability of the wetland plant canopy, it is often not possible to achieve better classification results by using only single-time-phase remote sensing images. The integration of multi-temporal remote sensing imagery can adequately capture physiological information and physical differences among features and has been demonstrated to achieve higher classification accuracy than that produced by single-temporal imagery [
18,
19]. Van et al. [
20] used a partial least squares random forest (PLS-RF) algorithm based on satellite multispectral imagery to assess the separability of six wetland vegetation communities in the subtropical coastal zone over four seasons and showed that the combined images of the three temporal phases of autumn, winter, and spring produced the highest mean overall classification accuracy (OA) (86 ± 3.1%) compared to mono-temporal classification. Kollert et al. [
21] used Sentinel-2 multispectral images of the spring, summer, and autumn seasons for tree species classification based on the random forest algorithm (RF) and found that the combined images of the three temporal phases achieved a higher classification accuracy than any single temporal image, with an OA of 84.4%. The above studies illustrate that the integration of multi-temporal data can achieve complementary information and thus improve vegetation classification accuracy. However, the critical periods of vegetation growth include budding, flowering, and maturation, and not every cycle of images can provide useful information, and images using all-time sequences may not achieve the best classification performance [
22]. Piaser et al. [
23] used Sentinel-2 images to classify aquatic vegetation in temperate wetlands and assessed the classification accuracy by eight machine learning algorithms, and the results showed that the classification scenario covering the widest temporal range (April–November) achieved better classification results than the scenario covering the narrow temporal range scenario (May–October, June–September, and July–August) to achieve better classification results. Macintyre [
24] used multi-temporal (spring, summer, autumn, and winter) Sentinel-2 imagery for wetland vegetation classification and showed that spring imagery achieved the highest classification accuracy among the mono-temporal classification scenarios, and the multi-temporal feature dataset of combined autumn and spring imagery produced the best results among all classification scenarios. The above study demonstrates that multi-temporal remote sensing imagery has great potential in the field of wetland vegetation classification and that the combination of images with different growth period temporal phases can affect the final classification results of wetland vegetation. Therefore, in this study, we used unmanned aerial vehicle (UAV) images from different growth periods as the data source to evaluate the differences in vegetation community classification effectiveness among different growth period images. We aimed to explore suitable growth period image features for distinguishing various vegetation communities.
Previous research has confirmed that compared to the traditional pixel-based classification methods, combining the Object-Based Image Analysis (OBIA) approach with high-spatial-resolution remote sensing images can better capture spatial relationships between land features and distinguish land cover types with similar spectral characteristics [
25], resulting in more accurate wetland vegetation classification results [
26]. OBIA first uses a segmentation algorithm to split the image into a number of uniform pixel aggregates and then fully exploits the semantic features of the image [
27], which are set as inputs in the machine learning algorithm for object classification. Machine learning algorithms have the advantages of strong model generalization [
28] and can automate the processing of large amounts of data [
29], and joint OBIA and machine learning algorithms have become one of the important methods for fine classification of wetland vegetation in recent years [
30,
31,
32]. Fu et al. [
12] combined the object-based RF-DT algorithm and UAV RGB images to classify wetland vegetation communities with an overall classification accuracy of more than 85%. Zhou et al. [
33] used RGB imagery for wetland vegetation classification and evaluated ten OBIA scenarios using five machine learning algorithms (Bayesian, K-nearest neighbor (KNN), SVM, decision tree (DT), and RF), and the results showed that the UAV-based RGB imagery combined with OBIA achieved high accuracy of wetland vegetation classification (OA of 90.73%). The above studies indicated that object-based machine learning algorithms have prospective applications and accurate classification performance in the field of wetland vegetation classification. However, utilizing only a single classifier often suffers from a degree of instability and overfitting [
34], and is susceptible to the quality of the dataset and field sample data. Stacking Ensemble Learning (SEL) as a combinatorial strategy can integrate the advantages of multiple classifiers to form a complementary advantage, thus achieving better classification results. The Stacking mechanism involves training base models in a hierarchical manner and then training a combiner (also known as a meta-classifier) that generates the final predictions based on the predictions of the base models. It has been proven to have good classification performance and generalization ability [
35]. Cai et al. [
36] used multi-temporal Sentinel-1 and Sentinel-2 images combined with the Stacking algorithm to extract vegetation information in the Dongting Lake wetlands, and the study demonstrated that the overall accuracy and Kappa coefficient (92.46% and 0.92) of object-based Stacking were higher than those of single classifiers (SVM, RF, and KNN). This effectively demonstrated the superiority of the object-based Stacking algorithm over individual classifiers in vegetation classification in highly heterogeneous areas. The above study shows that Stacking algorithms with multi-classifier integration have better performance in wetland classification. However, there is still a lack of relevant research to demonstrate the suitability of multi-temporal UAV-RGB images combined with an object-based Stacking algorithm for karst wetland vegetation classification. Therefore, this paper used OBIA to stack three machine learning models to construct the SEL model and compared the differences in classification effects of the four classifiers to demonstrate the stability and generalization ability of the Stacking model.
Machine learning models need to achieve high accuracy in predicting target attributes based on a large number of input features, which makes the model operation and its influencing factors difficult to explain and is known as a “black-box” model [
37]. In order to explain the influence of input features on target variables, it is necessary to systematically analyze feature attribution. Several studies have used importance evaluation methods such as linear regression [
38] and random forest [
39] to explain shallow machine learning model results; such methods can quickly filter out features that have a large impact on model performance and provide a visual understanding of the contribution of each feature in the model. However, these methods have difficulty explaining the negative contributions of the input feature variables to the model. Meanwhile, it is difficult to provide consistent explanations for complex models. Shapley Additive explanations (SHAP) are based on the concept of Shapley values in game theory to explain the attribution of features to machine learning algorithms, which can be interpreted for different types of models and rationalize the predictions [
40]. SHAP provides detailed explanations of input variables in terms of the model calculation process and output results, which can provide local and global explanations of the influence process and output results. It offers both local and global explanations for the influencing process and output results, effectively addressing the limitations of traditional feature importance analysis methods. However, the explanation of feature variables and models by SHAP has not been applied to this research domain of wetland vegetation classification. Therefore, this paper uses the SHAP method to analyze the classification model and multi-temporal image data to reveal the complex details of how different input features affect the local and global scale outputs, and to explore the best image feature variables for fine classification of each wetland vegetation community.
To fill the above research gaps, this study focuses on vegetation communities mapping in the Huixian Karst International Important Wetland in Guilin City, southern China by combining Object-Based Image Analysis (OBIA) with the SEL algorithm using multi-growth-period UAV images. The study aims to clarify the impact of combinations of different-phase unmanned aerial vehicles (UAVs) RGB images on classifying vegetation communities. The main contributions of this study are to:
We quantitatively evaluated the differences in the classification accuracy of single-growth-period UAV RGB images for vegetation communities mapping and explored the appropriate growth period images and their sensitive feature bands for distinguishing each vegetation community in a karst wetland.
We stacked an ensemble learning classification model using three machine learning algorithms (RF, XGBoost, and CatBoost) and examined its stability and generalization ability for vegetation communities’ classification using different growth period image combinations.
We explored the classification performance of different growth period UAV image combinations and quantitatively evaluated the effect of different growth period scenarios on vegetation communities mapping.
We interpreted the contributors of local and global feature variables to classify karst wetland vegetation communities using the Shapley Additive explanations (SHAP) method for black-box stacking model outputs and extracted the sensitive image features for distinguishing each vegetation community in different growth periods.
4. Discussion
Previous studies have demonstrated that integrating images from different growth periods can improve the classification accuracy of wetland vegetation [
36,
56]. In this study, we used ensemble machine learning algorithms and collected images of the spring, summer, and autumn growth periods of karst wetland vegetation. Subsequently, we combined six different growth period classification scenarios to assess accuracy. The research results indicated that in the single-growth-period classification scenario, the vegetation images from the summer growth period achieved the highest classification accuracy. The overall classification accuracy was 0.62% higher than that of the spring growth period scenario and 1.47% higher than that of the autumn growth period scenario. The classification effect of autumn growth period images was inferior to that of spring growth period images, which was consistent with the findings of [
57]. In the combined multi-growth-period classification scenario, the use of vegetation images from the spring, summer, and autumn growth periods achieved the highest classification accuracy (OA of 93.37%). The OA was 1.01% higher than the combined spring and summer growth period scenario and 0.33% higher than the combined summer and autumn growth period scenario, indicating that integrating vegetation images from the spring, summer, and autumn growth periods could improve vegetation classification accuracy and achieve superior classification results. However, due to the significant spectral differences among karst wetland vegetation and the spatial homogeneity of vegetation communities in different growth periods, using only a single-growth-period image was usually unable to achieve the highest classification accuracy for all vegetation communities. In this paper, we found that there was a certain degree of confusion between CP and LC in the vegetation summer growth period images. However, combining the three growth period images substantially alleviated this confusion. The F1 score of KR in the autumn growth period was 81.6%, but employing a classification scenario that combined three growth period images significantly improved its classification accuracy (the F1 score increased by 0.18). The combined multi-growth scenario had the most notable effect on improving BO and LC improvement, resulting in a 0.15 increase in the F1 score. LT achieved the highest classification accuracy (F1 score of 0.9) in the scenario that used a combination of vegetation summer and autumn images, indicating that the combined summer and autumn images were the most suitable periods for LT classification. BG and KR achieved the highest classification accuracy (F1 score reached 1) when combined with three growth period images. These findings further supported that multi-temporal image leverages the physiological information and phenological characteristics of vegetation to enhance classification accuracy through the complementary information provided by different images. These conclusions provided theoretical support for vegetation classification using multi-temporal images and were an important guide for the field of wetland classification.
Due to the significant spatial variability of karst wetland vegetation in different phases, relying on a single machine learning algorithm was often unable to achieve satisfactory classification results in all temporal phases. In this paper, we utilized OBIA with ultra-high resolution UAV images and stacked RF, XGBoost, and CatBoost models to form a SEL model. The classification performance and adaptability of the model were evaluated across six different growth periods of vegetation communities. The qualitative results demonstrated that the Stacking algorithm could more accurately delineate the boundary between BO and LC and reduce the confusion between AG and KR to some extent. It significantly improved the misclassification and omission between different classes, leading to superior classification outcomes. Quantitative results indicated that the ranking of the classification performance of the four algorithms for karst wetland vegetation communities was as follows: Stacking > CatBoost > RF > XGBoost. The Stacking algorithm achieved the highest OA among the five classification scenarios, especially in all the scenarios of multi-growth combinations, which achieved high accuracy in classification. It also showed that the SEL algorithm was suitable for image combinations of different temporal phases and achieved the highest classification accuracy in most growth periods. Comparing the classification accuracy of the four algorithms for different vegetation communities, it was evident that the Stacking algorithm could improve the classification accuracy of HK, RI, KR, LC, CP, and HM, with the most significant improvement observed for KR (F1 score increased from 0.818 to 0.94). Additionally, stacking significantly reduced the occurrence of large errors in other classifiers for LT, KR, and CP, exhibiting a more stable classification effect. This highlighted the Stacking algorithm’s capability to enhance the classification performance of various vegetation communities and achieve higher accuracy. However, due to the poor separability of wetland vegetation and the complexity of the classes, all four algorithms inevitably misclassified vegetation communities to some extent. This reflected the challenge of machine learning models in identifying classes with similar characteristics during model training. However, the Stacking algorithm consistently outperformed the other three algorithms in terms of classification. This was attributed to the Stacking algorithm enriching the training data by stacking, which allows the meta model to improve the prediction accuracy using heterogeneous structures [
58]. In this study, we used the highest average accuracy base model as the meta model, which may have certain limitations. In future research, we will further explore the use of an adaptive stacking approach to enhance the robustness of the model. Furthermore, the thinking of SEL models holds potential for application in the retrieval of physiological parameters of wetland vegetation, such as the canopy carbon content, chlorophyll content, and leaf area index (LAI), as well as for maintaining wetland ecosystem diversity. Although deep learning algorithms have gained popularity in wetland vegetation classification [
59], they are limited to pixel-based semantic segmentation [
60]. Moreover, training deep learning models requires large quantities of labeled data, resulting in increased costs and computational requirements [
61]. In addition, generating labeled data for training deep learning models can be challenging due to the complexity and scattered distribution of wetland vegetation types. Moreover, complex terrain can significantly affect the classification accuracy of deep learning algorithms [
62]. In future research, we will attempt to use deep learning algorithms for wetland vegetation classification in karst areas and explore the differences in classification performance compared to machine learning algorithms.
This study found that SHAP values not only enable the interpretability of machine learning models but also provide an importance analysis for complex models considered “black boxes”. Furthermore, it offered local and global explanations for the positive or negative contributions and magnitude of feature variables in wetland vegetation classification. These findings are consistent with previous studies such as [
63,
64]. This study assessed the contribution of different features to the classification of different vegetation communities at the local scale. The results showed that in the summer growing season, COM, NGBDI, and VDVI were the three features that contributed the most to the overall classification of the SEL model. In the combined three growth periods, July NGBDI and July IPCA were identified as the features that made the largest overall contribution to the classification. In addition, different land cover types exhibited varying sensitivities to different spectral bands during different growth periods. For instance, in the summer growing periods, BG displayed higher sensitivity to COM in vegetation indexes, while BG was most sensitive to summer NGBDI in the combined three growth periods. Similarly, BO exhibited high sensitivity to VDVI in the summer growth period, while it was relatively sensitive to summer IPCA in the combined three growth periods. In addition, the texture features Homog and Standard Deviation had high importance in LC identification. By integrating the above-mentioned sensitive spectral bands, important theoretical support can be provided for subsequent wetland vegetation community classification. On a global scale, this study evaluated the contributions of different feature bands to the classification results. The results indicated that texture features in the summer growth period were the most important features for the Stacking algorithm in identifying wetland vegetation communities in karst areas. The vegetation index (COM) was also considered a highly important feature. The combination of the three-growth periods reveals that the Mean Red texture feature in the spring growing period made a substantial contribution to the classification of wetland vegetation communities. Furthermore, the analysis of importance rankings and proportions of different feature types revealed that as the number of features increased, the influence of vegetation indexes and texture features on the classification gradually diminished, while spectral features, geometry features, and position features became increasingly important for wetland vegetation identification. The above experimental results indicated that COM and NGBDI contribute significantly to the overall wetland vegetation identification, while the importance of spectral features, position features, and geometry features was relatively lower. The results of this study provided the most influential features for the classification of karst wetland vegetation, improving classification efficiency and achieving high-precision fine classification. By demonstrating the sensitivity of various species to different features, more accurate and efficient technical guidance can be provided for wetland vegetation conservation and planning.