1. Introduction
Wetlands are transition zones between terrestrial and aquatic ecosystems and are considered one of the three major ecological systems along with forests and oceans [
1]. Moreover, they play a critical role in water conservation, water purification, flood storage, drought resistance, and the protection of biodiversity. Over the past half-century, excessive human activity has had a significant adverse impact on wetland ecosystems [
2], with a large number of wetlands being converted into cropland, fishponds, and construction sites, resulting in a significant reduction in wetland areas [
3,
4]. In addition, the proliferation of croplands and fishponds contributes significantly to the pollution of wetland ecosystems through rivers and groundwater, posing a threat to biodiversity and destroying the natural habitats of wetland species [
5]. As of today, up to 57% of the world’s wetlands have been converted or eliminated, with Asia experiencing the greatest decline in the number of wetlands [
6]. Therefore, it is imperative to accurately comprehend the spatial distribution and change characteristics of wetland vegetation species and ground objects in order to accurately evaluate and take advantage of wetland resources, as well as to provide data support for wetland vegetation restoration technology and research on regional biodiversity and its formation mechanism [
7,
8].
As a primary technical means for regional ecological environment monitoring, satellite remote sensing technology is widely used in extracting wetland data, monitoring dynamic changes, resource surveys, etc. [
9]. Using satellite data from MODIS [
10], WorldView-2 [
11], and ALOS PALAR [
12], scholars have worked extensively on wetland classification. Due to limitations in spectral and spatial resolutions, these research projects in the field of wetland classification were mostly concentrated on the vegetation community or major ground object types. There are still substantial constraints in the classification of wetland vegetation species, making it difficult to manage and assess wetland areas.
In recent years, due to the rapid advancement and popularity of unmanned aerial vehicles (UAV), it has been possible to provide technical support for the detailed management and assessment of these ecological environments [
13,
14]. UAVs are widely utilized for monitoring ecological environments due to their low cost, simple operation, and minimal dependence on landing and takeoff sites and weather conditions [
15]. In addition, UAVs are also capable of acquiring multi-angle and high spatial resolution remote sensing data according to specific user requirements, which compensates for the application limitations of satellite images [
16,
17,
18]. However, the spatial resolution of images can have a significant impact on several UAV-related studies, including fractional vegetation cover evaluation [
19], vegetation species identification [
20], disease detection [
21], etc. Changes in spatial resolution will result in a difference in the expression of information content, which causes spatial scale effects for pertinent results. Acquiring images with different spatial resolutions through resampling is the major method used in the present study on the spatial scale effect of remote sensing. To explore the impact of spatial resolution on classification results, many researchers have mimicked the image acquisition of UAV platforms at various heights using resampling techniques such as pixel aggregation and cubic interpolation algorithms. The findings suggested that the effect of spatial resolution on classification accuracy was related to the mixed-pixel effect in addition to the nature of per-pixel classification [
22]. More pixels were mixed up when spatial resolution decreased. However, the spatial resolution of the UAV image is not necessarily better the higher it is [
23]. Although increasing the spatial resolution of the UAV image will not necessarily increase classification accuracy, it will cost more and present more difficulties. In cases where there is excessively high spatial resolution and rich image information of the UAV image, certain vegetation features (such as shadows, gaps, etc.), may also be captured, resulting in a more complex image and a reduction in the classification accuracy [
24]. Additionally, in ultra-high spatial resolution images, the difference between spectral and texture features of the same vegetation species or ground objects becomes larger, while that of the different vegetation or ground objects becomes smaller. As a result, it makes it more challenging to capture unique spectral or texture features for the classification model [
25]. Therefore, recent research has focused on how to balance spatial resolution and image feature data while effectively identifying vegetation species and ground objects. However, the images obtained only by resampling will bring uncertainty to the spatial scale effect of remote sensing, thus affecting the assessment of the classification accuracy of wetland vegetation species and ground objects.
The classification accuracy of vegetation species and ground objects is directly influenced by image data sources as well as classification methods. At present, the classification methods of vegetation species and ground objects in remote sensing images are primarily pixel-based and object-based. Using pixel-based image analysis technology, land cover features are extracted from individual pixels or from adjacent pixels and are classified accordingly. It should be noted that since pixel-based analysis technology does not take the spatial or texture information of pixels into consideration, the classification of ultra-high spatial resolution images results in the “pepper and salt” phenomenon [
26,
27]. Geographic object-based image analysis (GEOBIA) technology combines raster units with the same semantic information into an object, which contains information on texture, spectrum, position, and geometry features. Information extraction is carried out following the creation of classification rules by utilizing the feature information [
28]. According to previous research results, it was evident that the classification accuracy of object-based methods was significantly higher than that of pixel-based methods [
29,
30]. In light of the abundance of wetland vegetation species and the high fragmentation of ground objects, object-based machine learning algorithms are currently one of the most effective tools for the classification of wetland vegetation species and ground objects. Nevertheless, the current research on the classification of wetland vegetation species and ground objects is primarily focused on the comparison of classification algorithms; however, insufficient research has been conducted on how classification algorithms respond to the spatial resolution of images.
In order to solve the aforementioned problems, this study obtained aerial images (Am) and resampled images (An) with different spatial resolutions by employing a UAV platform and classified wetland vegetation species and ground objects based on GEOBIA in addition to four machine learning classifiers (random forest (RF), support vector machine (SVM), K-nearest neighbor (KNN), Bayes) for the Am and the An. The main objectives of this study were: (1) to determine the optimal segmentation scale parameters for the Am and the An at different spatial scales; (2) to examine the variation law of feature variables between different images; (3) to reveal the scale effects of the Am and the An on the classification of wetland vegetation species and ground objects; (4) to determine the optimal spatial resolution image required to identify different wetland vegetation species and ground objects.
4. Discussion
As an effective tool for the classification of remote sensing images, GEOBIA has been widely used in the classification of wetland vegetation species and ground objects [
58]. In using GEOBIA, image segmentation is the first key step. Previous studies have shown that the FNEA multi-resolution segmentation algorithm is one of the most popular image segmentation algorithms used for the identification of wetland vegetation species and ground objects. Appropriate segmentation parameters directly impact the patch size of the generated object as well as the extraction accuracy of actual vegetation species and ground objects. Therefore, it is crucial to determine the optimal segmentation parameter value for the identification of wetland vegetation species and ground objects [
32]. The FNEA multi-resolution segmentation technology utilized three main parameters: shape, compactness, and SP. Changes in the SP had a greater impact on the quality of segmentation results than changes in the shape and compactness [
59]. In this study, after determining the shape and compactness parameters of 0.2 and 0.5, respectively, through trial and error, the SP was selected by the ESP2 tool in order to overcome the subjective influence of human perception on the results. However, the current research on the optimal SP is only for a specific spatial resolution image, and there are still some deficiencies in the research on the response of aerial images and resampled images with different spatial resolutions. The results of this study demonstrated that a higher spatial resolution corresponded to a longer segmentation time as well as a greater optimal SP value (
Figure 8). This was because higher spatial resolutions resulted in larger amounts of data and longer computer processing times for images. Thus, it is imperative that the effectiveness of image processing be taken into account in future studies instead of excessively focusing on spatial resolution while gathering UAV aerial images. Even at the same spatial resolution, the optimal SP value for the An was somewhat bigger than that for the Am. In light of the fact that the internal heterogeneity of the image in the An was lower as a result of resampling, an increase in the SP value was capable of producing segmentation results that were comparable to those of the Am. As a result of the combination of the ideal segmentation parameters, all types of vegetation species and ground objects were separated, and each object within an image was relatively close to the natural boundaries of vegetation species and ground objects, which was acceptable for further processing.
In GEOBIA, feature selection is the second key step after image segmentation. Due to the limited spectral resolution of UAV-RGB images and the serious confusion of the spectra of various wetland vegetation species, this study employed vegetation indexes, texture features, position features, and geometric features as a means to compensate for the lack of spectral information. However, having too many feature variables might cause data redundancy and overfitting, which leads to a reduction in the classification accuracy of the results. Therefore, this study used the feature space optimization tool in addition to the MDA method for feature optimization in order to improve the processing efficiency of high-dimensional data and to calculate the importance of each feature variable. As demonstrated in previous research, the importance of the vegetation index was highest, while the importance of the geometric features was lowest [
41,
60]. However, a sizeable portion of the important evaluation in this study was accounted for by position features, specifically X center and Y max. It was possible that this was due to the geographically constrained nature of the research area selected for this study, which magnified the significance of X center and Y max. In other words, the addition of X center and Y max was more conducive to improving the classification accuracy when classifying wetlands in small areas. The importance of each geometric feature was generally low. Thus, geometric features cannot be arbitrarily added in future studies. When comparing the importance of each vegetation index and texture feature between the Am and the An, red, blue, EXG, and GLCM mean (0) were found to have relatively high importance, suggesting that when classifying wetland vegetation species and ground objects, these feature variables should be taken into consideration first. The results showed that there were significant differences between the importance of some feature variables between the Am and the An, and the max_diff feature in the An was more important than the Am in spectral features. Moreover, the results also indicated that the GLDV contrast (135), GLCM dissimilarity (90), and GLCM correlation (90) feature in the An were more important than the Am in texture features, which was possibly one of the reasons for the higher classification accuracy of the An than the Am in the final classification results.
RF, SVM, KNN, and Bayes are machine learning classifiers commonly utilized for image classification, and this study evaluated the performance of these classifiers in the classification of wetland vegetation species and ground objects by OA and kappa coefficients. As demonstrated in previous research results, different classifiers functioned differently, with RF classifiers generally performing the best [
55]. Accordingly, the classification of wetland vegetation species and ground objects should prioritize RF classifiers in future research. This study focused on exploring the responses of these four classifiers to the spatial resolution of images, and the results indicated that the trends of OA and kappa coefficients in the Am and the An were relatively the same in the cases of various classifiers. In the case where the spatial resolution was lower than 2.9 cm, the OA and kappa coefficients decreased significantly (
Table 7 and
Table 8), which was due to the increasing number of mixed pixels caused by the decreasing spatial resolution. A pixel might contain information of multiple classes, and the classification result of the pixel was related to the proportion of classes in the mixed pixel. Small and fragmented classes were easily replaced by other large and uniform classes. The edges of the patches were more prone to increase commission errors or omission errors [
61]. However, a higher spatial resolution did not necessarily mean a better image [
24]. For example, the classification accuracy of 1.2, 1.8, and 2.4 cm images was lower than that of 2.9 cm images, because wetland vegetation species and ground objects had specific physical sizes, and spatial resolution above a certain threshold was not conducive to the identification of vegetation species and ground objects [
62,
63]. Although providing detailed information regarding vegetation species and ground objects, ultra-high spatial resolution images were obviously harmful in enhancing the phenomenon of different spectra of the same vegetation species or ground objects. Consequently, this also increased the difficulty of identifying vegetation species and ground objects. In addition, the ultra-high spatial resolution caused multiple super-impositions of information, which greatly reduced the processing efficiency of the images [
62,
64]. In future research, it is unnecessary to relentlessly strive for a spatial resolution that is better than the threshold value, and the flight altitude of UAVs may also improve the operational efficiency by covering a larger area, thus ensuring the maximum classification accuracy. It was also shown that the overall classification accuracy of the An was higher than that of the Am, probably because the images obtained by pixel aggregation resampling contained fewer disparities in spectral and textural features among homogeneous vegetation species or ground objects and more disparities in spectral and textural features among heterogeneous vegetation species or ground objects as compared with the corresponding aerial images [
20,
65]. Resampled images were therefore more useful for identifying wetland vegetation species and ground objects in the spatial resolution range of 1.2~5.9 cm than aerial images.
Based on the RF classifier, the spatial scale effects of each vegetation species and ground object in classification were explored, which has a good reference for selecting the best resolution image to identify wetland vegetation species and ground objects. In this study, the UA, PA, and AA of the RF classifier were calculated for each vegetation species and ground object in the Am and the An (
Figure 12 and
Figure 14), and the results indicated that water exhibited an accurate and stable identification accuracy in the Am and the An, which may be due to the lesser degree of heterogeneity among water objects formed after multi-resolution segmentation by FNEA. Moreover, water objects differed significantly from other objects, resulting in an easier extraction of water in wetland ecosystems.
Figure 12 demonstrated that higher spatial resolution and more informative images, such as the 1.2~2.9 cm images, were needed if the PA of hyacinth, mixed forest, and construction was to be improved in the Am. The UA of duckweed was significantly lower than that of other classes in the Am, which was caused by the fact that duckweed was primarily distributed in the northern part of the study area, where the vegetation species were highly fragmented. Moreover, duckweed was heavily mixed with hyacinth, and the duckweed parcels had irregular shape and size in the multi-scale images, resulting in the level of identification accuracy for duckweed being the lowest. In the An, mixed grass exhibited the lowest PA, and mixed forest exhibited a higher PA. Because mixed grass tended to form mixed pixels with the surrounding edge areas of mixed forest, mixed grass was easily incorrectly classified as mixed forest, thus, reducing the identification accuracy of mixed grass. In addition, the optimal spatial resolution required for the extraction of certain vegetation species or ground objects varied in the case of the Am and the An. In the Am, some vegetation species and ground objects (e.g., lotus, hyacinth, and bare) exhibited the highest AA in the 1.8 cm image, and some vegetation species (such as mixed forest, mixed grass, and duckweed) exhibited the highest AA in the 2.4 and 2.9 cm images. The AA of most vegetation species and ground objects changed regularly in the An as a result of the influence of the spatial resolution, which was possible because the An was resampled from a single image by pixel aggregation and the imaging mechanism of each scale was quite similar. In contrast, the Am was acquired by UAV flight at various altitudes in the field, which was easily affected by wind speed, light, and other disturbing factors during flight, resulting in some variability in image information at different scales. In the An, the 1.8~2.4 cm images with detailed characteristics also induced noise in the identification procedure of wetland vegetation species and ground objects, while the image element mixing phenomenon was common in the 3.6~5.9 cm images, which meant it was not possible to accurately distinguish wetland vegetation species or ground objects. The 2.9 cm image in the An was therefore less noisy and suitable enough for the purpose of distinguishing vegetation species and ground objects from the diverse wetland environment. In future studies, the designation of images with optimal spatial resolutions is crucial to obtain the ideal classification results for vegetation species and ground objects.