1. Introduction
With the rapid advancement of urbanization, there have been an increasing number of environmental challenges, such as air pollution, loss of green spaces, and the urban heat-island effect [
1]. As an essential component of the urban ecosystem, urban forests could play a crucial importance in regulating temperature, improving air quality, and managing urban ecosystems [
2,
3]. However, the function depends on the spatial distribution and variety of urban trees. Therefore, the precise classification of tree species is essential for sustainable development of cities [
4,
5].
The traditional methods for tree species classification are primarily manual, often labor-intensive, inefficient, and influenced by subjective factors [
6,
7]. Satellite remote sensing imagery, such as Landsat and SPOT, has become a valuable data source for large-scale tree surveys due to its advantages in providing comprehensive coverage and short revisit periods [
8,
9,
10]. However, there are also limitations in the classification of tree species, such as low resolution, difficulty in capturing fine features, and a lack of crucial three-dimensional data [
11,
12]. These limitations are not unique to urban areas but are particularly pronounced in such settings with complex spatial patterns.
As a valuable alternative to traditional satellite remote sensing with low- to medium-resolution, unmanned aerial vehicle (UAV)-based remote sensing has been utilized in various fields. They could acquire remote sensing imagery with a higher spatial resolution which could get detailed information about complex urban areas [
13,
14]. In addition, the rapid development of UAV platforms and sensors has enabled the collection of multiple remote sensing data in the same area quickly. The integration of data from various sources allows for a comprehensive characterization of urban trees across multiple dimensions [
11,
15,
16]. Qin et al. [
17] used UAV-based LiDAR, hyperspectral and ultra-high-resolution RGB data to classify 18 tree species in subtropical broadleaf forests. The results revealed that the fusion of multi-source data increases the classification accuracy, with an overall accuracy of 91.8%. Similarly, Wu et al. [
18] combined airborne hyperspectral data and LiDAR data to classify seven tree species in a forest field in Guangxi Province, achieving the highest overall accuracy of 94.68%.
UAV-based RGB data offers distinct advantages in fine tree species classification with higher precision requirements, due to its advantages in low acquisition cost, ease of operation, and high image resolution accurate to decimeter or even centimeter-level precision. Compared with other datasets, these attributes make it highly suitable for extensive surveys of urban tree species on a larger scale [
19,
20,
21]. However, it is important to recognize that RGB imagery has limited spectral information. To maximize the utilization of RGB imagery for urban tree species classification, a comprehensive exploration of their potential features is essential. In a study conducted by Wang et al. [
22], various information was extracted from UAV-based RGB imagery to classify four tree species in forest parks in Xi’an City. This information included spectral data, vegetation morphology parameters, texture details, and vegetation indices. The results exhibited a classification accuracy of 91.3% for urban forests. In a previous study by Feng et al. [
23], six second-order texture features were computed based on UAV visible light imagery. These features, created with distinct window sizes and designed to minimize correlation, substantially enhanced the accuracy of tree species classification. In summary, the utilization of UAV-based RGB imagery demonstrates significant potential for precise and efficient urban tree species classification.
The intricate and heterogeneous nature of urban environments poses a challenge for any single data source to fully capture the complex reality. RGB imagery has inherent limitations in capturing detailed vertical structural information. Consequently, relying only on RGB imagery might not accurately represent the minor variations in different canopy levels and heights [
24]. In addressing the limitations posed by traditional RGB imagery, LiDAR technology emerges as a pivotal solution, offering detailed three-dimensional (3D) surface data. It captures intricate vertical details and enriches our understanding of urban environments [
25,
26,
27,
28]. The fusion of two-dimensional (2D) and 3D information, as achieved through the integration of LiDAR and RGB imagery, proves to be invaluable for the fine classification of tree species [
17,
18,
29,
30]. Deng et al. [
31] achieved the highest classification accuracy of 90.8% by combining airborne laser scanning data with RGB imagery. Ke et al. [
32] employed an object-oriented approach, integrating LiDAR data with QuickBird multispectral images, resulting in a remarkable classification accuracy of 91.6%, which surpassed the accuracy of individual datasets by 20%. Moreover, You et al. [
33] delved into single tree parameter acquisition using UAV-acquired LiDAR data and high-resolution RGB images. Their findings indicated that structural parameters extracted from the combination of these two data types exhibited the highest accuracy levels. However, previous studies related to tree species classification mainly involved multispectral data from satellites or airborne platforms [
34,
35]. There have been limited studies that integrated UAV-based LiDAR and RGB imagery for tree species classification [
36,
37,
38].
Conventional methods for tree species classification primarily consist of pixel-based classification and object-based classification [
39]. Pixel-based classification methods rely on the spectral information within images, enabling them to discern subtle variations within the imagery [
40,
41]. Object-based classification methods merge neighboring image elements into larger objects for classification, enhancing their ability to capture spatial structures and contextual information in images [
6,
18,
42]. However, when it comes to urban environments, both pixel-based classification methods and object-based classification methods face specific challenges. In the case of pixel-based classification methods, processing urban imagery can sometimes result in noticeable salt-and-pepper effects [
22,
43]. In addition, object-based classification methods struggle to accurately identify and extract individual trees as objects in densely forested urban areas. Therefore, current scientific investigations are mainly on tree species classification in plantation forests, and few concentrate on the tree species in complex urban areas. Therefore, to solve the above problems, this paper proposed the Plurality Filling method to combine the advantages of pixel-based classification and object-based methods to improve the accuracy of image classification results.
In this study, seven distinct features of urban tree canopies were extracted from RGB imagery and LiDAR data collected by UAV. Various experiments were applied using the Random Forest classifier, followed by post-processing through the Plurality Filling method. The specific objectives of this research were as follows: (1) to assess the integrated application of UAV-based RGB imagery and LiDAR data for urban tree species classification; (2) to identify the optimal combination of features suitable for classifying urban tree species and assess the impact of different features on the accuracy of tree species classification; (3) to validate the effectiveness of the filling method in enhancing classification accuracy.
4. Discussion
Based on the RGB imagery and LiDAR data acquired by UAV in July, this study compared the impact of different feature combinations on urban tree species classification. The comparison revealed that the HSV color component and its texture features had a significantly positive effect on classification performance. The utilization of the HSV color component and its texture features increased the overall accuracy and Kappa coefficient by 11.33% and 0.13, respectively. The optimal feature combination experiment included almost all features, indicating that the addition of diverse features generally had a positive impact on accuracy improvement [
22]. However, the addition of excessive texture features and vegetation indices led to information redundancy, limiting the improvement in accuracy. Meanwhile, using RGB data alone, the overall classification accuracy is relatively low which is only 55.25%. This suggests that single RGB data are insufficient for the classification of urban tree species. However, the fused RGB and LiDAR features have increased the overall accuracy by 18.49% which is consistent with the results of many studies [
17,
59]. For example, Li et al. [
38] have used an improved algorithm to conduct individual tree species identification based on UAV RGB imagery and LiDAR data. And the results have proved that the combination of RGB and LiDAR data is much more suitable for tree species classification.
The proposed Plurality Filling method for post-processing has significantly improved the classification accuracy of urban tree species. After the post-processing operation, the overall accuracy of all experiments has increased by approximately 10% to 20%. This confirms the effectiveness of the Plurality Filling post-processing in improving classification accuracy. In order to better demonstrate its effectiveness, the conventional window sliding post-processing operation was also conducted which has improved the overall accuracy by an average of 6% across different experiments. In contrast, the proposed Plurality Filling post-processing method in this study has shown more excellent performance. The window sliding method involves sliding a fixed-size window across the image to reduce the salt-and-pepper effect. Typically, this method often yields favorable results. However, the high spatial heterogeneity of urban environments poses a challenge to the effectiveness of this method. In comparison, by assigning values to over-segmented objects, the Plurality Filling post-processing method could optimize irregular objects in complex environments which is much more suitable for tree species classification in urban environments. Therefore, the Plurality Filling post-processing method used in this study exhibits its clear advantages over the window sliding method, particularly in urban landscapes with high spatial heterogeneity.
The RGB-LiDAR fusion data and the Plurality Filling post-processing method have verified their excellent performance in improving the overall accuracy of urban tree species classification, especially in the areas with high canopy density urban trees. To validate their performances, comparative experiments in October were also conducted. The results revealed that the trend of overall accuracy variations was roughly similar in both July and October. The utilization of only RGB data yielded the lowest classification accuracy (OA = 47.70%, Kappa = 0.40). However, the incorporation of LiDAR features has increased the overall accuracy by 19.64%, with a Kappa coefficient improvement of 0.23. Consistent with the results in July, this confirms the effectiveness of integrating UAV-based RGB and LiDAR data in urban tree species classification. The primary distinction between the two months was the significantly higher improvement in accuracy for October when additional vegetation indices were added. Possible reasons may be the larger color variations exhibited by trees during the fall which make the vegetation indexes much more effective [
60]. In contrast, trees are in their most vigorous state in July which could reduce the effectiveness of vegetation indices. Therefore, the addition of redundant vegetation indices in July could lead to redundant information. To verify this, violin plots were drawn based on the different performances of various tree species in the three bands of RGB data and seven vegetation indices for both month. Violin plots for two months are illustrated in
Figure 12 and
Figure 13, revealing that the vegetation indexes have a broader distribution range in October. Furthermore, the median difference in the boxplots of each tree species was greater in October than in July which has further emphasized the larger growth state variations among tree species during the fall.
This study has investigated the excellent performance of fused UAV-based RGB and LiDAR data in urban tree species classification. Yet, there are also some limitations. Various features were extracted from RGB data, but the features extracted from LiDAR were limited. Only one height feature and one intensity feature were extracted, failing to fully utilize the advantages of LiDAR data in three dimensions. Instead, LiDAR data was treated as a supplementary data source to RGB data. Guo et al. [
25] extracted six diversity-related features from LiDAR data to describe forest biodiversity patterns. Listopad et al. [
61] utilized LiDAR elevation data to create new indices characterizing the complexity of forest stand structures. Indeed, we had previously attempted to extract height percentile data and intensity percentile data from LiDAR point clouds. However, due to issues with data quality, a significant number of invalid values were encountered. It is also important to note that Random Forest is unable to compute a classification with databases presenting no data for classification features. In the future, there is a need for in-depth exploration of LiDAR features using higher-quality LiDAR data to fully utilize the advantages in three dimensions. Furthermore, this study only employed the Random Forest classifier for urban tree species classification. In the future, the deep learning method would be explored in the potential performance of RGB and LiDAR data in urban tree species classification.