1. Introduction
The global ecosystem has been seriously threatened with the increase of greenhouse gas emissions, which will also have a devastating impact on biodiversity [
1,
2]. The forest is the largest ecosystem on the land, providing a wide range of economic and social benefits for humans through various forest products. The forest is also an important carbon sink system, reducing the concentration of greenhouse gases in the atmosphere [
3,
4]. Although forests have brought huge benefits, the problem of deforestation still exists, and the global forest area is declining [
5,
6,
7]. Therefore, afforestation has been used as an effective strategy to alleviate global warming, especially in mountainous and hilly areas with little economic value. It is important to assess afforestation in these regions. Manual checking of surviving trees is laborious and costly. With the wide application of UAV remote sensing in agriculture and forestry, different airborne equipment can be used to obtain ultrahigh spatial resolution data [
8,
9,
10,
11,
12]. The efficiency of vegetation monitoring and environmental parameter acquisition could be improved and the cost of agriculture and forestry production reduced [
13,
14]. The image visualization technology can be used to efficiently assess afforestation. Recognition of individual tree crowns (ITCs) has been the key method of afforestation assessment in the complex environment [
15,
16,
17,
18].
Most UAV-based studies have focused on plant detection and counting. The common method is to use different algorithms to generate the corresponding data structure according to the data characteristics of the airborne equipment, and analyze the data to distinguish the number and location of plants. The spectral, spatial, and contextual features of aerial images taken at different locations are different, and the methods used to extract image features for identifying ITCs are different. Color is the most important feature for ITC recognition, and different color indexes are usually used to enhance the recognition effect. The most commonly used color indexes exist in the red, green, and blue (RGB) color space, which have different efficiency in identifying plants under various soil, residue, and light conditions. In a study by Loris, green grass comparison was used to compare and test the measured data of ten different grassland management methods, and the results showed that the G-R index of aerial imagery was highly correlated with the ground truth [
19,
20]. By evaluating the application of high-resolution UAV digital photos in vegetation remote sensing, the Normalized Green–Red Difference Index (NGRDI) was used to reduce the changes caused by the difference of irradiance and exposure, and to explore the prediction of vegetation reflectance through the scattering model of arbitrarily inclined leaves [
21]. After studying the optical detection of plants, and discussing the color distribution of plants and soil in digital images in detail, various color indexes used to identify vegetation and background were calculated, and several color coordinate indexes were used to successfully distinguish weeds and non-plant surfaces [
22]. The correlation coefficient between the spectral reflectance and sugar beet diseases was the highest in the visible light band [
23]. Long-term practice has shown that the use of UAV to quickly evaluate forest regeneration can help reduce the cost of investigation. Through experiments, using a sampling-based consumer camera shooting method, a decision rule for seedling detection based on simple RGB and red edge (RE) image processing can be established. The results have shown that drones can use RGB cameras to detect conifer seedlings [
24]. According to the study, convolutional neural network (CNN), hyper-spectral UAV images and a watershed algorithm [
25], object clustering extraction method, etc. were mainly applied in structured environments, and are difficult to apply in complex environment. In the case of high weed coverage, the multi-scale retinex (MSR) method can enhance the distinction between foreground and background, and the average accuracy of the results obtained by a fruit tree SVM segmentation model is 85.27% ± 9.43% [
16]. Not only that, the Normalized Excessive Green Index was used to indicate the green vegetation in UAV automatic urban monitoring [
26], and a variety of indices were also used to segment vegetation, estimate biomass, and guide precision agricultural production [
27,
28,
29,
30]. All these studies have shown that the spectral characteristics of UAV images can be used to initially classify vegetation.
At the same time, the International Commission on Illumination (CIE) L*a*b* color space is also often used to identify vegetation and soil [
31]. In L*a*b*, the extraction accuracy of vegetation coverage is further improved by applying a Gaussian mixture model [
32,
33,
34]. K-means and L*a*b* color space have been used to detect individual trees from the oblique imaging taken by UAV in cities [
35]. XGBoost classification algorithm was used to solve the identification problem of interlacing orchard canopy [
15]. Dynamic segmentation, Hough transform, and generalized linear model have been widely used in missing plant detection [
36,
37]. Computer-unified device architecture (CUDA) speeds up the calculation speed in the process of finding the largest canopy blob area [
38], and light detection and ranging (LiDAR) point cloud analysis method also plays an important role [
39]. Mathematical morphology also plays an important role, combined with elevation information to realize automatic counting and geographic positioning of olive crowns [
40]. Multispectral images can accurately measure the structure of litchi trees [
41]. In addition, the airborne multispectral laser scanning technology improves the recognition accuracy of individual trees and dominant trees [
42,
43,
44]. Deep learning further improves segmentation accuracy [
45,
46], and is widely used in tree crown contour recognition—using the convolutional neural network Mask RCNN algorithm for crown contour identification and rendering, accuracy reached 0.91 [
47]. The detection of economic tree species in high-resolution remote sensing images has always been a consideration. Although palm crowns in the study area often overlap, with the support of large-scale multi-class samples collected, the detection accuracy of oil palm trees based on the deep convolutional neural network DCNN has reached 92–97% [
48]. The tree crown in simple scenes can also be automatically extracted from UAV images using neural networks to estimate biomass [
49] and tree species classification [
50].
This paper proposes a simple and effective method to detect and count tree crowns, multi-radius extraction and fusion based on a chromatic mapping method (MREFCM). Based on the statistical results of the transformation of multiple color indexes as a reference, the crown morphology extraction method was tested to obtain the optimized extraction parameters and a multi-scale extraction fusion method was defined, including a multi-radius closed filter and the fusion of multiple extraction results. The aims of the study were: (1) to determine whether the RF classifier shows good performance in crown recognition; (2) to test if the inclusion of two texture features improves the classification accuracy of tree crown; (3) to compare the proposed method with other crown extraction methods in a complex environment to verify the performance.
This study also intended to accurately extract tree crowns as well as further improve the recognition effect of ITCs and improve the efficiency of monitoring afforestation. In order to better adapt to the various crown forms of broad-leaved seedlings in new afforestation, this study used the combination of fusion extraction of multiple radius filters and an RF was used to recognize ITCs. The proposed method can be used as a supplement to the research on ITC recognition using color index technology.
2. Materials and Methods
2.1. Study Area
The study area is located in the west of Xichuan County, Nanyang City, Henan Province, China, 110°58′ E–111°53′ E, 32°55′ N–33°23′ N (
Figure 1). The mountains in the county are widely distributed, extending from northwest to southeast, and the altitude gradually decreases, which belongs to the monsoon climate zone of transition from the northern subtropical to warm temperate zone, including Dashiqiao Township, Maotang Township, Xihua Township, and Taohe Township, with a total area of 2820 km
2. The annual average temperature is 15.8 °C, and the annual rainfall is between 391.3 mm and 1423.7 mm. Two adjacent woodlands on both sides of the ridge in Dashiqiao Township were selected as the test sites for tree crown extraction and recognition. The sites are pure forests of
Koelreuteria paniculata Laxm, where the crown diameters are between 20–70 cm. As a common tree species for afforestation in China, the selection of this study area is helpful to understand the effect of mountain afforestation. It should be noted that the weed coverage between the trees varies greatly, and the interspersed bushes are very lush, some of which are larger than the adjacent crown. The average altitude of the study area is 390 m, with an area of 5900 m
2 and 5840 m
2, respectively. An overall image of the study sites is shown in
Figure 1.
2.2. Data Sources
The images were collected in June 2018 using DJI Phantom4 UAV (Shenzhen Dajiang Innovation Technology Co., Ltd., Shenzhen, China). The image acquisition component under the quadrotor supports GPS/GLONASS dual-mode positioning system, and the 3-axis gimbal stabilization provides a steady platform for the attached CMOS digital camera (Sony Exmor R CMOS 1/2.3-inch).
The flying height of the UAV is 20 m and 35 m, and the ground sampling distance is 0.7 cm and 1.3 cm, respectively, verifying the applicability of the proposed algorithm to images of different resolutions. These digital images are stored in the red–green–blue (RGB) color space with JPEG format. Using this color space to store natural scenes is conducive to transforming each channel of RGB to obtain more specific features in the image [
51]. The specifications of drones and airborne cameras are shown in
Table 1.
A total of 180 images were randomly selected for training and 168 images for testing, and the test dataset comprised 2418 trees. The crown size of broad-leaved seedlings varies greatly (20–70 cm), therefore, in order to improve the recognition effect, the crown training area in the training set was manually marked as a rectangle of 40 × 40~100 × 100 pixels with a size interval of 20 pixels. At the same time, the soil has no clear “individual” boundary and clear size, so only training samples of 60 × 60 pixels were collected from the soil. A total of 1400 areas were randomly selected to form the training set, including 600 positive samples of trees, 500 negative samples of weeds, 200 negative samples of shrubs, and 100 negative samples of soil (some samples include shadows). The color features and texture features of samples were used to build feature data pool, which was used to build RF model.
Figure 2 shows some image samples in the training set under different conditions.
The images were classified according to the fraction vegetation coverage (FVC), flight height, and brightness. FVC is the percentage of the total image in the HSI color space where the hue is in [π/2, π]. It shows the growth of vegetation in the image, which is used to measure the weed coverage under the forest, and indicates the interference degree of tree crown extraction. The classification of FVC is as follows: LFVC (low FVC), h < 45%; MFVC (middle FVC), 45% ≤ h < 60%; HFVC (high FVC), h ≥ 60%, where h is the ratio of the vegetation to the image area. Two flight altitudes were selected: 35 m and 20 m. Two kinds of brightness intensities were also selected (BI): insufficient brightness (IB) and sufficient brightness (SB), IB means I < 0.3 in HSI color space and SB means I ≥ 0.3. The images of the test set were calculated and grouped according to the above classification criteria, as shown in
Table 2. FVC and BI in the table are the average values of the corresponding groups. Details can be found in the literature [
16].
2.3. Framework of Research
At present, there are many studies that have introduced ITC recognition. For the detailed introduction of color index and morphological filtering, please refer to the literature [
16,
17,
22,
24].
Figure 3, which is a summarized form of the methodological flowchart, shows the three main stages of the proposed method in this paper: in the vegetation index selection stage, the most suitable color mapping method was selected by comparing the statistical results of the vegetation index. Then, the multi-radius extraction and fusion method was used to obtain the potential tree crown regions, namely the region of interest (RoI). Finally, RF was used to identify the extracted regions and fusion was performed selectively according to the distance.
2.4. Spectral Index Selection
As the first step of image processing, the images collected by UAV were preprocessed with an appropriate spectral index to obtain better extraction results, a process that was confirmed in previous studies [
27,
52]. The effect of the spectral index depends on the spectral characteristics of the dataset; the four sample images of different scenes, as shown in
Figure A1a–d (
Appendix A), were selected from the training set. These images were selected by intuitive judgment of vegetation coverage through manual vision, so the time cost was negligible.
Figure A1a–d shows the original images of different scenes.
Figure A1(a1–a7,b1–b7,c1–c7,d1–d7) is the transformation results of
Figure A1a–d, respectively. It can be seen intuitively that there were great differences in the transformation effects of the seven spectral indexes. We manually drew the foreground (tree crown) and background of the four sample images and used the spectral indexes shown in
Table 3 for color mapping. Using statistical methods, it was determined that the most suitable spectral index for this study was GRDI.
2.5. Potential Tree Crown Extraction
The image was first transformed using GRDI to enhance the separation of foreground and background, which means that the crowns in weeds were enhanced and the preliminary range of tree crowns could be obtained. As shown in
Figure A1(a2–d2), there were still some foreground and background features that were not separated after GRDI transformation. In order to separate trees from background better, morphological methods were needed for further processing. Considering that the images of the afforestation area were different in terms of the shooting height, background weed density and illumination intensity, the radii of the closed filters were selected as 3, 5, 7, and 9 after the experiment on the sample images.
Firstly, the Otsu method was used to segment the gray image after GRDI transform, and then four closed filters with different radii were used to process the image to eliminate the small gaps in the tree crown area, which may have been due to the different reflectance of leaves in the tree crown. Then, the hole filling method was used to eliminate the interference caused by the large holes in the image, which may have been caused by the branches of the tree crown expanding in different directions. Next, an open filter with a radius of 10 was used to remove the interference caused by isolated spots and small branches. In addition, there were still some little fragments in the image, usually because some weeds had similar spectral characteristics with the tree crown, so the region exclusion method with a threshold of 200 pixels was used to eliminate them, and the minimum crown area exceeded 400 pixels. Four extraction results of the same image could be obtained by using the above series processing method. It was obvious that the four extracted maps collectively represented the multi-scale structure of the target forest. In order to reduce the loss of crown edge information caused by morphological processing, a convex hull algorithm was used to fill the edges of extraction regions to improve the accuracy of crown recognition. Finally, four extraction results were fused to obtain more accurate crown position.
Given a large radius extraction map and a small radius extraction map, which contained the results of coarse segments and fine segments, one or more small radius extraction maps could be superimposed by a large radius extraction map, but this was not neatly suitable for the latter. In this study, the coarse segment and its corresponding fine segment were fused according to the following steps: (1) All fine segments with more than half of the area covered by the corresponding coarse segment were searched. (2) If the number of corresponding fine segments was larger than that of coarse segments, the coarse segments were replaced with all the selected fine segments; otherwise, we kept the coarse segments. (3) We kept all coarse segments that did not contain fine segments. The four extraction maps were fused according to the radius of the closed filter from large to small. The entire extraction process is shown in
Table 4.
2.6. Features for RoIs Recognition
To improve the accuracy of extraction regions classification and tree counting, the second part of this study included the use of effective color and texture features to discriminate trees, bare soil, weeds, and bushes in the extracted region.
The first category is color feature, the values of which were obtained by calculating the mean and variance of the color components in the RGB and L*a*b* color spaces. Since the L component in the L*a*b* color space reflects the illumination intensity of the image, it was not used as the source of the feature to eliminate the interference caused by the illumination intensity. Therefore, the color feature of samples was a 10-dimensional vector composed of the mean and variance of R, G, B, a*, and b* components.
The second category is the grey-level co-occurrence matrix (GLCM), which is a matrix that describes the gray-level relationship between pixels in an image and adjacent pixels or pixels within a certain distance, and can be used to identify tree species at high resolution [
53]. It can reflect the gray level of the local area or the entire area of the image. The comprehensive information about the direction, the adjacent interval, and the amplitude of change is the basis for analyzing the gray level change and texture characteristics of the image, and usually describes the texture features quantitatively according to different statistics of GLCM [
54]. Thus, the GLCMs of these samples in the feature pool were calculated, and five texture statistics of the GLCMs were calculated. Where and
j represent the element row and column subscripts of GLCMs.
Energy: it is the sum of squares of the gray level co-occurrence matrix elements, which reflects the uniformity of gray-level distribution and texture thickness of the image. Equation (1) can be used for calculation.
Contrast: Equation (2) reflects the clarity of the image and the depth of the texture groove. The deeper the texture groove, the greater the contrast and the clearer the visual effect.
Correlation: it reflects the similarity degree of gray level co-occurrence matrix elements in row or column direction, the equation is shown in (3).
where
and
are the mean and variance of
, and
and
are the mean and variance of
, respectively.
Entropy: it represents the degree of non-uniformity or complexity of the texture in the image, which is formulated as:
where
k indicates the gray scale of the image, which is usually represented by 16 levels.
Homogeneity: it reflects the homogeneity of image texture and measures the local change of image texture, which can be mathematically defined as follows:
Therefore, the texture features of extracted regions are composed of GLCM statistics in , , and directions, which constitute a vector of 20 dimensions.
The third category is local binary pattern (LBP), which can be calculated using Equation (6):
where
is the center pixel,
is the gray value of the center pixel,
is the gray value of the adjacent pixels of the center pixel,
is a sign function, and the expression is
LBP is an operator used to describe the local texture features of an image. In the uniform pattern, 59 dimensional features vector can be calculated according to reference [
55], which has the advantages of rotation invariance and gray invariance.
2.7. RF Model and Accuracy Evaluation
In order to distinguish whether the extracted regions are tree crowns or not, an RF recognition model based on the above 89 dimensional features was established. RF is a machine learning method widely used in classification, regression, and other tasks. In this study, a five-fold cross-validation method was used to evaluate the reliability of the model, that is, the sample of 1400 training set images were divided into five subsets, one single subset was retained as the test set, and the other four subsets were used for training. This process was completed circularly until all subsets were used as the test set once.
In order to evaluate its recognition effect, the number of trees in the study area was first determined, which was done manually. As the ground truth of the study area, compared with the machine recognition results of the RF model, its performance determines the accuracy of the same tree crown fusion in the next step. In order to quantitatively evaluate the recognition effect, the following indicators were used.
Precision: it means the probability of actual positive samples among all predicted positive samples. The precision can be calculated using Equation (8).
where true positive (
TP) is the number of correctly identified tree crowns, on the contrary, false positive (
FP) is the number of mistakenly identified tree crowns.
Recall: refers to the probability of being predicted as a positive sample in the actual positive sample, the equation for this is shown in (9).
where false negative (
FN) is defined as the number of tree crowns not recognized by the model.
F1 score: this is the weighted harmonic average of precision and recall, as shown in (10).
2.8. Repeat Count Region Fusion
The above-mentioned multi-scale region extraction method enhances the ability to distinguish potential regions of crowns, but it should be noted that the shape of the deciduous tree crown does not present a regular umbrella shape, specifically, the branches in the larger tree crown are often divided by shadows into separate areas, so some branches are similar to individual trees. Moreover, the difference in reflectivity of the branches may cause multiple extractions of the same crown. These factors all reduce the counting accuracy.
As shown in
Figure 1, the crown diameters in the test sites were obviously smaller than the distance between trees, so the average distance could be used to measure whether different extraction regions belonged to the same crown to eliminate multiple counting.
Therefore, after the tree canopy was identified by the RF method, the distance between all the identified tree canopies and their nearest neighbors could be calculated to obtain the average distance between the tree canopies, and finally the threshold
L was set to half of the average distance. The regions whose centroid spacing was less than the threshold
L calculated by (11) were regarded as the same crown and fused.
where
n is the current tree crown number,
s is the nearest neighbor tree crown number of
n, and
m is the total number of identified tree crowns.