Towards Amazon Forest Restoration: Automatic Detection of Species from UAV Imagery

Precise assessments of forest species’ composition help analyze biodiversity patterns, estimate wood stocks, and improve carbon stock estimates. Therefore, the objective of this work was to evaluate the use of high-resolution images obtained from Unmanned Aerial Vehicle (UAV) for the identification of forest species in areas of forest regeneration in the Amazon. For this purpose, convolutional neural networks (CNN) were trained using the Keras–Tensorflow package with the faster_rcnn_inception_v2_pets model. Samples of six forest species were used to train CNN. From these, attempts were made with the number of thresholds, which is the cutoff value of the function; any value below this output is considered 0, and values above are treated as an output 1; that is, values above the value stipulated in the Threshold are considered as identified species. The results showed that the reduction in the threshold decreases the accuracy of identification, as well as the overlap of the polygons of species identification. However, in comparison with the data collected in the field, it was observed that there exists a high correlation between the trees identified by the CNN and those observed in the plots. The statistical metrics used to validate the classification results showed that CNN are able to identify species with accuracy above 90%. Based on our results, which demonstrate good accuracy and precision in the identification of species, we conclude that convolutional neural networks are an effective tool in classifying objects from UAV images.


Introduction
Optical sensors coupled with Unmanned Aerial Vehicles (UAVs; also referred to as drones) are commonly used to acquire geometric characteristics of forests, such as canopy height and diameter. In addition to quantitative information, these sensors can generate qualitative attributes about forest environments, enhancing the possibility of characterizing forest species. However, for such a characterization, solely depending on traditional image processing tools might not be an effective strategy at all times [1,2]. Therefore, additional statistical tests, programming tools, and implementation workflows are deemed necessary for processing these images. In this context, Convolutional Neural Networks (CNN) have been widely used in detecting diverse objects [3][4][5][6][7], especially while characterizing forest environments [8][9][10][11][12][13][14].
CNN are artificial intelligence algorithms based on multilayer (feed-forward) neural networks, typically with up to 20 or 30 layers, and are distinguished from other neural networks due to their superior performance with object classification, detection, and

Image Acquisition and Pre-Processing
The images were acquired by a UAV DJI Phantom 4 Pro, using a RGB camera; the main characteristics are listed in Table 1. Images were obtained in November 2017, 2018, and 2019, always from 8 am to 3 pm, taking advantage of solar lighting and reduced shading in images.
We processed images in the Pix4D mapper software to make the orthomosaic of the areas. No tonal corrections or contrasts were made on the raw images or final mosaics. The species were chosen according to the number of individuals available for sample acquisition, and samples were selected only from plots where forest inventories were conducted in the field, in order to select the species from the UAV images correctly.
The major tree species considered are as follows: Cecropia juranyiana C.Mart (Urticaceae) is a pioneer species widely found in forest areas that have been anthropized and in secondary forests. It has an average height of 15 m, and foliolate leaves with up to 14 leaflets [39]. Hymenaea courbaril L. (Fabaceae) is a semicaducifolia tree, up to 35 m in height and 80 cm in diameter, with compound leaves with 2 leaflets and a broad crown [40,41]. Bauhinia acreana Harms (Fabaceae) is a species with a broad crown with bilobed leaves and pod dehiscent fruit [42]. Anacardium occidentale L. (Anacardiaceae) is a tree up to 10 m tall and 40 cm in diameter, with a wide crown and simple petiolate leaves, which

Image Acquisition and Pre-Processing
The images were acquired by a UAV DJI Phantom 4 Pro, using a RGB camera; the main characteristics are listed in Table 1. Images were obtained in November 2017, 2018, and 2019, always from 8 am to 3 pm, taking advantage of solar lighting and reduced shading in images. We processed images in the Pix4D mapper software to make the orthomosaic of the areas. No tonal corrections or contrasts were made on the raw images or final mosaics.
The species were chosen according to the number of individuals available for sample acquisition, and samples were selected only from plots where forest inventories were conducted in the field, in order to select the species from the UAV images correctly.
The major tree species considered are as follows: Cecropia juranyiana C.Mart (Urticaceae) is a pioneer species widely found in forest areas that have been anthropized and in secondary forests. It has an average height of 15 m, and foliolate leaves with up to 14 leaflets [39]. Hymenaea courbaril L. (Fabaceae) is a semicaducifolia tree, up to 35 m in height and 80 cm in diameter, with compound leaves with 2 leaflets and a broad crown [40,41]. Bauhinia acreana Harms (Fabaceae) is a species with a broad crown with bilobed leaves and pod dehiscent fruit [42]. Anacardium occidentale L. (Anacardiaceae) is a tree up to 10 m tall and 40 cm in diameter, with a wide crown and simple petiolate leaves, which can change their coloration according to the time of year, with pink Remote Sens. 2021, 13, 2627 4 of 15 younger leaves and green for mature leaves [43,44]. Handroanthus serratifolius (Vahl) S. Grooe (Bignoneaceae) is a tree up to 30 m tall, 80 cm in diameter, with palmate leaves, with 5-7 lobed and a serrated margin. In the flowering months, the species remains only with flowers [44,45]. Anadenanthera sp. Speg (Fabaceae) reaches about 25 m in height, with compound leaves [44,46]

Convolutional Neural Networks Training
A total of 683 trees (samples) were inserted in the LabelImg software [47], and this made it possible to select the region of interest using polygons with rectangular and square shapes (subsamples). We obtained 2437 subsamples ( Table 2). It was necessary to select subsamples from the same tree, and hence, in a single tree, the largest possible number of samples was selected, depending on canopy size. (Figure 2).

Convolutional Neural Networks Training
A total of 683 trees (samples) were inserted in the LabelImg software [47], and this made it possible to select the region of interest using polygons with rectangular and square shapes (subsamples). We obtained 2437 subsamples ( Table 2). It was necessary to select subsamples from the same tree, and hence, in a single tree, the largest possible number of samples was selected, depending on canopy size. (Figure 2).  The samples here can change size, but the CNN makes the same adjustment for all the inserted images, even if they are of different sizes; the neural network makes the identification using all the parameters referring to the selected samples, texture, contrast, color The samples here can change size, but the CNN makes the same adjustment for all the inserted images, even if they are of different sizes; the neural network makes the identification using all the parameters referring to the selected samples, texture, contrast, color intensity, brightness, patterns of brightness and darkness, and numbers and arrangements of brightness and darkness are minimized in the internal layers of the neural network; that is, the samples are standardized internally aiming at mitigating these factors. Regardless of the sample size, the transformations that the CNN make internally are presented in Figure 3. From the characteristics presented (shape, texture, brightness, contrast, etc.), it is possible to distinguish between the species under study in the network.
Remote Sens. 2021, 13, x FOR PEER REVIEW 5 of 15 intensity, brightness, patterns of brightness and darkness, and numbers and arrangements of brightness and darkness are minimized in the internal layers of the neural network; that is, the samples are standardized internally aiming at mitigating these factors. Regardless of the sample size, the transformations that the CNN make internally are presented in Figure 3. From the characteristics presented (shape, texture, brightness, contrast, etc.), it is possible to distinguish between the species under study in the network. At the end of the sample selection process, the software provides a file in .csv format, containing filename, height, width, class, xmin, xmax, ymin, and ymax [47]. From the Labelimg .csv file, it was possible to create two sets of samples randomly: Training (70%; 1706 samples) and Test (30%; 731 samples); these samples formed from the Training and Test images were fed into the CNN as input parameters. For validation, 487 images were separated from the training samples to be applied in the validation, guaranteeing that the model did not present overfitting ( Table 2).
We used the Python computational language to implement the convolutional neural networks with the Keras-TensorFlow package. This open-source system from Google uses and implements deep learning algorithms of neural networks. The faster_rcnn_in-ception_v2_pets model was utilized from this package, which was modified to train target species samples.
The neural network learns the patterns from the input data by reading the input data set and applying different computations to it. However, the neural network does not just do this once; it learns repeatedly using the input data set and also the results of previous tests. Each step in learning from the input data set is called an epoch. That is, an epoch refers to one cycle in the entire training data set [48,49]. Initially, CNN were trained with a large number of epochs or steps (iterations) to ensure that the smallest loss would be within that step range. After the first training, we determined an ideal number of steps to obtain the least loss to optimize the analyses and repetitions that would be performed; this test served fundamentally to know how many epochs would be necessary for the final At the end of the sample selection process, the software provides a file in .csv format, containing filename, height, width, class, xmin, xmax, ymin, and ymax [47]. From the Labelimg .csv file, it was possible to create two sets of samples randomly: Training (70%; 1706 samples) and Test (30%; 731 samples); these samples formed from the Training and Test images were fed into the CNN as input parameters. For validation, 487 images were separated from the training samples to be applied in the validation, guaranteeing that the model did not present overfitting ( Table 2).
We used the Python computational language to implement the convolutional neural networks with the Keras-TensorFlow package. This open-source system from Google uses and implements deep learning algorithms of neural networks. The faster_rcnn_inception_v2_pets model was utilized from this package, which was modified to train target species samples.
The neural network learns the patterns from the input data by reading the input data set and applying different computations to it. However, the neural network does not just do this once; it learns repeatedly using the input data set and also the results of previous tests. Each step in learning from the input data set is called an epoch. That is, an epoch refers to one cycle in the entire training data set [48,49]. Initially, CNN were trained with a large number of epochs or steps (iterations) to ensure that the smallest loss would be within that step range. After the first training, we determined an ideal number of steps to obtain the least loss to optimize the analyses and repetitions that would be performed; this test served fundamentally to know how many epochs would be necessary for the final model fit; this step was essential because an excessive number of epochs leads the model to overfitting, i.e., in this case, it will present results with very good statistical metrics, but with erroneous identification of species.
Therefore, tests were performed by reducing or increasing thresholds that are functions that limit the results of the CNN output. Values equal to 1 suggest maximum identification in the model outputs, which can lead to the identification of species that are not recognized in the field, and values of 0 suggest no identification of any species, so manual adjustment of the Threshold is essential for fine-tuning the model output; this will help prevent outputs from presenting results that will be considered false-positive. After initial training, we observed that modifying the threshold resulted in non-recognition of some species or false recognition. This led to a training model with variation in the number of species and thresholds. All results were compared with field data to obtain a model that best reflected real conditions of the studied area. Herein, thresholds modify the output according to input limits. The network receives inputs, then applies a linear combination, and, if that combination is greater or less than the specified limit value, it will produce an output of 1 or 0, respectively. Therefore, results with low probabilities would be rejected. Equation 1 represents the threshold function, where Sigma (∑) is the sum of input (x) and weight (w) pairs.

Classification Validation
Validation of the classifications was performed on the dataset for the test. For this, statistical metrics were used to evaluate and test the performance of the adjusted CNN. The metrics are specified in Table 3, of which are calculated according to the results of the classifications. The Kappa index is a measure of agreement used in nominal scales that gives us an idea of how far the observations deviate from those expected, at random, thus indicating to us how legitimate the interpretations are. It measures the percentage of the data values on the main diagonal of the table and then adjusts these values for the amount of agreement that might be expected [50]. Accuracy is an index that reflects the rate at which individuals are correctly classified into the category containing their true score. Ranking accuracy is usually attributed to the appropriateness and validity of your decisions based on the obtained score. A large value for the index indicates a high hit rate of individuals in the correct categories, and a low value indicates a lower rate of correct classification of individuals [51]. Adjusted F-score AGF = Use all confusion matrix elements and provide more weights to samples that are correctly classified in the lowest class.

Statistical Index Equation Description
Reference It ranges from 0 to 1, where: 0-no correct ratings 1-perfect classification [54] Where, ACCoverall: relative compliance observed between classifiers; RACCoverall: hypothetical probability compliance using the observed data to calculate the probabilities of each classifier to identify each category randomly; TP: true positive; TN: true negative; FP: false positive; FN: false negative; PPV: positive predictive value; TPR: true positive rate; NPV: negative predictive value; TNR: true negative rate.
The adjusted F-score is an index that groups all elements of the original confusion matrix and gives more weight to correctly classified patterns in correctly classified classes [52]. The Matthews Correlation Coefficient has a range from −1 to 1, and when it has a value of −1, it indicates a completely wrong binary classifier, while 1 indicates a completely correct binary classifier. This index allows you to evaluate the performance of your classification model [53], whereas similarity is a metric ranging from 0 (wrong classification) to 1 (perfect classification) calculated from the averages of the classifications of the classes of interest [54].

Results
From initial tests using the CNN, the training with six species and a varying threshold, we observed that an increase or reduction in threshold parameter interfered in species' characterization accuracy. In this sense, with this parameter equal to 0.9, the accuracy of species' recognition rates was higher; however, in this case, only two species were identified. When the same value was reduced to 0.8, the network identified three species; the same happened for 0.7 and 0.6 as well ( Figure 4). Decreasing threshold reduced identification accuracy, as well as the overlap of species' identification polygons. However, when comparing the results obtained by the neural network and field data, we observed a relationship between the trees identified by CNN and those observed in the field. Moreover, we observed more classification uncertainty in the areas with more shadows or overlapping of the canopies compared to species with higher canopies and without shade.  Figure 5). From this first training, we observed that recognition of only one species, two species, three species, four species, five species, and six species required 494, 541, 892, 976, 993, and 1042 epochs, respectively. Additionally, on changing the threshold, and including more species (characteristics), the network training time was relatively increased, making this process more time-consuming.    Figure 6 shows loss variation in the validation of the adjusted model using all species. This model eventually converged to its lowest loss value with 439 epochs, demonstrating that with this amount, it is possible to identify the six species used for training in the UAV images.  This model eventually converged to its lowest loss value with 439 epochs, demonstrating that with this amount, it is possible to identify the six species used for training in the UAV images.  Figure 6 shows loss variation in the validation of the adjusted model using all species. This model eventually converged to its lowest loss value with 439 epochs, demonstrating that with this amount, it is possible to identify the six species used for training in the UAV images.  Figure 7 shows the tree location in the plots and the model's functionality in locating trees. We observed that trees with larger canopies are more easily detected when compared with the smaller ones. False recognition is associated with overlapping crowns or similarities between leaf characteristics. In genus identification by the model, the results were more accurate for trees with a smaller canopy circumference. The model did not identify or erroneously classify species whose canopies were of less dense foliage. Canopies with leaves with higher contrast (brightness, color) than the others were identified  Figure 7 shows the tree location in the plots and the model's functionality in locating trees. We observed that trees with larger canopies are more easily detected when compared with the smaller ones. False recognition is associated with overlapping crowns or similarities between leaf characteristics. In genus identification by the model, the results were more accurate for trees with a smaller canopy circumference. The model did not identify or erroneously classify species whose canopies were of less dense foliage. Canopies with leaves with higher contrast (brightness, color) than the others were identified with greater accuracy. However, we observed some trees with discrepancies concerning the foliage or canopy of the same genre, which had a higher error rate in the classification.
We observed that the training with a threshold of 0.7 provided satisfactory results concerning processing time and species recognition. Therefore, the same value was utilized to calculate the confusion matrix-with all the six species. The overall accuracy achieved was 91.80%. Nevertheless, there occurred false recognition of the characteristics in all species; the biggest classification error (false positive) was found among C. juranyiana and H. serratifolius with B. acreana, in which there was a false characterization in 26 occasions (Table 4). with greater accuracy. However, we observed some trees with discrepancies concerning the foliage or canopy of the same genre, which had a higher error rate in the classification. We observed that the training with a threshold of 0.7 provided satisfactory results concerning processing time and species recognition. Therefore, the same value was utilized to calculate the confusion matrix-with all the six species. The overall accuracy achieved was 91.80%. Nevertheless, there occurred false recognition of the characteristics  Table 5 shows all the statistical indexes used to validate classifications. The Kappa index obtained was 0.9006, which indicates an adequate classification. We verified results close to this one in Accuracy, adjusted F-score, and Similarity. However, with respect to Matthews' correlation, the species B. acreana and H. serratifolius obtained a strong classifica-tion, while for the other species, this index was very strong, just as in Sensitivity the same species obtained results below 0.9.

Discussion
The recognition of forest species in the field or using remote sensing techniques is a major challenge for forestry researchers. In our study, particularities related to species' canopy and leaf structure (dendrometric characteristics) were found decisive for achieving satisfiable accuracy of predictive models. The results found in this study demonstrate good accuracy in identifying species using UAV imagery in this regard. All species studied obtained promising results in their identification.
The species Cecropia and Bauhinia have the smallest canopy perimeters of the species under study. However, statistical indexes showed results above 0.9. This is because their canopy has unevenness concerning branches and leaves, and, as reported in [55], the canopy characteristics directly influence the classifier. H. courbaril and Anadenanthera sp. have smaller leaves and larger canopies, standing out among the others. A. occidentale and H. serratifolius have more leaves and canopies that stand out in the forest, and, for this reason, the classifier was efficiently able to distinguish them from the others.
Compared to experiments conducted in environments with fewer tree species, such as in [56], the classification metrics were higher (97.80%-Global accuracy) in this study (91.80%-Global accuracy), as there was less chance of confusion in species' characteristics. Only limited studies have reported their experiences in object detection concerning the selection of the threshold. Freudenberg et al. (2019) [57] used this parameter with a value of 0.5 to detect palm trees, resulting in an F-score ranging from 0.875 to 0.957; these values are close to those observed in our study. In this context, Xiong et al. (2020) [58] compared sensor height with threshold variation in predicting dendrometry variables. The authors concluded that with an increase in sensor distance to the target object, it was necessary to retrain the CNN with an increasing threshold, thus obtaining better results in accuracy and error. In Wang et al. (year) [59], the authors reported that there is no predetermined value of this threshold and argued that the most appropriate value is dependent directly on the database and computational method used.
When analyzing the classifications with different thresholds, our results showed an overlap of CNN detection polygons similar to those obtained by Sarabia et al. (year) [60] and Xiong et al. (2020) [61]. In these studies, the authors reported issues with the interpretation of edges of trees to neighbors; a reduction in threshold value was found crucial to notice this uncertainty. However, this factor did not influence correct species identification; it only changed the visualization of the identification polygon and reduced the value related to the probability of species characterization. Regarding the number of epochs, previous studies suggest that this value is proportional to the number of characteristics inserted in the model [3,25,31,32].
The statistical metrics used to determine model accuracy showed very strong correlations for species detections. However, the results of B. acreana demonstrate that the existence of few samples of a given species results in lower values of Accuracy, Adjusted F-score, Matthews correlation, Sensitivity, and Similarity (Table 5). On the other hand, H. courbaril had fewer samples than C. juranyiana and obtained better results, mainly related to reduced size of the canopies and leaves of C. juranyiana compared to other species. Because it had long branches and leaves only at the tips, the canopies of neighboring trees intersected in these branches, causing detection inaccuracy and false species characterization. This characteristic was also reported by Wagner et al. (2019) [62], who reported accuracy results similar to our study when comparing a consortium of species in the Atlantic Forest biome, which have similar physiognomic characteristics to the Amazon biome.
Matthews' Similarity and Correlation are classification metrics considered to be less biased as they use multiple input variables; they incorporate both the set imbalance and the amount of data referring to classes [34]. In this regard, the inclusion of species caused lower values of those metrics in comparison to Accuracy and the F-score since they have bias related to the number of characteristics, and, specifically, the F-score does not depend on the true negatives (TN). The high similarity observed here demonstrates that the classifier can identify true negatives, just as the high sensitivity indicates that the classifier was able to identify true positives [35].
UAVs generally operate at low altitudes and can acquire images in better spatial resolutions than satellite images, providing more detailed information not only about the forest but also at the species level. However, the methodology presented here requires analysis, processing, and model adjustment to classify the species. Regarding costs, operations with UAVs are more feasible compared to traditional airborne imaging or high-resolution acquisition of orbital sensors. Thus, it is possible to acquire images more frequently, favoring periodic monitoring of the forest in the process of restoration and the study of its dynamics [63].

Conclusions
Convolutional neural network training using the faster_rcnn_inception_v2_pets model provided satisfactory results in identifying species characteristics. It proved to be an effective tool in classifying objects in UAV images without the need to modify image properties. In this sense, CNN training demonstrated good accuracy and precision in identifying the species under study; however, species insertion made the training timeconsuming. Moreover, we noticed that there exists a higher classification error when species' canopies overlap each other, along with shading in images. The quality of UAV images was observed to be an essential component required for obtaining species' characteristic patterns and for network training. It also impacts the frequency for obtaining the images, which expands the number of samples to be acquired. In our case, the number of samples used in CNN training was found sufficient to recognize forest species' characteristics and intrinsic patterns.
The identification of individual trees provides subsidies for new information on forest restoration ecosystems, and we hope our study encourages future researchers to develop methodologies to obtain information regarding distinctions in forest characterization concerning the seasons, vegetation phenology during the months of the year, atmospheric conditions, and other sensors, adding more information to the models.