Detection of Fir Trees ( Abies sibirica ) Damaged by the Bark Beetle in Unmanned Aerial Vehicle Images with Deep Learning

Invasion of the Polygraphus proximus Blandford bark beetle causes catastrophic damage to forests with firs (Abies sibirica Ledeb) in Russia, especially in Central Siberia. Determining tree damage stage based on the shape, texture and colour of tree crown in unmanned aerial vehicle (UAV) images could help to assess forest health in a faster and cheaper way. However, this task is challenging since (i) fir trees at different damage stages coexist and overlap in the canopy, (ii) the distribution of fir trees in nature is irregular and hence distinguishing between different crowns is hard, even for the human eye. Motivated by the latest advances in computer vision and machine learning, this work proposes a two-stage solution: In a first stage, we built a detection strategy that finds the regions of the input UAV image that are more likely to contain a crown, in the second stage, we developed a new convolutional neural network (CNN) architecture that predicts the fir tree damage stage in each candidate region. Our experiments show that the proposed approach shows satisfactory results on UAV Red, Green, Blue (RGB) images of forest areas in the state nature reserve “Stolby” (Krasnoyarsk, Russia).


Introduction
Taiga and Boreal forests play an important role in the global climate through the carbon, water and energy balance (Bonan [1]).In Russia, fir forests provide multiple provisioning, regulating and cultural ecosystem services and are considered a national asset of the country.Despite forests are a renewable resource, in vast areas of the world, forest degradation is too high and not compensated by regeneration (Hansen et al. [2]).One of the main problems of Russian forest degradation is the spatial spread acceleration of the four-eyed fir bark beetle (Polygraphus proximus Blandford) invasion that causes fast death of fir trees (Abies sibirica Ledeb) in various forest ecosystems.The geographic origin of P. proximus propagation comes from its natural habitat in Japan, the Korean Peninsula, Eastern China and the Russian Far East (Khabarovsk and Primorskii krai, Sakhalin and Kuril Islands) (Kuznetsov et al. [3].In Siberia, the first beetle occurrence was registered in 2008 (Kerchev [4]) and within the last 10 years, massive outbreaks of P. proximus occurred on large forested areas in the south-eastern part of the West Siberian Plain: Tomsk, Kemerovo and Novosibirsk oblasts, Altai region, in the Altai Republic as well as in Krasnoyarsk region (Pashenova et al. and Baranchikov et al. [5,6]), where the problem became catastrophic and got out of control.The invasion of this type of beetle can lead not only to degradation of fir forests but also to create a threat to fir existence as a forest type species, with the subsequent broad implications for the regional and global climate (Helbig et al., Ma [7,8]).
Death of fir trees due to P. proximus outbreaks occurs after several stages.The beetle usually attacks trunks of weakened trees, fallen deadwood and newly harvested wood.In case of massive outbreaks, P. proximus also attacks healthy trees, which can resist against the beetle attack during 2-3 years.Penetration of P. proximus under tree bark results in pervasion and reproduction of ophiostomatoid fungi (different species) and their phytopathogenic activity leads to the gradual weakening of fir trees.As the beetle further colonizes the tree, it begins to dry out.Increase of beetles quantity in a local forest stand leads to a massive death of fir trees.Usually, fir trees die within 2-4 years from the moment of first beetle attack, so the task of the beetle invasion monitoring is really important and poses a challenge to develop a remote survey and early-warning system for precise estimation of forest damage states.
Advances in Earth remote sensing techniques, particularly very high resolution satellite and airborne imagery, open the possibility to develop tools for regional mapping of consequences of forest pest activities.Nowadays, application of unmanned aerial vehicles (UAV) tends to be more popular in local scale forestry research because of better spatial resolution (Lehmann et al. [9]).Also, UAV data provides the basis for development of new methods in data analysis which can be applied later at larger scales (satellite data).Recently, application of deep learning, in particular convolutional neural networks (CNN), in processing of colour (RGB) images of the Earth surface from various data sources have provided high accuracy results in recognition of different plant species [10][11][12][13][14].However, there are still no published studies on the application of CNN methods to very high resolution imagery for detection of forest health decline caused by pest invasions.The goal of this study was to test the possibilities of neural networks as a new approach to detect bark beetle outbreaks in fir forests.In particular, our aim was to develop and test a CNN method to automatically detect individual fir trees disturbed by P. proximus in very high resolution imagery of mixed forests.
The main contributions of this paper can be listed as follows: • As far as we know, this is the first work in addressing the problem of forest damage detection caused by the P. proximus beetle in very high resolution images from UAVs with deep learning.

•
We built a new labelled dataset of orthoimages with four categories of damage stages in fir trees.

•
We designed a new CNN architecture to accurately classify trees in UAV images into different damage categories of trees and compared it to the most powerful CNN models in the stat.

•
We developed the detection model as follows.First, a new detection method selects the candidate regions that contain trees in UAV images.Then, these candidate regions are processed by the multiclass model to finally predict the category of tree damage.

•
We provide a complete description of the used methodology and the source code so that it can be replicated by other researchers to detect and classify tree damage by bark beetle in other images.
The rest of the paper is organized as follows.An introduction to Deep learning and CNNs is presented in Section 2, which includes a description of the CNN models and of the auxiliary techniques of transfer learning and data augmentation.The related works on tree classification are provided in Section 3. The study area and data acquisition are presented in Section 4. The proposed methodology is presented in Section 5, which includes a definition of fir trees damage categories, pre-processing dataset and data augmentation techniques of sample patches for training CNN models, description of the development of the classification model algorithm and creation of a data subset for independent model verification.Section 6 presents the obtained results and data analysis.Discussion and conclusions are provided in Section 7.

Introduction to Deep learning and CNNs
In this Section, we make a brief introduction to deep learning and particularly to CNN models (Section 2.1), including the two main approaches used to improve the learning of CNN: that is, transfer learning and data augmentation (Section 2.2).
Deep Neural Networks (DNNs) are a subset of machine learning algorithms that learn from a first set of data to make predictions on new data.Unlike most machine learning algorithms, DNNs are capable of extracting the existent patterns from data automatically, without the need of external hand-crafted features.Their computation is organized into layers; each layer is composed of a number of artificial neurons.To introduce non-linearity in the network, non-linear functions are introduced at the neuron level, such as the Rectified Linear Unit (ReLU) function.Deep networks are built by stacking a very large number of layers.They provide a powerful framework for supervised learning when trained on a large number of labelled samples (Goodfellow et al. [15]).A DNN includes an input layer, an output layer and hidden layers between them where the calculations take place.
One of the most powerful types of DNNs are Convolutional Neural Networks (CNNs), which have shown impressive results in image classification (given an input image, the CNN produces a label that indicates the visual content of that image) and object detection in images (given an input image the CNN produces a label together with a bounding box that indicates the region where the object-class is located in the image).
There are three main types of hidden layers in a CNN: • A convolution layer, the main building block of a CNN.It is based on a fundamental operation in image processing called convolution, which consists of filtering a 2D input image with a small 2D-matrix called filter or kernel.

•
A pooling layer, which reduces the input matrix in general by half.Conceptually, this operation is applied to increase the abstraction of the extracted features and it usually follows the convolution layer.

•
A fully connected layer, which is used as a classifier of the previously calculated high level features to derive scores for each target class.
Deep CNNs are built by alternating several convolutional layers with pooling layers.At the end of the sequence, the obtained convolutions and pooling layers are connected to one or more fully connected layers.Softmax function is used to convert the scores of the output of the final fully connected layer into a set of probabilities between 0 and 1, which represent the final output layer.
Before starting the training process, the weights of the CNN are initialized with random values.The CNN is then trained iteratively until convergence during a number of epochs.The loss function is used to measure how good the training process is by quantifying the distance between the prediction and the expected value.To find the best weights that correctly assign the input to the correct answer, the training process is reformulated into an optimization problem that aims to minimize the loss function.This minimization can be achieved using several optimization algorithms depending on the problem.One example is the ADAptive Moment estimation (ADAM) optimizer that computes the best weights.

CNNs Models
In Appendix B, we provide a brief description of the CNNs that are compared against the CNN developed in this work, that is, VGG, ResNet, Inception-V3, InceptionResNet-V2, Xception and DensNet.

Transfer Learning and Data Augmentation
CNNs need a large volume of data to achieve good results, however, this is not always possible since building new labelled dataset is costly and time consuming.To overcome this limitation in practice, CNNs are seldom trained from scratch.Instead, their weights are initialized by the pre-trained weights on a large dataset such as, the massive ImageNet.This approach is called transfer learning.In addition, to address a new problem, not all the weights are re-trained but only the weights of the last layers, for example, the fully connected layers.This process is called fine-tuning.
Data-augmentation technique consists of creating new training examples by applying geometrical transformations such as rotation and translation to the original samples.Its objective is artificially increasing the volume of the dataset.In fact, it was demonstrated in several works (Tabik et al. [16]).that increasing the volume of the dataset improves the learning of CNNs and reduces overfitting.In this paper we used the transformations explained in Table 1.
Table 1.The stages followed for pre-processing the input UAV imagery and data augmentation methods used for increasing the amount of sample patches.

Steps
The Pre-Processing Stages of Input UAV Imagery Step 1 Selection of image patches-a set of pictures manually cropped from orthophotomosaics using QGIS 7.2.2.
Step 2 Resizing of patches to 150 × 200 RGB pixels using cubic interpolation.
Step 3 Manual assignment of each patch to the appropriate tree damage category.

Methods for Increasing the Amount of Data
Step 1 Change the saturation of RGB channels.
Step 2 Remove Gaussian blur filter with a blur value of 5% and a width and height of a kernel 0.5.
Step 5 Cropping the central rectangle of each image to half distance from each edge (discarding the framing borders) and resizing back the central rectangle to 150 × 200 RGB pixels using cubic interpolation.

Classification of Trees in High Resolution Imagery and Related Works
In this section, we consider related works devoted to the problems of classification and detection of objects (from land-covers to organisms) on Earth remote sensing (ERS) data.Scientists from around the world are trying to solve this problem by using various ERS information with different methods and algorithms.In general, few works have addressed the detection of damaged trees using satellite or UAV images, in general they have only considered few health classes (e.g., healthy and dead trees) and did not apply CNNs but classical image classification methods.
For example, Deli et al. [17] developed a new CNN-based method for classifying four land-cover classes (crops, houses, soil and forests) in land surface images.They trained the algorithm with 100 images per class.Längkvist et al. [18] proposed the classification and segmentation of satellite orthoimagery (including five classes: vegetation, ground, road, parking, railroad, building and water) using CNNs in a small city for a full, fast and accurate per-pixel classification.They selected the parameters and analysed their influence on the neural network model architecture.They also found a better performance of the CNN model compared to object-oriented methods of classification, reaching a maximum CNN classification accuracy of 94.49%.
Few papers have used neural network classifiers to recognize plant species and plant growth using images from a digital camera and a cell phone camera.Dyrmann et al. [19] created a CNN capable of recognizing 22 plant species on colour images with an accuracy of 86.2%.For this, the authors used different data sets, depending on the lighting, image resolution and soil type.Razavi and Yalcin [20] proposed a CNN architecture to classify types of plants growing in agro stations.To evaluate the effectiveness of their approach, the results of the created CNN model were compared to Support Vector Machine (SVM; RBF kernel and polynomial kernel) classifiers.The accuracy of the CNN classification was greater than the SVM, reaching 97.47%.
Few studies have also used CNN on satellite imagery.Li et al. used deep learning models [10] to detect oil palm trees in QiuckBird images (0.6 m/pix).Guirado et al. in Reference [11] detected Ziziphus lotus shrubs on RGB satellite images from Google Earth using TensorFlow and got better accuracies than using object-based image analysis (OBIA).Baeta et al. [12] detected coffee crops in high-resolution SPOT images (2.5 m/pix).They used a binary classification model and presented a comparison of the accuracy of various CNN and OBIA methods.In addition, the authors used a large number of methods of preparation and preliminary processing of the image, which in the final result leads to a high accuracy of more than 95%.
Very few woks have used CNNs to detect plants in UAV images.Noteworthy is the work by Ferreira et al. [21], where weed was detected in soybean cultures using CNN based on the CaffeNet architecture.The authors used a set of data acquired with an UAV in manual mode at a height of 4 m above the Earth level using an RGB camera.The accuracy of the classification was 99.5%.Though, according to the authors, the accuracy may be lower in operational circumstances due to the heterogeneity in types of soils and weeds, since testing was performed on an experimental field.In 2018, Onishi and Ise [22] classified 7 types of trees (deciduous broad-leaved tree, deciduous coniferous tree, evergreen broad-leaved tree, Chamaecyparis obtuse, pinus, others) on UAV images using supervised deep learning (GoogLeNet) with an accuracy of 89%.
A similar problem to ours, which is the detection of damaged trees, is revised in the next series of works, though they mainly used classical image classification methods.In 2009, Abdullah et al. [23] identified healthy and infested trees in satellite imagery (Sentinel-2, WorldView-2 and 3 (up to 0.5 m/pix)) using partial least squares regression (PLSR).Heurich et al. [24] presented semi-automatic detection of dead trees following a spruce bark beetle outbreak in CIR aerial photographs using OBIA with accuracy up to 91%.In this work, authors used several classes, as deadwood areas, vital vegetation, non-woodland and assistant classes (shadows among dead trees and shadows among vital trees).In 2013, Ortiz et al. [25] detected a bark beetle attack with TerraSAR-X and RapidEye Data (up to 1.18 m/pix) using generalized linear models (GLM), maximum entropy (ME) and random forest (RF), with accuracies up to 74%.Meddens et al. [26] used multi-temporal disturbance detection methods to detect bark beetle-caused tree mortality and to map the subsequent forest disturbance on Landsat images with different classes (green trees, red damage stage, grey damage stage, herbaceous vegetation, bare soil, shadows, water).Kussul et al. [13] classified tree species and different levels of ash tree (Fraxinus sp.) mortality in WorldView-2 images with resolution of 1.85 m/pix using classical methods, in particular, the linear discriminant analysis (LDA), principal component analysis (PCA), stepwise regression and OBIA methods.Species diversity and the magnitude of ash loss were assessed.The authors needed to use a wide set of remote sensing indices to obtain good accuracies.The overall accuracy varied between 83% for the seven tree species and 77% for the four different levels of ash damage.In 2015, Näsi et al. [27] also identified damaged trees (with three classes: healthy, infected and dead) using UAV images and OBIA methods.In 2017, Dash et al. [28] used a non-parametric approach based on classification trees on UAV images to monitor forest health for disease outbreak in mature Pinus radiata trees.In 2018, Näsi et al. [29] published a paper that most closely relates to ours.They identified of bark beetle infestation at the individual tree level (healthy, infested and dead) in urban forests on UAV images using support vector machine (SVM), with accuracies up to 93%.
As far as we know, our work is the first in addressing the detection of damaged fir trees caused by the bark beetle using CNN methods on images acquired by UAVs (resolution below 0.1 m/pix).We considered four tree-health categories, which is higher than in previous works and allows to better assess the stage of the infestation.We also showed that it is possible to achieve good results using a relatively small training set of data.

Study Area and Data Acquisition
The study area is located in the territory of the "Stolby" State Nature Reserve, near Krasnoyarsk city in Central Siberia of the Russian Federation.Most of the territory (80%) constitutes the mid-mountain belt (500-800 m a.s.l.), mainly covered by mixed forests composed by seven tree species in different proportions: conifers such as pines (Pinus sylvestris, Pinus sibirica), larch (Larix sibirica), fir (Abies sibirica), spruces (Picea abies, Picea obovata) and parvifoliate trees such as birch (Betula pendula, Betula pubescens) and aspen (Populus tremula) (Ryabovol [30]).Pine trees dominate among other species and occupy 41% of the total forested area, mainly in low-mountains (200-500 m a.s.l.).Distribution of Siberian fir expands on 25% of the Stolby forest.
First appearance of P. proximus in the Stolby nature reserve was registered in 2011 and nowadays the roughly estimated forest damage by the beetle is about 25-30% of the fir area.
Four plots with different rates of the P. proximus invasion were chosen for our study at Stolby (Figure 1).

Remote Sens. 2019, 1, x FOR PEER REVIEW 6 of 21
First appearance of P. proximus in the Stolby nature reserve was registered in 2011 and nowadays the roughly estimated forest damage by the beetle is about 25-30% of the fir area.
Four plots with different rates of the P. proximus invasion were chosen for our study at Stolby (Figure 1).were created for each plot from a set of overlapping geolocated images (300-400 partially overlapping images per plot) using the Agisoft Photoscan Professional v1.4.0 (64-bit) software (www.agisoft.com, Agisoft LLC, St. Petersburg, Russia).This software provides photogrammetric and 3D-model reconstruction tools [31].The data processing workflow in Photoscan consisted of the following steps: alignment of each image location, generation of dense point cloud (3D-model), creation of digital elevation model (DEM) and eventually production of a georeferenced orthophotomosaic (orthophotos or orthoimages from now).The workflow was run several times with varying settings in order to achieve better stich of images over the complex mountain landscape of the study sites.As a result, we got four orthophotos for each one of the four sites (A, B, C and D for 2016 and 2018).

Methods
As we have mentioned, our aim is to evaluate the possibilities of convolutional neural networks as a new approach to detect bark beetle outbreaks in fir trees in very high resolution imagery in fir and mixed forests.Thus, this section includes a definition of fir-tree damage categories (Section 5.1), describes the data pre-processing techniques we used for training the CNN models (Section 5.2), presents the classification model (Section 5.3) and finally presents the proposed detection technique especially designed for detecting bark beetle outbreaks in fir forests (Section 5.4).

Definition of Fir Trees Damage Categories
To define the categories of health status of fir trees, we followed the entomological approach by Krivets et al. [32], which differentiates six categories according to the level P. proximus invasion into the trunk and its influence on the canopy: I-healthy trees; II-weakened trees; III-heavily weakened trees; IV-dying trees; V-recently died trees; VI-old deadwood.Since the first, second and third categories can only be recognized in-situ by trunk signs that do not visibly translate into the crown, we reclassified them into four categories (skipping the second and the third one).Hence, the final categories in our classification were: a-completely healthy tree or recently attacked by beetles, b-tree colonized by beetles, c-recently died tree and d-deadwood (Figure 2).elevation model (DEM) and eventually production of a georeferenced orthophotomosaic (orthophotos or orthoimages from now).The workflow was run several times with varying settings in order to achieve better stich of images over the complex mountain landscape of the study sites.As a result, we got four orthophotos for each one of the four sites (A, B, C and D for 2016 and 2018).

Methods
As we have mentioned, our aim is to evaluate the possibilities of convolutional neural networks as a new approach to detect bark beetle outbreaks in fir trees in very high resolution imagery in fir and mixed forests.Thus, this section includes a definition of fir-tree damage categories (Section 5.1), describes the data pre-processing techniques we used for training the CNN models (Section 5.2), presents the classification model (Section 5.3) and finally presents the proposed detection technique especially designed for detecting bark beetle outbreaks in fir forests (Section 5.4).

Definition of Fir Trees Damage Categories
To define the categories of health status of fir trees, we followed the entomological approach by Krivets et al. [32], which differentiates six categories according to the level P. proximus invasion into the trunk and its influence on the canopy: I-healthy trees; II-weakened trees; III-heavily weakened trees; IV-dying trees; V-recently died trees; VI-old deadwood.Since the first, second and third categories can only be recognized in-situ by trunk signs that do not visibly translate into the crown, we reclassified them into four categories (skipping the second and the third one).Hence, the final categories in our classification were: a-completely healthy tree or recently attacked by beetles, b-tree colonized by beetles, c-recently died tree and d-deadwood (Figure 2).

Dataset Preprocessing and Data Augmentation Techniques of Sample patches for Training the CNN Models
The design of the training dataset is the key to the performance of a good CNN classification model.For labelling the training dataset, we identified 50 image patches containing 4 categories of trees.In particular, we built two different datasets:

•
First, the dataset for training, validation and testing consisted in 50 manually sampled image-patches of single trees per each tree damage category, resulting in 200 image-patches from the plot A and B. To train the CNN model we used 80% of this selection.The remaining 20% of images were used for model internal validation.

•
Second, the dataset for external testing or external validation consisted in 88 patch-images generated by our proposed detection technique from the areas C and D.
A summary of these two datasets is provided in Table 2.The pre-processing method used for preparing the input UAV images is described in Table 1.
To improve the robustness and accuracy of the CNN classification models, we increased the volume of samples using data augmentation techniques.In particular, we increased the amount of sample patches for training the CNN models using the methods presented in Table 1.
The images of the training dataset were used for training the parameters of the neural network.The internal validation set is a set of examples used to tune the parameters of the neural network and determine a stopping point for the back-propagation algorithm.The testing dataset is a set of independent external examples used to assess or validate the performance of the final classification model.

Classification Model
A deep neural network was developed and trained on the prepared image samples to adapt its weights to the task of tree-damage recognition.To create a CNN and improve the quality of its training, we manually tuned the hyperparameters in the network.Several series of experiments were performed.For each experiment, the hyperparameters were altered and the consequent network's operation quality change was estimated.The loss function, which should be minimized in the CNN (see Section 1), was established by a categorical cross-entropy loss between the input and the output classification of the images.The ADAM's optimization was chosen among existing optimization algorithms since it was the most efficient one due to the possibility of initial calibration of the CNN.When exploring the effect of changing the learning rate parameter with the ADAM's optimizer, the highest accuracy for CNN on the test data was achieved at a value of 0.0001.The total number of the network layers was determined by creating cascades of convolutional layers and a pooling layer.To assess the impact of the number of training epochs on the CNN accuracy, training was conducted and compared in the range from 1 to 50 epochs (see Section 6.2).
The overall architecture of the CNN (Table 3, Figure 3) consisted of six convolutional blocks (each includes one convolutional layer).Additionally, the first and third convolutional blocks included maximum pooling layers.At the top of the CNN, there are two fully connected layers and one output layer.The ReLU activation function was used in the last four convolutional blocks and the Softmax function for the output layer.In order to keep the overtraining of the network under control, we used the Dropout regularization method which reduced the complexity of the model, saving the number of its parameters low.It is very important to choose an appropriate regularization coefficient, so it was set to 0.25 after the second, fourth and fifth layers and it was set to 0.5 before the output layer.Our network with various activation functions is presented in Table 3.The overall architecture of the CNN (Table 3, Figure 3) consisted of six convolutional blocks (each includes one convolutional layer).Additionally, the first and third convolutional blocks included maximum pooling layers.At the top of the CNN, there are two fully connected layers and one output layer.The ReLU activation function was used in the last four convolutional blocks and the Softmax function for the output layer.In order to keep the overtraining of the network under control, we used the Dropout regularization method which reduced the complexity of the model, saving the number of its parameters low.It is very important to choose an appropriate regularization coefficient, so it was set to 0.25 after the second, fourth and fifth layers and it was set to 0.5 before the output layer.Our network with various activation functions is presented in Table 3.Consequently, our CNN model has the following form, presented in Figure 3.

Proposed Detection Process
We developed a candidate selection technique to find the potential regions in the test image that are more likely to be a tree.Then we analysed each one of these candidate regions using our CNNclassification model.The proposed candidate selection technique is a data processing algorithm that includes a sequence of the steps presented in the Table 4. Consequently, our CNN model has the following form, presented in Figure 3.

Proposed Detection Process
We developed a candidate selection technique to find the potential regions in the test image that are more likely to be a tree.Then we analysed each one of these candidate regions using our CNN-classification model.The proposed candidate selection technique is a data processing algorithm that includes a sequence of the steps presented in the Table 4.

Steps Data Processing Algorithm
Step 1 Conversion of RGB images to a grayscale colour palette Step 2 Blurring of grayscale images using the Gaussian high-pass filter in order to reduce Gaussian noise with the following function parameters: kernel size of 11 × 11 pixels and the standard deviation equal to 0 Step 3 Creation of binary images from the grayscale blurred images by application of a threshold function with the optimal brightness threshold value of input image pixels equal to 100 Step 4 Structuring of picture elements outlines by application of two successive functions (erosion and dilation) with several iterations for the binary pictures in order to distinguish individual tree crown contours and to minimize the effect of their confluence (fusion) in one object Step 5 Detection of image patches was implemented using a contour area calculation function based on the Green formula (Dawkins [33]), object size for the function was set in the range between 50 × 50 and 200 × 200 pixels The output of this process are the candidate regions indicated by red bounding boxes in Figure 4. Finally, all the bounding boxes were analysed by the classification model.

Steps Data Processing Algorithm Step 1 Conversion of RGB images to a grayscale colour palette
Step 2 Blurring of grayscale images using the Gaussian high-pass filter in order to reduce Gaussian noise with the following function parameters: kernel size of 11 × 11 pixels and the standard deviation equal to 0 Step 3 Creation of binary images from the grayscale blurred images by application of a threshold function with the optimal brightness threshold value of input image pixels equal to 100 Step 4 Structuring of picture elements outlines by application of two successive functions (erosion and dilation) with several iterations for the binary pictures in order to distinguish individual tree crown contours and to minimize the effect of their confluence (fusion) in one object Step 5 Detection of image patches was implemented using a contour area calculation function based on the Green formula (Dawkins [33]), object size for the function was set in the range between 50 × 50 and 200 × 200 pixels The output of this process are the candidate regions indicated by red bounding boxes in Figure 4. Finally, all the bounding boxes were analysed by the classification model.

Results and Analysis
In this section, we will first describe the metrics used for the evaluation of the results of our proposed model, we will then show the performance of the training and internal test process (sites A and B) and finally we will show the results of the detection on two external test datasets of UAV images (sites C and D).

Evaluation Metrics
To evaluate and compare the results of the proposed CNN model with other CNN models, we used the performance metrics based on the confusion matrix.The confusion matrix is four by four (four categories) and contains the results of multi-class classifier (one class against the rest) in terms

Results and Analysis
In this section, we will first describe the metrics used for the evaluation of the results of our proposed model, we will then show the performance of the training and internal test process (sites A and B) and finally we will show the results of the detection on two external test datasets of UAV images (sites C and D).

Evaluation Metrics
To evaluate and compare the results of the proposed CNN model with other CNN models, we used the performance metrics based on the confusion matrix.The confusion matrix is four by four (four categories) and contains the results of multi-class classifier (one class against the rest) in terms of number of true positive (TP) predictions, true negative (TN) predictions, false positive (FP) predictions and false negative (FN) predictions [34].
The main metrics we can extract from the matrix of confusion are: accuracy (1), precision (2), recall (3) and F-score (4).Accuracy is calculated as the total number of the correct predictions (TP + TN) divided by the total number of examples in the test set.Precision is calculated as the number of correct positive predictions (TP) divided by the total number of positive predictions (TP + FP).Recall is calculated as the number of correct positive predictions (TP) divided by the number of true positives and false negatives.F-score indicates the balance between precision and recall.The highest and best value of all these metrics is 1.0 and the worst is 0.0.

Evaluation of the Training Process and Comparison
To evaluate the training process of our new CNN, we used Python, Keras and TensorFlow framework with different pre-processing data augmentation techniques.Keras is a high-level API to ease the process of building and training deep learning models [35].TensorFlow is open source software library for high performance numerical computation, which runs on various heterogeneous systems, including distributed graphics processing unit (GPU) clusters (Murray [36]).The used hardware environment in this study was Intel Xeon E5-2630v4 CPU accelerated with NVIDIA Titan Xp GPU as a platform for learning and testing.
The application of data augmentation techniques to the sample patches for training the CNN models resulted in the creation of 4400 image-patches from the first data subset, which had 200 patches (Section 5.2).
The maximum performance of our CNN model with augmented dataset was achieved at the 23th training epoch, providing an internal test accuracy of 99.7% and a minimum training loss lower than 0.01.After the 23th epoch, the validation loss was stabilized and the difference between the training and validation loss increased (Figure 5).
The main metrics we can extract from the matrix of confusion are: accuracy (1), precision (2), recall (3) and F-score (4).Accuracy is calculated as the total number of the correct predictions (TP + TN) divided by the total number of examples in the test set.Precision is calculated as the number of correct positive predictions (TP) divided by the total number of positive predictions (TP + FP).Recall is calculated as the number of correct positive predictions (TP) divided by the number of true positives and false negatives.F-score indicates the balance between precision and recall.The highest and best value of all these metrics is 1.0 and the worst is 0.0. ( (2)

Evaluation of the Training Process and Comparison
To evaluate the training process of our new CNN, we used Python, Keras and TensorFlow framework with different pre-processing data augmentation techniques.Keras is a high-level API to ease the process of building and training deep learning models [35].TensorFlow is open source software library for high performance numerical computation, which runs on various heterogeneous systems, including distributed graphics processing unit (GPU) clusters (Murray [36]).The used hardware environment in this study was Intel Xeon E5-2630v4 CPU accelerated with NVIDIA Titan Xp GPU as a platform for learning and testing.
The application of data augmentation techniques to the sample patches for training the CNN models resulted in the creation of 4400 image-patches from the first data subset, which had 200 patches (Section 5.2).
The maximum performance of our CNN model with augmented dataset was achieved at the 23th training epoch, providing an internal test accuracy of 99.7% and a minimum training loss lower than 0.01.After the 23th epoch, the validation loss was stabilized and the difference between the training and validation loss increased (Figure 5).For comparison with our CNN model, we have also tested on the same input data the most powerful CNN models such as Xception, VGG-16, VGG-19, ResNet-50, Inception-V3, InceptionResNet-V2, DenseNet-121, DenseNet-169 and DenseNet-201 (Jordan [37]).A brief explanation of each model is provided in Appendix B. We adapted the last layer of all these models to the 4 classes of our problem.The obtained results of all these models after few training epochs are shown in Appendix A (Table A1).As it can be seen from Table A1, all models provide high training accuracies.

External Test Detection Results
The confusion matrix of our CNN model in the four categories with data-augmentation are shown is Figure 6.InceptionResNet-V2, DenseNet-121, DenseNet-169 and DenseNet-201 (Jordan [37]).A brief explanation of each model is provided in Appendix B. We adapted the last layer of all these models to the 4 classes of our problem.The obtained results of all these models after few training epochs are shown in Appendix A (Table A1).As it can be seen from Table A1, all models provide high training accuracies.

External Test Detection Results
The confusion matrix of our CNN model in the four categories with data-augmentation are shown is Figure 6.The results based on the confusion matrix of our CNN model on each category with and without data augmentation are shown in the Table 5.
Table 5.The performance of our CNN model with and without data augmentation on the test set for each categories of trees.The performance is expressed in terms of true positives (TP), true negatives (TN), false positives (FP), false negatives (FN), precision, recall and F-score.As it can be seen from Table 5, the trained CNN with data-augmentation provides the best test accuracy, recall and F1_score.The average accuracy, recall and F-score improved by 14.5%, 20.4% and 34.3% respectively.A comparison between the results obtained by our model and by more complex models with augmentation on the 88 candidate areas from images C and D is presented in Appendix A (Table A2).As it can be seen from this Table, although Xception, VGG16, VGG19, The results based on the confusion matrix of our CNN model on each category with and without data augmentation are shown in the Table 5.

Categories
Table 5.The performance of our CNN model with and without data augmentation on the test set for each categories of trees.The performance is expressed in terms of true positives (TP), true negatives (TN), false positives (FP), false negatives (FN), precision, recall and F-score.As it can be seen from Table 5, the trained CNN with data-augmentation provides the best test accuracy, recall and F1_score.The average accuracy, recall and F-score improved by 14.5%, 20.4% and 34.3% respectively.A comparison between the results obtained by our model and by more complex models with augmentation on the 88 candidate areas from images C and D is presented in Appendix A (Table A2).As it can be seen from this Table, although Xception, VGG16, VGG19, ResNet50, Inception V3, InceptionResNetV2, DenseNet121, DenseNet169 and DenseNet201 are powerful and computationally intensive models, our less computationally intensive architecture provides much better results in the damaged tree categories problem.

Categories of
On the other hand, the proposed model with data-augmentation achieves high F1_score on damage class 1, 2, 3 and 4 with 92.75%, 89.86%, 89.66% and 88.89 respectively.This can be explained by the fact that the model correctly distinguishes the colour, shape and texture of each one of the four damage classes.
The results of the detection on the input test areas C and D are shown in Figures 7 and 8   As it can be seen from Figures 7 and 8, our model correctly recognizes the tree damage categories in most candidate regions (red boxes).In the few cases where the boxes include more than one tree crown, the model correctly predicts the existence of infected trees.Since the objective of this work is to detect infected trees, if a box has more than one tree and at least one of them is unhealthy and the model considers the entire box as unhealthy tree class, this answer is considered as valid answer for the purpose of finding unhealthy trees.
On the other hand, detecting individual trees in dense forests using only RGB bands information is a very complex task.In fact, previous studies (such as Mohan et al. [38]) that aimed to detect individual tree crowns in dense forests had to include several sources of data (high resolution images together with 3D LIDAR point cloud data), to achieve similar accuracies to our results.Actually, our results demonstrate that the proposed CNN based approach has a high potential and can be further improved in the future using more information, such as multispectral bands, NDVI or other spectral indices and 3D LIDAR data.As it can be seen from Figures 7 and 8, our model correctly recognizes the tree damage categories in most candidate regions (red boxes).In the few cases where the boxes include more than one tree crown, the model correctly predicts the existence of infected trees.Since the objective of this work is to detect infected trees, if a box has more than one tree and at least one of them is unhealthy and the model considers the entire box as unhealthy tree class, this answer is considered as valid answer for the purpose of finding unhealthy trees.
On the other hand, detecting individual trees in dense forests using only RGB bands information is a very complex task.In fact, previous studies (such as Mohan et al. [38]) that aimed to detect individual tree crowns in dense forests had to include several sources of data (high resolution images together with 3D LIDAR point cloud data), to achieve similar accuracies to our results.Actually, our results demonstrate that the proposed CNN based approach has a high potential and can be further improved in the future using more information, such as multispectral bands, NDVI or other spectral indices and 3D LIDAR data.

Conclusions
Damage to fir trees caused by the attacks of bark beetles is a major problem in the Taiga and Boreal Forests, particularly in Central Siberia of Russia.An efficient method for recognizing

Conclusions
Damage to fir trees caused by the attacks of bark beetles is a major problem in the Taiga and Boreal Forests, particularly in Central Siberia of Russia.An efficient method for recognizing categories of tree damage using remote sensing imagery could substantially help as an early warning system that allows to eliminate fir trees colonized by the P. proximus beetle, to reduce its spread to new territories.The presented results are of great interest, both for scientific and practical purposes, since such work in the task of classifying categories of tree damage caused by beetle attacks to fir trees using CNN methods on UAV images has not yet been encountered.Recognizing the categories of damage to fir trees is a difficult task due to the fact that the forest canopy is very dense and diverse.
In this article, we developed a model based on CNNs to detect in RGB UAV images the damage to fir trees caused by bark beetles.Our network consists of six convolutional blocks.The network can recognize four damage categories in Siberian firs, from healthy to dead, with high prediction accuracy on images from different survey data.We have shown that our model provides better performance than the most powerful state-of-the-art CNN models, such as Xception, VGG, ResNet, Inception, InceptionResNet, DenseNet.We showed that data augmentation substantially increased the performance of our CNN model.Our model, trained with data augmentation, showed up to 98.77% accuracy for damage categories one, three and four.The second category may have the lowest recognition accuracy due to the fact that we had to merge three categories traditionally assessed in the field (weakened trees, heavily weakened trees and dying trees) into one category-trees colonized by the beetle, since it was not possible to visually observe any differences between these three categories in the UAV imagery.
Regarding alternative models, the VGG-16 model, on average, showed higher results, among other models considered in the paper.VGG-16 model recognized the first, second and fourth categories with an accuracy of 85.9%, 79.76 and 94.37%, respectively.Recognition of the third category with an accuracy higher than 88.89% is achieved by ResNet-50 model.The lowest accuracies were reached by InceptionResNet-V2.
In future works, we will focus on improving the segmentation of tree categories, introducing time series of data and trying to implement the three missing classes: weakened trees, heavily weakened trees and dying trees.It would also be interesting to explore if we can transfer our model to a new area with the same species and to other species, by only changing the last output layer in the network.

Figure 1 .
Figure 1.Location of the four plots in the nature reserve "Stolby," Krasnoyarsk city (Russia), where A, B plots are fragments from the orthophotos used to build the training dataset; and C, D plots are fragments from the orthophotos used to build the testing dataset used for external validation or independent testing.A set of RGB images with ultra-high spatial resolution (≈5-10 cm/pix) were obtained for the research sites during multiple flights of the DJI Phantom 3 Pro quadcopter (with standard built-in camera) in July 2016 (plot A and C) and of the Yuneec Typhoon H hexacopter (with CGO3+ camera) in May 2016 (plot B) and August 2018 (plot D).Imagery for research plots A and C were recorded in cloudy weather conditions at 670 m (A) and 700 m (C) altitudes (120-150 m elevation above ground), plot B and D was surveyed in sunny weather at 120 m height.Default camera settings (auto white balance, ISO 100) were applied in all aerial shots.Image composites (orthophotomosaics) were created for each plot from a set of overlapping geolocated images (300-400 partially overlapping images per plot) using the Agisoft Photoscan Professional v1.4.0 (64-bit) software (www.agisoft.com,Agisoft LLC, St. Petersburg, Russia).This software provides photogrammetric and 3D-model reconstruction tools [31].The data processing workflow in Photoscan consisted of the following steps: alignment of each image location, generation of dense point cloud (3D-model), creation of digital

Figure 1 .
Figure 1.Location of the four plots in the nature reserve "Stolby," Krasnoyarsk city (Russia), where A, B plots are fragments from the orthophotos used to build the training dataset; and C, D plots are fragments from the orthophotos used to build the testing dataset used for external validation or independent testing.A set of RGB images with ultra-high spatial resolution (≈5-10 cm/pix) were obtained for the research sites during multiple flights of the DJI Phantom 3 Pro quadcopter (with standard built-in camera) in July 2016 (plot A and C) and of the Yuneec Typhoon H hexacopter (with CGO3+ camera) in May 2016 (plot B) and August 2018 (plot D).Imagery for research plots A and C were recorded in cloudy weather conditions at 670 m (A) and 700 m (C) altitudes (120-150 m elevation above ground), plot B and D was surveyed in sunny weather at 120 m height.Default camera settings (auto white balance, ISO 100) were applied in all aerial shots.Image composites (orthophotomosaics)

Figure 2 .
Figure 2. Damage categories of Siberian fir trees used in this study (adapted from Krivets et al. [32]): (a) completely healthy tree or recently attacked by beetles, (b) tree colonized by beetles, (c) recently died tree and (d) deadwood.Top figures illustrate the vertical orthoimages corresponding to the bottom horizontal pictures.

Figure 2 .
Figure 2. Damage categories of Siberian fir trees used in this study (adapted from Krivets et al. [32]): (a) completely healthy tree or recently attacked by beetles, (b) tree colonized by beetles, (c) recently died tree and (d) deadwood.Top figures illustrate the vertical orthoimages corresponding to the bottom horizontal pictures.

Figure 3 .
Figure 3.The architecture of our CNN model of deep machine learning.

Figure 3 .
Figure 3.The architecture of our CNN model of deep machine learning.

Figure 4 .
Figure 4. Pre-processing consisted of the following steps: first, converting the three band image into one grey-scale band image (PAN); second, converting grey-scale band image into blurred image; third, converting the blurred image into a binary image based on a 100 over 256 digital value threshold; and fourth, detecting categories of trees on RGB images.The 48 candidate patches identified in test area C and 40 candidate patches identified in test area D are labelled with red contour in the right panel.

Figure 4 .
Figure 4. Pre-processing consisted of the following steps: first, converting the three band image into one grey-scale band image (PAN); second, converting grey-scale band image into blurred image; third, converting the blurred image into a binary image based on a 100 over 256 digital value threshold; and fourth, detecting categories of trees on RGB images.The 48 candidate patches identified in test area C and 40 candidate patches identified in test area D are labelled with red contour in the right panel.

Figure 5 .
Figure 5. Loss and accuracy for each epoch of the CNN model training with data augmentation.

Figure 5 .
Figure 5. Loss and accuracy for each epoch of the CNN model training with data augmentation.

Figure 6 .
Figure 6.Confusion matrix of the proposed CNN model with data augmentation on the candidate regions obtained from test areas C and D.

Figure 6 .
Figure 6.Confusion matrix of the proposed CNN model with data augmentation on the candidate regions obtained from test areas C and D.
. The red boxes present the selected areas produced by our candidate selection technique (see Section 5.4).Each red box has two numbers.The first number is the tree category estimated by the CNN model; C1, C2, C3 and C4 refer to the tree damage categories 1, 2, 3 and 4 respectively.The second number indicates the probability calculated by our classification CNN model.Actually, this probability indicates the level of confidence of the model.Remote Sens. 2019, 1, x FOR PEER REVIEW 13 of 21 ResNet50, Inception V3, InceptionResNetV2, DenseNet121, DenseNet169 and DenseNet201 are powerful and computationally intensive models, our less computationally intensive architecture provides much better results in the damaged tree categories problem.On the other hand, the proposed model with data-augmentation achieves high F1_score on damage class 1, 2, 3 and 4 with 92.75%, 89.86%, 89.66% and 88.89 respectively.This can be explained by the fact that the model correctly distinguishes the colour, shape and texture of each one of the four damage classes.The results of the detection on the input test areas C and D are shown in Figures 7 and 8.The red boxes present the selected areas produced by our candidate selection technique (see Section 5.4).Each red box has two numbers.The first number is the tree category estimated by the CNN model; C1, C2, C3 and C4 refer to the tree damage categories 1, 2, 3 and 4 respectively.The second number indicates the probability calculated by our classification CNN model.Actually, this probability indicates the level of confidence of the model.

Figure 7 .
Figure 7. Results of the detection of damaged tree categories on test area C. C1, C2, C3 and C4 indicate the estimated class by our CNN classification model together with the corresponding probability.The symbols "+" and "-" indicate respectively correct and incorrect class estimation by our model.

Figure 7 .
Figure 7. Results of the detection of damaged tree categories on test area C. C1, C2, C3 and C4 indicate the estimated class by our CNN classification model together with the corresponding probability.The symbols "+" and "−" indicate respectively correct and incorrect class estimation by our model.

Figure 8 .
Figure 8. Results of the detection of damaged tree categories on test area D. C1, C2, C3 and C4 indicate the estimated class by our CNN classification model together with the corresponding probability.The symbols "+" and "-" indicate respectively correct and incorrect class estimation by our model.

Figure 8 .
Figure 8. Results of the detection of damaged tree categories on test area D. C1, C2, C3 and C4 indicate the estimated class by our CNN classification model together with the corresponding probability.The symbols "+" and "−" indicate respectively correct and incorrect class estimation by our model.

Author
Contributions: A.S. and S.T. conceived and conducted the experiments, developed the CNN model and wrote the manuscript.A.R. made a survey of the research plots and a field exploration of the territory.A.S. prepared the training and testing datasets.D.A.-S.and A.R. made changes, revised and edited the first submitted version of the manuscript.A.S., S.T., D.A.-S., A.R., Y.M. and F.H. made changes, revised and edited the final version of the manuscript.Funding: A.S. supported by the grant of the Russian Science Foundation No. 16-11-00007.S.T. was supported by the Ramón y Cajal Programme (No. RYC-2015-18136).S.T. and F.H. received funding from the Spanish Ministry of Science and Technology under the project TIN2017-89517-P.D.A.-S. received support from project ECOPOTENTIAL, which received funding from the European Union Horizon 2020 Research and Innovation Programme under grant agreement No. 641762, from the European LIFE Project ADAPTAMED LIFE14 CCA/ES/000612 and from project 80NSSC18K0446 of the NASA's Group on Earth Observations Work Programme 2016.A.R. was supported by the grant of the Russian Science Foundation No. 18-74-10048.Y. M. was supported by the grant of Russian Foundation for Basic Research No. 18-47-242002, Government of Krasnoyarsk Territory and Krasnoyarsk Regional Fund of Science.

Table 2 .
Four fir tree categories training and testing datasets.Each image has 150 × 200-pixels.

Table 3 .
The CNN network developed in this study, with different activation functions, where layers, output size and networks parameters are represented.

Table 3 .
The CNN network developed in this study, with different activation functions, where layers, output size and networks parameters are represented.

Table 4 .
The proposed candidate selection technique in the test areas cropped from test areas C and D.

Table 4 .
The proposed candidate selection technique in the test areas cropped from test areas C and D.