1. Introduction
Human activities have led to the global redistribution of species, causing a decline in biodiversity and the apparition of non-native species in natural environments [
1]. Introducing these species affects the native populations and jeopardizes the function of the ecosystem [
2]. This phenomenon is being enhanced by climate change, allowing for the arrival of invasive pests and new pathogens to crops and forests [
3,
4]. The apparition of these diseases poses a severe threat to food security and the resilience of natural landscapes [
5], especially in island ecosystems, which are especially vulnerable to climate change [
6].
The Canarian archipelago, and its unique vegetation landscape, shaped by Canarian palm tree (
Phoenix canariensis) groves, is being affected by this worldwide spread phenomenon, with the groves consequently being in decline. Numerous pests and diseases can be found, the most important of which are
Serenomyces phoenicis and
Phoenicococcus marlatti.
S. phoenicis is a fungus that mainly affects mature and older leaves, invading the vascular tissues and drying localized leaf areas.
P. marlatti is an insect that feeds on palm sap and can cause severe damage by reducing the growth and production. In addition, although the threat of the palm weevil (
Rhynchophorus ferrugineus) has been eradicated [
7], there is another similar species, the spotted coconut weevil (
Diocalandra frumenti), a coleopteran that attacks palm trees, that causes the lower leaves to dry and the formation of small galleries in the rachis that can affect the vascular bundles, causing severe damage to the palm tree.
The European Union Natura 2000 protection areas designated
P. canariensis groves as a priority habitat as an essential endemic Canary Islands plant species, contributing to its identity and economy [
8]. In this context, new tools to monitor and treat the pathologies that affect and jeopardize the populations of
P. canariensis are required. Traditionally, visual inspection through fieldwork has been the predominant method. However, this methodology requires highly qualified personnel, and its success depends on the expertise of the technicians. In addition, its diagnosis can be affected by the variability of different pathologies over time, causing inconsistencies in the assessments and impoverishing their repeatability and reproducibility [
9,
10]. Therefore, phytosanitary monitoring needs reliable and efficient alternatives to improve disease detection. In this sense, remote sensing is a suitable alternative for disease monitoring and surveillance [
11,
12,
13,
14].
Although the images obtained from satellites and aircraft allow for thorough monitoring and can cover larger areas, compared with UAV images, their spatial resolution may be too coarse to draw significant conclusions [
14]. In this sense, high-resolution UAV imagery presents clear advantages [
15,
16]. Additionally, UAVs can be used alongside satellites to combine the detailed information of high-resolution images with the large-scale coverage of satellite data [
17,
18,
19]. The potential of UAVs for conducting detailed surveys in vegetation has been demonstrated in diverse applications, such as species identification [
20], plant stress detection [
21], forest health assessments [
22], early detection of insect infestations [
23], weed management [
24], and senescence prediction [
25].
Among the sensors mounted on UAVs, multispectral cameras are predominantly used for disease identification [
14]. The spectral information of the images obtained by these cameras allows for the calculation of different vegetation indices (VIs), representative of the biophysical and biochemical properties of plants, which change, such as loss of pigmentation or variations in the structure of their leaf cells in response to different stresses [
26]. These indices help characterize these changes [
10] and represent one of the most critical factors for identifying crop diseases [
14].
The applicability of multispectral data is not limited to calculating the spectral indices. Recent innovations in data analytics and image processing contribute to developing and applying new techniques and algorithms for studying vegetation pests and diseases, further deepening our understanding of the capabilities of multispectral data and improving the accuracy and processing times [
12]. Among them, machine learning (ML) algorithms for image classification and segmentation [
27] stand out, and their application is handy for phytosanitary monitoring. These techniques have been used to study the prevalence of different crop diseases [
28,
29,
30], the location of diseased or stressed specimens in different scenarios [
31,
32,
33], or the identification of damaged leaves with a high level of detail [
34,
35,
36].
SVM is an ML algorithm that successfully deals with limited training samples [
13] and has been found to outperform other algorithms for detecting vegetation diseases [
37,
38]. For example, in [
39], the authors detected Bakanae disease in rice seedlings with an accuracy of 0.88 using SVM classifiers, and in [
40], wheat leaf rust was detected with accuracies approaching 0.93. Another ML algorithm used to identify crop diseases successfully is RF, reaching accuracies close to 0.79 in [
41], while [
17] used RF classifiers to detect infected banana trees with an accuracy of 0.97. In other research [
42], UAV RGB and thermal images were used to estimate sap flow and leaf stomatal conductance in a range of forest tree species, with RF being the model that achieved the best accuracy (better than 0.9). In [
33], the authors developed a model to predict tree mortality using RF algorithms and spectral indices derived from multispectral UAV imagery.
Although SVM and RF are usually preferred [
43], other algorithms, such as ANNs, show sufficiency for plant disease and pest surveillance in complex scenarios [
44,
45]. In [
46], ML with visible and IR reflectance data was used to classify damaged soybean seeds, obtaining the highest accuracy with ANN classifiers among several models that were tested, and in [
47], diseased leaves in cotton plants were identified using ANN and RGB images.
Considering
P. canariensis, we only found examples in the literature using RS and ML techniques for identifying red palm weevil (
Rhynchophorus ferrugineus). In [
48], the authors used thermal infrared and RGB images with a 0.5 m and 38 mm spatial resolution, respectively, taken on a platform 3 m above the canopy in order to identify the infected palm trees, achieving an accuracy of 0.75. Another example can be found in [
49], where the authors used time series of thermal images acquired by a balloon platform to detect the effects of vascular damage in the tree canopy. These studies focussed on urban areas, and we attempts to study diseases that threaten the distribution of
P. canariensis in natural habitats, as well as studies focusing on pathologies present in the Canarian archipelago are lacking.
In this context and considering the ecological and socio-cultural importance of palm groves in the Canary Islands, the Guarapo project
http://guarapo.lagomera.es/ (accessed on 17 July 2023) was proposed to assess the conservation status of
Phoenix canariensis. Within the framework of this project, our objective was to develop a tool for monitoring diseases through probabilistic classification modelling, in order to identify infected specimens using high-resolution multispectral UAV images and ML techniques.
2. Materials and Methods
A probabilistic classification model was developed to identify infected palm specimens based on the prevalence of affected leaves within each palm tree. First, the ability of different vegetation indices to discriminate between affected and healthy leaves was studied using a Jeffries-Matusita spectral separability analysis. Then, three different steps were followed, namely: (i) image segmentation, to detect and identify individual palm trees; (ii) pixel-level classification within each previously segmented palm tree, using ML and considering the reflectance of bands 1 to 5 of a Micasense Altum camera, where the spectral indices showed the highest spectral separability in the previous analysis; and (iii) calculation of the relative prevalence of pixels classified as affected leaves in each individual, which were later to be used as the predictor variable in the probabilistic classification model (
Figure 1).
2.1. Study Area
Two study areas were chosen, namely the Vegaipala area on the island of La Gomera and Barranco El Cercado on Tenerife, located in the Canary Island archipelago (
Figure 2). This archipelago, consisting of eight islands, is located in the Northeast Atlantic, and it is characterized by a subtropical climate with a low seasonal temperature variability [
50]. The trade winds with prevailing north-east direction characterize the precipitation patterns, with the areas exposed to north and north-east being the most humid [
51].
These study areas, located on the southern slopes of the islands, were chosen because of their representativity of the health status of the palm groves in the archipelago. The main harmful agents found are S. phoenicis and P. marlatti, which, along with the increasing drought problems characteristic of these slopes, are causing the general palm groves to exhibit a declining trend in the archipelago.
2.1.1. Vegaipala
The selected study area covers approximately 0.84 ha. Its geographical coordinates are at 28.093°N and 17.201°W, with an approximate altitude of 800 m above sea level. This palm grove is near the hamlet of Vegaipala, in the San Sebastián de La Gomera municipality, and it is located on a hillside with terraces of abandoned crops.
According to the Köppen classification, the climate in Vegaipala is Csa—temperate with dry and warm summers. The average annual temperature varies from 12.4 °C in January to 21.5 °C in August. The accumulated annual rainfall is 394.7 mm
https://atlasclimatico.sitcan.es/ (accessed on 17 July 2023).
Other species of vegetation that are present are Micromeria gomerensis, Cistus monspeliensis, Agave americana, and Opuntia maxima Mill. M. gomerensis is a protected endemic plant of the island of La Gomera, belonging to the Lamiaceae family, which grows in mountain areas and ravines. C. monspeliensis is a perennial plant that grows in arid and rocky areas of the island, and it is also common in the Mediterranean, with ecological and cultural value in the archipelago. A. americana is a perennial succulent plant native to America, highly resistant to arid conditions, and Opuntia maxima Mill. is a species of cactus belonging to the Cactaceae family, native to America, which may pose a threat to other native species due to its colonizing potential.
2.1.2. Barranco El Cercado
The area selected for the study occupies an area of approximately 6.12 ha, with geographical coordinates centered at 28.530°N and 16.207°W, with an approximate height of 200 m above sea level. This ravine is located within the Anaga Natural Park’s boundaries in the Santa Cruz de Tenerife municipality. It has become a tourist attraction with anthropic influence due to the proximity of urban areas.
Its climate is also classified as Csa—temperate with a dry and warm summer, with an average annual temperature ranging from 17.2 °C in January to 24.4 °C in August. The accumulated annual rainfall is 385.4 mm
https://atlasclimatico.sitcan.es/ (accessed on 17 July 2023).
The area is also home to Periploco laevigatae, an evergreen shrub with a high resistance to drought and exposure, and Juniperus canariensis, another type of drought-resistant shrub capable of growing in poor soils. Both species are endemic to the Canary Islands, with the latter holding a protected status.
2.2. Data Collection
Various flight missions were conducted at 60 m above ground level for each of the study areas using a Micasense Altum multispectral camera (MicaSense, Inc., Washington, DC, USA) mounted on a DJI Matrice 200 v2 (Da-Jiang Innovations (DJI), Shenzhen, Guangdong, China). The flight speed was set to 2 m/s with a front and lateral overlap of 85%. At Vegaipala, the wind conditions for the day of the flight (14 September 2022) were optimal, with wind speeds close to 2.5 m/s. For the Barranco El Cercado, the prevailing wind during the planned month for measurements (June 2022) was northbound moderate, with gusts exceeding 50 km/h. These gusts conditioned the capture of the images, having to make up to four trips between 1 and 12 June 2022 to the study area before being able to carry out the flights safely on 12 June.
The Altum camera captures five radiance bands in the visible and near-infrared regions (i.e., blue, green, red, red edge, and near infrared) comprising wavelengths of 475.0 nm, 560.0 nm, 668.0 nm, 717.0 nm, and 842.0 nm, respectively (
Figure 3). For radiometric calibration, reference images were taken before and after each flight by pointing the camera to a calibrated reflectance panel (CRP). An additional correction step was applied with the Downwelling Light Sensor (DLS 2) (MicaSense Inc., Washington, DC, USA), an advanced light sensor that adjusts for lighting changes and provides GPS data to the camera.
Image processing resulted in developing two orthomosaics for both study areas, using Pix4dFields
® software (
Figure 2). Image processing carried out included the following (i) georeferentiation, (ii) rig relative correction, and (iii) radiometric correction. The orthomosaics presented a spatial resolution of 3.94 cm/pixel.
The assessment of the health status of the P. canariensis specimens in both study areas was conducted by technicians from the Plant Health Service of the Dirección General de Agricultura of the Consejería de Agricultura, Ganadería y Pesca from the Canary Government. Information was collected from 95 palms for El Cercado and 68 for Vegaipala, and they were labelled as either healthy or infected.
To evaluate different vegetation indices and their spectral separability later, healthy leaves as well as those with different levels of affection were selected and cut. Images were taken of these leaves placed on a black background of near-zero reflectance (ρ < 0.02) with the Altum camera at a height of 5 m. The images, with a spatial resolution of 2.4 mm, were calibrated using CRP.
2.3. Spectral Indices Separability Analysis
Among the plethora of available spectral indices known to be suited to characterize the physiological and biochemical variations in vegetation, a selection criterion was defined by applying a series of filters. First, indices incorporating an atmospheric correction dimension, single band, and weighted indices were discarded. Then, a set of indices was defined, seeking equitable representation of the available bands in the Altum Micasense camera, aiming for similar ranges of values for potential indices and enhancing the comparability of the results of the later analysis. Following this criterion, eight spectral indices were selected for application and analysis (
Table 1): BNDVI (blue normalized difference vegetation index) [
52], GNDVI (green normalized difference vegetation index) [
53], NDRE (normalized difference red edge index) [
54], NDVI (normalized difference vegetation index) [
55], SIPI 2 (structure insensitive pigment index 2) [
56], OSAVI (optimized soil-adjusted vegetation index) [
57], NDYI (normalized difference yellowness index) [
58], and SIPI (structure insensitive pigment index) [
59].
The indices were calculated for different regions defined by groups of leaflets representing three types of leaves: (i) healthy, with green shades; (ii) affected, with yellow-green shades; and (iii) dry, with whitish shades. These regions were identified and defined based on visual interpretation. For each index, the mean reflectance and variance were calculated. From these values, a spectral separability analysis was performed between the different selected regions using the Jeffries-Matusita (
J −
M) distance, defined by Equation (1) [
60]:
which represents the distance between two probability density functions or statistical distributions—in our case, two types of leaves.
B1,2 is the Bhattacharyya distance (Equation (2)).
where
and
represent the mean vectors of two distributions, and
and
are the covariance matrices.
J −
M exhibits asymptotic behavior at 2.0, implying maximum spectral separability when this value is reached between two classes [
60].
2.4. Image Processing
2.4.1. Palm Tree Segmentation
All individual palm trees were segmented using a regional growth method [
61]. The segmentation consisted of the selection of seed points based on the published
P. canariensis distribution map
https://www.idecanarias.es/listado_servicios/mapa-palmeras-canarias (accessed on 17 July 2023), a similarity threshold of 10–15% difference in spectral values, and an eight-connection scheme for the neighboring pixels. The stopping criterion was set to a maximum region size of 10% of the total image or a change in the similarity between neighboring pixels of less than 1%. Post-processing with a median 3 × 3 kernel was carried out. Among all of the segmented palm trees, we only selected those specimens that had been previously labelled by the technicians (both healthy and infected), so as to carry on with the analyses.
2.4.2. Pixel Classification
A pixel-based classification was performed using SVM (support vector machine), ANN (artificial neural network), and RF (random forest). These algorithms were considered due to their maturity and developed state for classification procedures [
62] and their wide application in vegetation monitoring [
12,
27].
Pixels belonging to the identified palm trees in the previous segmentation process were classified, and four thematic classes were defined: (i) affected leaves, (ii) healthy leaves, (iii) shadow, and (iv) dates. For selecting the representative pixels of the class affected leaves, both those showing yellowish pigmentation and those with whitish colorations (dry leaves) were considered. The selected number of pixels representative of each thematic class was chosen by seeking the proportionality between the number of selected pixels and the approximate class prevalence in the images [
63].
The classification process involved two steps: (i) application of the three ML classifiers so as to choose the best algorithm and (ii) testing.
First, all of the selected pixels were randomly divided, reserving 80% of them to implement and compare the different classifiers (this data group will later be referred to as the training/validation set). The remaining 20% was used to test the classifier selected in the previous step (later called the testing set). For the Barranco El Cercado study area, 12,565 pixels were obtained. The training/validation set comprised 10,052 selected pixels, while the testing test comprised 2513. For Vegaipala, 8280 selected pixels were used, with 6624 for the training/validation set and 1656 for the testing set (
Table 2).
A five-fold cross-validation technique with 10 replicates was used to implement the classifiers, and the training/validation set was randomly divided between 80% training and 20% validation. The robustness of the classifiers was assessed and compared by analyzing the confusion matrices using the following metrics: (i) omission error, (ii) commission error, (iii) accuracy [
64], (iv) precision, (v) recall, and (vi) F1-score [
65,
66] (
Table 3). The mean values for these metrics for all classes were used as criteria to identify the best algorithm.
Finally, the testing set was used to validate the selected classifier, considering the same metrics (
Table 3).
2.5. Palm Disease Probabilistic Classification Modelling
Once the representative pixels of the healthy and affected leaves had been identified for each specimen, the prevalence of the affected leaves was calculated. A simple mathematical operation was performed by dividing the number of pixels classified as affected by the total number of representative leaf pixels (excluding dates and shadows).
Then, a probabilistic classification model was built considering the target variable as the palm health status (with values of 1 for infected specimens and 0 for the healthy ones), and the prevalence of affected leaves as the predictor variable. The same ML algorithms selected for the previous pixel-level classification were tested (SVM, ANN, and RF), and the output of the model was a probability map of any given palm tree being infected (that is, to belong to the infected class).
The 95 labelled palm trees from El Cercado were used to train and validate the model using a five-fold cross-validation technique and ten replicates, with 80% of the palm trees randomly selected for training (76) and 20% for validation (19). The robustness of the algorithms was compared using the metrics in
Table 3. The algorithm showing the best validating metrics was used to build the model. The result was tested with the 68 labelled palms in Vegaipala. Building the model with data from Barranco El Cercado and testing them with information from Vegaipala allowed us to use 100% independent data. On the other hand, the more significant number of palms whose health status was previously assessed in the El Cercado resulted in a more robust database for training and validation.
To train, validate, and test the model, we used a threshold of 0.6 probability, considering any given palm tree surpassing that threshold as belonging to the class “infected”, and those presenting values below 0.6 were classified as “healthy”.
4. Discussion
Using high-resolution (3.94 cm) UAV multispectral imagery and ML algorithms, a probabilistic classification model was built to detect potential infections in
Phoenix canariensis. The spatial resolution was achieved with 60 m altitude drone flights, an optimal flight altitude that has been proven to enhance the biophysical parameters extraction of vegetation [
67]. This approach represents the first attempt to identify and detect possible infections on a palm tree scale in the Canary Islands based on spectral response.
There is no clear consensus on which VI is the most appropriate for disease detection, as it may depend on the species studied, their conditions, and the intrinsic characteristics of the data [
31]. We proposed eight spectral indices commonly used in the literature [
10,
12]. Based on a
J −
M analysis, NDVI, SIPI, and SIPI2 were selected for their ability to differentiate between healthy and dry leaves. Other studies also found a good synergy between NDVI and SIPI to identify scab infections in wheat using hyperspectral data and SVM algorithms [
68]. In [
69], the authors demonstrated the effectiveness of combining these two VIs to study how aphid infestation affects the phenological stages of mustard plants. However, to the best of our knowledge, there are no examples of the concrete combination of NDVI, SIPI, and SIPI2 to feed machine learning algorithms for identifying pests in palm tree groves.
Because of the nature of our in situ data, namely labelling infected and healthy individuals, it was our goal to characterize the prevalence of unpigmented and dry leaves at an individual scale, to later use this information as the predictor variable in our probabilistic classification model. To assess this prevalence at the individual scale, a prior palm tree segmentation step was needed.
Procedures for palm tree detection [
66] and individual specimen segmentation [
70] based on deep learning techniques are gaining visibility in literature, although these are data demanding [
71,
72]. Because of the limited number of available specimens captured in our dataset, we considered a region growing algorithm for this step. This methodology allowed for individual palm tree segmentation, although the outcome needed simple manual corrections by visual inspection in some areas. The algorithm struggled to discriminate between different but overlapping palm trees and between other kinds of vegetation. The main reason for this was probably due to the inherent heterogeneity in the spatial distribution of palm trees in their natural habitat and the presence of other types of vegetation with a similar spectral response. Generally, this type of image processing is usually performed on crops [
13,
73,
74,
75], whose homogeneous and planned spatial distribution greatly helps segmentation algorithms. However, the complexity increases in natural habitats such as
P. canariensis groves in the Canary Islands. Other studies in similar scenarios have had the same problems, to the point of manually segmenting the tree canopies [
32,
33].
The combination of NDVI, SIPI, and SIPI2 with the SVM classifier outperformed the other algorithms in both study areas, with accuracies of 0.97 and 0.98 for Vegaipala and El Cercado, respectively. These results agree with the findings of [
76], where a classification with two classes (trees and background) was performed using NDVI and SIPI among other VIs, and SVM was the best classifier, with an overall accuracy of 0.95. While the authors of other studies have relied on the exclusive use of VIs or even performed dimension-reducing principal component analysis (PCA) [
17], we decided to add the selected VIs to the five bands of the Altum Micasense camera. When the VIs and the spectral bands were used together, the accuracy and precision of the classifiers increased. Similar results were obtained in [
77], where the authors found that combining spectral bands and VIs improved the classification performance. However, this improvement was only found when an appropriate subset of these indices was selected, to the extent that the overall accuracy decreased when more than four VIs were used, probably caused by redundancies and collinearity in the data. The strongest classifier was also SVM, with an overall accuracy of 0.96. The mentioned potential presence of collinearity in the data may be the reason behind the better performance of SVM against other tested algorithms, namely RF, known to outperform other algorithms in several studies, but also to be particularly sensitive to non-independence in the training data [
78]. Selecting five-fold cross-validation allowed for a trade-off between computational cost and accuracy. While small values of k may hinder the robustness of the classification [
79], optimal values of this parameter may be found for different scenarios [
80]. In this particular study, k = 5 was the threshold upon which the computational cost of the analyses dramatically raised, without significantly improving accuracy. A data split of 80:20 was selected due to its common practice in the literature [
81]. In addition, the relatively large number of available pixels to train and validate the classifications allowed us to maintain a substantial number of validating pixels, even though a relatively low percentage (20%) was defined for the validation subset.
Because of the relatively low spectral separability between affected (with different levels of depigmentation) and dry leaves, we decided to combine these two classes to carry out the image classification. Other studies also found that combining different levels of affection under one class resulted in a more robust classification. For example, in [
32], an initial classification was carried out with four classes, between (i) asymptomatic, (ii) less than 50% defoliated, (iii) more than 50% defoliated and (iv) dead trees, and finally finding that the aggregation of classes depicting affected leaves improved the accuracy of the classification from 0.67 to 0.91.
The three ML algorithms tested to build the probabilistic classification model have been previously proposed for similar purposes. SVM is an algorithm that successfully deals with limited training data, outperforming other algorithms in disease detection [
13] and it is widely used for this type of study [
43]. On the other hand, ANN stands out for its usefulness in complex scenarios [
44], such as ours. However, in our study, RF was the algorithm of choice for constructing the probabilistic classification model, with a mean accuracy of 0.87 and a precision of 0.85. These findings agree with other studies that employed similar methodologies in natural habitats [
33]. In this case, the authors pointed out RF as the algorithm with the best accuracies, with values close to 0.84 for predicting tree mortality.
When identifying infected individuals, the errors of commission were significantly lower than for healthy individuals, and the precision and F1-score were higher. The selection of a threshold value of 0.6 likely influenced this result. However, this threshold allowed us to find a trade-off between identifying healthy and diseased specimens. In this way, we prioritized obtaining greater accuracy for identifying infected palm trees, while seeking to reduce errors of commission, considering it more important to state with certainty that the palms identified as infected were indeed infected.
5. Conclusions
The probabilistic classification model developed, based on the machine learning RF algorithm, is an efficient tool for identifying infected palm tree specimens from multispectral information derived from UAV onboard sensors. This tool showed performances similar, and even superior in some cases, to that of more complex and data-demanding techniques.
The model resulted in the identification of 26 healthy palms for the study area in Vegaipala, while 42 were found to be infected, representing a prevalence of approximately 62%.
Collecting new images and in situ data will allow, on the one hand, to further validate the proposed model and to construct more complex models based on DL architecture, such as Mask R-CNN. In addition, studying the palm groves at different times of the year will broaden our knowledge of the seasonal variations of P. canariensis.
The developed algorithms for palm tree segmentation, pixel classification within segmented palm trees, and probabilistic classification could be applied to new scenarios, considering a transfer learning procedure. This will imply the possibility of enhancing future phytosanitary treatment without the need for further in situ labelled data, or user-defined selection of pixels for classification, potentially significantly reducing costs.