Development of Spectral Disease Indices for ‘Flavescence Dorée’ Grapevine Disease Identification

Spectral measurements are employed in many precision agriculture applications, due to their ability to monitor the vegetation’s health state. Spectral vegetation indices are one of the main techniques currently used in remote sensing activities, since they are related to biophysical and biochemical crop variables. Moreover, they have been evaluated in some studies as potentially beneficial for detecting or differentiating crop diseases. Flavescence Dorée (FD) is an infectious, incurable disease of the grapevine that can produce severe yield losses and, hence, compromise the stability of the vineyards. The aim of this study was to develop specific spectral disease indices (SDIs) for the detection of FD disease in grapevines. Spectral signatures of healthy and diseased grapevine leaves were measured with a non-imaging spectro-radiometer at two infection severity levels. The most discriminating wavelengths were selected by a genetic algorithm (GA) feature selection tool, the Spectral Disease Indices (SDIs) are designed by exhaustively testing all possible combinations of wavelengths chosen. The best weighted combination of a single wavelength and a normalized difference is chosen to create the index. The SDIs are tested for their ability to differentiate healthy from diseased vine leaves and they are compared to some common set of Spectral Vegetation Indices (SVIs). It was demonstrated that using vegetation indices was, in general, better than using complete spectral data and that SDIs specifically designed for FD performed better than traditional SVIs in most of cases. The precision of the classification is higher than 90%. This study demonstrates that SDIs have the potential to improve disease detection, identification and monitoring in precision agriculture applications.


Introduction
Plant pathogens pose a major threat to crops and reduce yields worldwide [1]. Marks occurring due to two different infections can be quite similar. Also, patches resulting of the same infection do not appear the same way depending on the crop variety and surrounding conditions. Thus, identifying crop diseases based on symptomatology alone is a complicated and subjective task and is often not sufficient. Additional laboratory tests are usually required to confirm the visual diagnosis.
Since 2013, about half of the French vineyard (400,000 ha) is placed in compulsory control zone against the FD and its insect vector [2]. The FD has also spread to other southern European countries (Italy, Portugal, Serbia and Switzerland) where it induced serious yield losses [2,3] and was declared as a quarantine body by the European Union. FD disease may result in the deterioration of the European grape quality. When FD is identified, massive amounts of pesticides are applied to prevent the propagation of the infection. This implies other problems such as chemical pollution and soil contamination. In order to efficiently apply pesticides, detecting and mapping initial established symptoms of diseases seems crucial.

FD Grapevine Disease and Identification Tests
The distinction between different grapevine diseases is a complicated task [21]. In fact, various diseases can cause almost identical symptoms; also, different symptoms may appear due to the same virus depending on the grapevine variety. Furthermore, marks can be the result of a fusion of many infections affecting the plant at the same time. Some factors such as bad weather conditions, nutrient deficiencies, pollution and pesticides can produce expressions indistinguishable from those of diseases; moreover, the time of infection and the overall environment can affect the appearance of signs on leaves. Present in the national territory since the mid-twentieth century, FD disease is transmitted to the vine by the Scaphoideus Titanus leafhopper and is progressing regularly in France [22].
Three symptoms must be present simultaneously (some are shown in Figure 1) and on the same branch to conclude the presence of FD : the change in leaf coloration, the absence of lignification of the new shoots and the mortality of the inflorescences and the berries. For more details, [24] have reviewed in their paper the biology and the ecology of FD. In this study, we are assessing the possibility of FD detection based exclusively on foliar symptoms by employing spectral technology. The FD is difficult to detect because the characteristic symptoms usually appear at least one year after inoculation, not necessarily every year, nor on all the branches. In addition, grapevine varieties have different sensitivities with respect to FD, so the symptoms are not expressed the same fashion. Mainly, the discoloration of leaves varies according to the grape varieties (yellow for white-berried grapevines, red for red-berried grapevines). Other complications that arise when detecting FD is the similarity between its symptoms and those of other yellows of the vine such as 'Bois Noir' (BN); however, new chemical methods based on Polymerase Chain Reaction (PCR) are capable of detecting and differentiating BN from FD.
When FD is diagnosed in the field and in order to control the overall risk, uprooting contaminated vines regularly and applying pesticides to limit the population of leafhoppers, are the currently applied approaches.
(a) (b) Figure 1. Some symptoms of FD on leaves: a red discoloration on a red grapevine variety (a) and a yellow discoloration on a white grapevine variety (b); windings of leaves can also be noticed.

Sampling Set-Up
In 2016, spectral signatures were considered from Provence-Alpes Côte d'Azur (PACA) French region ( Figure 2). Two acquisition campaigns were conducted at the time in the PACA region. The Figure 1. Some symptoms of FD on leaves: a red discoloration on a red grapevine variety (a) and a yellow discoloration on a white grapevine variety (b); windings of leaves can also be noticed.

Sampling Set-Up
In 2016, spectral signatures were considered from Provence-Alpes Côte d'Azur (PACA) French region ( Figure 2). Two acquisition campaigns were conducted at the time in the PACA region. The first  Four grapevine varieties were tested, 2 red-berried ones (Marselan, Grenache) and 2 whiteberried ones (Vermentino, Chardonnay). Red-berried fields were measured first in the morning from (10:00 to 12:00) and white-berried fields were measured next in the afternoon (14:00 to 16:00).
Measurements were performed on 2-4 leaves per grapevine and 2-4 measurements were made on each leaf. Four diseased and four healthy grapevines were considered for each grapevine variety. In total, there were 213 diseased and 201 healthy samples (63 Diseased Grenache and 64 Healthy Grenache; 63 Diseased Marselan and 64 Healthy Marselan; 47 Diseased Vermentino and 40 Healthy Vermentino, 42 Diseased Chardonnay and 34 Healthy Chardonnay). A range of healthy leaves of different ages was selected; however, for the infected leaves, a set is chosen in order to get a complete and representative range of FD symptoms. In order to ensure timely follow-up, the grapevines were located using a GPS and leaves were labeled. We tried to consider the same leaves during both acquisition campaigns but after testing them in August, leaves were not necessarily present in September; they either naturally fell or were cut by the winegrower. Thus, when the leaf was not found, we considered another candidate located on the same branch.
An inspector from the Regional Federation of Defense against Pests of PACA was there to confirm the presence of the disease and its severity stage. Furthermore, extra laboratory tests (PCR analysis) were done after the end of the acquisition campaigns to support the inspector's claim.

Reflectance Measurements
Spectral reflectance is the ratio of incident to reflected radiant flux measured from a surface over a defined range of wavelengths. Spectral reflectance measurements from leaf surfaces, in this study, were acquired using a portable Spectro-radiometer (FieldSpec 3, Analytical Spectral Devices, Boulder, CO, USA). Measurements were made on each leaf using a plant probe, specially designed with a low power source, for sensible vegetation surfaces, leaving no observable damage. It has the advantage of reducing the effect of environmental light scattering to insure better measurement accuracy. Each sample data was taken every 1 nm from 350 nm to 2500 nm. This was the result of an interpolation performed by the software because the true spectral resolution of the instrument is about 3 nm at 700 nm wavelengths and about 10 nm at 1400 or longer wavelengths. Before starting the acquisitions, the spectro-radiometer was warmed up for a minimum of 20 min, then a calibration was performed to absolute reflectance using a Teflon calibration disk. The number of samples for Spectrum was set to 30, the number of samples for Dark Current and White Reference were set to 100. It took approximately four hours to complete the measurements directly in the field.
Spectral measurements were taken from the same locations on leaves (shown in Figure 3) for both acquisition campaigns and one measurement is taken per location. The locations were chosen according to the disease that has tendency to start growing between the veins first. A range of 2-4 measurements is considered depending on the leaf surface with respect to the probe diameter. When Four grapevine varieties were tested, 2 red-berried ones (Marselan, Grenache) and 2 white-berried ones (Vermentino, Chardonnay). Red-berried fields were measured first in the morning from (10:00 to 12:00) and white-berried fields were measured next in the afternoon (14:00 to 16:00).
Measurements were performed on 2-4 leaves per grapevine and 2-4 measurements were made on each leaf. Four diseased and four healthy grapevines were considered for each grapevine variety. In total, there were 213 diseased and 201 healthy samples (63 Diseased Grenache and 64 Healthy Grenache; 63 Diseased Marselan and 64 Healthy Marselan; 47 Diseased Vermentino and 40 Healthy Vermentino, 42 Diseased Chardonnay and 34 Healthy Chardonnay). A range of healthy leaves of different ages was selected; however, for the infected leaves, a set is chosen in order to get a complete and representative range of FD symptoms. In order to ensure timely follow-up, the grapevines were located using a GPS and leaves were labeled. We tried to consider the same leaves during both acquisition campaigns but after testing them in August, leaves were not necessarily present in September; they either naturally fell or were cut by the winegrower. Thus, when the leaf was not found, we considered another candidate located on the same branch.
An inspector from the Regional Federation of Defense against Pests of PACA was there to confirm the presence of the disease and its severity stage. Furthermore, extra laboratory tests (PCR analysis) were done after the end of the acquisition campaigns to support the inspector's claim.

Reflectance Measurements
Spectral reflectance is the ratio of incident to reflected radiant flux measured from a surface over a defined range of wavelengths. Spectral reflectance measurements from leaf surfaces, in this study, were acquired using a portable Spectro-radiometer (FieldSpec 3, Analytical Spectral Devices, Boulder, CO, USA). Measurements were made on each leaf using a plant probe, specially designed with a low power source, for sensible vegetation surfaces, leaving no observable damage. It has the advantage of reducing the effect of environmental light scattering to insure better measurement accuracy. Each sample data was taken every 1 nm from 350 nm to 2500 nm. This was the result of an interpolation performed by the software because the true spectral resolution of the instrument is about 3 nm at 700 nm wavelengths and about 10 nm at 1400 or longer wavelengths. Before starting the acquisitions, the spectro-radiometer was warmed up for a minimum of 20 min, then a calibration was performed to absolute reflectance using a Teflon calibration disk. The number of samples for Spectrum was set to 30, the number of samples for Dark Current and White Reference were set to 100. It took approximately four hours to complete the measurements directly in the field.
Spectral measurements were taken from the same locations on leaves (shown in Figure 3) for both acquisition campaigns and one measurement is taken per location. The locations were chosen according to the disease that has tendency to start growing between the veins first. A range of 2-4 measurements is considered depending on the leaf surface with respect to the probe diameter. When the leaf is small, only 2 reflectance spectra from 2 locations are acquired and when the leaf is wide enough, 4 spectral tests are taken from all the 4 locations. The same procedure is applied for healthy leaves and infected leaves. the leaf is small, only 2 reflectance spectra from 2 locations are acquired and when the leaf is wide enough, 4 spectral tests are taken from all the 4 locations. The same procedure is applied for healthy leaves and infected leaves.

Spectral Data Analysis for Disease Detection
Each spectral measurement, acquired in this study, is the reflectance in a large number of contiguous narrow bands (350-2500 nm). Analyzing such high dimensional data is a complex and time-consuming task; therefore, reducing the dimensionality of the data, by selecting optimal wavebands, seems crucial. Techniques, such as SDIs, that uses only few spectral bands, are useful in the hyperspectral data analysis.

Spectral Disease Vegetation Indices Development Based on GA Feature Selection (SDIs)
In this section, the procedure is detailed from the acquisition of spectral signatures till the creation of disease-specific (or disease-dependent) indices. Figure 4 shows the approach that was adapted to compute and to evaluate SDIs. After the acquisition of the spectral signatures of leaves from the field, we obtained a set of healthy and diseased observations ranging from 350 to 2500 nm, with a total of 2151 features or wavelengths. Since the spectral data were noisy at the extremities, values between 400 nm and 2100 nm were only considered and adopted, giving in total 1901 wavelengths. Following this, the spectral resolution was reduced by a factor of 3 due to high correlation between adjacent wavelengths. In consequence, only 633 wavelengths were considered for the rest of the analysis ( Figure 5).
From the modified set of observations obtained, the best wavelengths were chosen by applying a GA feature selection tool. Genetic Algorithms (GA) provides a valid tool for solving optimization and search problems; it imitates the natural human evolution process [25]. GA manipulates one population to produce a new one based on some genetic operators. The five important steps in GA [26] are: (1) chromosome encoding, (2) fitness evaluation, (3) selection mechanisms, (4) genetic operators and (5) criteria to stop the GA ( Figure 6).

Spectral Data Analysis for Disease Detection
Each spectral measurement, acquired in this study, is the reflectance in a large number of contiguous narrow bands (350-2500 nm). Analyzing such high dimensional data is a complex and time-consuming task; therefore, reducing the dimensionality of the data, by selecting optimal wavebands, seems crucial. Techniques, such as SDIs, that uses only few spectral bands, are useful in the hyperspectral data analysis. In this section, the procedure is detailed from the acquisition of spectral signatures till the creation of disease-specific (or disease-dependent) indices. Figure 4 shows the approach that was adapted to compute and to evaluate SDIs. the leaf is small, only 2 reflectance spectra from 2 locations are acquired and when the leaf is wide enough, 4 spectral tests are taken from all the 4 locations. The same procedure is applied for healthy leaves and infected leaves.

Spectral Data Analysis for Disease Detection
Each spectral measurement, acquired in this study, is the reflectance in a large number of contiguous narrow bands (350-2500 nm). Analyzing such high dimensional data is a complex and time-consuming task; therefore, reducing the dimensionality of the data, by selecting optimal wavebands, seems crucial. Techniques, such as SDIs, that uses only few spectral bands, are useful in the hyperspectral data analysis.

Spectral Disease Vegetation Indices Development Based on GA Feature Selection (SDIs)
In this section, the procedure is detailed from the acquisition of spectral signatures till the creation of disease-specific (or disease-dependent) indices. Figure 4 shows the approach that was adapted to compute and to evaluate SDIs. After the acquisition of the spectral signatures of leaves from the field, we obtained a set of healthy and diseased observations ranging from 350 to 2500 nm, with a total of 2151 features or wavelengths. Since the spectral data were noisy at the extremities, values between 400 nm and 2100 nm were only considered and adopted, giving in total 1901 wavelengths. Following this, the spectral resolution was reduced by a factor of 3 due to high correlation between adjacent wavelengths. In consequence, only 633 wavelengths were considered for the rest of the analysis ( Figure 5).
From the modified set of observations obtained, the best wavelengths were chosen by applying a GA feature selection tool. Genetic Algorithms (GA) provides a valid tool for solving optimization and search problems; it imitates the natural human evolution process [25]. GA manipulates one population to produce a new one based on some genetic operators. The five important steps in GA [26] are: (1) chromosome encoding, (2) fitness evaluation, (3) selection mechanisms, (4) genetic operators and (5) criteria to stop the GA ( Figure 6). After the acquisition of the spectral signatures of leaves from the field, we obtained a set of healthy and diseased observations ranging from 350 to 2500 nm, with a total of 2151 features or wavelengths. Since the spectral data were noisy at the extremities, values between 400 nm and 2100 nm were only considered and adopted, giving in total 1901 wavelengths. Following this, the spectral resolution was reduced by a factor of 3 due to high correlation between adjacent wavelengths. In consequence, only 633 wavelengths were considered for the rest of the analysis ( Figure 5).
From the modified set of observations obtained, the best wavelengths were chosen by applying a GA feature selection tool. Genetic Algorithms (GA) provides a valid tool for solving optimization and search problems; it imitates the natural human evolution process [25]. GA manipulates one population to produce a new one based on some genetic operators. The five important steps in GA [26] are: (1) chromosome encoding, (2) fitness evaluation, (3) selection mechanisms, (4) genetic operators and (5) criteria to stop the GA ( Figure 6).  Human genetics vocabulary is often used in GA, chromosomes are the bit strings (individuals that form the population), gene is the feature [27]. In this study, a binary space is assumed: a gene value "1" indicates that the feature indexed by the "1" is chosen. Contrarily, (i.e., if it is 0), the feature is not chosen for evaluation. At the beginning, a matrix of dimension (Population size (300 samples) x Number of wavelengths (633 spectral features)) containing random binary digits is created, which forms the initial population. A fitness function evaluates the discriminative capacity of the population, made by chromosomes, each selecting a subset of features. In this work, the loss obtained by cross-validated SVM (Support Vector Machine) classification model is used. Individuals are ranked, based on the values reported by the fitness function; then, the Elite kids with the best fitness values, are selected to survive and are, hence, transferred to the next generation. The selection operation provides individuals for genetic cross-over and mutation; it ensures that the population is being constantly improved. Tournament Selection was used here due to its simplicity, speed and efficiency. Cross-over consists on combining two parent individuals to form children in the new generation. XOR operation is performed in this case since parent chromosomes are binary [28]. The number of new children produced due to the cross-over operator, is defined based on the cross-over fraction. Mutation is another genetic operator and induces a perturbation of chromosomes by applying a bit flipping procedure depending on the mutation probability. Mutation ensures genetic diversity, eliminating premature convergence. A uniform mutation is applied in this study. The number of new children produced due to the mutation operator, is defined by subtracting the population size from the number of elite children and the number of children obtained by cross-over. Each new generation, formed by GA, contains individuals from Elite kids, crossover kids and mutation kids [29]. The new population is evaluated again and the GA continues to evolve until the stopping condition is met. Two stopping conditions are applied in this study: Maximum Number of Generations and Stall Generation Limit. GA terminates if the average changes in the fitness values among the chromosomes over Stall Generation Limit generations is less than or equal to tolerance  Human genetics vocabulary is often used in GA, chromosomes are the bit strings (individuals that form the population), gene is the feature [27]. In this study, a binary space is assumed: a gene value "1" indicates that the feature indexed by the "1" is chosen. Contrarily, (i.e., if it is 0), the feature is not chosen for evaluation. At the beginning, a matrix of dimension (Population size (300 samples) x Number of wavelengths (633 spectral features)) containing random binary digits is created, which forms the initial population. A fitness function evaluates the discriminative capacity of the population, made by chromosomes, each selecting a subset of features. In this work, the loss obtained by cross-validated SVM (Support Vector Machine) classification model is used. Individuals are ranked, based on the values reported by the fitness function; then, the Elite kids with the best fitness values, are selected to survive and are, hence, transferred to the next generation. The selection operation provides individuals for genetic cross-over and mutation; it ensures that the population is being constantly improved. Tournament Selection was used here due to its simplicity, speed and efficiency. Cross-over consists on combining two parent individuals to form children in the new generation. XOR operation is performed in this case since parent chromosomes are binary [28]. The number of new children produced due to the cross-over operator, is defined based on the cross-over fraction. Mutation is another genetic operator and induces a perturbation of chromosomes by applying a bit flipping procedure depending on the mutation probability. Mutation ensures genetic diversity, eliminating premature convergence. A uniform mutation is applied in this study. The number of new children produced due to the mutation operator, is defined by subtracting the population size from the number of elite children and the number of children obtained by cross-over. Each new generation, formed by GA, contains individuals from Elite kids, crossover kids and mutation kids [29]. The new population is evaluated again and the GA continues to evolve until the stopping condition is met. Two stopping conditions are applied in this study: Maximum Number of Generations and Stall Generation Limit. GA terminates if the average changes in the fitness values among the chromosomes over Stall Generation Limit generations is less than or equal to tolerance Human genetics vocabulary is often used in GA, chromosomes are the bit strings (individuals that form the population), gene is the feature [27]. In this study, a binary space is assumed: a gene value "1" indicates that the feature indexed by the "1" is chosen. Contrarily, (i.e., if it is 0), the feature is not chosen for evaluation. At the beginning, a matrix of dimension (Population size (300 samples) × Number of wavelengths (633 spectral features)) containing random binary digits is created, which forms the initial population. A fitness function evaluates the discriminative capacity of the population, made by chromosomes, each selecting a subset of features. In this work, the loss obtained by cross-validated SVM (Support Vector Machine) classification model is used. Individuals are ranked, based on the values reported by the fitness function; then, the Elite kids with the best fitness values, are selected to survive and are, hence, transferred to the next generation. The selection operation provides individuals for genetic cross-over and mutation; it ensures that the population is being constantly improved. Tournament Selection was used here due to its simplicity, speed and efficiency. Cross-over consists on combining two parent individuals to form children in the new generation. XOR operation is performed in this case since parent chromosomes are binary [28]. The number of new children produced due to the cross-over operator, is defined based on the cross-over fraction. Mutation is another genetic operator and induces a perturbation of chromosomes by applying a bit flipping procedure depending on the mutation probability. Mutation ensures genetic diversity, eliminating premature convergence. A uniform mutation is applied in this study. The number of new children produced due to the mutation operator, is defined by subtracting the population size from the number of elite children and the number of children obtained by cross-over. Each new generation, formed by GA, contains individuals from Elite kids, crossover kids and mutation kids [29]. The new population is evaluated again and the GA continues to evolve until the stopping condition is met. Two stopping conditions are applied in this study: Maximum Number of Generations and Stall Generation Limit. GA terminates if the average changes in the fitness values among the chromosomes over Stall Generation Limit generations is less than or equal to tolerance function. The goal is to insure genetic homogeneity. All the GA parameters used in our study are described in Table 1. When GA terminates, one individual is chosen providing the convergence. This individual contains the optimal features, it is a binary set with "1" meaning that the feature at this specific index is considered. Since the initial population is randomly created, the number of selected wavelengths by the GA tool cannot be predicted and is function of the data, in fact, the GA keeps evolving until convergence and the number of features might be big. In order to reduce computational cost, we averaged the selected wavelengths chosen by GA to obtain only 8 wavelengths representative of the set ( Figure 7). However, this feature averaging step is optional and all wavelengths selected by GA can be used in the feature combination step.
function. The goal is to insure genetic homogeneity. All the GA parameters used in our study are described in Table 1. When GA terminates, one individual is chosen providing the convergence. This individual contains the optimal features, it is a binary set with "1" meaning that the feature at this specific index is considered. Since the initial population is randomly created, the number of selected wavelengths by the GA tool cannot be predicted and is function of the data, in fact, the GA keeps evolving until convergence and the number of features might be big. In order to reduce computational cost, we averaged the selected wavelengths chosen by GA to obtain only 8 wavelengths representative of the set ( Figure 7). However, this feature averaging step is optional and all wavelengths selected by GA can be used in the feature combination step. The indices to be developed aim at identifying a specific plant disease. Thus, a combination of a single wavelength and a normalized wavelength difference seemed suitable. A weighting factor for the single wavelength was determined and the possible weights were: −1, −0.5, 0.5 and 1. An exhaustive search of the best SDI is undertaken, combinations of an individual wavelength and a normalized wavelength difference are tested. Each combination of 3 wavelengths and a weighting factor forms an index (Equation (1)). When feature averaging is applied: 8 wavelengths × 7 wavelengths × 6 wavelengths × 4 weighting factors = 1344 possible combinations or SDIs were tested. The ideal case would be, again, to consider directly the wavelengths selected by GA with no averaging and evaluate all possible combinations. The indices were assessed for their classification ability using a 10-fold cross validation SVM model and the configuration providing the best classification precision is retained, this optimal configuration is the best SDI. The indices to be developed aim at identifying a specific plant disease. Thus, a combination of a single wavelength and a normalized wavelength difference seemed suitable. A weighting factor for the single wavelength was determined and the possible weights were: −1, −0.5, 0.5 and 1. An exhaustive search of the best SDI is undertaken, combinations of an individual wavelength and a normalized wavelength difference are tested. Each combination of 3 wavelengths and a weighting factor forms an index (Equation (1)). When feature averaging is applied: 8 wavelengths × 7 wavelengths × 6 wavelengths × 4 weighting factors = 1344 possible combinations or SDIs were tested. The ideal case would be, again, to consider directly the wavelengths selected by GA with no averaging and evaluate all possible combinations. The indices were assessed for their classification ability using a 10-fold cross validation SVM model and the configuration providing the best classification precision is retained, this optimal configuration is the best SDI.
where a, c, d are wavelengths chosen from the pool of the 8 best averaged wavelengths (a = c = d) and b is the weighting factor.

Common Spectral Vegetation Indices computation (SVIs)
In the context of vegetation status monitoring, identifying a specific disease or stress, can be done using spectral reflectance measurements. Discrimination between healthy and infected plants is performed based on some optimal wavelengths or a combination of wavelengths. The principal aim of SVIs is to highlight a certain property of the vegetation; they are combinations of reflectance at 2 or many wavelengths. Several vegetation indices have been proposed in the scientific literature, most of them relate the physiological status of crop to hyperspectral data through their correlation to biochemical constituents (chlorophyll, carotenoids, water, cellulose, lignin, dry matter . . . ). Pigment-specific vegetation indices are, currently, an effective data analysis tool for disease discrimination. The ability to identify FD with vegetation indices, found in the literature, was tested in this segment. The classification accuracies of the NDVI, the PRI, the ARI, the SIPI, the mCAI, the PSSRa, PSSRb and PSSRc, the GM1 and GM2, the ZTM and the TCARI/OSAVI were compared to those obtained by SDIs ( Table 2).
PRI index is a function of the reflectance at the 531 nm, this reflectance is related to xanthophyll. When the xanthophyll activity is high, the light use efficiency is low, meaning a possible stress occurred.

Classification
There are hundreds of classifiers in the literature and it is often difficult for researchers to choose an appropriate classifier for a certain application. The easiest approach that is used to address this issue is to try several classifiers and select the one having the highest accuracy. In this work, we selected only one classifier, the Support Vector Machines since it is one of the most widely used classifiers in the field and gave good performance in several applications [50,51]. SVM is a supervised machine learning algorithm, mostly used to solve classification problems. It consists on defining a boundary (line/hyperplane) that best separates two classes [52]. The closest points to the boundary are called support vectors, the margin is the perpendicular distance calculated from the boundary to the support vectors. A maximal-margin classifier defines a hyperplane separating two classes and having the largest margin. However, a soft-margin classifier allows points to lie between the margins or on the wrong side of the plane. It is usually used when classes are not fully separable.
In practice, SVM are implemented using kernels. When applying non-linear Kernels (polynomial or radial), non-linear boundaries are created and the accuracy improves. Due to its flexibility, the Radial Basis Function (RBF) kernel is however the most used, so we employed it also in our study. One of the most known methods for fitting SVM is the Sequential Minimal Optimization (SMO) method. The concept and the applications of SVM are discussed in detail in [53].

Data Configuration
In our study, we employed a binary classification involving only 2 classes: we considered the healthy group vs. the diseased group in total (medium infested measurements from the August acquisition campaign + high infested measurements from the September acquisition campaign). Since there are four grapevine varieties tested in this study (Marselan, Grenache, Vermentino and Chardonnay), it is possible to analyze the measurements of each variety alone, or measurements can be combined. Based on the grapevine color, we can analyze red types and white types; it is also feasible to combine all leaf measurements together (Figure 8).
There are hundreds of classifiers in the literature and it is often difficult for researchers to choose an appropriate classifier for a certain application. The easiest approach that is used to address this issue is to try several classifiers and select the one having the highest accuracy. In this work, we selected only one classifier, the Support Vector Machines since it is one of the most widely used classifiers in the field and gave good performance in several applications [50,51].

Support Vector Machines (SVM)
SVM is a supervised machine learning algorithm, mostly used to solve classification problems. It consists on defining a boundary (line/hyperplane) that best separates two classes [52]. The closest points to the boundary are called support vectors, the margin is the perpendicular distance calculated from the boundary to the support vectors. A maximal-margin classifier defines a hyperplane separating two classes and having the largest margin. However, a soft-margin classifier allows points to lie between the margins or on the wrong side of the plane. It is usually used when classes are not fully separable.
In practice, SVM are implemented using kernels. When applying non-linear Kernels (polynomial or radial), non-linear boundaries are created and the accuracy improves. Due to its flexibility, the Radial Basis Function (RBF) kernel is however the most used, so we employed it also in our study. One of the most known methods for fitting SVM is the Sequential Minimal Optimization (SMO) method. The concept and the applications of SVM are discussed in detail in [53].

Data Configuration
In our study, we employed a binary classification involving only 2 classes: we considered the healthy group vs. the diseased group in total (medium infested measurements from the August acquisition campaign + high infested measurements from the September acquisition campaign). Since there are four grapevine varieties tested in this study (Marselan, Grenache, Vermentino and Chardonnay), it is possible to analyze the measurements of each variety alone, or measurements can be combined. Based on the grapevine color, we can analyze red types and white types; it is also feasible to combine all leaf measurements together (Figure 8).

Results
Spectroscopic and imaging techniques have demonstrated good potential in detecting disease and stress in crops. Currently, researchers tend to apply spectral vegetation indices (SVIs) to identify different plant diseases.

Results
Spectroscopic and imaging techniques have demonstrated good potential in detecting disease and stress in crops. Currently, researchers tend to apply spectral vegetation indices (SVIs) to identify different plant diseases.

Reflectance Spectra of Diseased Grapevine Leaves
When comparing spectral signatures of healthy and infected red/white berried leaves in Figure 9, obvious differences can be depicted, suggesting that the spectral response was affected by the infestation. For the Marselan variety (a red-berried variety), the healthy spectra were higher than the infested ones in the visible (VIS) region (mainly between 500-700 nm) but the opposite occurred in the region NIR (800-1300 nm) and in the IR region (>1300 nm). It seems like when the infestation arises, the spectral signature is lower in the VIS region and higher in the NIR-IR region, the same trend was also observed for the Grenache type (data not shown here). On the other hand, for the Chardonnay variety (a white berried variety), the healthy spectra were lower than the infested ones in the VIS region (mainly between 500-700 nm) but the opposite occurred in the region NIR (800-1300 nm) and in the IR region (>1300 nm). It seems like when the infestation occurs, the spectral signature is higher in the VIS region and lower in the NIR-IR region, the same trend was also observed for the Vermentino type (data not shown here). These changes prove that the spectral signature depends on the pathogen-host interaction. In other words, the grapevine variety does not show the same pattern when the same infestation occurs.
The mean value from a 10-fold cross-validation was reported for the classification; in this manner, all the data were taken into account and variances between different experiments under similar conditions were considered.
The model accuracy defined the percentage of testing set samples correctly classified and the False Negative Rate (FNR) defines the percentage of negative results that are, in fact, positive; in contrast, False Positive Rate (FPR) defines the percentage of positive results that are, in fact, negative. When plotting on a single graph, the FPR values on the abscissa and the TPR values on the ordinate, the resulting curve is called ROC (Receiver Operating Characteristic) curve, AUC (Area Under Curve) refers to the area under the curve. The advantage of using dimension reduction techniques based on GA will be demonstrated next.
the region NIR (800-1300 nm) and in the IR region (>1300 nm). It seems like when the infestation arises, the spectral signature is lower in the VIS region and higher in the NIR-IR region, the same trend was also observed for the Grenache type (data not shown here). On the other hand, for the Chardonnay variety (a white berried variety), the healthy spectra were lower than the infested ones in the VIS region (mainly between 500-700 nm) but the opposite occurred in the region NIR (800-1300 nm) and in the IR region (>1300 nm). It seems like when the infestation occurs, the spectral signature is higher in the VIS region and lower in the NIR-IR region, the same trend was also observed for the Vermentino type (data not shown here). These changes prove that the spectral signature depends on the pathogen-host interaction. In other words, the grapevine variety does not show the same pattern when the same infestation occurs.
The mean value from a 10-fold cross-validation was reported for the classification; in this manner, all the data were taken into account and variances between different experiments under similar conditions were considered.
The model accuracy defined the percentage of testing set samples correctly classified and the False Negative Rate (FNR) defines the percentage of negative results that are, in fact, positive; in contrast, False Positive Rate (FPR) defines the percentage of positive results that are, in fact, negative. When plotting on a single graph, the FPR values on the abscissa and the TPR values on the ordinate, the resulting curve is called ROC (Receiver Operating Characteristic) curve, AUC (Area Under Curve) refers to the area under the curve. The advantage of using dimension reduction techniques based on GA will be demonstrated next.

No Dimension Reduction, Use of Complete Spectral Data
In this section, all spectral data are considered (400-2100 nm) in the analysis; this means that no dimension reduction method is applied in this case. Table 3 presents the result of using complete spectra measurements from August (slightly infected leaves). The best classification accuracy is for Vermentino (93.75%) and the worst is for Marselan variety (70.97%). The Grenache and Chardonnay measurements gave similar precision (90.63%). What can be critical in disease diagnosis is probably the FNR, which means that a diseased case was claimed to be healthy. In general, the lower the FNR, the better the classifier is. Here, the best FNR was also for the Vermentino (6.67%). When considering combined measurements

No Dimension Reduction, Use of Complete Spectral Data
In this section, all spectral data are considered (400-2100 nm) in the analysis; this means that no dimension reduction method is applied in this case. Table 3 presents the result of using complete spectra measurements from August (slightly infected leaves). The best classification accuracy is for Vermentino (93.75%) and the worst is for Marselan variety (70.97%). The Grenache and Chardonnay measurements gave similar precision (90.63%). What can be critical in disease diagnosis is probably the FNR, which means that a diseased case was claimed to be healthy. In general, the lower the FNR, the better the classifier is. Here, the best FNR was also for the Vermentino (6.67%). When considering combined measurements depending on the color of the grapevines, the observations of the White measurements were better than Red ones (92.19% > 87.3%). Table 4 presents the result of using complete spectra measurements from September (highly infected leaves). The accuracy from September, in general, is better than that of August for all kinds of measurements. This seems logical, since symptoms at the end of the season become well established, diseased spectral reflectance are more influenced by the disease and can be more easily discriminated from healthy ones. White berried grapevines performed better that red-ones when each grapevine type is considered alone or combined (Marselan-Grenache 94.79-95.06% vs. Vermentino-Chardonnay 98.18-97.73%; Red 96.61% < White 98.99%). Furthermore, for White-berried leaves no FPR was reported. The hardest classification scenario is when all measurements were combined because observations from all grapevine types having different characteristics were put together. However, we obtained a satisfying SVM precision (96.01%) and a good AUC (0.99). Table 3. Results of using complete spectra in classifying different groups of spectral data acquired in the August acquisition campaign (Severity of infestation = 1).  Table 5 presents the result of using complete spectra measurements from August in addition to those from September (slightly + highly infected leaves). Here the accuracy in general was better than considering moderately infected leaves (from August) but was less than using only highly infected leaves for the analysis (from September). The classification's accuracy was above 92% for all cases: the best was for Chardonnay (97.37%), no FPR was found for this variety. When combining measurements was applied, similar results were found for Red, White and All configurations (around 95-96% of accuracy). Table 5. Results of using complete spectra in classifying different groups of spectral data acquired in the August and September acquisition campaigns (Severity of infestation = 1 & 2).

Dimension Reduction Using Vegetation Indices (SVIs)
In this section, the results of the classification using the common SVIs are presented. Only the best SVIs will be presented next, for more details refer to Tables A1-A3. Table 6 presents the result of calculating the best traditional SVI from August measurements (slightly infected leaves). The classification accuracies were satisfying (>90%) and they were more advantageous than using the complete spectra. No FNR was reported for Grenache leaves. The best SVIs were ARI, ZTM, TCARI/OSAVI. ARI is convenient for Red-grapevine varieties when considered individually or combined. ZTM behaved well for Vermentino and Chardonnay but when combined, TCARI/OSAVI performed better. This index was also robust when all observations are considered together (92.13%). Table 6. Results of using the best SVIs in classifying different groups of spectral data acquired in the August acquisition campaign (Severity of infestation = 1).  Table 7 presents the result of calculating the best traditional SVI from September measurements (highly infected leaves). In this case, all accuracies were enhanced with respect to those of August (>94%). When compared to using complete spectra, SVIs used less wavebands and gave better accuracies except for the case of mixing all measurements together (96.01% > 94.02%). The best results were associated with White-berried signatures and no FNR were found (97-98%). ARI, ZTM, GM1 and mCAI accomplished best precisions. Similar to the first acquisition campaign, the index ARI was interesting for the red-berried signatures. Moreover, ZTM was chosen for Vermentino and Chardonnay. However, when combined, GM1 was selected. mCAI was the most robust index when All measurements are mixed together (94.02%). Table 7. Results of using the best SVIs in classifying different groups of spectral data acquired in the September acquisition campaign (Severity of infestation = 2).  Table 8 presents the result of calculating the best traditional SVI from August measurements in addition to those from September (slightly + highly infected leaves). The performance was less than using only spectra with well-established disease marks from September. When compared to using complete spectra, Marselan, White-berried data were better or very similar to using complete spectra. However, for Grenache and mixed data the use of all wavelengths was more accurate. The best SVIs were ARI, GM1, ZTM and mCAI. ARI was again chosen to be the best SVI for classifying White grape leaves and was also selected when red grape leaves reflectance was tested (93.23%). GM1 was found to be interesting for Vermentino but ZTM was more convenient for Chardonnay. mCAI, like the above case, was the most robust index when all measurements are mixed together (88.41%).

Dimension Reduction Using Spectral Disease Indices (SDIs)
The discriminatory capacity of the best single wavelengths and wavelength differences chosen by the GA were tested. This data reduction procedure was the foundation for spectral disease index development. In this section, the results of the classification using the SDIs are presented. Table 9 presents the result of calculating SDIs from August measurements (slightly infected leaves). A 100% success with no FNR in classifying individual grapevine measurements was obtained, except for Chardonnay. When combining observations, the results were also satisfying (precision > 94.44%). In general, better percentage was reached when applying the SDIs than using the complete spectra on one hand and applying conventional SVIs on the other hand. Table 9. Results of using SDIs in classifying different groups of spectral data acquired in the August acquisition campaign (Severity of infestation = 1).  Table 10 presents the result of calculating SDIs from September measurements (highly infected leaves). A 100% success with no FNR in classifying leaves measurements each variety at a time and when considering White-reflectance spectra together was obtained. In general, these accuracies are better than those corresponding to the first acquisition campaign. However, when the red varieties are grouped together, it seems that the ARI was better than the SDI (96.6% < 98.31%). When observations were combined, using complete spectra gave a slightly better result than the SDI (96.01% > 94.20%) Table 10. Results of using SDIs in classifying different groups of spectral data acquired in the September acquisition campaign (Severity of infestation = 2).  Table 11 presents the result of calculating SDIs from August measurements in addition to those from September (slightly + highly infected leaves). No FPR was present for White-berried data. In general, these accuracies are better than those corresponding to the first acquisition campaign. However, in accordance with the last case, when the red varieties are grouped together, it seems that the ARI was better than the SDI (92.03% < 93.23%). For the White-berried measurements and all mixed ones, it seems that using complete spectra is a bit more advantageous than the SDI but with more computation burden. Table 11. Results of using SDIs in classifying different groups of spectral data acquired in the August and the September acquisition campaigns (Severity of infestation = 1 & 2).

Discussion
SDI indices were put in place in this article to improve and simplify FD disease detection in grapevines based on hyperspectral data. At the beginning, the most significant wavebands from the VIS, Red-edge, NIR or SWIR (Short-Wave Infrared) needed to be selected.
Feature selection is often used in data pre-processing to identify relevant features having significance in the classification task. The results obtained in this study, confirmed the effectiveness of the GA algorithm in improving the robustness of the feature selection procedure. In fact, GA was able to reach a global optimum despite local peaks that might be caused by noise or interdependencies in the data set. This conclusion was also confirmed in other studies in different fields. In [54] GA selected the best subset of features for breast cancer diagnosis system. Furthermore, in [55] GA feature selection algorithm was applied for hand writing recognition. The complexity of the feature set was reduced using less features and achieved recognition rates similar to those reached when no feature selection is applied.
After the choice of certain wavebands by the GA tool was made, the SDIs were normalized in order to reduce the impact of change in lighting, land, crop variety or sensor specific effects. This helped producing more robust and more generalized indices. SDIs were more advantageous than complete spectra and SVIs in the beginning of the season (August measurements), hence, great promise for early detection of diseases. Using complete spectra was better for the case of combined measurements (August + September) for the Grenache, red-varieties and all data. However, the proposed indices proved high accuracy in general with the advantage of reducing data dimensionality by speeding up the disease detection. SDIs gave, in general, higher accuracy than SVIs but, the ARI index performed a bit better in September measurements for red varieties and all combined data than the corresponding SDIs. The ARI index is documented as a performant feature in many studies. The study [56] concluded that ARI had a persistent response to yellow rust disease at 4 out of 5 growth stages and mentioned that the ARI index was selected for diagnosis of yellow rust in other studies like the one conducted by [57]. Among the indices investigated in the research made by [58], only the ARI index could differentiate healthy from rust infected leaves. However, it was not capable of distinguishing stem rust from leaf rust pustules. In addition to the ARI index, ZTM, GM1, mCAI and TCARI/OSAVI were found to be the best SVIs in this study. The ratio of the TCARI and the OSAVI indices formed a good Chlorophyll estimator, this was done independently of Leaf Area Index (LAI) and illumination state. The ratio demonstrated good results not only in continuous closed crop canopies [49] but also in open tree canopy orchards [59]. Authors in [58] found that the TCARI index, was the only index capable of discriminating stem and leaf rust, among all others. Chlorophyll content is a potential indicator of vegetation stress because of its direct role in the photosynthesis process of light harvesting, initiation of electron transport [46], this was confirmed in our study as the ZTM vegetation index was chosen for white-berried data. Loss of chlorophyll in response to infestation by sap feeding insects like aphids [60] and leafhoppers [61] has been reported earlier. GM1 was also selected in this study; in fact, differences in reflectance between healthy and stressed vegetation due to changes in Chl; a, b levels have been detected previously in the green peak and along the red-edge spectral region of 690-750 nm [62]. The CAI index indicates exposed surfaces containing dried plant material [63]. Absorptions in the 2000 nm to 2200 nm range are sensitive to cellulose. It was stated in [64] that the CAI index is useful to monitor vegetation coverage for biomass estimation.
Many studies tended to manage pest occurrence in commercially important agricultural crops by designing new and adapted vegetation indices. Research [65] monitored damage by green bugs in wheat by using a hyperspectral spectrometer and a digital camera. They designed 2 indices based on possible band combinations and their correlation with the severity damage. Optimal bands were 509, 537, 572, 719, 747, 873, 901 nm. The study detailed in [66], on the other hand, used 2 or 3 narrow bands to design hyperspectral indices in order to assess severity grades of leafhopper in cotton. Two indices gave better results than traditional SVIs from literature and were consistent across tested fields. Interesting bands were: 550, 691, 715, 761, 1124 nm. In the research of [67], two indices were proposed and found to be capable of estimating leaf rust disease. The difficulty was to detect early symptoms due to the resemblance between spectral signatures between lightly infected areas and healthy ones. For August data, the majority of the selected bands were found in the NIR region; however, for September data, VIS bands were mostly selected since symptoms became more visually pronounced. In this case, bands from blue (450-520 nm), green (530-570 nm), red (580-700 nm) in the VIS were selected. This was in accordance with [19] the 2 maximum differences in the VIS region appeared at the green peak (550 nm) and in the red peak (680 nm) indicating less chlorophyll absorption in the infected leaves. Furthermore, changes in Cab levels were translated as modifications occurring over the spectral red edge region, this explains why many optimal bands were selected in the specific range of 690-750 nm. Reflectance near 700 nm was pointed out by [68] as an essential feature of green vegetation produced by an equilibrium between biochemical and biophysical plant characteristics. Since plant diseases influence the chlorophyll content of crop plants, increased reflectance around 700 nm can be a first but unspecific indicator to detect diseased crops. Many chosen bands were also mixed with the water absorption bands; the research conducted in [69], demonstrated that the sensitivity to water content was greatest in spectral bands centered at 1450, 1940 where water has its major absorption features.
It can be concluded, as seen from the tables, that the SDIs were dependent on the disease infestation level and the grapevine variety considered; the best wavelengths selected were different from one case to another. As a consequence, although the SDIs tested gave good results, there was no single best index for FD in all situations. In fact, the sensitivity of an index differs depending on the soil, the vegetation and the weather conditions. Therefore, no single index with the same spectral bands was found to be applicable to quantify FD in this research. SDIs were found to be interesting for precision agricultural applications; additional work will be needed in order to apply SDIs in practice. The proposed indices need to be tested on different varieties of grapevines before it can be effectively applied in precision farming. Our study enhances the ability to detect and map FD when foliar symptoms are becoming visible, hence, further tests need to be carried on hosts which do not provide any clear symptoms of infestation, to check if the computed SDIs are capable of predicting early the FD occurrence. Besides, suggested SDIs need to be tested in changing environments, scales and field conditions; other types of diseases must be also taken into consideration to investigate whether SDIs are capable of distinguishing FD from various infestations in general and BN in particular. At the end, it was proven, through this study that the development of indices based on spectral variations due to vegetation diseases, is feasible.

Conclusions and Perspectives
Plants display the occurrence of infections in a number of ways. RS is an effective way to detect crop diseases, based on the fact that a pest modifies the photosynthesis phenomenon and the physical structure of the plant, altering the absorption of light by the plant's surface. The difference between spectral signatures of healthy and diseased plants can be the key to identify efficient wavelengths correlated with a specific disease. The transformation of reflectance into vegetation indices is a widely used technique to detect leaf contents (pigments, water, ...); nevertheless, these indices, based on only few wavelengths, showed potential for disease detection. Common vegetation indices are not yet capable of identifying a particular disease. In this article, a data analysis technique was exposed to design specific grapevine disease indices.
Our data set contains a large number of features, in order to reduce the cost and running time, as well as achieving an acceptably high recognition rate, we have selected the most useful ones by applying a GA feature selection tool. GA is one of the most advanced techniques used in the field of predictive analysis, it is computationally expensive but it performs better than common selection techniques and has the advantage of manipulating large data sets with no need for specific knowledge about the problem under study. After being selected, wavelengths are then combined to design the SDI. Depending on the disease severity, on one hand and on the grapevine variety on the other hand, a combination of a single and normalized wavelengths is required each time to correctly identify the FD.
Our study extracted some wavelengths bands sensible to FD occurrence at the leaf-scale. Based on these findings, it might be possible to conceive a multispectral camera, for example and mount the sensor on a movable platform to localize infection foci in a field. However, when going from considering leaves to examining a complete branch, or maybe the whole grapevine, some corrections need to be taken into account. Geometric and radiometric improvements capable to solve shadowing problems, branch structures and interfering reflectance from other surrounding objects are necessary. When applying a multispectral sensor, we will not only have spectral information but also spatial data. Available spatial data will enable adding advanced image processing algorithms to make the detection of FD more robust. Actually, two other FD symptoms cannot be detected spectrally, so, in order to better conclude the presence of the condition, additional pattern recognition algorithms can be integrated directly in the sensor to detect the absence of lignification and the berry mortality.
In accordance with other studies, we found that SDIs performed better than traditional SVIs. The advantages of using SDIs include the dimensionality reduction and the efficiency of computation and processing. The proposed method for SDIs development, in this article, can be transferred to hyperspectral data from different kinds of sensors, it can be used for other crops varieties and for different kinds of diseases or biotic and abiotic stress of crops.

Acknowledgments:
We would like to thank Alice Dubois and Sylvain Bernard from the Regional Federation of Defense against Pests of Provence Alpes Côtes Azur, Corinne Trarieux and Jocelyn Dureuil from the Interprofessional Office of Burgundy Wine for their expertise. We also would like to thank the funders of the DAMAV project (Automatic Detection of Grapevine Diseases) and of course the winegrowers for their cooperation.
Author Contributions: Hania AL-Saddik acquired the data with the help of Jean-Claude Simon; Hania AL-Saddik analyzed the data with the help of Frederic Cointault; Hania AL-Saddik wrote the paper, then Jean-Claude Simon and Frederic Cointault corrected it.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: