Analysis of New RGB Vegetation Indices for PHYVV and TMV Identification in Jalapeño Pepper (Capsicum annuum) Leaves Using CNNs-Based Model

Recently, deep-learning techniques have become the foundations for many breakthroughs in the automated identification of plant diseases. In the agricultural sector, many recent visual-computer approaches use deep-learning models. In this approach, a novel predictive analytics methodology to identify Tobacco Mosaic Virus (TMV) and Pepper Huasteco Yellow Vein Virus (PHYVV) visual symptoms on Jalapeño pepper (Capsicum annuum L.) leaves by using image-processing and deep-learning classification models is presented. The proposed image-processing approach is based on the utilization of Normalized Red-Blue Vegetation Index (NRBVI) and Normalized Green-Blue Vegetation Index (NGBVI) as new RGB-based vegetation indices, and its subsequent Jet pallet colored version NRBVI-Jet NGBVI-Jet as pre-processing algorithms. Furthermore, four standard pre-trained deep-learning architectures, Visual Geometry Group-16 (VGG-16), Xception, Inception v3, and MobileNet v2, were implemented for classification purposes. The objective of this methodology was to find the most accurate combination of vegetation index pre-processing algorithms and pre-trained deep- learning classification models. Transfer learning was applied to fine tune the pre-trained deep- learning models and data augmentation was also applied to prevent the models from overfitting. The performance of the models was evaluated using Top-1 accuracy, precision, recall, and F1-score using test data. The results showed that the best model was an Xception-based model that uses the NGBVI dataset. This model reached an average Top-1 test accuracy of 98.3%. A complete analysis of the different vegetation index representations using models based on deep-learning architectures is presented along with the study of the learning curves of these deep-learning models during the training phase.


Introduction
Agriculture is facing harder yield scenarios every year, and this is due to factors such as abiotic and biotic stress conditions. On the one hand, abiotic stress includes physical problems such as non-favorable environment conditions and chemical stress due to toxic conditions which may be present in soil, air, or irrigation water [1]. On the other hand, biotic stress conditions are divided into visible pests and microorganisms that lead to plant diseases [2]. Visible pests are mainly insects, but also rodents and any kind of animals which can result in harm to the crop [3]. However, microorganisms are the main concern regarding agricultural losses which account for around 20-40% of yearly crop yields worldwide [2]. model to identify 10 common rice diseases. The authors used a dataset of 500 natural images of diseased and healthy rice leaves and stems captured from an experimental rice field. The proposed model achieved an accuracy of 95.48%. In 2018, Barbedo [19] studied the effects of using relatively small datasets on the effectiveness of deep-learning tools for plant disease classification. The experiments were carried out using an image database containing 12 plant species, each presenting very different characteristics in terms of number of samples, number of diseases, and variety of conditions. In 2019, Thenmozhi and Srinivasulu [20] proposed an efficient deep CNN model to classify insect species on three insect datasets. The National Bureau of Agricultural Insect Resources (NBAIR) dataset consists of 40 classes of field crop insect images, the Xie1 and the Xie2 datasets contain 24 and 40 classes of insects, respectively. The proposed model was evaluated and compared with pre-trained deep-learning models based on AlexNet, ResNet, GoogLeNet, and VGGNet for insect classification. Transfer learning was applied to fine tune the pretrained models. Data augmentation techniques such as reflection, scaling, rotation, and translation were also applied to prevent models from overfitting. The highest classification accuracies of 96.75%, 97.47%, and 95.97% were achieved in the proposed CNN model for NBAIR, Xie1, and Xie2 insect datasets, respectively. Picon et al. [21] proposed a cropconditional CNN architecture that seamlessly incorporates contextual meta-data consisting of the plant species identification. To validate this approach, the authors generated a challenging dataset that consists of more than 100 images taken by cell phone in real field conditions. This dataset contains almost equally distributed disease stages of 17 diseases and 5 crops (wheat, barley, corn, rice, and rapeseed). The proposed crop conditional CNN model obtained an average balanced accuracy of 0.98 and removed 71% of the classifier errors. In 2020, Barman et al. [22] presented a comparison of CNNs for smartphone image citrus leaf disease. The MobileNet CNN and Self-Structured CNN architectures were trained and tested on the same citrus dataset. The best training and validation accuracies obtained for Self-Structured CNN were 98% and 99% at epoch 12. Kang and Chen [23] developed a fast implementation framework for deep-learning-based fruit detection. This framework includes a label generation module and a fruit detector LedNet. The LedNet model with a lightweight backbone achieved 0.821 and 0.853 on recall and accuracy of the apple detection. Alves et al. [24] proposed a new deep-residual-learning model called ResNet34 to address cotton pest recognition issues using a field-based image database of 1600 images balanced for 15 pests and 1 with no insects. The classification model achieved a high accuracy with F1-score of 0.98.
The novelty of this project is the development of a new predictive analytics methodology to identify Tobacco Mosaic Virus (TMV) and Pepper Huasteco Yellow Vein Virus (PHYVV) visual symptoms on Jalapeño pepper (Capsicum annuum) leaves by means of new RGB vegetation indices as pre-processing algorithms and pre-trained deep-learning models based on VGG-16, Xception, Inception v3, and MobileNet v2 architectures. The objective of this methodology is to find the most accurate combination of an RGB vegetation index and a deep-learning classification model to identify PHYVV and TMV in Jalapeño pepper plants by using a small training dataset. Furthermore, it presents measurements of how identification accuracy can be improved by pre-processing images with new RGB vegetation indices and its usefulness for future research.

Results
All the experiments were carried out using Keras version 2.2.4, with Tensorflow 1.13.1 as backend, and Python version 3.7.8. Experiments were conducted on a PC with the following configuration: Intel Xeon W-2133 processors, 32 GB of RAM, and one NVIDIA GTX 1080 GPU.

Image Processing Results
The results obtained from the image processing are shown in Figure 1. It can be observed that image representations were organized as columns and dataset classes as rows. Consequently, an image results matrix is established to compare between different results where it is easy to observe that NRBVI and NGBVI indices are effective in highlighting chlorotic pixels in all the classes in their grayscale versions. It is also noteworthy that Jet colored versions of NRBVI and NGBVI are even more useful in identifying chlorotic areas inside the analyzed leaf. Furthermore, it is important to mention that comparing between rows, each class shows different patterns that need to be classified by an AI-based algorithm to finally have an efficient identification of infected plants.
the following configuration: Intel Xeon W-2133 processors, 32 GB of RAM, and one NVIDIA GTX 1080 GPU.

Image Processing Results
The results obtained from the image processing are shown in Figure 1. It can be observed that image representations were organized as columns and dataset classes as rows. Consequently, an image results matrix is established to compare between different results where it is easy to observe that NRBVI and NGBVI indices are effective in highlighting chlorotic pixels in all the classes in their grayscale versions. It is also noteworthy that Jet colored versions of NRBVI and NGBVI are even more useful in identifying chlorotic areas inside the analyzed leaf. Furthermore, it is important to mention that comparing between rows, each class shows different patterns that need to be classified by an AI-based algorithm to finally have an efficient identification of infected plants.

Data Description and CNN Architectures Parameters
In this investigation, the experiments were conducted on five different datasets. One dataset consisted of the RGB images, whereas the other four were built applying each of the pre-processing algorithms previously described, that is, NRBVI, NRBVI-Jet, NGBVI, and NGBVI-Jet. Within each dataset there are three classes. These classes are labeled as healthy (non-infected leaves), PHYVV (PHYVV-infected leaves), and TMV (TMV-infected leaves). Each dataset has 100 images for each class, for a total of 300 images. Each dataset was divided into 80% (240 images) for training data and 20% (60 images, 20 images per class) for testing data. During the training phase, 20% of the validation data was randomly selected from the 240 images. These validation data are used to manually adjust hyper-

Data Description and CNN Architectures Parameters
In this investigation, the experiments were conducted on five different datasets. One dataset consisted of the RGB images, whereas the other four were built applying each of the pre-processing algorithms previously described, that is, NRBVI, NRBVI-Jet, NGBVI, and NGBVI-Jet. Within each dataset there are three classes. These classes are labeled as healthy (non-infected leaves), PHYVV (PHYVV-infected leaves), and TMV (TMV-infected leaves). Each dataset has 100 images for each class, for a total of 300 images. Each dataset was divided into 80% (240 images) for training data and 20% (60 images, 20 images per class) for testing data. During the training phase, 20% of the validation data was randomly selected from the 240 images. These validation data are used to manually adjust hyper-parameters, which are essentially the settings that cannot be automatically learned during training. These include the learning rate and the batch size, among others.
The models were trained using the Stochastic Gradient Descent (SGD) as an optimizer with 0.9 momentum, and the learning rate was varied between 1e-3 and 1e-4. The learning rate defines the learning progress of the proposed model and updates the weight parameters to reduce the loss function of the network. The maximum number of epochs was set to

Experimental Environment
Two strategies were followed for training the models used in these experimental procedures. First, the models were trained from scratch. That is, all trainable parameters in the model start with a randomly assigned value, and the second value is transferred from learning. That is, all trainable parameters start with a value that was assigned while training the model on a different dataset, in this case, transfer learning from pre-trained models on ImageNet dataset was used.
As opposed to with the traditional machine learning techniques, CNNs-based models can learn the features needed to discriminate between the classes directly from the original images instead of extracting the specific features manually. It is well known that training deep CNNs-based models requires many images. In this approach, the dataset is limited to about 300 images for each vegetation index representation (in each individual dataset). To address this issue, data augmentation was used to obtain a variety of conditions in the dataset. The input images were randomly rotated between 0 and 360 degrees, randomly translated along the X or Y direction (or both directions), and randomly flipped vertically or horizontally during training.
The training procedure was as follows. For each dataset and CNN architecture, a total of 10 trials were carried out for each model from scratch and 10 trials were carried out from transfer learning, with a random selection of 20% for validation data.
To evaluate the performance of the different models, the following metrics were computed: 1.
The average, maximum, and minimum Top-1 accuracies for each model. Top-1 accuracy measures the number of times the answer with the highest probability given by the models matches the expected answer. It is presented as the ratio of the number of correct answers to the total number of answers.

2.
The precision of each class. This is the ratio of the number of correctly predicted positive instances to the total number of predicted positive instances, see Equation (1). An associated term, macro-precision, is also used and it measures the average precision per class. 3.
The recall of each class that is the ratio of the number of correctly predicted positive instances to the total number of instances in a class, see Equation (2). Macro-recall measures the average recall per class.
4. F1-score is the weighted average of precision and recall, see Equation (3). Macro-F1score measures the average F1-score per class. where TP (true positive), FP (false positive), TN (true negative), and FN (false negative) are technical terms for binary classifiers. Specifically, TP is the positive samples correctly classified, FP is the positive samples misclassified, TN is the negative samples correctly classified, and FN represents the negative samples misclassified.

Models from Scratch
In this first experimental configuration. A total of 10 models were built from scratch for each dataset and CNN architecture. The results of the test data indicate a slightly better performance from models that used the NRBVI-Jet dataset compared to those that used other datasets. For the VGG-16-based model, the average Top-1 accuracy was 73.3%, for the Xception-based model it was 83.3%, for the Inception-based model it was 82.3%, and for the MobileNet-based model it was 77.5%. Table 2 summarizes the performance of the CNNs-based models. Based on previous results, it was noted that all models based on VGG-16 and MobileNet reached similar average Top-1 accuracies. The models based on Xception and Inception that used the NRBVI-Jet dataset have a slightly better performance than others that used different datasets; the difference in models performance ranged from 0.4% to 8.3%. Table 3 reports metric values of precision, recall, and F1-score from models that used the NRBVI-Jet dataset. As can be seen in Table 3, the Xception-based model reached better metrics values than the other models. It obtained 0.83 on Macro-precision, Macro-recall, and Macro-F1-score. Table 3. Classification report of precision, recall, and F1-score of the models that use NRBVI-Jet dataset. For the HEALTHY class, the VGG-16-based model obtained 0.74, 0.85, and 0.79 on precision, recall, and F1-score, respectively. The Xception-based model achieved 0.87, 1.0, and 0.93 on precision, recall, and F1-score, respectively. The Inception-based model obtained 0.8, 1.0, and 0.89 on precision, recall, and F1-score, respectively. The MobileNet-based model achieved 0.7, 0.95, and 0.81 on precision, recall, and F1-score, respectively.
Analysis of the accuracy during the training phase provides additional insight into how a particular model is performed. Figure 2 shows the learning curves of models that use the NRBVI-Jet dataset. For all models, an overfitting problem is clearly seen at early epochs. Moreover, the difference in the learning curves of the training and validation data was huge; it ranged from 12% to 20%. Therefore, the models reached overfitting. achieved 0.92, 0.6, and 0.73 on precision, recall, and F1-score, respectively. Table 3. Classification report of precision, recall, and F1-score of the models that use NRBVI-Jet dataset. Analysis of the accuracy during the training phase provides additional insight into how a particular model is performed. Figure 2 shows the learning curves of models that use the NRBVI-Jet dataset. For all models, an overfitting problem is clearly seen at early epochs. Moreover, the difference in the learning curves of the training and validation data was huge; it ranged from 12% to 20%. Therefore, the models reached overfitting.

Pre-Trained Models and Data Augmentation
In this experimental configuration, transfer learning from pre-trained models on the ImageNet dataset was employed. Based on the experimental and empirical evidence, it was found that a complete fine-tuning technique instead of a feature extractor technique was the best option. It starts with a pre-trained model as a starting point and then completely re-trains it on the target dataset. Data Augmentation was also used to prevent models from overfitting. Table 4 shows the test accuracy of the fine-tuned models on the different datasets. The best performance was obtained by the models that used the NGBVI dataset, an average Top-1 accuracy of 96.6% was achieved by the VGG-16-based model, an average Top-1 accuracy of 98.3% was obtained by the Xception-based model, and average Top-1 accuracies of 95% and 92.3% were obtained by Inception-based model and MobileNet-based model, respectively. The difference in model performance between the maximum and minimum values of the average Top-1 test accuracy was approximately 6.7%.  Table 5 details the performance metric values of precision, recall, and F1-score from models that used the NGBVI dataset for each class. For the HEALTHY class, all models obtained 1.0 on precision, recall, and F1-score. For PHYVV class, the VGG-16-based model obtained 0.91, 1.0, and 0.95 on precision, recall, and F1-score, respectively. The Xception-based model achieved 0.95, 1.0, and 0.98 on precision, recall, and F1-score, respectively. The Inception-based model obtained 0.87, 1.0, and 0.93 on precision, recall, and F1-score, respectively. The MobileNet-based model achieved 0.9, 0.95, and 0.93 on precision, recall, and F1-score, respectively.
For TMV class, the VGG-16-based model obtained 1.0, 0.9, and 0.95 on precision, recall, and F1-score, respectively. The Xception-based model achieved 1.0, 0.95, and 0.97 on precision, recall, and F1-score, respectively. The Inception-based model obtained 1.0, 0.85, and 0.92 on precision, recall, and F1-score, respectively. The MobileNet-based model achieved 0.95, 0.9, and 0.92 on precision, recall, and F1-score, respectively. Figure 3 shows the accuracy and loss curves obtained from models that used the NG-BVI dataset on the training and testing data during the training progress. It is clearly seen that the models based on Xception and Inception have a great performance improvement; as they have lower losses and higher accuracies.

Discussion
Plant viruses are a major threat to sustainable and productive agriculture, causing significant economic losses worldwide. Several studies on automated plant virus detection have been conducted using machine-learning techniques. One of the major machinelearning techniques is the convolutional neural network, a type of deep-learning neural network that has become a very successful automated classification technique for imagebased plant virus classification.
This study demonstrated that the proposed methodology is a powerful method for high-accuracy, automated PHYVV and TMV classification in Jalapeño pepper leaf images. This method avoids the complex and labor-intensive step of feature extraction (handcrafted feature extractors) from images employing CNNs-based models. Furthermore, using different RGB image vegetation indices enabled investigators to obtain more accurate CNNs-based classification models.

Discussion
Plant viruses are a major threat to sustainable and productive agriculture, causing significant economic losses worldwide. Several studies on automated plant virus detection have been conducted using machine-learning techniques. One of the major machinelearning techniques is the convolutional neural network, a type of deep-learning neural network that has become a very successful automated classification technique for imagebased plant virus classification.
This study demonstrated that the proposed methodology is a powerful method for high-accuracy, automated PHYVV and TMV classification in Jalapeño pepper leaf images. This method avoids the complex and labor-intensive step of feature extraction (handcrafted feature extractors) from images employing CNNs-based models. Furthermore, using different RGB image vegetation indices enabled investigators to obtain more accurate CNNs-based classification models.
Data representation plays a crucial role in the performance of CNNs-based models. In the first experimental configuration, the results of the test set showed that the most accurate model was the Xception-based model that used the NRBVI-Jet dataset. It reached an average Top-1 accuracy of 83.3%. The worst models were those that used NRBVI and NGBVI datasets. In the second experimental configuration, transfer learning from pre-trained models and data augmentation were used. The best average Top-1 accuracy was obtained by an Xception-based model that used the NGBVI dataset, with an average Top-1 accuracy of 98.3%. In this experimental configuration, all models significantly increased the average Top-1 test accuracies between 12-26% concerning the first experimental configuration of the models built from scratch. Therefore, it was concluded that transfer learning from pre-trained models and data augmentation helped CNNs-based models to reduce the overfitting and to reach high test accuracies, as shown in Table 4 and Figure 3; the difference between the accuracy curves of the training and the validation data are small in each trained model. This study therefore offers a promising avenue for virus classification using different RGB image vegetation indices along with CNNs-based models in relatively small image datasets. The source code of this article is publicly released and can be downloaded from https://github.com/jrmillan1983/PHYVV_TMV_CNN (Supplementary Materials).

Field Site
The experiment was carried out in two different small-sized greenhouses located in Queretaro, Mexico at 20 • 42 20.39 N 100 • Figure 4. The experiment at site A was carried out from February 2020 to June 2020 and site B was utilized as experiment site from July 2020 to October 2020. Both greenhouses are in a semi-arid region at an elevation of 1800 m with an average temperature of 25 • C with below-freezing winter temperatures. The plant substrate was made as a mixture of clay 55, sandy 25, silt 20, organic matter 1.81 and DAP 1.4 (g/cm 3 ). Furthermore, the irrigation system was a standard drop-based irrigation system with average agronomic practices for Jalapeño plants.
Data representation plays a crucial role in the performance of CNNs-based models. In the first experimental configuration, the results of the test set showed that the most accurate model was the Xception-based model that used the NRBVI-Jet dataset. It reached an average Top-1 accuracy of 83.3%. The worst models were those that used NRBVI and NGBVI datasets. In the second experimental configuration, transfer learning from pretrained models and data augmentation were used. The best average Top-1 accuracy was obtained by an Xception-based model that used the NGBVI dataset, with an average Top-1 accuracy of 98.3%. In this experimental configuration, all models significantly increased the average Top-1 test accuracies between 12-26% concerning the first experimental configuration of the models built from scratch. Therefore, it was concluded that transfer learning from pre-trained models and data augmentation helped CNNs-based models to reduce the overfitting and to reach high test accuracies, as shown in Table 4 and Figure 3; the difference between the accuracy curves of the training and the validation data are small in each trained model. This study therefore offers a promising avenue for virus classification using different RGB image vegetation indices along with CNNs-based models in relatively small image datasets. The source code of this article is publicly released and can be downloaded from https://github.com/jrmillan1983/PHYVV_TMV_CNN (Supplementary Materials).

Field Site
The experiment was carried out in two different small-sized greenhouses located in Queretaro, Mexico at 20°42′20.39″ N 100°15′33.81″W for site A and 20°42′23.4″ N 100°15′42.2″ W for site B and their exact location can be observed as a red circle within Figure 4. The experiment at site A was carried out from February 2020 to June 2020 and site B was utilized as experiment site from July 2020 to October 2020. Both greenhouses are in a semi-arid region at an elevation of 1800 m with an average temperature of 25 °C with below-freezing winter temperatures. The plant substrate was made as a mixture of clay 55, sandy 25, silt 20, organic matter 1.81 and DAP 1.4 (g/cm 3 ). Furthermore, the irrigation system was a standard drop-based irrigation system with average agronomic practices for Jalapeño plants.

Biological Materials
The plant material used was Capsicum annuum L. Type Jalapeño cv. Don Pancho from the National Institute of Agricultural and Livestock Forestry Research (INIFAP, in Spanish), with a purity of 98%. The plants were grown in two small greenhouses with an average temperature of 25 °C which can be observed in Figure 5. The PHYVV sample was

Biological Materials
The plant material used was Capsicum annuum L. Type Jalapeño cv. Don Pancho from the National Institute of Agricultural and Livestock Forestry Research (INIFAP, in Spanish), with a purity of 98%. The plants were grown in two small greenhouses with an average temperature of 25 • C which can be observed in Figure 5. The PHYVV sample was provided by Cinvestav, Irapuato, Mexico and the TMV samples were provided by UNAM-Iztacala, Mexico. In order to infect the plants, a bio-ballistic-based procedure was carried out to inoculate PHYVV into the plants [25]. With respect to TMV, plants of 6 to 8 leaves were abraded with wet carborundum (400 grit) and inoculated with 30 µg of TMV (25 µg/mL), by gently rubbing adaxial leaves' surfaces. Furthermore, RT-PCR confirmation tests were carried out to ensure the presence of the virus inside the plants according to the procedure described by Guevara-Olvera et al. [26].
Plants 2021, 10, 1977 11 of 18 provided by Cinvestav, Irapuato, Mexico and the TMV samples were provided by UNAM-Iztacala, Mexico. In order to infect the plants, a bio-ballistic-based procedure was carried out to inoculate PHYVV into the plants [25]. With respect to TMV, plants of 6 to 8 leaves were abraded with wet carborundum (400 grit) and inoculated with 30 µ g of TMV (25 µ g/mL), by gently rubbing adaxial leaves' surfaces. Furthermore, RT-PCR confirmation tests were carried out to ensure the presence of the virus inside the plants according to the procedure described by Guevara-Olvera et al. [26].

Image Dataset
The image dataset for this project was generated by taking photographs of Jalapeño pepper leaves using a RGB camera on a smartphone model Samsung Galaxy S7 edge at a 4032 × 3024 12 MP resolution with 4:3 aspect ratio at ambient light conditions, using no flash and a thick white paper background. It is noteworthy that to avoid being invasive the photographs were taken in situ without cutting the leaves. Furthermore, a dataset of plant leaves was generated by including 3 classes (healthy, PHYVV, and TMV) with 100 images within each class for a total of 300 images for training purposes. Sample images of these 3 classes can be seen in Figure 6.

Image Dataset
The image dataset for this project was generated by taking photographs of Jalapeño pepper leaves using a RGB camera on a smartphone model Samsung Galaxy S7 edge at a 4032 × 3024 12 MP resolution with 4:3 aspect ratio at ambient light conditions, using no flash and a thick white paper background. It is noteworthy that to avoid being invasive the photographs were taken in situ without cutting the leaves. Furthermore, a dataset of plant leaves was generated by including 3 classes (healthy, PHYVV, and TMV) with 100 images within each class for a total of 300 images for training purposes. Sample images of these 3 classes can be seen in Figure 6.
Plants 2021, 10, 1977 11 of 18 provided by Cinvestav, Irapuato, Mexico and the TMV samples were provided by UNAM-Iztacala, Mexico. In order to infect the plants, a bio-ballistic-based procedure was carried out to inoculate PHYVV into the plants [25]. With respect to TMV, plants of 6 to 8 leaves were abraded with wet carborundum (400 grit) and inoculated with 30 µ g of TMV (25 µ g/mL), by gently rubbing adaxial leaves' surfaces. Furthermore, RT-PCR confirmation tests were carried out to ensure the presence of the virus inside the plants according to the procedure described by Guevara-Olvera et al. [26].

Image Dataset
The image dataset for this project was generated by taking photographs of Jalapeño pepper leaves using a RGB camera on a smartphone model Samsung Galaxy S7 edge at a 4032 × 3024 12 MP resolution with 4:3 aspect ratio at ambient light conditions, using no flash and a thick white paper background. It is noteworthy that to avoid being invasive the photographs were taken in situ without cutting the leaves. Furthermore, a dataset of plant leaves was generated by including 3 classes (healthy, PHYVV, and TMV) with 100 images within each class for a total of 300 images for training purposes. Sample images of these 3 classes can be seen in Figure 6.

Vegetation Index Algorithms
Vegetation indices are algorithms that are focused on estimating the fractional green vegetation cover by means of indirect estimation of vegetation surface through linear algebra operations between image masks such as red, green, and blue, or specific hyper-spectral bands [27]. Usually, vegetation indices are focused on quantifying vegetation covered areas for remote sensing applications. However, the most common plant disease symptoms are chlorosis and necrosis. Because of this, two new vegetation indices are proposed to highlight chlorosis and necrosis areas across vegetation with the objective of obtaining a better identification of chlorotic and necrotic symptoms.

Normalized Red-Blue Vegetation Index
NRBVI is a newly proposed vegetation index focused on highlighting chlorosis areas and discarding non-vegetation pixels from the greenhouse environment such as walls and ground to name but a few. Furthermore, NRBVI can be estimated from RGB images by subtracting red from blue masks as the first step, followed by a further normalization operation by dividing the resulting pixels by the maximum pixel value as can be observed at Equation (4).

Normalized Green-Blue Vegetation Index
Alternatively, NGBVI is another newly proposed vegetation index, which is focused on highlighting chlorosis symptoms too, but by means of green and blue masks. As can be observed in Equation (5), NGBVI can be calculated by subtracting green from blue masks in the numerator and further dividing it by the highest pixel value from the numerator result.

Jet Color Scale
Color scales are a useful tool for highlighting small differences in grayscale images. Most commercial equipment uses color scales to improve visualization for humans in applications such as vegetation indices, thermal cameras, and different kinds of remote sensing applications. Color scales consist of a discretization of the 8-bit grayscale into sixteen solid color bands where colors are assigned depending on the color scale that you chose. For this project, the jet color pallet was chosen due to its popularity among vegetation index applications and for its clear visualization where blue corresponds to low intensity pixels and red to the highest values. For clarity, grayscale and jet scale differences are shown in Figure 7.

Vegetation Index Algorithms
Vegetation indices are algorithms that are focused on estimating the fractional green vegetation cover by means of indirect estimation of vegetation surface through linear algebra operations between image masks such as red, green, and blue, or specific hyperspectral bands [27]. Usually, vegetation indices are focused on quantifying vegetation covered areas for remote sensing applications. However, the most common plant disease symptoms are chlorosis and necrosis. Because of this, two new vegetation indices are proposed to highlight chlorosis and necrosis areas across vegetation with the objective of obtaining a better identification of chlorotic and necrotic symptoms.

Normalized Red-Blue Vegetation Index
NRBVI is a newly proposed vegetation index focused on highlighting chlorosis areas and discarding non-vegetation pixels from the greenhouse environment such as walls and ground to name but a few. Furthermore, NRBVI can be estimated from RGB images by subtracting red from blue masks as the first step, followed by a further normalization operation by dividing the resulting pixels by the maximum pixel value as can be observed at Equation (4).

Normalized Green-Blue Vegetation Index
Alternatively, NGBVI is another newly proposed vegetation index, which is focused on highlighting chlorosis symptoms too, but by means of green and blue masks. As can be observed in Equation (5), NGBVI can be calculated by subtracting green from blue masks in the numerator and further dividing it by the highest pixel value from the numerator result.

Jet Color Scale
Color scales are a useful tool for highlighting small differences in grayscale images. Most commercial equipment uses color scales to improve visualization for humans in applications such as vegetation indices, thermal cameras, and different kinds of remote sensing applications. Color scales consist of a discretization of the 8-bit grayscale into sixteen solid color bands where colors are assigned depending on the color scale that you chose. For this project, the jet color pallet was chosen due to its popularity among vegetation index applications and for its clear visualization where blue corresponds to low intensity pixels and red to the highest values. For clarity, grayscale and jet scale differences are shown in Figure 7.

Deep-Learning Models
CNNs have become the most popular learning algorithms in vision-related applications in recent years. Some of the remarkable application areas of CNNs include image segmentation, classification, plant species identification, plant disease identification, crop

Deep-Learning Models
CNNs have become the most popular learning algorithms in vision-related applications in recent years. Some of the remarkable application areas of CNNs include image segmentation, classification, plant species identification, plant disease identification, crop pest classification, among others. There are a lot of different state-of-the-art CNN architectures, therefore testing of all the possibilities was not a viable option. In this article, four common deep-learning network architectures were used (VGG-16, Xception, Inception v3, and MobileNet v2).
One of the most popular CNN architectures is VGG. This algorithm represents one of the most successful state-of-the-art CNN architectures due to its powerful and accurate classification ability. VGG network was introduced by Simonyan and Zisserman in 2014 [28]. In this study, a VGG-16 architecture was used that consists of 16 sequentially stacked convolution layers followed by rectified linear units (ReLU). A MaxPooling is computed after every two convolutional layers in the first four layers, and then after every three layers in the rest of the network, ending with three fully connected layers followed by a softmax layer. The block diagram of a VGG-16 network is illustrated in Figure 8.
tectures, therefore testing of all the possibilities was not a viable option. In this article, four common deep-learning network architectures were used (VGG-16, Xception, Inception v3, and MobileNet v2).
One of the most popular CNN architectures is VGG. This algorithm represents one of the most successful state-of-the-art CNN architectures due to its powerful and accurate classification ability. VGG network was introduced by Simonyan and Zisserman in 2014 [28]. In this study, a VGG-16 architecture was used that consists of 16 sequentially stacked convolution layers followed by rectified linear units (ReLU). A MaxPooling is computed after every two convolutional layers in the first four layers, and then after every three layers in the rest of the network, ending with three fully connected layers followed by a softmax layer. The block diagram of a VGG-16 network is illustrated in Figure 8. The Xception architecture is a linear stack of depthwise separable convolution layers with residual connections [29]. The Xception network consists of 36 convolutional layers that perform as the feature extractor. These 36 convolutional layers are organized into 14 modules with linear residual connections around them, except for the first and last modules, and are followed by a softmax layer. A complete description of the specifications of the network is featured in Figure 9. The Xception architecture is a linear stack of depthwise separable convolution layers with residual connections [29]. The Xception network consists of 36 convolutional layers that perform as the feature extractor. These 36 convolutional layers are organized into 14 modules with linear residual connections around them, except for the first and last modules, and are followed by a softmax layer. A complete description of the specifications of the network is featured in Figure 9. Inception v3 is a convolutional neural network architecture from the Inception family. The Inception v3 network is composed of 11 Inception modules of five kinds in total and supports the following concepts.
1. Factorizing convolution to reduce the number of parameters (connections) without decreasing the network efficiency. 2. Factorizing into smaller convolution to reduce the number of parameters. For example, a 5 × 5 convolution filter has 25 parameters, this filter is replaced by two 3 × 3 Figure 9. The Xception architecture [29].
Inception v3 is a convolutional neural network architecture from the Inception family. The Inception v3 network is composed of 11 Inception modules of five kinds in total and supports the following concepts.

1.
Factorizing convolution to reduce the number of parameters (connections) without decreasing the network efficiency.

2.
Factorizing into smaller convolution to reduce the number of parameters. For example, a 5 × 5 convolution filter has 25 parameters, this filter is replaced by two 3 × 3 convolution filters, 3 × 3 + 3 × 3 = 18 parameters, this is a reduction of 28% of the number of parameters.

3.
Factorizing into asymmetric convolution is to factorize a standard two-dimensional convolution kernel into two one-dimension convolution kernels. For example, a 3 × 3 convolution filter (9 parameters) could be replaced by a 1 × 3 convolution filter followed by a 3 × 1 convolution filter (1 × 3 + 3× 1 = 6 parameters). This is a reduction of 33% on the number of parameters.

4.
Auxiliary classifier is a small CNN inserted between layers during training, and the loss incurred is added to the main network loss. 5.
Efficient Grid Size Reduction.
A complete description of the specifications of the network is featured in Figure 10. For further description of Inception v3, see Szegedy et al. [30]. MobileNet is a lightweight CNN architecture. The first version of MobileNet was proposed by Google in 2017 [32] and it is oriented to mobile and embedded devices. It is characterized by using depthwise and pointwise convolution. MobileNet v2 [33] adapts inverted residual blocks which can incorporate low-level features with high-level features. The architecture of MobileNet v2 contains the initial full convolution layer with 32 filters, followed by 19 residual bottleneck layers, see Table 6. For further description of MobileNet v2, see Sandler et al. [33].  Figure 10. The Inception v3 architecture [30,31].
MobileNet is a lightweight CNN architecture. The first version of MobileNet was proposed by Google in 2017 [32] and it is oriented to mobile and embedded devices. It is characterized by using depthwise and pointwise convolution. MobileNet v2 [33] adapts inverted residual blocks which can incorporate low-level features with high-level features. The architecture of MobileNet v2 contains the initial full convolution layer with 32 filters, followed by 19 residual bottleneck layers, see Table 6. For further description of MobileNet v2, see Sandler et al. [33]. Table 6. MobileNet v2 architecture description [33].

Input
Operator Expansion Factor Output Channels

Number of Repeated Layers
Stride -

Image Processing Experiment
The image-processing experiment setup consisted of applying the RGB image as common input to four different algorithms, which are NRBVI and NGBVI in grayscale as well as Jet colored versions, and the original RGB image, to compare the performance of each algorithm in order to identify chlorosis lesions on Jalapeño leaf images [34]. The image-processing methodology is described as block diagram in Figure 11, where the RGB image as common input and five versions of the same image as outputs of the whole process can be observed.
Plants 2021, 10,1977 16 of 18 each algorithm in order to identify chlorosis lesions on Jalapeño leaf images [34]. The image-processing methodology is described as block diagram in Figure 11, where the RGB image as common input and five versions of the same image as outputs of the whole process can be observed.

Conclusions
CNNs are used extensively as a powerful class of classification models of digital images in a variety of applications in the agricultural field, such as plant species identification, plant disease identification and crop pest classification, among others. This article proposes an innovative methodology for the identification of Tobacco Mosaic Virus (TMV) and Pepper Huasteco Yellow Vein Virus (PHYVV) visual symptoms on Jalapeño pepper (Capsicum annuum) leaves.
The proposed methodology combines new RGB vegetation indices and deep-learning classification models. The results show that the proposed methodology can correctly and effectively recognize TMV and PHYVV through image recognition of different image representations such as RGB, NRBVI, NRBVI-Jet, NGBVI, and NGBVI-Jet. The best CNNsbased models were those that used NGBVI vegetation index representations. The pretrained Xception-based model reached an average Top-1 accuracy of 98.3%. Future investigations will focus on increasing the dataset size and analyzing other state-of-the-art CNN architectures.
Supplementary Materials: The source code of this article is publicly released and can be downloaded from https://github.com/jrmillan1983/PHYVV_TMV_CNN.

Conclusions
CNNs are used extensively as a powerful class of classification models of digital images in a variety of applications in the agricultural field, such as plant species identification, plant disease identification and crop pest classification, among others. This article proposes an innovative methodology for the identification of Tobacco Mosaic Virus (TMV) and Pepper Huasteco Yellow Vein Virus (PHYVV) visual symptoms on Jalapeño pepper (Capsicum annuum) leaves.
The proposed methodology combines new RGB vegetation indices and deep-learning classification models. The results show that the proposed methodology can correctly and effectively recognize TMV and PHYVV through image recognition of different image representations such as RGB, NRBVI, NRBVI-Jet, NGBVI, and NGBVI-Jet. The best CNNs-based models were those that used NGBVI vegetation index representations. The pre-trained Xception-based model reached an average Top-1 accuracy of 98.3%. Future investigations will focus on increasing the dataset size and analyzing other state-of-the-art CNN architectures.