GamaNNet: A Novel Plant Pathologist-Level CNN Architecture for Intelligent Diagnosis

: Plant pathologies significantly jeopardise global food security, necessitating the development of prompt and precise diagnostic methods. This study employs advanced deep learning techniques to evaluate the performance of nine convolutional neural networks (CNNs) in identifying a spectrum of phytosanitary issues affecting the foliage of Solanum lycopersicum (tomato). Ten thousand RGB images of leaf tissue were subsampled in training (64%), validation (16%), and test (20%) sets to rank the most suitable CNNs in expediting the diagnosis of plant disease. The study assessed the performance of eight well-known networks under identical hyperparameter conditions. Additionally, it introduced the GamaNNet architecture, a custom-designed model optimised for superior performance on this specific type of dataset. The investigational results were most promising for the innovative GamaNNet and ResNet-152, which both exhibited a 91% accuracy rate, as evidenced by their confusion matrices, ROC curves, and AUC metrics. In comparison, LeNet-5 and ResNet-50 demonstrated lower assertiveness, attaining accuracies of 74% and 69%, respectively. GoogLeNet and Inception-v3 emerged as the frontrunners, displaying diagnostic preeminence, achieving an average F1-score of 97%. Identifying such pathologies as Early Blight, Late Blight, Corynespora Leaf Spot, and Septoria Leaf Spot posed the most significant challenge for this class of problem.


Introduction
On a global scale, it is estimated that around 40% of agricultural production is lost due to phytosanitary problems [1].According to the Food and Agriculture Organization of the United Nations (FAO), plant pathology and invasive insects are among the main causes of the loss of over 40 percent of food crops worldwide, with annual losses exceeding USD 220 billion [2].The loss of staple cereals (rice, wheat, and maize) and tubers (potatoes) directly impacts food and nutritional security, while the loss of commodities (tomatoes, coffee, and bananas) impacts household livelihoods and, consequently, national economies [3].Recognising that the worst scenarios are in developing countries, long-term approaches involving climate parameters (Pathogen-Environment-Host) project that even regions of suboptimal climate will be protected against the occurrence of new outbreaks [4].
The authors of ref. [5] explains that to minimise production losses and maintain crop sustainability, the practice most recommended by phytopathologists is constant crop monitoring together with the rapid and accurate diagnosis of disease.The great challenge, however, is correctly identifying the symptoms of the principal diseases of each crop [6].The conventional practice of periodic inspections (scouting) by field staff is necessary in all productive areas to detect and assess the severity of the disease.However, when investigating yellow sigatoka in banana fields, [7] pointed out that human subjectivity does not allow inspection workers to be consistently thorough throughout the day; it is understood that factors such as the weather, physical wear and tear, and social distractions conspire to make this task time-consuming, counterproductive, and prone to failure.It is in this scenario that automated, reliable, and economically competitive solutions for monitoring plant health can facilitate control at the onset of disease, and improve crop productivity.
Plant disease is any abnormality caused by biotic or abiotic factors that, by continually altering the plant metabolism, result in a loss of production or fruit quality.Diseases originating in living organisms such as fungi, bacteria, viruses, and insects are known as biotic, and it is the occurrence of certain pathogenic species that causes epidemic diseases in plants.Among the most sensitive crops, the tomato (Solanum lycopersicum) occupies a prominent place in the global agricultural economy, with more than 180 million tons produced annually in more than 170 countries [8].Historically, the largest producer of table tomatoes is China with 34.78% of the total production.In tomato crops, some pests are considered primary, i.e., they are cosmopolitan and highly devastating (100% losses) [9].Secondary (occasional) pests are those that cause only sporadic damage, are found in localised areas, and vary with the region, growing season, and production system.
When it comes to intelligent image interpretation, the most promising results have been obtained with convolutional neural networks (CNNs) [10].These multilayer architectures are based on artificial neural networks, which link together several non-linear processing units [11].Deep network learning allows semantic differences to be extracted from small representative fragments, even in large volumes of heterogeneous data [12]; they do so through representations at various levels of abstraction [13].Well-designed CNNs have already managed to surpass other canonical methods for specific solutions in precision agriculture, achieving superhuman levels in certain domains [14].
The authors of ref. [15] identified key research lacunae in agricultural deep learning (DL), categorising applications into crop yield prediction, plant stress, weed and pest detection, disease identification, and precision agriculture.Their review highlighted the prevalent use of CNN architectures like AlexNet and ResNet to bolster agricultural economics.Nevertheless, they advocate for novel DL methodologies to enhance model efficacy and expedite inference, thereby optimising practical utility.
Taking into account the aforementioned context, this article sets forth two primary objectives: first, to conduct a comparative analysis among the extant pre-eminent CNNs with the aim of ascertaining the most apt architecture for the detection of nine phytosanitary issues in tomato leaves; and second, to introduce an innovative CNN model specifically designed for the visual classification of symptomatic tomato foliage.

Phytosanitary Database
PlantVillage is a project of international cooperation that combines artificial intelligence with precision agriculture, mobilising educational institutions (Cambridge University, Cambridge, UK) and remote sensing (Planet Network, Newton, NJ, USA) with the world programmes for combatting hunger of the United Nations Food and Agriculture Organisation (FAO, Rome, Italy).This programme integrates an open repository that joins 54,323 images of 14 crops and 38 phytosanitary problems with multispectral images at a spatial resolution of 256 × 256 pixels and spectral resolution of three wavelengths (Red-Green-Blue channels).In our study, we used images from research by [16], which enriched the original PlantVillage collection (plantvillage.psu.edu), with modifications in the axes (mirroring), rotation (flipping), and brightness of the images.
Tomato (Solanum lycopersicum) is widely cultivated around the world.However, tomato crops are frequently afflicted by a spectrum of pests and diseases, significantly compromising both yield and quality [17].In this respect, ten thousand digital images were acquired under non-ideal (by non-ideal conditions we mean digital images obtained with variations in the shape of the target and in the positioning angle relative to the sensor and the light source) conditions as they portray nine phytosanitary problems commonly found in agricultural areas with serious economic impact on crop exploitation.S1 (Supplementary Materials).The described symptoms show similarities between some disorders due to the profusion of severity levels in each class.Similar to the image database employed by [18] in the diagnosis of soybean diseases, a standard grey card was consistently employed as a background during image capture.This practice served to standardise the colour palette, thereby reducing the influence of variable lighting positions on an accurate classification.

Learning Strategies
Training convolutional architecture is highly computer-intensive.For this reason, all the stages of the study were carried out by remote processing using the Kaggle ® platform (Kaggle, San Francisco, CA, USA), which provides free of charge the Nvidia P100 GPU (Nvidia, Santa Clara, CA, USA), 1.32 GHz, 9.3 TFlops, and 12 GB of RAM.Using openframework applications created by Google ® , such as the Keras ® and Tensorflow ® virtual libraries, deep learning strategies were employed for the supervised classification of RGB images with mutually exclusive labels (diagnosis).
Starting with a random selection, the database was stratified (Figure 2) into subsets of 64% for the training phase and 16% for validation.The remaining 20% was separated for the testing phase as unseen data, i.e., data that had never been exposed to the algorithms under analysis.It is worth noting that this hold-out was applied after the random reordering of all the individuals.This allowed all the classes to be present in the three subsets, thereby ensuring that each CNN could be explored without a priori bias.The described symptoms show similarities between some disorders due to the profusion of severity levels in each class.Similar to the image database employed by [18] in the diagnosis of soybean diseases, a standard grey card was consistently employed as a background during image capture.This practice served to standardise the colour palette, thereby reducing the influence of variable lighting positions on an accurate classification.

Learning Strategies
Training convolutional architecture is highly computer-intensive.For this reason, all the stages of the study were carried out by remote processing using the Kaggle ® platform (Kaggle, San Francisco, CA, USA), which provides free of charge the Nvidia P100 GPU (Nvidia, Santa Clara, CA, USA), 1.32 GHz, 9.3 TFlops, and 12 GB of RAM.Using openframework applications created by Google ® , such as the Keras ® and Tensorflow ® virtual libraries, deep learning strategies were employed for the supervised classification of RGB images with mutually exclusive labels (diagnosis).
Starting with a random selection, the database was stratified (Figure 2) into subsets of 64% for the training phase and 16% for validation.The remaining 20% was separated for the testing phase as unseen data, i.e., data that had never been exposed to the algorithms under analysis.It is worth noting that this hold-out was applied after the random reordering of all the individuals.This allowed all the classes to be present in the three subsets, thereby ensuring that each CNN could be explored without a priori bias.

Convolutional Architectures
CNN configurations generally consist of a series of especially ordered elements, where the variation in these elements is the origin of the different performances of each network.Figure 3 shows the general architecture of the CNNs and their principal elements, among which are convolutional layers, max pooling (subsampling), and the flattening process.The logical reasoning of the AI began with this interpretive phase, in which information is processed in networks, and whose neuronal structures will list the probabilities of an association between the current piece of data and each of the learned classes.In the output layer, the label deduced by the architecture was based on the greatest counted percentage (Rank 1), where the possible errors of judgement were attributed to the equivalences between these probability distributions.
The CNN architectures consisted of sequential processing models in which images of variable dimension were transformed into progressively less-complex features, which were, however, discernible among the included classes.Nine state-of-the-art architectures were implemented using the Python programming language based on the excellent results obtained with a wide variety of targets, namely: LeNet-5 [19], AlexNet [20], Goog-LeNet [21], Inception-v3 [22], as well as the derivations of the Visual Geometry Group (VGG) [23] with 16 and 19 convolutional layers, and ResNet [24] with 50 and 152 convolutions.Based on the approaches and results of the above authors, this study also proposed a new architecture, known as GamaNNet, whose parameters were repeatedly

Convolutional Architectures
CNN configurations generally consist of a series of especially ordered elements, where the variation in these elements is the origin of the different performances of each network.Figure 3 shows the general architecture of the CNNs and their principal elements, among which are convolutional layers, max pooling (subsampling), and the flattening process.

Convolutional Architectures
CNN configurations generally consist of a series of especially ordered elements, where the variation in these elements is the origin of the different performances of each network.Figure 3 shows the general architecture of the CNNs and their principal elements, among which are convolutional layers, max pooling (subsampling), and the flattening process.The logical reasoning of the AI began with this interpretive phase, in which information is processed in networks, and whose neuronal structures will list the probabilities of an association between the current piece of data and each of the learned classes.In the output layer, the label deduced by the architecture was based on the greatest counted percentage (Rank 1), where the possible errors of judgement were attributed to the equivalences between these probability distributions.
The CNN architectures consisted of sequential processing models in which images of variable dimension were transformed into progressively less-complex features, which were, however, discernible among the included classes.Nine state-of-the-art architectures were implemented using the Python programming language based on the excellent results obtained with a wide variety of targets, namely: LeNet-5 [19], AlexNet [20], Goog-LeNet [21], Inception-v3 [22], as well as the derivations of the Visual Geometry Group (VGG) [23] with 16 and 19 convolutional layers, and ResNet [24] with 50 and 152 convolutions.Based on the approaches and results of the above authors, this study also proposed a new architecture, known as GamaNNet, whose parameters were repeatedly The logical reasoning of the AI began with this interpretive phase, in which information is processed in networks, and whose neuronal structures will list the probabilities of an association between the current piece of data and each of the learned classes.In the output layer, the label deduced by the architecture was based on the greatest counted percentage (Rank 1), where the possible errors of judgement were attributed to the equivalences between these probability distributions.
The CNN architectures consisted of sequential processing models in which images of variable dimension were transformed into progressively less-complex features, which were, however, discernible among the included classes.Nine state-of-the-art architectures were implemented using the Python programming language based on the excellent results obtained with a wide variety of targets, namely: LeNet-5 [19], AlexNet [20], GoogLeNet [21], Inception-v3 [22], as well as the derivations of the Visual Geometry Group (VGG) [23] with 16 and 19 convolutional layers, and ResNet [24] with 50 and 152 convolutions.Based on the approaches and results of the above authors, this study also proposed a new architecture, known as GamaNNet, whose parameters were repeatedly adjusted and validated until arriving at the best configuration.GamaNNet was the only network not to require the input images to be resized.

Training the Networks
With a view to accelerating the assimilation of key features into each symptom, the transfer learning technique was implemented in all the CNNs using the ImageNet public dataset, with 1.2 million images distributed over 1000 categories.This made it possible to reuse the cognitive logic (weights) from other domains and apply it to a specific task.This step was crucial because it initialised the weights in the model before the networks made first contact with this database, speeding up comprehension.
In all, 100 epochs were standardised for each training cycle, as it was thereby possible to observe the evolution of the network throughout the learning process.Due to the diversity of CNNs, a variable number of parameters was generated in each architecture, with the weights between the artificial synapses stored in a file for later replications.To allow a fair comparison, an attempt was also made to standardise the hyperparameters between the experiments, employing those described in Table S2 (Supplementary Materials).
The 'Adam' optimiser and the 'Categorical Cross entropy' loss function were adopted because they result in good training for a model with multiple mutually exclusive classes [25].The activation function refers to the resource whereby activated neurons can be retained and mapped by a non-linear function.In the words of [26], the activation function makes a CNN "achieve the significance of Artificial Intelligence" by elevating its capacity and synaptic agility, and allowing the resolution of non-linear problems.And, as the author points out, it is in this sense that the Rectified Linear Unit (ReLu) is the best of the many existing activation functions.
All the convolutional structures (2D) converged to ten neurons, which represent the phytosanitary problems described in this study.As the number of the output layer of the CNNs is equal to the number of classes, each output neuron presented a distinct probability for the test image.For this purpose, the activation function used at the end of each architecture was the 'Softmax' function, which normalises the sum of the outputs of all the neurons to an interval between 0 and 1.This function is recommended by [27] for multiclass problems, where the probability of the test image belonging to a single class has to be obtained.The authors demonstrated how an increase in the probability of the class i occurs at the expense of reducing the probability of all the other classes in their study (k! = i).

Evaluation
During the validation phase, in order to evaluate the performance of each CNN, the suggested classes were compared with the original classes of the samples.Each case of compatibility was considered a hit, and the metrics were updated.At the end of the epochs, a representation of the performance in terms of loss and overall accuracy was obtained for each of the architectures.The number of model parameters and the training time for each network were noted.
The next step consisted of subjecting the networks to unseen test data and generating a confusion matrix.Important information was extracted, such as quantifying the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) labels [28].More elaborate analyses were considered based on the secondary metrics, listed in Table S3 (Supplementary Materials).
An analysis of how well the CNNs learned the particularities of the patterns of each disease was also seen in the ROC (Receiver Operating Characteristic) curve, which evaluated the viability of employing a network as opposed to a mean probability value.The area under the curve (AUC) was another measure of performance that, in this study, allowed the weaknesses of the CNNs in solving for multiple classes to be visualised.At the end of the analysis, comparative tables were presented to list the performance of each of the convolutional networks.

Results
At first, it should be noted that one of the core theorems in machine learning (No-Free-Lunch-Theorem) states that no algorithm is best, whether for generic or all individual cases.This is due to the close relationship between predictive models, databases, and optimisation.Thus, if any classifier performs better than another for one class of problem, it will be supplanted for other classes [29], acquiring-in the space of all the possible problems-a performance equalling that of any other classifier, including random research [30].That said, the results presented here were considered based on efficiency during the diagnosis of these 10 phytosanitary non-simultaneous problems, visually detectable in RGB images.
All nine of the implemented state-of-the-art models were able to distinguish between asymptomatic leaves and those that presented one of the nine anomalies under evaluation, with the performance varying according to the network.This reinforces the idea that the image bank used is, in fact, complex and demands a certain sophistication in the learning methods.The model predictions are listed in Table 1.GoogLeNet and Inception-v3 stood out the most due to their excellent overall accuracy compared to the other architectures.Among the networks with a hit rate greater than 90% are VGG-16 (93.0%),GamaNNet (91.0%), and ResNet-152 (91.0%).The other CNNs showed reasonable ability given the complexity to which they were exposed, but even so, AlexNet and VGG-19 reaffirmed their superiority for discrimination, compared to the LeNet-5 and ResNet-50 networks, which had the worst rates.Based on the training time of each network, the best trade-off was seen with AlexNet (7.03 min) and GamaNNet (14.02 min), which reached high accuracies over short intervals of time, suggesting better efficiency compared to the other architectures.In fact, LeNet-5 was trained in the shortest recorded time (2.17 min); however, it should be noted that the marked resizing produced the lowest spatial resolution among all the architectures under evaluation, a fact that considerably speeded up the convolutional processes of the model.
The reflection of this time-efficiency binomial on training can be seen in lower computational costs and in the speed of improvement that such networks can achieve, in addition to the depth of the network, a reasonable number of trained parameters, and the non-linearity (ReLU) applied after each convolutional layer.According to [31], this activation function alone can reduce the training time when compared to other functions such as 'Tanh' and 'Sigmoid'.

ResNet-50
The contrast found between the highest accuracy during validation and the overall accuracy during the test phase is a result of the non-uniform performance of the algorithms when solving multiclass problems, and, in that sense, ResNet-50 showed the greatest discrepancy.A detailed analysis of the accuracy and loss probability curves (Figure 4) shows that the best performances were intercurrent, i.e., learning was not uniform.In contrast to other deep learning models, this architecture was not successful in progressively assimilating the variations in each disease.It rarely achieved validation accuracies greater than 90%, and, as such, the fluctuating rates of diagnostic accuracy compromised its overall performance.

ResNet-50
The contrast found between the highest accuracy during validation and the overall accuracy during the test phase is a result of the non-uniform performance of the algorithms when solving multiclass problems, and, in that sense, ResNet-50 showed the greatest discrepancy.A detailed analysis of the accuracy and loss probability curves (Figure 4) shows that the best performances were intercurrent, i.e., learning was not uniform.In contrast to other deep learning models, this architecture was not successful in progressively assimilating the variations in each disease.It rarely achieved validation accuracies greater than 90%, and, as such, the fluctuating rates of diagnostic accuracy compromised its overall performance.
(a) (b) The amplitude also suggests that the strategy of transfer learning with the ImageNet database did not result in effective gains for this classification.In the referenced study by [31], the authors were also adept at applying transfer learning to the classification of five principal diseases in the foliage of chilli plants, and their proposed model yielded remarkable diagnostic accuracy (98.63%) with no data augmentation.
As defined by [32], computer vision techniques are exceptionally well suited for the task of identifying plant disease, given that plant pathologies are predominantly identified through their visual manifestations, which encompass the shape, appearance, and texture of the affected areas.However, with this dataset, the plants are diagnosed by regional disturbances in the leaves, and, in this case, the position and shape of the spots are not always consistent.The possible randomness shown by each class was definitely a limiting factor for this CNN.
Supplementary Materials shows the high ambiguity of the architectures during label assignment.In the confusion matrix, the columns are understood to represent the predictions of the classifier and the rows the correct diagnoses.The diagonal cells summarise the assertiveness of the architecture in classifying correctly, while the off-diagonal entries show the mistakes made by the algorithm.
The classes are identified as (A) Bacterial Leaf Spot, (B) Alternariosis, (C) Healthy, (D) Late blight, (E) Leaf Mould, (F) Septoria Leaf Spot, (G) Spider Mite, (H) Corynespora Leaf Blight, (I) TYLCV, and (J) ToMV.The ROC curve is a probability curve, while the AUC represents the degree of separability, i.e., it informs how much the model is able to distinguish between classes.The higher the AUC value in Figure 5b, the better the model is in correctly diagnosing the pathology (Figure 5a).The amplitude also suggests that the strategy of transfer learning with the ImageNet database did not result in effective gains for this classification.In the referenced study by [31], the authors were also adept at applying transfer learning to the classification of five principal diseases in the foliage of chilli plants, and their proposed model yielded remarkable diagnostic accuracy (98.63%) with no data augmentation.
As defined by [32], computer vision techniques are exceptionally well suited for the task of identifying plant disease, given that plant pathologies are predominantly identified through their visual manifestations, which encompass the shape, appearance, and texture of the affected areas.However, with this dataset, the plants are diagnosed by regional disturbances in the leaves, and, in this case, the position and shape of the spots are not always consistent.The possible randomness shown by each class was definitely a limiting factor for this CNN.
Supplementary Materials shows the high ambiguity of the architectures during label assignment.In the confusion matrix, the columns are understood to represent the predictions of the classifier and the rows the correct diagnoses.The diagonal cells summarise the assertiveness of the architecture in classifying correctly, while the off-diagonal entries show the mistakes made by the algorithm.
The classes are identified as (A) Bacterial Leaf Spot, (B) Alternariosis, (C) Healthy, (D) Late blight, (E) Leaf Mould, (F) Septoria Leaf Spot, (G) Spider Mite, (H) Corynespora Leaf Blight, (I) TYLCV, and (J) ToMV.The ROC curve is a probability curve, while the AUC represents the degree of separability, i.e., it informs how much the model is able to distinguish between classes.The higher the AUC value in Figure 5b, the better the model is in correctly diagnosing the pathology (Figure 5a).The AUC (area under the curve) values of this architecture were the lowest in this study for Spider Mite (0.53) and Septoria Leaf Spot (0.67), which were correctly classified only slightly better than under a random criterion (mean probability: 10%).In the test, the infestations of Spider Mite (class G) were confused with Late blight (D), Leaf Mould (E), and Corynespora Leaf Spot (H).With the exception of the phytoviruses, samples from all the classes were mistakenly diagnosed as Late Blight (D).The poor ability of the architecture was also demonstrated by attributing the condition of Healthy (C) to 96 plants with phytosanitary problems.
Estimating the nutritional value of rice leaves using images, ref. [33] concluded that their Resnet-50 database obtained superior performance to AlexNet, VGG architectures, and even GoogLeNet.The superiority of ResNet-50 for certain classes of problems was also confirmed by [34] when detecting nitrogen deficiency in maize leaves, comparing the established methods of texture vectorisation with deep learning.The authors concluded that ResNet-50 was superior to the Fourier, Gabor, LBP, and Fractal methods and to some of their variations, as well as supplanting GoogLeNet and VGG-19.
The international literature generally highlights the capability of ResNet-50 in problems of homogeneous textures with binary or multiclass solutions, as is the case with leaves that are more or less chlorotic.Studying rust against healthy leaves, [35] highlighted that ResNet-50 models have the capability to focus on small areas which other models do not handle well.However, in the case of irregular patterns such as plant pathologies, its performance was severely compromised.ResNet-50 was not efficient in learning similar symptoms, showing a tendency to include leaves with Septoria (F) in the Alternariosis class (B) and part of the latter in the Late Blight class (D).A similar result was reported by [36] when evaluating the in situ recognition of ten pests (insects) on the leaf tissue of different crops.The authors discarded this network for achieving the lowest percentage of correct answers for that level of complexity.

GoogLeNet
Also known as Inception-v1, GoogLeNet was initially published by [21], and has four convolutional layers, four max pooling layers, three medium pooling layers, and five fully connected layers, applying dropout and the ReLU activation function following the convolutions.Despite its modest depth (22 layers), GoogLeNet stands out for having a significantly smaller number of network parameters (Table 1) (Supplementary Materials), the second smallest number compared to the other networks.
The convolutional neural networks that were more adept at discerning the symptoms discussed here were GoogLeNet and Inception-v3, with an AUC of (0.984) and (0.986), respectively.This is a reasonable result since both networks share the same convolutional The AUC (area under the curve) values of this architecture were the lowest in this study for Spider Mite (0.53) and Septoria Leaf Spot (0.67), which were correctly classified only slightly better than under a random criterion (mean probability: 10%).In the test, the infestations of Spider Mite (class G) were confused with Late blight (D), Leaf Mould (E), and Corynespora Leaf Spot (H).With the exception of the phytoviruses, samples from all the classes were mistakenly diagnosed as Late Blight (D).The poor ability of the architecture was also demonstrated by attributing the condition of Healthy (C) to 96 plants with phytosanitary problems.
Estimating the nutritional value of rice leaves using images, ref. [33] concluded that their Resnet-50 database obtained superior performance to AlexNet, VGG architectures, and even GoogLeNet.The superiority of ResNet-50 for certain classes of problems was also confirmed by [34] when detecting nitrogen deficiency in maize leaves, comparing the established methods of texture vectorisation with deep learning.The authors concluded that ResNet-50 was superior to the Fourier, Gabor, LBP, and Fractal methods and to some of their variations, as well as supplanting GoogLeNet and VGG-19.
The international literature generally highlights the capability of ResNet-50 in problems of homogeneous textures with binary or multiclass solutions, as is the case with leaves that are more or less chlorotic.Studying rust against healthy leaves, [35] highlighted that ResNet-50 models have the capability to focus on small areas which other models do not handle well.However, in the case of irregular patterns such as plant pathologies, its performance was severely compromised.ResNet-50 was not efficient in learning similar symptoms, showing a tendency to include leaves with Septoria (F) in the Alternariosis class (B) and part of the latter in the Late Blight class (D).A similar result was reported by [36] when evaluating the in situ recognition of ten pests (insects) on the leaf tissue of different crops.The authors discarded this network for achieving the lowest percentage of correct answers for that level of complexity.

GoogLeNet
Also known as Inception-v1, GoogLeNet was initially published by [21], and has four convolutional layers, four max pooling layers, three medium pooling layers, and five fully connected layers, applying dropout and the ReLU activation function following the convolutions.Despite its modest depth (22 layers), GoogLeNet stands out for having a significantly smaller number of network parameters (Table 1) (Supplementary Materials), the second smallest number compared to the other networks.
The convolutional neural networks that were more adept at discerning the symptoms discussed here were GoogLeNet and Inception-v3, with an AUC of (0.984) and (0.986), respectively.This is a reasonable result since both networks share the same convolutional structure.Identifying changes in leaf tissue due to phytosanitary problems, ref. [37] concluded that among the architectures under test, GoogLeNet was notably superior to the other networks (including Inception-v3) for such tasks.
In the confusion matrix of the GoogLeNet network (Figure 6a) it can be seen that for the largest error event, 12 plants with Alternaria (B) were wrongly diagnosed between Bacterial Leaf Spot (A), Late Blight (D), Septoria Leaf Spot (F), and Corynespora Leaf Blight (H), the latter being confused with Spider Mite in 11 samples.
AgriEngineering 2024, 6, 153 9 of 17 structure.Identifying changes in leaf tissue due to phytosanitary problems, ref. [37] concluded that among the architectures under test, GoogLeNet was notably superior to the other networks (including Inception-v3) for such tasks.
In the confusion matrix of the GoogLeNet network (Figure 6a) it can be seen that for the largest error event, 12 plants with Alternaria (B) were wrongly diagnosed between Bacterial Leaf Spot (A), Late Blight (D), Septoria Leaf Spot (F), and Corynespora Leaf Blight (H), the latter being confused with Spider Mite in 11 samples.In view of this, it should be noted that GoogLeNet also has structural limitations, such as those highlighted by [36] when reporting on the combination of the small and large filters (1 × 1 + 3 × 3 + 5 × 5) that this network uses.Small filters (3 × 3) in the convolutions are ideal for capturing detailed information, which is assumed to be the reason some particularities are not very well learned and only later recognised by this CNN.

Inception-v3
The Inception-v3 network is composed of several symmetrical and asymmetrical blocks, where each block has several convolutional branches and sequential layers as seen in GoogLeNet.Among the differences between these versions is the size of the filters used in the convolutions.Inception-v3 dropped the larger (5 × 5) filters and replaced them with two smaller (3 × 3) filters.Comparing the two networks, [38] explain that in GoogLeNet, medium pooling is used at the end of the network, averaging each feature map from (7 × 7) to (1 × 1).In Inception-v3, with 20 deeper layers, an efficient reduction in grid size was proposed as an alternative to the max pooling layers.As such, the feature maps are obtained individually by convolution, and at the end are concatenated and forwarded to the next module.
However, these improvement strategies proved to be irrelevant to the conditions of this research, given the overall equivalence between the accuracies of the two networks.In a synoptic approach, [38] noted that they achieved superior performance metrics for GoogLeNet when comparing it to Inception-v3 in a classic interpretation of handwritten digits.
Analysing the results obtained by the third generation of Inception (Figure 7), the loss of sensitivity of the algorithm in separating samples with visually correlated symptoms was obvious, especially in the case of Corynespora Leaf Blight (H) and Late Blight (D).In the classification by Inception-v3, 13 plants with class (D) symptoms and 17 with In view of this, it should be noted that GoogLeNet also has structural limitations, such as those highlighted by [36] when reporting on the combination of the small and large filters (1 × 1 + 3 × 3 + 5 × 5) that this network uses.Small filters (3 × 3) in the convolutions are ideal for capturing detailed information, which is assumed to be the reason some particularities are not very well learned and only later recognised by this CNN.

Inception-v3
The Inception-v3 network is composed of several symmetrical and asymmetrical blocks, where each block has several convolutional branches and sequential layers as seen in GoogLeNet.Among the differences between these versions is the size of the filters used in the convolutions.Inception-v3 dropped the larger (5 × 5) filters and replaced them with two smaller (3 × 3) filters.Comparing the two networks, [38] explain that in GoogLeNet, medium pooling is used at the end of the network, averaging each feature map from (7 × 7) to (1 × 1).In Inception-v3, with 20 deeper layers, an efficient reduction in grid size was proposed as an alternative to the max pooling layers.As such, the feature maps are obtained individually by convolution, and at the end are concatenated and forwarded to the next module.
However, these improvement strategies proved to be irrelevant to the conditions of this research, given the overall equivalence between the accuracies of the two networks.In a synoptic approach, [38] noted that they achieved superior performance metrics for GoogLeNet when comparing it to Inception-v3 in a classic interpretation of handwritten digits.
Analysing the results obtained by the third generation of Inception (Figure 7), the loss of sensitivity of the algorithm in separating samples with visually correlated symptoms was obvious, especially in the case of Corynespora Leaf Blight (H) and Late Blight (D).In the classification by Inception-v3, 13 plants with class (D) symptoms and 17 with class (H) symptoms were labelled as Alternariosis (B).Additionally, ten samples from the remainder of the class (H) were diagnosed as 'positive' for Septoria Leaf Spot (F).

GamaNNet
Various guidelines were defined with the idea of modelling a new architecture that was computationally economical and had good discriminatory ability.Essentially, the network should rationalise the number of trainable parameters, and categorical accuracy during the validation phase should be greater than 90%.The optimal architecture was, therefore, identified as that where the highest accuracy was associated with the lowest computational cost, showing the best compromise.
The input dimension was the original dimension of the database since we sought to preserve the sharpness and quality of the images.As such, RGB images (three colour channels) with dimensions of 256 × 256 were able to enter the network.A sequential model was allowed for layer-by-layer implementation.Although a kernel dimension of (3 × 3) is the most common, iterations have shown that a smaller kernel (2 × 2) would be more advisable, as more characteristic patterns are brought to the fore.As the symptoms of the diseases are irregular and, most times localised, the smaller kernel size was included, even when triggering larger volumes of information within the network.
Six convolutional layers were sequenced with an initial number of (32) filters, progressively doubling in value until the final convolutional layer (1024), followed by four layers of max pooling and two fully connected layers.Batch normalisation took place after the final convolution.The strategy of increasing filters gradually allowed the information in each piece of data to be split in half after each convolution.The Rectified Linear Unit (ReLU) activation function was used after each convolution, reducing training time and allowing values greater than zero to be transmitted between the layers.The max pooling layers that reduce the dimensionality of the data were also standardised at (2 × 2).
The flattened layer at the end of the six convolutional layers condensed all the parameters loaded so far, and two densely connected layers were implemented with 1024 and 10 neurons, respectively.Dropout regularisation was applied between the two dense layers, and the Softmax function was used based on the expected values for the final neurons being probabilities.
By the end, 12,245 million parameters were generated, of which only 512 were untrained.Training took place over one hundred epochs when lower losses and higher categorical accuracies were found.The classifier showed no loss of sensitivity.During the validation of the optimal architecture, the oscillations presented accuracies close to 90%,

GamaNNet
Various guidelines were defined with the idea of modelling a new architecture that was computationally economical and had good discriminatory ability.Essentially, the network should rationalise the number of trainable parameters, and categorical accuracy during the validation phase should be greater than 90%.The optimal architecture was, therefore, identified as that where the highest accuracy was associated with the lowest computational cost, showing the best compromise.
The input dimension was the original dimension of the database since we sought to preserve the sharpness and quality of the images.As such, RGB images (three colour channels) with dimensions of 256 × 256 were able to enter the network.A sequential model was allowed for layer-by-layer implementation.Although a kernel dimension of (3 × 3) is the most common, iterations have shown that a smaller kernel (2 × 2) would be more advisable, as more characteristic patterns are brought to the fore.As the symptoms of the diseases are irregular and, most times localised, the smaller kernel size was included, even when triggering larger volumes of information within the network.
Six convolutional layers were sequenced with an initial number of (32) filters, progressively doubling in value until the final convolutional layer (1024), followed by four layers of max pooling and two fully connected layers.Batch normalisation took place after the final convolution.The strategy of increasing filters gradually allowed the information in each piece of data to be split in half after each convolution.The Rectified Linear Unit (ReLU) activation function was used after each convolution, reducing training time and allowing values greater than zero to be transmitted between the layers.The max pooling layers that reduce the dimensionality of the data were also standardised at (2 × 2).
The flattened layer at the end of the six convolutional layers condensed all the parameters loaded so far, and two densely connected layers were implemented with 1024 and 10 neurons, respectively.Dropout regularisation was applied between the two dense layers, and the Softmax function was used based on the expected values for the final neurons being probabilities.
By the end, 12,245 million parameters were generated, of which only 512 were untrained.Training took place over one hundred epochs when lower losses and higher categorical accuracies were found.The classifier showed no loss of sensitivity.During the validation of the optimal architecture, the oscillations presented accuracies close to 90%, ensuring the high performance of this network when working with symptomatic tomato leaves (Figure 8).
ensuring the high performance of this network when working with symptomatic tomato leaves (Figure 8).The confusion matrix and ROC curve prepared for this network also displayed susceptibility when labelling plants affected with Alternariosis (Figure 9), diagnosing Late Blight (D) and Corynespora Leaf Blight (H).In fact, the latter received far more incorrect samples than any other class.As for improvement, GamaNNet was satisfactorily able to detect Bacterial Leaf Spot (A) and healthy leaves (C), as well as the viruses TYLCV (I) and ToMV(J).In this experiment, the performance matched that of the extremely deep ResNet-152 network, with 58 million parameters, and outperformed classic networks like ResNet-50, LeNet-5, VGG-19, and AlexNet.From the computational point of view, the features extracted at the end of each convolutional layer include important information that must be passed on to the next convolutional block.As a scientific curiosity, the most relevant features that describe the symptoms of Bacterial Leaf Spot (A) were sequenced to allow the by-products transmitted between the hidden layers to be visualised.Figure 10 shows different features extracted from the GamaNNet convolutions when interpreting one class during training.This process, The confusion matrix and ROC curve prepared for this network also displayed susceptibility when labelling plants affected with Alternariosis (Figure 9), diagnosing Late Blight (D) and Corynespora Leaf Blight (H).In fact, the latter received far more incorrect samples than any other class.As for improvement, GamaNNet was satisfactorily able to detect Bacterial Leaf Spot (A) and healthy leaves (C), as well as the viruses TYLCV (I) and ToMV(J).In this experiment, the performance matched that of the extremely deep ResNet-152 network, with 58 million parameters, and outperformed classic networks like ResNet-50, LeNet-5, VGG-19, and AlexNet.
ensuring the high performance of this network when working with symptomatic tomato leaves (Figure 8).The confusion matrix and ROC curve prepared for this network also displayed susceptibility when labelling plants affected with Alternariosis (Figure 9), diagnosing Late Blight (D) and Corynespora Leaf Blight (H).In fact, the latter received far more incorrect samples than any other class.As for improvement, GamaNNet was satisfactorily able to detect Bacterial Leaf Spot (A) and healthy leaves (C), as well as the viruses TYLCV (I) and ToMV(J).In this experiment, the performance matched that of the extremely deep ResNet-152 network, with 58 million parameters, and outperformed classic networks like ResNet-50, LeNet-5, VGG-19, and AlexNet.From the computational point of view, the features extracted at the end of each convolutional layer include important information that must be passed on to the next convolutional block.As a scientific curiosity, the most relevant features that describe the symptoms of Bacterial Leaf Spot (A) were sequenced to allow the by-products transmitted between the hidden layers to be visualised.Figure 10 shows different features extracted from the GamaNNet convolutions when interpreting one class during training.This process, From the computational point of view, the features extracted at the end of each convolutional layer include important information that must be passed on to the next convolutional block.As a scientific curiosity, the most relevant features that describe the symptoms of Bacterial Leaf Spot (A) were sequenced to allow the by-products transmitted between the hidden layers to be visualised.Figure 10 shows different features extracted from the GamaNNet convolutions when interpreting one class during training.This process, however, is intrinsic to any convolutional architecture, with similar (and even more sophisticated) abstractions being developed at greater processing depths.
however, is intrinsic to any convolutional architecture, with similar (and even more sophisticated) abstractions being developed at greater processing depths.Right from the first layer, it can be seen that the network is trying to learn any patterns that may be available in the images; for this reason, all the features of an RGB image can be recognised in a single extraction.However, as we delve deeper into the convolutional layers, the extractions begin to focus on specific disease patterns only, whether on the marginal blight of the leaves or scattered points on the leaf blade, around the edges or even on the most obvious veins.The results of the max pooling layers played a fundamental role in these abstractions, as they were able to sweep the image to reduce the dimension of the data while preserving the most relevant information in the previous dimension.
Each feature overvalued only one characteristic of the image at a time, and therefore, each extraction was able to list various features in sequence as fractions of one original piece of data.This means that the deeper the convolutional network, the more abstract the features it highlights.For shallower layers, the information may still be intelligible, but by the end of the convolutions, the features are humanly indecipherable.

Discussion
As defined by [39], plant pathologies are predominantly identified through their visual manifestations, which encompass the shape, appearance, and texture of the affected areas.Yet, it should be emphasised that low diversity could affect the training ability of the models and reduce the ability to classify a small number of plant diseases correctly.Moreover, as noted by [40], the generalisability of certain models is called into question, given their evaluation of a restricted range of flora and their constrained capability in detecting diverse diseases within a single specimen.
The international literature is rich in simulations with public datasets where almost all the CNNs have an overall accuracy greater than 99.0%.However, 'failures' in the interpretation of new images are also welcome in a classification model because they suggest congruent and generic reasoning by the CNN, even if the reasoning is not always accurate.Although the best classifiers are desirable, it is necessary to prioritise any substantial ability of the algorithm to generalise, i.e., ensuring good results even in the face of new, equivalent, but diverse data.Overfitting any model masks the reliability of replicating the methodology using unseen data.
More detailed inspections allowed each architecture to be characterised in terms of its limitations.Sensitivity corresponds to the assertiveness of the CNN computing the proportion of the plants that were correctly diagnosed among the total of plants diagnosed for any given class.Specificity refers to the proportion of the plants correctly diagnosed in any given class among all the plants in the dataset that actually have the disease, i.e., that are in both this and the other classes.In short, sensitivity and specificity assess the positive and negative effectiveness of the algorithm for each class, while the F1-score groups these two perspectives into a single index.
According to Figure 11, it can be seen that Alternariosis and Corynespora Leaf Spot are a dead end for both networks.Inception-v3 is more likely to misclassify plants of other Right from the first layer, it can be seen that the network is trying to learn any patterns that may be available in the images; for this reason, all the features of an RGB image can be recognised in a single extraction.However, as we delve deeper into the convolutional layers, the extractions begin to focus on specific disease patterns only, whether on the marginal blight of the leaves or scattered points on the leaf blade, around the edges or even on the most obvious veins.The results of the max pooling layers played a fundamental role in these abstractions, as they were able to sweep the image to reduce the dimension of the data while preserving the most relevant information in the previous dimension.
Each feature overvalued only one characteristic of the image at a time, and therefore, each extraction was able to list various features in sequence as fractions of one original piece of data.This means that the deeper the convolutional network, the more abstract the features it highlights.For shallower layers, the information may still be intelligible, but by the end of the convolutions, the features are humanly indecipherable.

Discussion
As defined by [39], plant pathologies are predominantly identified through their visual manifestations, which encompass the shape, appearance, and texture of the affected areas.Yet, it should be emphasised that low diversity could affect the training ability of the models and reduce the ability to classify a small number of plant diseases correctly.Moreover, as noted by [40], the generalisability of certain models is called into question, given their evaluation of a restricted range of flora and their constrained capability in detecting diverse diseases within a single specimen.
The international literature is rich in simulations with public datasets where almost all the CNNs have an overall accuracy greater than 99.0%.However, 'failures' in the interpretation of new images are also welcome in a classification model because they suggest congruent and generic reasoning by the CNN, even if the reasoning is not always accurate.Although the best classifiers are desirable, it is necessary to prioritise any substantial ability of the algorithm to generalise, i.e., ensuring good results even in the face of new, equivalent, but diverse data.Overfitting any model masks the reliability of replicating the methodology using unseen data.
More detailed inspections allowed each architecture to be characterised in terms of its limitations.Sensitivity corresponds to the assertiveness of the CNN computing the proportion of the plants that were correctly diagnosed among the total of plants diagnosed for any given class.Specificity refers to the proportion of the plants correctly diagnosed in any given class among all the plants in the dataset that actually have the disease, i.e., that are in both this and the other classes.In short, sensitivity and specificity assess the positive and negative effectiveness of the algorithm for each class, while the F1-score groups these two perspectives into a single index.
According to Figure 11, it can be seen that Alternariosis and Corynespora Leaf Spot are a dead end for both networks.Inception-v3 is more likely to misclassify plants of other classes as having Alternariosis or Septoria Leaf Spot.The same occurs with GoogLeNet, which tends to classify plants as having Bacterial Leaf Spot or Corynespora Leaf Blight.These two scenarios reduce the sensitivity of the classifier.As for the specificity (recall) of Inception-v3, it can be seen that the diseases that most deviated from their proper label were Late Blight and Corynespora Leaf Blight, which migrated to other classes.The criticality of these metrics highlights the uncertainty of submitting the classifier to plants with these symptoms.In GoogLeNet, only Alternariosis showed this more acute behaviour.Achieving its greatest accuracy at 36 epochs, GamaNNet shows robust performance with a validation accuracy of 93.31% and an overall test accuracy of 91.0%, supported by an AUC of 0.949, indicating its effectiveness in plant disease diagnosis.
By analysing the duration and the number of parameters (Table 1), we can infer that GamaNNet offers a competitive prediction speed, as it requires fewer computational resources and provides a faster overall processing time.Specifically, the average number of parameters per minute (0.873) is significantly higher compared to other state-of-the-art architectures, reflecting its optimised design for faster and more efficient predictions.This efficiency not only reduces training time but also translates to faster predictions in practical applications.In summary, GamaNNet provides a satisfactory balance between accuracy and computational efficiency.
In mina complex task, one less problematic class can be decisive, meaning less risk of errors by the classifier and, consequently, greater probabilities of a correct diagnosis.Given the performance of both architectures and the limitations discussed above, the authors recommend applying the GoogLeNet network for similar activities of detection and foliar diagnosis due to the levels of sensitivity and specificity being maintained above 90%.
By inspecting our database, it is possible to understand the difficulties in discerning the classes of the BDFH group of plant pathologies (Figure 12).Any mistake is understandable since there is greater similarity between these classes and greater dissimilarity between the others.Added to this is the greater or lesser degree of severity that the leaf tissue may present.By analysing the duration and the number of parameters (Table 1), we can infer that GamaNNet offers a competitive prediction speed, as it requires fewer computational resources and provides a faster overall processing time.Specifically, the average number of parameters per minute (0.873) is significantly higher compared to other state-of-the-art architectures, reflecting its optimised design for faster and more efficient predictions.This efficiency not only reduces training time but also translates to faster predictions in practical applications.In summary, GamaNNet provides a satisfactory balance between accuracy and computational efficiency.
In mina complex task, one less problematic class can be decisive, meaning less risk of errors by the classifier and, consequently, greater probabilities of a correct diagnosis.Given the performance of both architectures and the limitations discussed above, the authors recommend applying the GoogLeNet network for similar activities of detection and foliar diagnosis due to the levels of sensitivity and specificity being maintained above 90%.
By inspecting our database, it is possible to understand the difficulties in discerning the classes of the BDFH group of plant pathologies (Figure 12).Any mistake is understandable since there is greater similarity between these classes and greater dissimilarity between the others.Added to this is the greater or lesser degree of severity that the leaf tissue may present.
Recognising that the focus of the present study on 10 non-simultaneous plant quarantine problems is insufficient to capture the full spectrum of plant symptoms, it is acknowledged that there are crucial limitations regarding the generalisability and broader agricultural applicability of the model.Future studies should encompass a wider array of plant diseases and collect images under varied field conditions, capturing natural variability in disease presentation, such as differences in lighting, background, and severity stages.These enhancements aim to ensure the efficiency, robustness, and reliability of the model in diverse real-world scenarios.

Conclusions
In light of the objectives established in this investigation, it can be posited that each of the models employed in this study exhibits distinct advantages as well as limitations.Nonetheless, under the conditions of the study, the GoogleNet model demonstrated the highest accuracy in classifying diseases in tomato leaves.
With a hit rate surpassing 90% and offering the second most-favourable trade-off among the evaluated CNN models, these findings indicate that the performance of the GamaNNet architecture is competitively efficient when compared with the accepted architectures such as VGG-16 or ResNet-152.Given the complexity to which it was subjected, the GamaNNet model was underscored by its balance between accuracy and computational resource utilisation, making it an effective tool for practical applications in precision agriculture.
Future endeavours of this research consist of implementing disease classification for a broader array of crops, leveraging in situ data collection through smartphone technology.By extending this research to recognise phytosanitary problems in other crops, the authors hope to join the emerging AI movement in the field and have a valuable impact on ensuring food security and optimising crop production.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/agriengineering6030153/s1,Table S1.Symptomatic table of the classes included in this study.Table S2.Summary table of the hyperparameters included in training each CNN.Table S3.Evaluation metrics for classification by the networks during the test phase.
In order to reduce any bias in the image bank during classification, the samples were evenly distributed (n = 1000) between the classes.The contrasting background highlighted leaf tissue under varying levels of severity in infections by (i) Bacterial Leaf Spot; (ii) Alternariosis; (iii) Late blight; (iv) Leaf Mould; (v) Septoria Leaf Spot; (vi) Corynespora Leaf Blight; (vii) Tomato Yellow Leaf Curl (TYLCV); (viii) Tomato Mosaic (ToMV); and infestation by (viii) Spider Mite.The individual samples of the regimented dataset are shown in Figure 1 and their classes are described in Table S1 (Supplementary Materials).AgriEngineering 2024, 6, 153 3 of 17 variations in the shape of the target and in the positioning angle relative to the sensor and the light source) conditions as they portray nine phytosanitary problems commonly found in agricultural areas with serious economic impact on crop exploitation.In order to reduce any bias in the image bank during classification, the samples were evenly distributed (n = 1000) between the classes.The contrasting background highlighted leaf tissue under varying levels of severity in infections by (i) Bacterial Leaf Spot; (ii) Alternariosis; (iii) Late blight; (iv) Leaf Mould; (v) Septoria Leaf Spot; (vi) Corynespora Leaf Blight; (vii) Tomato Yellow Leaf Curl (TYLCV); (viii) Tomato Mosaic (ToMV); and infestation by (viii) Spider Mite.The individual samples of the regimented dataset are shown in Figure 1 and their classes are described in Table

Figure 2 .
Figure 2. Workflow for classifying plant diseases in tomato leaves.

Figure 3 .
Figure 3. Representation of the general structure of convolutional neural networks (CNN).

Figure 2 .
Figure 2. Workflow for classifying plant diseases in tomato leaves.

Figure 2 .
Figure 2. Workflow for classifying plant diseases in tomato leaves.

Figure 3 .
Figure 3. Representation of the general structure of convolutional neural networks (CNN).

Figure 3 .
Figure 3. Representation of the general structure of convolutional neural networks (CNN).

Figure 4 .
Figure 4. (a) Accuracy during the training and validation of ResNet-50; (b) loss during the training and validation of ResNet-50.

Figure 4 .
Figure 4. (a) Accuracy during the training and validation of ResNet-50; (b) loss during the training and validation of ResNet-50.

Figure 8 .
Figure 8.(a) Accuracy during GamaNNet training and validation; (b) loss during GamaNNet training and validation.

Figure 8 .
Figure 8.(a) Accuracy during GamaNNet training and validation; (b) loss during GamaNNet training and validation.

Figure 8 .
Figure 8.(a) Accuracy during GamaNNet training and validation; (b) loss during GamaNNet training and validation.

Figure 10 .
Figure 10.Feature extraction after each convolutional layer of the GamaNNet architecture.

Figure 10 .
Figure 10.Feature extraction after each convolutional layer of the GamaNNet architecture.

Figure 11 .
Figure 11.(a) Sensitivity, specificity, and F1-score for GoogLeNet; (b) Sensitivity, specificity, and F1-score for Inception-v3.GamaNNet network maintains a high input dimension (256 × 256), preserving detailed information crucial for classification.With 12 layers, it balances complexity and efficiency, achieving high accuracy without excessive training times like ResNet-152 (135.68 min) and VGG-19 (85.42 min).The training duration for GamaNNet is 14.02 min, with 12.245 million parameters, making it more efficient than ResNet-152 (58.352 million).Achieving its greatest accuracy at 36 epochs, GamaNNet shows robust performance with a validation accuracy of 93.31% and an overall test accuracy of 91.0%, supported by an AUC of 0.949, indicating its effectiveness in plant disease diagnosis.By analysing the duration and the number of parameters (Table1), we can infer that GamaNNet offers a competitive prediction speed, as it requires fewer computational resources and provides a faster overall processing time.Specifically, the average number of parameters per minute (0.873) is significantly higher compared to other state-of-the-art architectures, reflecting its optimised design for faster and more efficient predictions.This efficiency not only reduces training time but also translates to faster predictions in practical applications.In summary, GamaNNet provides a satisfactory balance between accuracy and computational efficiency.In mina complex task, one less problematic class can be decisive, meaning less risk of errors by the classifier and, consequently, greater probabilities of a correct diagnosis.Given the performance of both architectures and the limitations discussed above, the authors recommend applying the GoogLeNet network for similar activities of detection and foliar diagnosis due to the levels of sensitivity and specificity being maintained above 90%.By inspecting our database, it is possible to understand the difficulties in discerning the classes of the BDFH group of plant pathologies (Figure12).Any mistake is understandable since there is greater similarity between these classes and greater dissimilarity between the others.Added to this is the greater or lesser degree of severity that the leaf tissue may present.

Figure 11 .
Figure 11.(a) Sensitivity, specificity, and F1-score for GoogLeNet; (b) Sensitivity, specificity, and F1-score for Inception-v3.GamaNNet network maintains a high input dimension (256 × 256), preserving detailed information crucial for classification.With 12 layers, it balances complexity and efficiency, achieving high accuracy without excessive training times like ResNet-152 (135.68 min) and VGG-19 (85.42 min).The training duration for GamaNNet is 14.02 min, with 12.245 million parameters, making it more efficient than ResNet-152 (58.352 million).Achieving its greatest accuracy at 36 epochs, GamaNNet shows robust performance with a validation accuracy of 93.31% and an overall test accuracy of 91.0%, supported by an AUC of 0.949, indicating its effectiveness in plant disease diagnosis.By analysing the duration and the number of parameters (Table1), we can infer that GamaNNet offers a competitive prediction speed, as it requires fewer computational resources and provides a faster overall processing time.Specifically, the average number of parameters per minute (0.873) is significantly higher compared to other state-of-the-art architectures, reflecting its optimised design for faster and more efficient predictions.This efficiency not only reduces training time but also translates to faster predictions in practical applications.In summary, GamaNNet provides a satisfactory balance between accuracy and computational efficiency.In mina complex task, one less problematic class can be decisive, meaning less risk of errors by the classifier and, consequently, greater probabilities of a correct diagnosis.Given the performance of both architectures and the limitations discussed above, the authors recommend applying the GoogLeNet network for similar activities of detection and foliar diagnosis due to the levels of sensitivity and specificity being maintained above 90%.By inspecting our database, it is possible to understand the difficulties in discerning the classes of the BDFH group of plant pathologies (Figure12).Any mistake is understandable since there is greater similarity between these classes and greater dissimilarity between the others.Added to this is the greater or lesser degree of severity that the leaf tissue may present.

Table 1 .
Summary table of parameters and performance for each architecture.
* the best-performing models for this class of problems.