Algorithms and Models for Automatic Detection and Classiﬁcation of Diseases and Pests in Agricultural Crops: A Systematic Review

: Plant diseases and pests signiﬁcantly inﬂuence food production and the productivity and economic proﬁtability of agricultural crops. This has led to great interest in developing technological solutions to enable timely and accurate detection. This systematic review aimed to ﬁnd studies on the automation of processes to detect, identify and classify diseases and pests in agricultural crops. The goal is to characterize the class of algorithms, models and their characteristics and understand the efﬁciency of the various approaches and their applicability. The literature search was conducted in two citation databases. The initial search returned 278 studies and, after removing duplicates and applying the inclusion and exclusion criteria, 48 articles were included in the review. As a result, seven research questions were answered that allowed a characterization of the most studied crops, diseases and pests, the datasets used, the algorithms, their inputs and the levels of accuracy that have been achieved in automatic identiﬁcation and classiﬁcation of diseases and pests. Some trends that have been most noticed are also highlighted.


Introduction
Plant diseases and pests are considered one of the main factors influencing food production and responsible for significantly reducing crops' physical or economic productivity. In order to maintain control of production losses and maintain crop sustainability, some measures must be carried out properly, such as a constant monitoring of the crop, combined with the rapid and accurate diagnosis of the associated diseases, pests, or anomalies. These practices are usually recommended by specialists in plant pathology [1]. Farmers are aware of these challenges, and the role technology can play in addressing these threats in agriculture to increase agricultural productivity and operating profits. Technological progress has enabled the use of techniques and methods capable of optimizing agricultural returns [2], preserving natural resources [3], reducing unnecessary use of fertilizers [4], and identifying diseases in crops from remote sensing images [5]. Automatic detection, identification, and classification of diseases and pests in crops have attracted considerable attention from researchers. Currently, numerous studies propose distinct methods to approach this problem. This growing interest can be seen in the results obtained in some databases of scientific articles. As of March 2023, there have been 605 articles retrieved from Scopus when the query "automatic plant diseases detection" is searched. With the same query, 341 articles are retrieved from the Web of Science database. It is also verified that the results have increased exponentially in recent years. Additionally, the results of many of these studies show the potential of this kind of solution in automatically detecting and classifying diseases and pests in crops and their potential applicability in solutions to

Related Work
In recent years, several studies have reviewed works related to the automatic detection, identification, and classification of diseases and pests in agricultural crops. To better understand the work that has been carried out in this area, a search was accomplished in the Scopus database. Thus, after some initial experiments and considering an initial analysis of some literature studies, five terms were identified and used in the search. First, terms related to the type of study, in this case, review. Second, the term "automatic". Third, terms related to pests, plagues, or diseases. Fourth, terms related to agriculture, crops, leaves, or plants were added to the search terms, as they represent the study area or plants or parts where diseases or pests are typically visible. Finally, classification, identification, or detection terms were also included.
The complete string for the search is as follows: review AND automatic AND (pest* OR plague* OR disease*) AND (crop* OR leaf* OR plant* OR agricul*) AND (classification OR identification OR detection) The database search was conducted in February 2023 and has considered the field's article title, abstract, and keywords. Only studies published in 2012 or later were considered. After searching for the literature, 98 studies were obtained. These studies were evaluated in terms of title and abstract, resulting in the exclusion of 78 studies. Most of them were excluded because they were not studies presenting reviews. Among the others, their focus differed from the intended one (5) or because their full text was unavailable (3). The remaining 20 studies were underwent qualitative and quantitative analysis. Table 1 summarizes some of their characteristics, namely the year in which the article was published, the main objective of the review, the crop(s) analyzed in the review, the number of studies included in its analysis (when the analysis covers several areas, only the studies that are focused on identification or classification of diseases or pests are considered), and the time span of studies considered in the review. The interest that automatic detection, identification, and classification of diseases and pests in agricultural crops have attracted, and the growing importance it has gained in recent years, is evident. After applying the inclusion and exclusion criteria, all identified studies were published between 2020-2022, with a higher prevalence in 2022. Furthermore, more than two-thirds of the studies only analyzed articles published in the last ten years (published in 2013 or later). This highlights the interest that this study area has attracted in recent years. It also indicates that it is still a new area of research and has been the focus of more significant interest only recently.
ML approaches are very popular for automatically detecting and monitoring diseases and pests. Eighty percent of the reviews specifically address the use of ML algorithms and, more specifically, deep learning algorithms.
Regarding the type of crops studied, 40% of the articles describe reviews of studies that specifically addressed a single crop (e.g., tomato, rice, cotton, potato). The remaining 12 (60%) reviews analyze studies related to multiple crops. The number of studies analyzed in each review is quite diverse (see Figure 1). Twenty percent of reviews analyze 10 studies or fewer, and 45% analyze 20 studies or fewer. Considering the number of studies that have appeared in recent years, the number of studies included in these reviews is insignificant. On the other hand, 40% (eight reviews) of the reviews analyze more than 40 studies. Of these, three reviews analyze specific crops (tomato, maize, and grains). The other five reviews analyze multiple crops. One of them ( [25]) presents a review of sensors for the automatic detection and monitoring of insect pests, which is different from the focus of the review presented in this article.
ML approaches are very popular for automatically detecting and monitoring diseases and pests. Eighty percent of the reviews specifically address the use of ML algorithms and, more specifically, deep learning algorithms.
Regarding the type of crops studied, 40% of the articles describe reviews of studies that specifically addressed a single crop (e.g., tomato, rice, cotton, potato). The remaining 12 (60%) reviews analyze studies related to multiple crops.
The number of studies analyzed in each review is quite diverse (see Figure 1). Twenty percent of reviews analyze 10 studies or fewer, and 45% analyze 20 studies or fewer. Considering the number of studies that have appeared in recent years, the number of studies included in these reviews is insignificant. On the other hand, 40% (eight reviews) of the reviews analyze more than 40 studies. Of these, three reviews analyze specific crops (tomato, maize, and grains). The other five reviews analyze multiple crops. One of them ( [25]) presents a review of sensors for the automatic detection and monitoring of insect pests, which is different from the focus of the review presented in this article. Considering the dimension of the review, i.e., the number of studies included in each review and the scope, the remaining four reviews ( [10,12,19,21]) have some similarities with the review presented in this article. However, some of them focus on approaches that use specific algorithms, [10] focuses on CNN in the detection of plant leaf diseases, and [12,19] focus on deep learning-based plant disease detection. In these cases, a fundamental difference, in relation to the approach presented here, is that they only focused on approaches that used ML algorithms, which, right from the start, only allow conclusions to be drawn within this specific area. Some other differences focus on the parameters considered in the analysis and the research questions investigated. In this sense, this work represents a step forward concerning other related works, thus representing a significant contribution to this study area.

Methodology
This section reviews studies that addressed automatic detection and identification systems for diseases and pests in agricultural crops. The review was carried out following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [1]. It includes the following steps, which correspond to the section or subsection of this article listed after each step:

•
Intended goals of the review (Section 1).

•
How the search was conducted (Section 3.1).

•
Screening for inclusion (Section 3.2).  Considering the dimension of the review, i.e., the number of studies included in each review and the scope, the remaining four reviews ( [10,12,19,21]) have some similarities with the review presented in this article. However, some of them focus on approaches that use specific algorithms, [10] focuses on CNN in the detection of plant leaf diseases, and [12,19] focus on deep learning-based plant disease detection. In these cases, a fundamental difference, in relation to the approach presented here, is that they only focused on approaches that used ML algorithms, which, right from the start, only allow conclusions to be drawn within this specific area. Some other differences focus on the parameters considered in the analysis and the research questions investigated. In this sense, this work represents a step forward concerning other related works, thus representing a significant contribution to this study area.

Methodology
This section reviews studies that addressed automatic detection and identification systems for diseases and pests in agricultural crops. The review was carried out following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [1]. It includes the following steps, which correspond to the section or subsection of this article listed after each step:

•
Intended goals of the review (Section 1).

•
How the search was conducted (Section 3.1).

•
Analysis and Discussion (Section 5). • Writing the review.
Next, the procedure used to arrive at the related work, the data sources and keywords used in the research, the process, data selection, data extraction, and analysis, which will be used in this work, are described.
For the analysis of the related work, articles that address the use of algorithms for automatically identifying diseases and pests in crops were studied. After obtaining the first results, the data underwent a selection process to eliminate irrelevant articles or articles Appl. Sci. 2023, 13, 4720 5 of 16 that were out of context. After selecting the relevant articles, they were analyzed according to predefined parameters.

Search Strategy
The Scopus [26] and Web of Science [27] databases were used as data sources. These databases are among the most complete in several areas and provide an advanced search that allows users to configure search words in different fields, such as in the title, keywords, and throughout the text, among others. It also allows adding logical operators: AND, OR, and NOT. In this way, access was gained to a significant part of scientific work in informatics.
In collecting the first sample of articles, the keyword "automatic" and terms related to diseases or pests ("pest," "disease," plague), terms related to crops (crop, leaf, plant or agriculture), and terms related to identification (classification, identification, and detection) were used. To make it possible to search for articles that contain the indicated terms, the symbol "*" was used to represent other terms that may contain the identified keywords. Thus, the string for the search is as follows: automatic AND (pest* OR plague* OR disease*) AND (crop* OR leaf* OR plant* OR agricul*) AND (classification OR identification OR detection) The search was carried out on 5 December 2022 in the title or keywords field of the document, and 278 (253 Scopus and 25 Web of Science) results were obtained.
This research included studies published in the last 10 years (since 2013), studies published in a scientific peer-reviewed publication, and studies written in English.

Screening for Inclusion
In this screening, the reviewers considered that studies should only be included in the review if they met the following criteria: (1) Studies that presented a solution for automatic detection and identification systems for diseases and pests in crops; (2) Studies with full text available.
Studies that met only some of these criteria were excluded. After removing duplicated articles (23), 255 studies remained. Moreover, after applying criteria (1), 143 more records were excluded. At this stage, the reviewers did not judge the quality or evaluate the information found in each study. Furthermore, after applying criteria (2), more than 29 were excluded. This resulted in 83 studies.

Screening for Exclusion
With a more in-depth reading of the 83 articles, 35 were eliminated for needing more information or being outside the intended focus, reviews (6) and incomplete information (29), leaving a total of 48 articles to analyze.

Results Summary
As shown in Figure 2, after searching the literature, 255 papers were obtained (after removing 23 duplicates), referred to as the 'identification' stage in the diagram; after applying the inclusion criteria identified in Section 3.2 'Screening for Inclusion' and in the 'screening' section of the diagram, 172 papers were excluded, resulting in 83 papers.
A full-text evaluation of the papers was performed, thus excluding papers that did not match the intended focus, some that were just a review, and papers that did not have complete information; this step is represented in the figure as "eligibility." The 48 papers remaining at the end were featured in the synthesis and were the "included" studies in the flowchart. Appl  A full-text evaluation of the papers was performed, thus excluding papers that did not match the intended focus, some that were just a review, and papers that did not have complete information; this step is represented in the figure as "eligibility." The 48 papers remaining at the end were featured in the synthesis and were the "included" studies in the flowchart.
This study focused on finding papers directly linked to identifying, detecting, and eventually automatically classifying anomalies in agricultural crops. In this context, the following papers show several examples of this process where several techniques were presented as a solution. Some datasets were also mentioned, where one can see the origin of the information in each experience presented. In addition, it takes into account that one of the main focuses of the authors in their research was to highlight the accuracy of the algorithms or models used to classify diseases or pests in crops.

Data Extraction and Analysis
After selecting the articles that met the inclusion and exclusion criteria, the data extraction stage followed. At this stage, all those articles were fully read and analyzed according to criteria to extract information that allows answering the previously identified research questions. Thus, each of the 48 selected articles was analyzed and summarized considering the following criteria: • Year of publication.

•
What type of approach is described (algorithm or end user application)? • What types of crops is it intended for? • What are the inputs for the proposed algorithms? • What algorithms are used? • What information is used for the training and validation of the algorithms? This study focused on finding papers directly linked to identifying, detecting, and eventually automatically classifying anomalies in agricultural crops. In this context, the following papers show several examples of this process where several techniques were presented as a solution. Some datasets were also mentioned, where one can see the origin of the information in each experience presented. In addition, it takes into account that one of the main focuses of the authors in their research was to highlight the accuracy of the algorithms or models used to classify diseases or pests in crops.

Data Extraction and Analysis
After selecting the articles that met the inclusion and exclusion criteria, the data extraction stage followed. At this stage, all those articles were fully read and analyzed according to criteria to extract information that allows answering the previously identified research questions. Thus, each of the 48 selected articles was analyzed and summarized considering the following criteria: • Year of publication. • What type of approach is described (algorithm or end user application)? • What types of crops is it intended for? • What are the inputs for the proposed algorithms? • What algorithms are used? • What information is used for the training and validation of the algorithms? • What results were obtained in terms of accuracy and diseases or pests identified?
Next, the analysis of the 48 articles is summarized, considering these perspectives. A summary of the characteristics of the included articles is summarized in Tables 2 and 3.   Table 3 identifies the dataset, the proposed algorithm, and the accuracy achieved in each one of the algorithms described in each article.

Discussion
In this section, some details and results of the review are discussed. This discussion will follow the answers to the research questions initially proposed in this review.
Although the search strategy considered works published in the last 10 years, after applying the inclusion and exclusion criteria, only studies published since 2017 (last six years) remained to be included in the review. This reveals the interest that this research area has attracted in recent years, but it also indicates that it is still a new area of research.
Most of the studies included in the review (43/48-89.6%) describe algorithms and models with a focus on analyzing the performance of these algorithms. Only 10.4% (5/48) of the analyzed studies present applications (web, mobile, or robotic systems) that end users can use. This seems to indicate that a significant part of the research effort has been focused on the study of new algorithms in several crops and on trying to achieve high levels of accuracy. The development and presentation of solutions with applications that can be used by end users in real use environments has been less significant. However, in this case, these approaches may represent solutions that are closer to the reality in which they can be used, and they need to be further investigated and validated in real-world scenarios.
There is a significant predominance of ML-based approaches regarding the analyzed algorithms and models.
For each of the research questions identified as the target of this study, described in Section 1 below and based on the results of the systematic review (presented earlier), we present answers to each of them below.
The review shows that the crops most focused on, in studies of automatic detection of diseases and pests (RQ1), are tomatoes and citrus. Tomato crops are studied in 22.5% (9/40) of the analyzed studies. Citrus was analyzed in 15.0% (6/40) of the studies. Rice, grapes, beans, and corn are in third place, with 7.5% each (3/40). Then come the apple, peach, pepper, cotton, and paddy cultures with 5.0% each (2/40), and lastly follow the cucumber, banana, cassava, peanut, sunflower, herbs, raspberry, strawberry, brinjal, tea, mustard, coffee cultures, soybean, sugar, guava, and lemon with 2.5% (1/40) of the analyzed studies. In this question, only 40 studies were referenced because eight did not have consistent information.
The automatic detection of diseases has attracted more attention than the detection of pests (RQ2). The review shows that 85.7% (40/48) of the studies focus on diseases, and 18.8% (9/48) of studies refer to plagues or pests (one of the studies refers to diseases and pests). The most studied diseases were in the tomato crop (the most studied crop): ToLCNDV & ToLCGV (begomovirus infections), early blight, and late blight. For pests or plagues, those most commonly found were wheat mites, wheat aphids, wheat sawflies, and rice plant hoppers.
All analyzed studies use images of leaves, fruits, plants, or insects as input to their algorithms for detecting diseases and pests. Leaf images were the most common input for detection/identification/classification (RQ3). The analysis shows that 85.4% (41/48) of the studies refer to this input. Next, it is found that 8.3% (4/48) of the studies refer to insect images and 4.2% (2/48) refer to fruit, and 4.2% (2/48) to plant images. Additionally, practically all algorithms that use images of leaves use images in which the leaf is the main element of the image. Usually, the image of the leaf occupies practically the entire image area. Only a minority of studies use leaf images from vehicles such as UAVs. The datasets used for pest detection include leaf images and insect images from sticky traps.
Generally, the analyzed approaches use image datasets to train and validate the proposed algorithms. In these cases, image datasets that are publicly available can be used, or new datasets can be constructed and used. The analysis of the studies revealed that PlantVillage is the most used dataset (RQ4). It was used in 24.4% (11/45) of the studies. In sequence, 8.9% (4/45) of the studies used the Kaggle dataset. Other datasets were also used, each one in one study: PlantDoc Middlebury dataset, Heilongjiang Academy of Land Reclamation Sciences-China, Plant health, NBAIR dataset, Xie1 and Xie2 dataset, Coffee Leaf dataset, PlantPathology, CIFAR-10 dataset and a dataset from paddy farmlands situated at UAS, India. A significant number of studies, 46.7% (21/45), used self-collected datasets. Three studies did not provide information about the datasets used.
Regarding the algorithm or models most used (RQ5), the review shows that CNN models are the most commonly used for research or studies in this area. About 54.2% (26/48) of the studies refer to CNN models, namely Faster R-CNN, EficcientNet, VGG, GoogleNet, MobileNet, ResNet, AlexNet, LeNet, and DenseNet. Next, 16.7% (8/48)  The level of accuracy achieved in detecting, identifying, and classifying diseases or pests in agricultural crops (RQ6) depends on several factors, namely the crop and the diseases and pests considered. Higher levels of accuracy were achieved in the three crops mentioned in the studies. In the first case, tomato crops, where the highest precisions were found to vary from 90.3% to 99.89%. Next, in the second case, the citrus crop, for which accuracies vary from 88.96% to 98%. Finally, the accuracy in the potato crop varies from 89% to 97%. It is also essential to keep in mind that these values depend on the disease or pest being studied.
This review allowed us to identify some trends that have been observed either because they are used in a significant number of studies or because there has been a growing interest in their use over the years (RQ7). CNN-based algorithms tend to be the most commonly used to achieve the objectives in this area (see Figure 3 left). ML-based approaches are becoming popular in developing solutions for plant diseases and pest detection. Approximately 93.8% (45/48) of the studies in this review proposed ML-related approaches. In addition, the number of articles related to ML-based approaches tends to increase yearly, indicating its popularity and may grow even more. A deeper analysis shows that in the last decade, the most-used algorithms began to be cited in studies from 2017 onwards. An increase in CNN over the other algorithms was subsequently noticed (see Figure 3, right). Appl. Sci. 2023, 13, x FOR PEER REVIEW 12 of 17 Regarding the algorithm or models most used (RQ5), the review shows that CNN models are the most commonly used for research or studies in this area. About 54.2% (26/48)  The level of accuracy achieved in detecting, identifying, and classifying diseases or pests in agricultural crops (RQ6) depends on several factors, namely the crop and the diseases and pests considered. Higher levels of accuracy were achieved in the three crops mentioned in the studies. In the first case, tomato crops, where the highest precisions were found to vary from 90.3% to 99.89%. Next, in the second case, the citrus crop, for which accuracies vary from 88.96% to 98%. Finally, the accuracy in the potato crop varies from 89% to 97%. It is also essential to keep in mind that these values depend on the disease or pest being studied.
This review allowed us to identify some trends that have been observed either because they are used in a significant number of studies or because there has been a growing interest in their use over the years (RQ7). CNN-based algorithms tend to be the most commonly used to achieve the objectives in this area (see Figure 3 left). ML-based approaches are becoming popular in developing solutions for plant diseases and pest detection. Approximately 93.8% (45/48) of the studies in this review proposed ML-related approaches. In addition, the number of articles related to ML-based approaches tends to increase yearly, indicating its popularity and may grow even more. A deeper analysis shows that in the last decade, the most-used algorithms began to be cited in studies from 2017 onwards. An increase in CNN over the other algorithms was subsequently noticed (see  Accuracy highly depends on the crops, diseases, and pests considered, the algorithms used, and the datasets used. However, the results show that it is possible to achieve very high levels of accuracy in some diseases and pests in several crops. This indicates that these approaches can be used in applications that can be used in natural environments. Most of the datasets represent images of leaves taken with cameras, where the leaves are the central object of the photo, although the backgrounds can vary. This means that the training and validation of the solutions are carried out based on images that are obtained in scenarios in which a user will have to approach the tree, bring the camera closer to the leaf and take the picture, which is not always consistent with a use in a realistic scenario. There are still very few cases in which the images are obtained using cameras on land or in air vehicles, which would represent a step forward in obtaining solutions that are more adequate to the reality of agricultural crops. However, this type of Accuracy highly depends on the crops, diseases, and pests considered, the algorithms used, and the datasets used. However, the results show that it is possible to achieve very high levels of accuracy in some diseases and pests in several crops. This indicates that these approaches can be used in applications that can be used in natural environments. Most of the datasets represent images of leaves taken with cameras, where the leaves are the central object of the photo, although the backgrounds can vary. This means that the training and validation of the solutions are carried out based on images that are obtained in scenarios in which a user will have to approach the tree, bring the camera closer to the leaf and take the picture, which is not always consistent with a use in a realistic scenario. There are still very few cases in which the images are obtained using cameras on land or in air vehicles, which would represent a step forward in obtaining solutions that are more adequate to the reality of agricultural crops. However, this type of approach could be important in the future since the development of solutions that use this type of images as input will allow a closer use of the reality that is to operate in crop fields or even eventually its use in real time.

Strengths and Limitations of this Review
This review followed the PRISMA methodology. It provides a systematic review of the existing works that approach automatic detection, identification, and classification of diseases and pests in agricultural crops. This review is critical because it presents an overview of the most-studied crops and a characterization of the algorithms and models used, their inputs, the datasets most commonly used to train and validate them, and the accuracy achieved. In addition, it also presents some trends that have been observed. It represents an essential basis for academics and researchers to understand this study area and develop new algorithms and applications.
However, it also has some limitations. The literature search was carried out using two databases (Scopus and Web of Science). These databases cover several domains and span many individual databases. However, other databases, such as IEEE Xplore, ACM Digital Library, PubMed, ScienceDirect, or BMC, could have led to more articles being included in the review. The search strategy may have influenced the number of articles considered in the study. For example, the search string used, the option to search only for articles written in English or only for articles published in the last ten years, may also have influenced the number of relevant articles considered. Although these limitations may have affected the number of articles obtained and considered in the review, we believe these constraints did not significantly affect the discussion and conclusions.

Conclusions
This systematic review aimed to find studies on automating processes in detecting, identifying, and classifying diseases and pests in agricultural crops. It followed the PRISMA methodology. The literature search was conducted in two abstract and citation databases (Scopus and Web of Science). The initial search returned 278 studies, and after removing duplicates and applying the inclusion and exclusion criteria, 48 articles were included in the review. All analyzed studies propose the research structure related to the detection, identification, and automatic classification of diseases or pests in agricultural crops. This study presented the review, identifying the most studied crops, characterizing the proposed algorithms, results achieved, and the most-used datasets. It is important as a document to support researchers who intend to develop work in this area, characterizing this area of study and identifying some of the most noted trends. Considering the number of studies included in each review and the scope, few works are similar to this one. Of those that are most similar, some of them focus on approaches that use specific algorithms (mostly ML-based algorithms). This review did not have this prerequisite and, therefore, allows a broader discussion. Furthermore, it addresses different parameters and research questions. In this sense, it represents a step forward in relation to other related works, thus representing a significant contribution to this study area.
The results indicated that most of the studies were focused on algorithms or systems that allow the presentation of results using the various deep learning and ML techniques and that 95% of the studies focus on demonstrating the ability of specific algorithms and models in solving problems related to the automatic detection of diseases or pests. In all cases, it was necessary to use a dataset. Analysis showed that the PlantVillage dataset was the most commonly used. Models and classifiers such as CNN, SVM, k-NN, ANN, Random Forest, and others were used to train the datasets, classify the diseases and pests, and achieve better accuracy for each algorithm. The accuracy achieved depends on the diseases and pests in the agricultural crops. This review also made it possible to identify some gaps in information in some contents, which caused difficulties in the research. More specifically, in some cases, they did not provide enough information about the dataset they used. In some cases, researchers were