A Systematic Review on Automatic Insect Detection Using Deep Learning

: Globally, insect pests are the primary reason for reduced crop yield and quality. Although pesticides are commonly used to control and eliminate these pests, they can have adverse effects on the environment, human health, and natural resources. As an alternative, integrated pest management has been devised to enhance insect pest control, decrease the excessive use of pesticides, and enhance the output and quality of crops. With the improvements in artiﬁcial intelligence technologies, several applications have emerged in the agricultural context, including automatic detection, monitoring, and identiﬁcation of insects. The purpose of this article is to outline the leading techniques for the automated detection of insects, highlighting the most successful approaches and methodologies while also drawing attention to the remaining challenges and gaps in this area. The aim is to furnish the reader with an overview of the major developments in this ﬁeld. This study analysed 92 studies published between 2016 and 2022 on the automatic detection of insects in traps using deep learning techniques. The search was conducted on six electronic databases, and 36 articles met the inclusion criteria. The inclusion criteria were studies that applied deep learning techniques for insect classiﬁcation, counting, and detection, written in English. The selection process involved analysing the title, keywords, and abstract of each study, resulting in the exclusion of 33 articles. The remaining 36 articles included 12 for the classiﬁcation task and 24 for the detection task. Two main approaches—standard and adaptable—for insect detection were identiﬁed, with various architectures and detectors. The accuracy of the classiﬁcation was found to be most inﬂuenced by dataset size, while detection was signiﬁcantly affected by the number of classes and dataset size. The study also highlights two challenges and recommendations, namely, dataset characteristics (such as unbalanced classes and incomplete annotation) and methodologies (such as the limitations of algorithms for small objects and the lack of information about small insects). To overcome these challenges, further research is recommended to improve insect pest management practices. This research should focus on addressing the limitations and challenges identiﬁed in this article to ensure more effective insect pest management.


Introduction
Insect pests cause between 20% and 40% of the world's agricultural production losses every year [1], making agricultural practices dependent on pesticides.Applying these chemical components has become the most profitable solution for crop protection with the appearance of intensive agriculture [2].There has been an increase in resistant pests, the poisoning of organisms, air pollution, water pollution, poisoning, and other health problems due to the chemical properties of pesticides and their continued use over decades [3].
Insect monitoring is necessary for the early detection of pests to avoid the excessive use of pesticides [4].Integrated pest management (IPM) systems that can reduce the overuse of pesticides started to be developed in recent decades by the research community, monitoring plagues and applying precise amounts when needed [5,6].The main objective of insect monitoring is to provide farmers with a decision-making tool, contributing to the optimisation of their crops, increasing environmental sustainability, and improving the quality and yield of production [7].One form of monitoring is detecting and counting insects that are attracted to traps distributed along the agricultural fields where the insects will be captured.A typical monitoring approach is made by specialists, who recognise and manually count insects caught in traps [4,8].However, this task is very time consuming, susceptible to errors, and sometimes subjective-each trap may contain dozens of insects of different species [9].
Smart pest monitoring (SPM) has emerged with rapid advances in fields such as artificial intelligence (AI) and the Internet of things (IoT), allowing automatic data acquisition, remote transmission, data processing, and decision making [5,10].AI algorithms improve data processing and propose hypotheses for increasingly accurate decision-making.AI is a general field that encompasses machine learning (ML) and deep learning (DL) [11].ML is a type of AI that uses algorithms and statistical models to allow a system to improve its performance of a specific task over time.In other words, ML allows a system to learn from data without being explicitly programmed [11].DL is a specific type of ML that involves the use of neural networks, which are algorithms inspired by the brain's structure.These algorithms are made up of many layers of interconnected nodes and can learn complex patterns in data.DL has been particularly successful for computational vision tasks suited for image classification, segmentation, detection, and other tasks related to image recognition [12].Several AI techniques for insect automatic detection and counting have been developed and published with data-driven methods; e.g., DL.However, automatic detection and counting is still an open problem, and several challenges remain [4].
This study aimed to perform a literature review of DL methods for insect classification and detection.The review includes papers submitted until 5 February 2022.For this review, 36 studies were chosen according to predefined criteria.These studies were carefully examined, and their methodologies, results, and database sources were thoroughly analysed.Through this analysis, the most successful methods were identified, and the study also highlighted open challenges and potential solutions.The focus of this research is to address the challenges identified and propose solutions to improve insect pest management practices, with the ultimate goal of achieving better and more effective results.
Regarding the novelty of this article, the following can be listed:

•
The integration of deep learning techniques for automatic insect detection in traps; • A systematic review and analysis of recent research on deep learning methods for insect detection; • An investigation of the effectiveness of deep learning in addressing the challenges of traditional insect detection methods; • A comparison of deep learning methods for insect classification and detection;

•
The identification of key research gaps and opportunities for future work in this area.
The previous novelties highlight the following needs that this work can help overcome: • Insect infestations can cause significant crop losses and economic damage in agricultural production; • Traditional methods of insect detection and control can be time-consuming, labourintensive, and potentially harmful to the environment and human health;

•
Deep learning techniques have the potential to improve the efficiency and effectiveness of insect detection, leading to more sustainable and profitable farming practices; • A systematic review of recent research on deep learning methods for insect detection can provide valuable insights and guidance for future research and development in this field; • The results of this study can help inform and improve the use of deep learning techniques for insect detection in practical applications.
This paper is structured as follows: Section 2 provides a background for the theme of automatic image acquisition and insect detection and classification evolution.Section 3 describes the research questions, the inclusion criteria, the research strategy, and the study characteristics.Section 4 presents the main findings in terms of methodologies developed for this specific purpose.DL-based applications for insect detection, classification, and detection are summarised, and the main detected challenges and gaps are provided.In Section 5, we discuss and summarise the results found.Finally, in Section 6, the conclusion and recommendations for the future are presented.

Theoretical Background
Pest control seeks to follow a diversified pest reduction strategy combined with other forms of control and the use of chemical components.A possible way to deal with some crop pests is by installing traps to attract insects [13].Insect traps are essential elements of SPM.These can be sex pheromone traps, yellow sticky traps, and light traps [13].The type of trap is chosen according to the kind of plantation or the pest to be monitored [14].Traps are frequently observed by qualified personnel to determine the number of insects that have been trapped in each trap.There is a need to travel regularly to each location to carry out this task, making this work expensive [8].On the other hand, traps can control large areas and not interfere with crop quality as chemical compounds do.The main advantages of traps are their practical and reliable response for pest monitoring, the identification of the right time to intervene with pesticides, the identification and quantification of pests, and the reduction of costs and harmful effects on human beings, the environment, and natural resources [15].Therefore, traps yield information about the timing of the appearance and activity of certain pests and auxiliaries, allowing treatments to be carried out at the right time [16].Monitoring insects through remote sensing is possible with the emergence of more sophisticated technologies, being an asset for agricultural activity and enabling real-time monitoring [17].Image acquisition devices are installed in fields to monitor traps, and insect detection and classification techniques are used.
Several authors have proposed different SPM systems.The possibility of implementing these mechanisms and acquiring high-resolution images allows the remote control of pests, reduces the need for human resources, and allows decision-making at a distance.The resolution of the acquired images has a great influence on the methods applied in intelligent image processing [18].
Preti et al. [7] reviewed the evolution of insect pest detection in terms of methodology and equipment used.They observed that the first equipment used to collect images in traps were optical sensors directly implemented in traps in 1985.With the integration of IoT, big data, AI, and other modern information technologies, it has been possible to develop and adapt various devices for pest monitoring.As shown in Figure 1, several IoT devices are installed at strategic points on the agricultural plot to collect images from the traps; the images are captured and stored on a server and later processed through digital image processing techniques and by DL [4,19].
Ramalingam et al. [10] proposed a real-time remote monitoring system for insect traps based on IoT and DL.Saranya et al. [20] developed a methodology using image processing and a passive infrared sensor to detect the presence of insects by the heat radiated by their bodies.Image processing is used to capture images of the pest to confirm its presence in the field.Rustia and Lin [21] developed an image monitoring system connected via Wi-Fi, where each trap was equipped with a sensor and camera placed 80 mm away.Every 10 min, an image was collected and sent to a remote server for processing.In the processing, several insect detection and recognition algorithms were used.With pest monitoring through sensing, methods began to be developed for detecting and identifying pests based on image processing and ML techniques (summarised in Table 1).There are many different approaches to insect detection using ML, and different algorithms may be better suited to different tasks [25,26].Qiao et al. [27] proposed a simple image processing system to automatically estimate the number of whiteflies on sticky traps.Initially, the noise was eliminated with a low pass filter; then, the images were converted to grayscale and transformed into binary images.The authors used ten different threshold levels to determine the optimal image level.The pixels with a value greater than that of the defined threshold were white, and the smallest one was black; thus, it was possible to detect the whiteflies.The method proved With pest monitoring through sensing, methods began to be developed for detecting and identifying pests based on image processing and ML techniques (summarised in Table 1).There are many different approaches to insect detection using ML, and different algorithms may be better suited to different tasks [25,26].Qiao et al. [27] proposed a simple image processing system to automatically estimate the number of whiteflies on sticky traps.Initially, the noise was eliminated with a low pass filter; then, the images were converted to grayscale and transformed into binary images.The authors used ten different threshold levels to determine the optimal image level.The pixels with a value greater than that of the defined threshold were white, and the smallest one was black; thus, it was possible to detect the whiteflies.The method proved to be very effective for adult whiteflies.However, it only worked for whiteflies on sticky traps.Xia et al. [28] developed an automatic method for whitefly, aphid, and thrip identification in greenhouses.The method starts by using the watershed algorithm to segment insects from the background.With the Mahalanobis distance, the insect's colouring characteristics were extracted to identify the species of different insects.Comparing the proposed identification and the manual identification performed by experts, correlations of 93.4%, 92.5%, and 94.5% were obtained, respectively, for whiteflies, aphids, and thrips.
Rustia and Lin [21] proposed an IoT-based remote monitoring system for pests on yellow sticky traps and developed image processing and ML algorithms.The images were divided into four regions and equalised using a histogram based on the brightness adjustment obtained from reference images.A k-means grouping is applied in each image converted into a colour space.The insects and the background are black or white in the image obtained.In the end, the insects can be classified and counted.The method effectively acquired accurate and automatic pest counts, obtaining an average accuracy of 98%.Classifying pests in corn, soybean, wheat, and canola is difficult due to the similarity between insect species; Xie et al. [29] proposed an insect recognition system using multiple task sparse representation and multiple kernel learning techniques.It was shown that their method performs well in classifying insect species, outperforming other methods.Ebrahimi et al. [31] and More and Nighot [30] implemented an approach based on the support vector machine for classifying and identifying pests.Most of these techniques showed good performance; however, they are only recommended for particular situations and are not adaptable to other scenarios because these techniques cannot make intelligent decisions.
DL can learn and make decisions using algorithms inspired by the human brain, making it possible to adapt to more complex environments [19,32].In recent years, DL has started to be applied in the field of agriculture as well.For example, DL algorithms could be used to analyse images of crops to identify pests or diseases or to monitor the growth and health of plants.This information could be used to optimise irrigation or fertilisation or to take other actions to improve crop yields.DL could also be used in other areas of agriculture, such as in analysing data from field sensors.In other words, several DL applications have emerged to solve challenges in the agricultural context.Automatic recognition of pest images has become one of the leading research points in DL [24].
The object detection task can be associated with two important concepts: (1) object classification; and (2) detection, as shown in Figure 2. Classification is the assignment of a class to the principal object in the image.Object detection consists of the object localisation and classification of multiple objects in an image [33].This technique uses rectangular bounding boxes to locate and classify the categories of the objects [34].Object detection is an important area of computer vision.It is crucial in many applications, such as video, medical images, vehicles, pedestrians, and face detection.
There are two significant groups of detectors: one-stage detectors and two-stage detectors.One-stage detectors solve the detection task by directly predicting object categories and regression object locations [33], such as You Only Look Once (YOLO) [35] and the Single Shot Multi-Box Detector (SSD) [36].This method does not require the region proposal process, so the detection is faster; however, the precision is generally lower than that of the two-stage object detector architecture.Two-stage detectors initially extract the regions of interest from the input image and then classify and redefine the location of the object through the first proposed regions; examples are Region-based Convolutional Neural Networks (R-CNN) [37], Fast R-CNN [38], Faster R-CNN [39], Mask R-CNN [40] and Cascade R-CNN [41].The most significant advantage is the high precision, and the disadvantage is the high detection time [34].Examples of insect classification and detection tasks.The classification example of of small brown plant hopper and aphids on plant images.The first example of a detection task is the detection of grape moths on pheromone trap, image provided by [22]; the second example is the detection of army worm on plants images.Images with small brown plant hopper, aphids and army worm were adapted from the public dataset IP102 [42].

Research Questions
In this study, three essential research questions were considered, which are the following: • (RQ1) What are the methods that obtain better mean average precision (mAP) for the task of insect detection?• (RQ2) What dataset variables have the most significant influence on detection?• (RQ3) What are the main challenges of and recommendations for automatically detecting insects?

Inclusion Criteria
The study of methods of automatic detection of insects in traps was carried out considering the following criteria: (1) studies that apply DL techniques for insect classification; (2) studies that apply DL methods for automatic insect counting; (3) studies that apply DL methods for insect detection; (4) studies published between 2016 and 2022; and (5) studies written in English.

Search Strategy
This systematic review consisted of studies that met the inclusion criteria in the following electronic databases: IEEE Xplore, Scopus, MDPI, ScienceDirect, SpringerLink, and PubMed.The search terms used were "automatic detection of insects", "insect traps", "classification", and "DL".The studies were analysed to identify the various DL methods of automatic insect detection.The search was conducted on 5 February 2022.

Selection of the Papers and Extraction of Study Characteristics
Ninety-two studies collected in these databases were identified.After analysing all the studies, the selection was made for inclusion in the research, as shown in Figure 3.Of the ninety-two articles initially identified, two were duplicates.After screening, considering the title, keywords, and abstract, thirty-three articles were discarded because they did not cover insect detection and classification.Then, a complete study was carried out considering the inclusion criteria; consequently, twenty-one articles were excluded.Thus, the remaining thirty-six articles were analysed and included in this survey.Of the selected Examples of insect classification and detection tasks.The classification example of of small brown plant hopper and aphids on plant images.The first example of a detection task is the detection of grape moths on pheromone trap, image provided by [22]; the second example is the detection of army worm on plants images.Images with small brown plant hopper, aphids and army worm were adapted from the public dataset IP102 [42].

Research Questions
In this study, three essential research questions were considered, which are the following:

•
(RQ1) What are the methods that obtain better mean average precision (mAP) for the task of insect detection?• (RQ2) What dataset variables have the most significant influence on detection?• (RQ3) What are the main challenges of and recommendations for automatically detecting insects?

Inclusion Criteria
The study of methods of automatic detection of insects in traps was carried out considering the following criteria: (1) studies that apply DL techniques for insect classification; (2) studies that apply DL methods for automatic insect counting; (3) studies that apply DL methods for insect detection; (4) studies published between 2016 and 2022; and (5) studies written in English.

Search Strategy
This systematic review consisted of studies that met the inclusion criteria in the following electronic databases: IEEE Xplore, Scopus, MDPI, ScienceDirect, SpringerLink, and PubMed.The search terms used were "automatic detection of insects", "insect traps", "classification", and "DL".The studies were analysed to identify the various DL methods of automatic insect detection.The search was conducted on 5 February 2022.

Selection of the Papers and Extraction of Study Characteristics
Ninety-two studies collected in these databases were identified.After analysing all the studies, the selection was made for inclusion in the research, as shown in Figure 3.Of the ninety-two articles initially identified, two were duplicates.After screening, considering the title, keywords, and abstract, thirty-three articles were discarded because they did not cover insect detection and classification.Then, a complete study was carried out considering the inclusion criteria; consequently, twenty-one articles were excluded.Thus, the remaining thirty-six articles were analysed and included in this survey.Of the selected searches, twelve were for the classification task and twenty-four were for the detection task.
searches, twelve were for the classification task and twenty-four were for the detection task.

Results
The articles selected were divided into three topics: (1) the classification of insects with DL; (2) the detection of insects with DL; and (3) the challenges and recommendations found.For the first topic, the studies that described pest classification were briefly analysed, allowing the identification of the methodologies and architectures, the size of the dataset, and the results obtained.For the second topic, we analysed the papers that solved the detection task.Then, the detailed analysis of eight studies, considered interesting and promising, was performed.Finally, for the third topic, challenges and recommendations were presented.
For better organisation, the studies were separated into three tables.Table 2 summarises the studies focusing on the classification task, and Tables 3 and 4 focus on detection tasks.The tables show the data collected from each selected article: image scenario, the number of classes, dataset size, methods, architectures, and results.Through the detection tables, it is also possible to analyse the average inference times per image obtained in the test dataset of the studies that provided this information.To assess classification and detection performance, the results relied on accuracy and mAP as the respective metrics.These metrics were chosen based on their widespread use in evaluating classification and detection and were consistently employed across all the reviewed studies.This approach facilitated a more meaningful comparison of results across the studies.

Results
The articles selected were divided into three topics: (1) the classification of insects with DL; (2) the detection of insects with DL; and (3) the challenges and recommendations found.For the first topic, the studies that described pest classification were briefly analysed, allowing the identification of the methodologies and architectures, the size of the dataset, and the results obtained.For the second topic, we analysed the papers that solved the detection task.Then, the detailed analysis of eight studies, considered interesting and promising, was performed.Finally, for the third topic, challenges and recommendations were presented.
For better organisation, the studies were separated into three tables.Table 2 summarises the studies focusing on the classification task, and Tables 3 and 4 focus on detection tasks.The tables show the data collected from each selected article: image scenario, the number of classes, dataset size, methods, architectures, and results.Through the detection tables, it is also possible to analyse the average inference times per image obtained in the test dataset of the studies that provided this information.To assess classification and detection performance, the results relied on accuracy and mAP as the respective metrics.These metrics were chosen based on their widespread use in evaluating classification and detection and were consistently employed across all the reviewed studies.This approach facilitated a more meaningful comparison of results across the studies.

Classification of Insects with DL
Convolutional Neural Networks (CNN) are neural networks that follow a feedforward pattern, where all layers connect, following the path from the input to the output of the network.CNNs are inspired by biological processes, more specifically by the organisation of an animal's visual cortex [73].This type of neural network is often applied in image recognition and video processing, thus becoming the "state of the art" in object classification and detection problems.The disadvantage of CNNs is the need for much labelled data for feature extraction [74].There are some CNN architectures available that are widely used.
Classifying insects is essential in many contexts and for the important premise of IPM in agriculture [4].More than 1.02 million insect species have been described [75], making insect identification difficult and complex.Some of the applications include the classification of pests, diseases, and invasive species [76].In Table 2, the data collected in each article on the classification of insect pests using DL are summarised.The selected studies were published between 2017 and 2021.It appears that all studies provide a solution for classification in field images on plants.
To identify the most harmful cotton pests under field conditions, Alves et al. [50] presented a real dataset containing cotton field images, with 15 classes and 100 images.All images were resized to 224 × 224; as the dataset was small, they applied data augmentation and used CNN with ResNet34 to classify major pests automatically; the method was trained on GPU NVIDIA GTX 1060 and obtained a final accuracy of 97.8%.Cheng et al. [43] proposed the use of a CNN with ResNet101 to achieve pest identification with the complex background of agricultural land.The dataset contained different angles and pest poses.All these images were mirrored before being fed into the system to fully utilise CNN to double the total amount of data.For 10 classes in 550 images of agricultural pests in the complex background, an overall accuracy of 98.7% was reached.
Kasinathan et al. [26] used a public dataset with 1387 images (rescaled to the size of 227 × 227 pixels) and 24 classes in a highly complex background.First, image data augmentation techniques were applied, such as rotation, flipping, and cropping operators, and second, they applied CNN with the architecture proposed by them.The CNN model proposed contains five convolutional layers, three max-pooling layers, a flatten layer, a fully connected layer, and a softmax output layer.The authors' methodology was able to reach 90.0% accuracy.With the purpose of classifying insect species in three publicly available insect datasets, Thenmozhi and Srinivasulu [44] proposed an efficient deep CNN model.The CNN architecture was constituted of six convolutional layers, five max-polling layer, one fully connected layer, and the output layer with softmax.Data augmentation techniques were also applied to avoid network overfitting.Deep learning models were implemented using the Matlab2018a framework, utilising NVIDIA Quadro K2200 GPU.The highest classification accuracies of 96.8%, 97.5%, and 95.9% were achieved in the proposed CNN model for insect dataset 1 (40 classes), insect dataset 2 (24 classes) and insect dataset 3 (40 classes), respectively.All images were resized to 227 × 227 pixels.Wang et al. [49] proposed a new model called CPANet that includes four convolution layers, six max-pooling layers, three inception modules, one average pooling layer, one fully convolution layer, and an output layer with softmax.The dataset used contains 20 classes in 4909 images.Before training, the data was enhanced by image processing methods such as inversion, rotation, scaling, and Gaussian noise addition.The authors used standard architectures such as VGG, InceptionV3, and ResNet50 and compared them with the proposed model.All experiments were trained, validated, and tested using GPU Nvidia GTX 1080Ti.Their approach achieved the best accuracy, reaching 92.6%.
The success of DL depends in part on the amount of data.Sometimes the available data are scarce and private, or the costs associated with their acquisition or annotation are very high.In these situations, it is common to use transfer learning [77].Transfer learning consists of using the knowledge learned for a task in each domain to improve the learning of another domain in another task [46]; i.e., a network is pre-trained on a large dataset, such as ImageNet [78] or MS COCO [79], and then applied to the dataset that we intend to train [77].If the source dataset is large and complete, the learned features can be useful for the problem we want to solve [11].
There are two ways to use a pre-trained network: (1) fixed feature extraction and (2) fine-tuning [11].Fixed feature extraction consists of removing the fully connected layers; that is, the convolutional layers of the pre-trained network are froze n and a new classifier is added.Considering the extracted resources, the classifier is trained from scratch [80].Fine-tuning consists of replacing and training the classifier that was added to the pre-trained network, and tuning part of the pre-trained network kernels through backpropagation [46].Normally, the initial layers do not change, as they contain more generic resources, while the later layers become more specific to our dataset, so they are adjusted by backpropagation [77].
To recognise ten types of pests present in rice plantations, Malathi and Gopinath [52] used fine-tuning and fixed feature extraction with several standard architectures.The dataset consists of 3549 images (resized to 227 × 227 pixels) of 10 pests that affect rice plantations.The ResNet50 fine-tuning model reached a better accuracy (of 95.0%) than the other models.Still, for the classification of diseases and pests in rice plants, Rahman et al. [48] were able to reach an accuracy of 97.1%.These techniques were used in the dataset with eight different species of pests and contained 1426 images.All the images were been resized to the default image size of each architecture before working with that architecture.Everton Castelão Tetila et al. [46] analysed the performance of InceptionV3, Resnet50, VGG16, VGG19, and Xception for different fine-tuning and fixed feature extraction strategies on a dataset composed of 5000 images and 2 classes, captured under field conditions.They trained all experiments on GPU NVIDIA GTX1070 and showed that architectures trained with fine-tuning have higher accuracy, reaching an accuracy of 93.8% for Resnet50 finetuning.Li et al. [45] presented a method to classify 10 common pest species; a fine-tuning GoogLeNet model was proposed to deal with the complex backgrounds presented.The approach was conducted on four Titan X 12 GB GPUs and made it possible to reach an accuracy of 94.6%.
Pattnaik et al. [47] applied transfer learning with the different pre-trained models for pest classification in tomato plants.The dataset was composed of 859 images categorised into 10 classes.The best performance was obtained using the DenseNet169 model (88.8% accuracy).
Chen et al. [53] and Karar et al. [51] used YOLOv3 and Faster R-CNN, respectively, but only for the classification.To classify T. papillosa in the orchard, Chen et al. [53] applied YOLOv3 only as a classifier on a dataset composed of 700 images of T. papillosa.The input image resolution was 416 × 416 pixels.Data augmentation and the parameters were adjusted to improve the model's learning rates.Their methodology was trained on GPU NVIDIA RTX 2070 and reached an accuracy of 95.3%.Karar et al. [51] tested several detectors, such as Faster R-CNN and SSD, in a dataset with 500 images (with size of 224 × 224 pixels), for classifying aphids, cicadellidae, flax budworms, flea beetles, and red spiders.All detectors were trained, validated, and tested on GPU NVIDIA GTX1080.The Faster R-CNN with the InceptionV2 architecture presented an overall accuracy of 99.0% for all pests tested.
Regarding the results obtained, the approach presented by Pattnaik et al. [47] achieved the lower accuracy, and the one presented by Karar et al. [51] revealed the most performant (99%) accuracy.However, as the authors used different databases, direct comparisons may be unfair.Therefore, we analysed the impact of the number of classes and the size of the dataset on the results obtained.Comparing the best and worst results, the Karar et al. [51] method was applied to a database with 5 classes and 500 images.The Pattnaik et al. [47] method was used in a database with 10 classes and 859 images.In other words, the dataset used by Pattnaik et al. [47] has a greater variability of insect species, with the highest number of classes, which makes it more difficult to classify.These findings suggest that the number of classes is significant, but it can be difficult to determine the correct number of classes, especially when the classes are not well separated and are unbalanced.

Detection of Insects with DL
The detection of insect pests is an essential task in SPM and can provide farmers with a helpful decision-making tool [7].Effective detection of insect pests improves the accuracy of applied amounts of pesticide, which can have a significant economic and environmental impact [5].
Twenty-four studies that solve the issue of detecting insect pests with DL were selected.The selected studies were published between 2016 and 2022.About 66.7% of the research covers the detection of insect pests in traps, and 33.3% covers the detection directly in plants.Several different methodologies were proposed that can be divided into two groups: (1) standard detectors; and (2) combined/adapted methodologies.Standard detectors refer to architectures previously proposed by other authors, such as YOLO, Faster R-CNN, SSD, and others.The combined/adapted methodologies include modified architectures, adapted architectures, and a combination of several different methods.

Standard Detectors
Table 3 summarises the data collected in studies that used standard detectors.Chen et al. [21], Wang, Q. et al. [24], Yun et al. [61], and Zhong et al. [54] showed in their experiments that the YOLO, YOLOv3, and YOLOv5 architectures were the ones with the best performances.Butera et al. [62], He et al. [25], Hong et al. [59], Nieuwenhuizen et al. [55], and Ramalingam et al. [10] applied the Faster R-CNN architecture to different datasets and showed that this was the one that showed the best performance.He et al. [32] and Wang et al. [60] proposed an approach with SSD and Cascade R-CNN, respectively.Shi et al. [57] and Sun et al. [56] proposed a methodology often used to solve the small detection task, considering the challenges of the detection of small insects, like R-FCN and RetinaNet, respectively.Since the focus of this study is the use of DL to detect insects, five studies using standard detectors were selected.These studies are analysed in detail below.
He et al. [25] proposed a method for detecting the brown rice leafhopper.The algorithm consists of two layers based on faster R-CNN.The first layer seeks to identify the target of the image; that is, it aims to identify the plant.The second layer aims to detect the brown planthopper tested with Faster R-CNN with the VGG16 and ZF networks, showing that the VGG16 network showed the best results.The dataset contained 4600 images in the rice plantation in a natural environment.The training, validation, and test were set up with Nvidia GeForce GTX 1060 GPU.The proposed model obtained an average precision (AP) of 94.6% with an inference time per image of 0.36 s.Compared to the detection results using only one Faster R-CNN network and the application of the two networks, the detection with two layers showed better results.YOLOv3 was also tested and compared with their proposal.The results showed that the overall performance of their model was better than the YOLOv3 algorithm.
Wang, Q. et al. [24] provide a standardised dataset on traps for multiple agricultural pest targets.This database, called Pest24, consists of 25,378 high-resolution images with 24 major pest classes specified by the Chinese Ministry of Agriculture.They applied several state-of-the-art object detection methods, Faster R-CNN, SSD, YOLOv3, and Cascade R-CNN.For each technique, they initially used the default settings of their hyperparameters, and all experiments were trained on a Linux server with Nvidia Titan X (Pascal) GPU and 128 GB memory.Then, they tried different hyperparameter values for the YOLOv3 method, which showed the best results.The k-means clustering algorithm was used to optimise the parameter's scaling range.The backbone of this method was Darknet-53.YOLOv3 obtained a mAP of 58.8%, proving to be the model that worked the best in detecting the twenty-four species of insects.Given the size of the dataset and the high number of classes, the authors considered adherence to objects, pest similarity, pest density, relative scale, and colour discrepancy as essential factors in the detection task.The relative scale is the factor that exerts the most significant influence on the AP of detection, and the colour discrepancy has the least significant impact.
Nieuwenhuizen et al. [55] presented a methodology to detect and count whiteflies, macrolophus bugs, and nesidiocoris bugs in sticky traps.The dataset contained 1350 images of high resolutions captured under controlled light conditions in two different greenhouses.The Faster R-CNN method with inception Resnetv2 obtained an 87.4% mAP.The model was trained in Nvidia 1080Ti GPU.The counting task results obtained were compared with those obtained by traditional counting; the correlation was greater than 0.95.However, they state that the quality of the data and annotations present in the images influenced the classification results.
Hong et al. [59] developed algorithms that detect and count Matsucoccus thunbergianae from pheromone trap images.The authors collected 50 images in the laboratory.The resolution of the images is 6000 × 4000 pixels, and the insect's average size is only 60 × 60 pixels.The images were cropped, with two different dimensions, 12 × 8 cropping and 6 × 4 cropping, to solve the problem of dataset dimension and of the scale of the insect to the image.In the cropped image, the insect had a larger size relative to the image size than in the uncropped image, so it was also possible to increase the number of images in the dataset.To compare and verify which architecture had the best performance, they trained a the Faster R-CNN with Resnet101, EfficientDet D4, Retinanet50, and SSD Mobilenetv2 architectures for the two chopped databases.The dataset with a 12 × 8 crop had better AP because object size relative to image size increased.Quadro RTX-6000 GPU was used for the training, validation, and testing.The model that obtained the best results was Faster R-CNN, with an AP of 85.6% for an IoU of 0.5 and an inference time per image of 0.078 s.The model that had the shortest inference time was SSD, but the detection results were not as good as those obtained by Faster R-CNN.
Shi et al. [57] proposed an architecture based on the R-FCN method to detect eight species of insects that may be present in stored grains.The dataset used is constituted by dataset1 and dataset2; dataset1 was raised in a laboratory environment (in traps) and had 1716 images, and dataset 2 has 784 images and was created to simulate the actual situation (in grains).The authors proposed R-FCN, an architecture like Faster R-CNN.There is only the replacement of the fully connected layers after RoI pooling, with a set of position-sensitive score maps to perform average voting.The backbone of this method was DenseNet.It used training techniques on various scales and applied the soft-NMS algorithm [81].Faster R-CNN and YOLO were applied to compare results with their proposed method to compare results with their proposed approach.All experiments were done on two NVIDIA TITAN XP GPUs.The model with the bests results was their proposed one based on R-FCN, with which they obtained an mAP of 83.4% and an inference time of 0.124 s.

Combined/Adapted Methodologies
Table 4 summarises the data collected in each study that used combined or adapted methodologies.Li, W. et al. [71], Liu et al. [69] and Tang et al. [72] proposed some modifications to the original architectures, the first and second authors proposing modiciations to Faster R-CNN and the third author proposing modifications to YOLOv4.Liu et al. [64] proposed a new approach called PestNet inspired by Faster R-CNN.R. Li et al. [67] and Li, W. et al. [66] developed an approach with CNN and Region Proposal Network (RPN), the first author used a multi-scale model, training the images with different resolutions.Wang et al. [70] applied RPN with balanced sampling, the objective was to extract more detailed characteristics of the small insects.Rustia et al. [6] used YOLOv3 to spot all insects present in the image and then applied successive CNN classifiers to filter the insects detected initially.Ding and Taylor [63] ran through all the images with a sliding window and classified the insects found in each position.Martins et al. [65] and Tetila et al. [68] performed the segmentation of all insects in traps, and then in each segmented location, proceeded to classify the insect with CNN.
We selected three surveys that used modified architectures, adapted architectures, and a combination of different methods from the selected studies and made a more detailed analysis of each.Liu et al. [64] developed a new method called PestNet.This consists of three main parts.The first stage consists of a CNN with channel-spatial attention; the objective is to extract and enhance image resources.The second stage comprises an RPN to provide the region proposals considering the resources extracted in the first stage.In the third step, the fully connected layers are replaced by the position-sensitive score map for classification and bounding-box regression.The dataset used consists of 88,670 images in traps of 16 different species.The authors experimented with the proposed methodology with different CNN architectures, such as VGG, ResNet50 and ResNet101.They compared it with other state-ofthe-art methods such as Faster R-CNN and SSD.Their experiments are trained on a GeForce GTX TITAN X GPU and obtained better results with the ResNet101 backbone with a mAP of 75.5% and an inference time of 0.441 s, which surpasses the last generation methods.
To detect two species of the fruit fly, Martins et al. [65] proposed a method in which they initially applied a two-step segmentation method to segment areas with insects, species under study, or others.Generated bounding boxes for each segmented region; trained several CNNs to identify the one that obtained the greatest precision and proceeded to identify and classify each bounding box.The dataset used contained 662 sticky trap images; it got 22,479 bounding boxes after the initial segmentation.The network that obtained the best results for the insect classification task was the ResNet18, with a mAP of 92.4% and an inference time per image of 0.145 s utilizing a Nvidia Tesla T4 GPU.
W. Li et al. [71] developed a method based on Faster R-CNN, called 'TPest-RCNN', to automatically detect whitefly and thrips on the sticky trap in greenhouse conditions.The dataset contained 1400 images.The algorithm proposed has two significant differences from Faster R-CNN, the improved anchor size, and the RoIPooling design which was adjusted to focus on small objects and thus was able to obtain exact locations.The backbone network used is the VGG16.The anchor size present by the R-CNN is larger than the insect dimensions, so they adapted the anchor dimensions to the insect dimensions to solve this problem.RoIPooling has been replaced by a method the authors call RoIAlign, inspired by the Mask R-CNN architecture.RoIPooling can produce a deviation between the final and initial position of the bounding box, which may represent the wrong detection.To solve this, RoIAlign divides the proposed region into 4 × 4 pool sections.Four sampling areas are defined for each section, the centre point of each sampling area representing the sampling location.The pixel values of these points were calculated using the bilinear interpolation method.Finally, max pooling is applied for each compartment.The methodology applied by the authors was trained on NVIDIA Tesla K80 and obtained a mAP of 95.2%.The proposed model surpassed the Faster R-CNN architecture.
Regarding the results obtained, ten surveys had above 90.0%,seven had above 80.0%, and below 90.0%, six had results above 70.0%,and below 80.0%, only one study obtained results less than 70.0%.Li, W. et al. [71] was the study that had the best result with 95.2% and the study Wang, Q. et al. [24] was the one that obtained the lowest result with 58.8%.Analysing the impact of the number of classes and the size of the dataset; First, it is possible to verify that the increase in the number of classes is proportional to the mAP obtained.The example is the method of Li, W. et al. [71] that got 95.2% and used a dataset with 2 classes and the method Wang, Q. et al. [24] got 58.8% with a dataset that contained 24 classes.It's worth noting that the dataset used in [24] was unbalanced and had high similarity between species, posing challenges for DL algorithms to learn from such data.

Challenges and Recommendations in Insect Detection
Despite much research being developed to detect insect pests using DL, some challenges remain unsolved and affect the results obtained.We can divide the challenges into two significant groups, (1) datasets; and (2) methods of insect detection.

Datasets
Insects are the most biodiverse group of animals [82].They can present some challenges related to your physical characteristics, such as size, the similarity between species, the different positions that can have in images and the different morphological characteristics of the same insect.As we know, insects are living beings of reduced dimensions.An image can have a high resolution, being represented by a large set of pixels, or it can be represented by a set of smaller pixels, having a lower resolution.Wang, Q. et al. [24] show that the relative scale, that is, the size of the insects in proportion to the image, is the factor that exerts the most significant influence on the detection task.As shown in Figure 4, the trap includes dozens of insects that are represented with little pixels.So, regular replacement of traps and increase of resolution of the image is encouraged.
classes.It's worth noting that the dataset used in [24] was unbalanced and had high similarity between species, posing challenges for DL algorithms to learn from such data.

Challenges and Recommendations in Insect Detection
Despite much research being developed to detect insect pests using DL, some challenges remain unsolved and affect the results obtained.We can divide the challenges into two significant groups, (1) datasets; and (2) methods of insect detection.

Datasets
Insects are the most biodiverse group of animals [82].They can present some challenges related to your physical characteristics, such as size, the similarity between species, the different positions that can have in images and the different morphological characteristics of the same insect.As we know, insects are living beings of reduced dimensions.An image can have a high resolution, being represented by a large set of pixels, or it can be represented by a set of smaller pixels, having a lower resolution.Q. Wang et al. [24] show that the relative scale, that is, the size of the insects in proportion to the image, is the factor that exerts the most significant influence on the detection task.As shown in Figure 4, the trap includes dozens of insects that are represented with little pixels.So, regular replacement of traps and increase of resolution of the image is encouraged.Given the incredible biodiversity of insects, there are very similar species; and at the time of image capture, insects of the same species may be in different positions and throughout different life stages that may have different morphological characteristics.These characteristics can generate significant challenges in the task of insect detection.For example, in Figure 5, the similarity between the three species, Armyworm, Bollworm and Yellow tiger, are three different species with identical morphological characteristics.Additionally, two examples of different positions of the same insect in the same image can be observed.Additionally, examples of the same grape moth in different lighting conditions could be confused with another insect.Some images collected in the field associated with SPM systems may present some challenges related to the background, lighting, and the appearance of shadows [66,83].This challenge can be solved by carefully choosing the hour when an image is captured, choosing strategic points for the placement of SPM devices, and avoiding areas with trees and shadows.The dataset can be based on plants or traps.In plants, detection or classification can be more difficult because, in traps, the background is uniform, and in plants, the background contains different aspects that can interfere with the performance of the Given the incredible biodiversity of insects, there are very similar species; and at the time of image capture, insects of the same species may be in different positions and throughout different life stages that may have different morphological characteristics.These characteristics can generate significant challenges in the task of insect detection.For example, in Figure 5, the similarity between the three species, Armyworm, Bollworm and Yellow tiger, are three different species with identical morphological characteristics.Additionally, two examples of different positions of the same insect in the same image can be observed.Additionally, examples of the same grape moth in different lighting conditions could be confused with another insect.Some images collected in the field associated with SPM systems may present some challenges related to the background, lighting, and the appearance of shadows [66,83].This challenge can be solved by carefully choosing the hour when an image is captured, choosing strategic points for the placement of SPM devices, and avoiding areas with trees and shadows.The dataset can be based on plants or traps.In plants, detection or classification can be more difficult because, in traps, the background is uniform, and in plants, the background contains different aspects that can interfere with the performance of the model.Another characteristic of these systems is the acquisition of very similar images since the collection of the image is continuous.
for classification and detection [87].The conventional methods typically used for learning these models are not well-suited to imbalanced datasets, and as a result, existing classifiers tend to exhibit bias towards the majority class due to the unequal class distribution [86].The use of synthetic data in the classes with the smallest number of images or the implementation of focal loss in the models can be very useful in solving this challenge but the effectiveness of these methods has not yet been thoroughly studied [88,89].Examples of similarity between species, different positions of the same insect, and the different colours of the same insect.Images from the dataset Pest24 adopted from [24] and provided by [22].
In order to achieve human-level results, DL methodologies require large datasets for training models [4].However, there is a shortage of public databases that are diverse, labelled, and of sufficient size for insect classification.Furthermore, the class distribution of insects is often unbalanced [32].To address this challenge, it is important to encourage the collection and publication of images, as well as the development of semi-supervised methodologies [84].Semi-supervised learning is particularly useful when all data cannot be labelled, as an effective semi-supervised model can outperform a supervised model [85].
The composition of datasets in certain ecosystems is often unbalanced due to an uneven distribution of insects, with some classes having a lack of data and others having a greater amount of data [86].This can negatively affect the learning of classification and detection models, as the samples with greater representation may lead to the model being biased towards the majority class within the ecosystem, resulting in poor generalisation [87].This issue of unbalanced datasets, which is commonly encountered in real-world applications, can have a significant impact on the performance of deep learning algorithms for classification and detection [87].The conventional methods typically used for learning these models are not well-suited to imbalanced datasets, and as a result, existing classifiers tend to exhibit bias towards the majority class due to the unequal class distribution [86].
The use of synthetic data in the classes with the smallest number of images or the implementation of focal loss in the models can be very useful in solving this challenge but the effectiveness of these methods has not yet been thoroughly studied [88,89].

Methods of insect detection
Insect detection is a challenging task in computer vision and raises many challenges.Small objects occupy areas less than or equal to 32 × 32 pixels.While many methods used for detection give good results for medium and large objects, their performance is not so good when used to detect small objects [90].
High-resolution images are often resized so that objects of smaller pixel representations end up losing useful information to reduce the computational cost.Most of the algorithms used in the object detection task are based on CNNs.After the convolutional layers, clustering layers are applied to reduce the sampling of feature maps, thus reducing the dimensions of the image and the feature map.Due to this feature of CNN, and as small objects are represented by a few pixels, their features extracted in the initial layers are eliminated [90].
Li et al. [71] proposed the replacement of RoIPooling with RoIAlign.Part of the pixels can be removed through pooling, causing incorrect detections or even the non-detection of insects.This method showed promising results, with an mAP of 95.2%.The study in this research obtained the best results in the detection task.The Faster R-CNN architecture proposed by Ren et al. [40] has a predefined anchor that becomes too large for insect detection, affecting detection results.Thus, several authors proposed anchor optimisation; that is, they adjusted the dimensions of the anchor to the size of the insect to be detected.To solve the problem of insect size, Hong et al. [59] cut the images into two different sizes, 12 × 8 and 6 × 4, and this methodology achieved an mAP of 85.63% with Faster R-CNN.This methodology proved to be significantly better than Faster R-CNN without image cropping.A more extensive set of pixels represents the insect in the cropped image, facilitating the detection task.
There is an increased need to deal with scale problems in the task of detecting small objects.One way to address this challenge is to scale input images to many different scales and use multiple detectors for each different scale.Tong et al. [90] have identified seven methodologies that address this challenge: image pyramids in resources, a single resource map, a pyramidal resource hierarchy, integrated resources, a resource pyramid network, resource merging and resource pyramid generation, and a multi-scale merging module.As is known, DL methodologies perform better on large datasets.The same applies to small objects, which can also be improved by increasing the number of samples.Data augmentation is intended to produce additional data, through transformations, including inverting, cropping, rotation, scaling, and other techniques.The context in which the object is detected can play an important role in the performance of the methodologies.CNNs learn hierarchical representational contextual information; however, in the detection of smaller objects, there is still the possibility of enhancing the contextual information based on learning [90].There are three different context-based methods, the local context, the global context, and the context interactives, and examples of architecture include CoupleNet [91], R-FCN++ [92], and Context-SVM [93], respectively.The methodologies used learn contextual information that can help or impact performance.The Generative Adversarial Networks (GAN) [94] model is constituted by a generating network and a discriminator.The generator learns the characteristics of the true data and generates a new sample.The discriminator compares the generated data with the real one.GANs can benefit the detection of small objects, as the generator improves the samples of small objects by increasing resolution, and the discriminator competes with the generator.Some authors have already proposed methods that can contribute to the detection of small objects considering these crucial aspects.Shi et al. [57] proposed an architecture based on R-FCN, a multi-scale feature learning architecture.Tang et al. [72] used various data augmentation methods to increase the diversity of training samples; the resource extraction network obtains resource maps of different scales, and the feature fusion network performs feature fusion based on multi-scale feature maps.Wang et al. [70] developed an adaptive approach to learning features from different levels of the feature pyramid.

Discussion
Insects pose challenges in classification and detection.As we have seen, these tasks are essential in the SPM, so there is a great interest on the part of the scientific community in developing this challenge.This review analysed several methods for detecting and classifying insects using DL techniques.In general, the researched methods can be divided into two significant types of approaches: (1) standard and (2) adaptable.We consider standard approaches that implement methodologies proposed by other authors, such as the VGG, ResNet, AlexNet, Inception, and GoogLeNet architectures, and such as Faster R-CNN and YOLO detectors.Several studies have opted for adaptable approaches.It was possible to verify that there has been a growing trend, since 2019, of the development of new methodologies adapted to small objects, with a focus on insects.
The methods that present better performance in classification are the Faster R-CNN detector as a classifier with an accuracy of 99.0%, and the ResNet101 and the ResNet34 architectures with 98.7% and 97.8% accuracy, respectively.However, as the studies use different databases, it became difficult to determine the method with the best performance, so we calculated the average number of images per class.We analysed the impact of the number of classes, the size of the dataset, and the average number of images per class on the accuracy obtained by all the classification studies.Of the analysed variables, the one that showed the greatest influence on the results was the size of the dataset, although only 4.3% of the variability in the results was explained by the number of images in the dataset.The scatter plot indicates a negative relationship between the two variables with a 0.207 correlation value between the variables.Thus, we can verify that, in the analysed studies of classification, there is no significant relationship between the number of classes, the size of the dataset, and the average number of images per class in the results obtained.
In the detection, the methodologies are distributed; 54% use standard detectors in detection, and the remaining 46% apply combined/adapted methodologies.The methods that show the best detection performance are the modified Faster R-CNN, YOLOv5, and Faster R-CNN architectures, with mAP values of 95.20%, 94.70%, and 94.64%, respectively.The authors of the modified Faster R-CNN architecture proposed several modifications to the Faster R-CNN standard, such as replacing RoIPooling with RoIAlign and applying anchor optimisation.YOLOv5 is a recent architecture that has shown good results, which can be explained using a path aggregation network with a pyramid resource network, which improves the propagation of low-level features in the model and increases the accuracy of object location, even for small objects.We analysed the impact of the number of classes, the size of the dataset, and the average number of images per class on the results obtained by all the detection studies.Analysing all the variables individually, we can conclude that 63.8% of the variability of the results can be explained by the number of classes, where the scatterplot indicates a strong negative relationship between the two variables having a correlation value between variables of 0.799.Another variable that has a significant effect is the size of the dataset, with an impact of 21.1% of variability in mAP.The scatterplot indicates a negative relationship between the two variables, with a correlation value of 0.460.The remaining average number of images per class showed that it does not influence the result; i.e., the result is independent of them.This is because we must consider that the classes of the datasets are not balanced and that datasets with more classes tend to have a lower result since the classes are of very similar species, which generates confusion in the learning of the models, thus obtaining inferior results.
There are two main challenges and recommendations in insect detection using DL.First, dataset characteristics influence the performance of DL methods.Additionally, the attributes of DL methods can be adapted or improved to solve the task with better performance.
Challenge 1-dataset images: • Insects are frequently poorly visible in datasets images.
There are many different insects, so it is necessary to represent them in the images to differentiate them accurately.A careful setup will improve data acquisition and preprocessing stages, ensuring adequate image resolution to represent insect characteristics.Attention should be given to the number of insets in the image; e.g., traps with overlayer insects will result in complex insect recognition.

•
Images captured in the field using SPM systems.
Creating datasets that comprise field-captured images using SPM systems can prove to be a daunting undertaking due to various factors, such as the presence of shadows, background interference, and lighting inconsistencies.To overcome these obstacles, a range of strategies can be implemented, including careful selection of the best time of day for image capture, the strategic placement of SPM devices, and avoiding shadowy and treed areas.Several pre-processing techniques can also be employed to tackle these challenges, including shadow-or glare-free image reconstruction and the creation of an illumination invariant shadow-free image for shadow edge detection.Alternatively, simpler image processing algorithms like morphological reconstruction can also be utilised.Additionally, datasets can be based on plants or traps, each of which poses unique challenges.In particular, detecting and classifying plants can be more complex due to diverse background aspects that can impact the model's effectiveness.

•
Insect classes are unbalanced in datasets images.
DL algorithms have issues learning from unbalanced datasets, significantly impacting the performance of methods of insect classification and detection.Data augmentation, where synthetic data can be used to make the dataset more balanced, and the implementation of focal loss in model training, are suggestions to help model performance and can be very useful to solve this challenge.
Annotation costs are frequently unbearable, preventing the annotation of the entire dataset and limiting the success of supervised DL algorithms.Using semi-supervised and domain-knowledge algorithms is recommended.Semi-supervised detection can effectively leverage unlabelled data to improve model performance.
Challenge 2-methodologies: Small insects are represented by few pixels, so they lack information about their appearance necessary to distinguish the background or differentiate them from other classes.Furthermore, the DL algorithms developed for object detection are limited because most were developed for the detection of medium and large objects. Recommendations: • Multi-scale resource learning.
Resizing the input images to different scales and thus enabling learning at different scales.For this, the use of methodologies with image pyramids in resources, a single resource map, a pyramidal resource hierarchy, integrated resources, a resource pyramid network, resource merging and resource pyramid generation, and a multi-scale merging module is recommended.

• Context-based detection
Object context can play an important role in the performance of methodologies.The context in which the object is detected can help improve object detection performance, especially for detecting small objects.

• GAN based detection
The use of GAN can improve the performance of insect detection because, during training, the discriminator generates the bounding boxes and is made of classification, which is backpropagated to the generator, thus improving detection precision.
Over a period of time, insect traps can get excessively crowded with insects, which can have an adverse effect on the quality of the collected data.To maintain the integrity of the study and ensure that the data remains relevant and reliable, it is crucial to replace these traps.Although deep learning detection methods can be used to detect insects, they are limited when it comes to insect overlay.Therefore, substitution is particularly important when using traps to monitor insect populations or study insect behaviour.However, the process of replacing traps can be time-consuming and labour-intensive, which presents a challenge.To address this issue, a possible solution would be to automatically replace the trap once it reaches its maximum capacity of insects.This would eliminate the need for manual labour and ensure that the data collected is of high quality and accuracy.This approach can be achieved by integrating sensors or image recognition technology into the trap, which can detect the number of insects in the trap and signal the need for replacement.The information collected by the sensor or image recognition system can be sent to a central control unit or a cloud-based system, which can then trigger the replacement of the trap by deploying a new one.This approach can be especially useful for large-scale monitoring programs and precision agriculture, where a large number of traps are deployed in multiple locations.By implementing an automated trap replacement system, the overall efficiency of the monitoring process can be improved, and the accuracy of the data can be maintained, resulting in better decision-making and increased productivity.This will undoubtedly be one of the main challenges from the industry's point of view, and can be overcome by the collaboration between research groups and farmers.

Conclusions
Several DL applications have emerged for insect classification and detection in recent years, and several methodologies have been developed to improve these tasks' results.This article carried out a systematic review on automatic insect detection using DL, where thirty-six articles were analysed considering the inclusion criteria to answer three research questions:

•
(RQ1) What are the methods that obtain better mAP for the task of insect detection?
The method that obtained the best result for detection was the modified Faster R-CNN architecture, where the replacement of RoIPooling with RoIAlign was proposed.However, YOLOv5 also showed high performance, and its use is recommended.

•
(RQ2) What dataset variables have the most significant influence on detection?
The number of classes in the dataset is the factor that most strongly influences insect detection methods.Datasets with many classes tend to negatively influence the result since there is usually a lot of similarity between classes, and the number of images per class is unbalanced.

• (RQ3) What are the main challenges of and recommendations for automatically detecting insects?
Two key open challenges were identified that were related to automatic insect detection using DL: those associated with datasets images and methodologies.For the challenges associated with dataset images, we recommend improving data acquisition, data augmentation, focal loss, and semi-supervised and domain-knowledge algorithms, while for the methodologies, we recommend multi-scale resource learning, context-based detection, and GAN-based detection.
Incorporating advanced insect detection methods is a game-changer in the agriculture industry, and enables the vast improvement of the efficiency, quality, and sustainability of production.By leveraging these technologies, farmers can benefit from early insect detection and swift intervention, significantly reducing crop losses and optimising output.In addition to boosting productivity, these advancements can also minimise the need for harmful chemicals, decreasing chemical contamination and promoting a healthier and more sustainable farming environment.

Figure 1 .
Figure 1.Devices installed in the field to collect images of traps and the respective images collected.(a) Pheromone trap in a vineyard to attract grape moths provided by [22]; (b) yellow sticky traps installed to detect diamondback moths adapted from [23]; (c) light trap to attract 24 major pest classes specified by the Chinese Ministry of Agriculture.Images adapted from the dataset Pest24 [24].

Figure 1 .
Figure 1.Devices installed in the field to collect images of traps and the respective images collected.(a) Pheromone trap in a vineyard to attract grape moths provided by [22]; (b) yellow sticky traps installed to detect diamondback moths adapted from [23]; (c) light trap to attract 24 major pest classes specified by the Chinese Ministry of Agriculture.Images adapted from the dataset Pest24 [24].

Figure 2 .
Figure 2.Examples of insect classification and detection tasks.The classification example of of small brown plant hopper and aphids on plant images.The first example of a detection task is the detection of grape moths on pheromone trap, image provided by[22]; the second example is the detection of army worm on plants images.Images with small brown plant hopper, aphids and army worm were adapted from the public dataset IP102[42].

Figure 2 .
Figure 2.Examples of insect classification and detection tasks.The classification example of of small brown plant hopper and aphids on plant images.The first example of a detection task is the detection of grape moths on pheromone trap, image provided by[22]; the second example is the detection of army worm on plants images.Images with small brown plant hopper, aphids and army worm were adapted from the public dataset IP102[42].

Figure 3 .
Figure 3. Flow diagram of the selection of the papers.

Figure 3 .
Figure 3. Flow diagram of the selection of the papers.

Figure 4 .
Figure 4. Example of the high amount of M. thunbergianae in sticky traps.The original image has a resolution of 6000 × 4000 pixels, and the size of the ground truth bounding box of M. thunbergianae was 60 × 60 pixels on average.Images adapted from [59].

Figure 4 .
Figure 4. Example of the high amount of M. thunbergianae in sticky traps.The original image has a resolution of 6000 × 4000 pixels, and the size of the ground truth bounding box of M. thunbergianae was 60 × 60 pixels on average.Images adapted from [59].

Figure 5 .
Figure 5.Examples of similarity between species, different positions of the same insect, and the different colours of the same insect.Images from the dataset Pest24 adopted from[24] and provided by[22].

Table 1 .
Study analysis based on image processing and ML techniques.

Table 1 .
Study analysis based on image processing and ML techniques.

Table 2 .
Study analysis of classification with image scenario on plants.

Table 2 .
Study analysis of classification with image scenario on plants.

Table 3 .
Study analysis of detection with standard detectors.
n.a.= information not available.

Table 4 .
Study analysis of detection with combined/adapted methodologies.
n.a.= information not available.