Research on Identiﬁcation Technology of Field Pests with Protective Color Characteristics

: Accurate identiﬁcation of ﬁeld pests has crucial decision-making signiﬁcance for integrated pest control. Most current research focuses on the identiﬁcation of pests on the sticky card or the case of great differences between the target and the background. There is little research on ﬁeld pest identiﬁcation with protective color characteristics. Aiming at the problem that it is difﬁcult to identify pests with protective color characteristics in the complex ﬁeld environment, a ﬁeld pest identiﬁcation method based on near-infrared imaging technology and YOLOv5 is proposed in this paper. Firstly, an appropriate infrared ﬁlter and ring light source have been selected to build an image acquisition system according to the wavelength with the largest spectral reﬂectance difference between the spectral curves of the pest ( Pieris rapae ) and its host plants (cabbage), which are formed by speciﬁc spectral characteristics. Then, ﬁeld pest images have been collected to construct a data set, which has been trained and tested through YOLOv5. Experimental results demonstrate that the average time required to detect one pest image is 0.56 s, and the mAP reaches 99.7%.


Introduction
Accurate identification of field pests can provide basic data for scientific pest control. It is an essential prerequisite for effective pest investigation, pest prediction, and accurate pest killing [1][2][3], as well as a critical foundation for appropriate pesticide application, contributing to decision-making significance for integrated pest control [4][5][6].
In recent years, the automatic pest identification method based on digital image processing technology has become a research hotspot for experts and scholars [7][8][9]. The traditional machine learning technology mainly includes three steps: image preprocessing, feature extraction, and pest identification [10,11]. Ebrahimi M.A. et al. [12] proposed a method to identify thrips using the SVM (Support Vector Machines) method with region index and intensify as the color index. The average error of the classification was less than 2.25%. Yao et al. [13] developed a rice light-trap insect imaging system to automate rice pest identification. The experimental results revealed that the average accuracy of the identification of the four species of Lepidoptera rice pests was 97.5%. Wen et al. [14] designed an invariant local feature-based insect classification method to automatically classify certain common insects in orchards.
Although the traditional machine learning technology has made great progress in pest identification, its identification effect depends on the effect of feature extraction and the performance of the selected classifier, resulting in weak generalization ability and poor robustness of its identification model [15]. Agriculture field pests are a kind of visual target with small sizes and diverse posture changes. Additionally, its identification environment (such as Pieris rapae), the identification model of deep learning with strong genera ability is more suitable for field pest identification. This method adopts a convo layer, activation layer, normalization layer, and pooling layer to continuously s pose, automatically extracts the characteristics of pest, and recognizes pests thro fully connected layer [16,17]. Lu et al. [18] proposed a classification algorithm b feature optimization to identify rice planthoppers and reached the identification a of 96.19%. Zhang et al. [19] improved the Faster R-CNN (Convolutional Neural Ne model by replacing the VGG16 (Visual Geometry Group) with the depth residual (ResNet50) to identify aphids and leaf miners on sticky cards. The results sugges the precision of the improved Faster R-CNN model reached 90.7%. Patel D. J. an N. [20] compared three widely used deep learning meta-architectures (Faster R-CN (Single Shot MultiBox Detector) Inception, and SSD Mobilenet) as object detectio lected flying insects, and Faster R-CNN meta-architecture presented the most outs performance with an accuracy of 95.33%. Thenmozhi K. and Reddy U.S. [21] prop efficient deep CNN model to classify insect species on three publicly available in tasets. Rustia et al. [22] designed a multi-class insect identification method fo sticky paper and obtained it from wireless cameras using cascaded convolutiona networks. The multi-class insect classifier had an accuracy of 86-92%. Although t tification accuracy of the above methods is high, most of them are for the identific pests on the sticky card, or for the identification of pests with large differences target and background. At present, there are few studies on field pest identificat protective color characteristics.
Pieris rapae and its host plant (cabbage) with a similar color to Pieris rapae lected as experimental objects in this paper ( Figure 1). As an extension of compute technology, near infrared imaging technology, especially the conventional imagin first NIR (NIR-I) window of 700 to 900 nm [23], can distinguish the target objects to the background in appearance characteristics [24]. It is widely used in insect identification [25,26] and plant disease monitoring [27], but there is little research identification. Thus, near-infrared imaging technology and YOLOv5 have been the identification of pests with protective color. Firstly, the average spectral chara curves of Pieris rapae and cabbage were obtained by hyperspectral experiment. By ing and comparing these two curves, the wavelength with the largest difference in reflectance is obtained. According to this wavelength, the appropriate infrared fil light source and other image acquisition equipment are selected to build an imag sition platform. Collect a large number of pest images, and construct pest image by optimizing and expanding pest images. Finally, the appropriate deep learnin (YOLOv5) is selected to achieve the identification of field pests with protective co acteristics.

Hyperspectral Test
The hyperspectral test platform is illustrated in Figure 2. The platform uses SOC 710 portable hyperspectral spectrometer produced in the United States to collect spectral data of cabbage and Pieris rapae in a good life state. The spectrometer is composed of a built-in push-broom mode, a total of 128 wavelengths, a spectral range of 400-1000 nm, and a spectral resolution of 4.6875 nm. The platform can collect images by setting the acquisition wavelength. The imaging speed is 30 lines per second, and the image resolution is 696 pixel × 510 pixel. The light source adopts a controllable halogen lamp powered by a precision-regulated power supply. The height of the objective table is adjustable.

Hyperspectral Test
The hyperspectral test platform is illustrated in Figure 2. The platform uses SOC portable hyperspectral spectrometer produced in the United States to collect spectral of cabbage and Pieris rapae in a good life state. The spectrometer is composed of a bu push-broom mode, a total of 128 wavelengths, a spectral range of 400-1000 nm, a spectral resolution of 4.6875 nm. The platform can collect images by setting the acquis wavelength. The imaging speed is 30 lines per second, and the image resolution i pixel × 510 pixel. The light source adopts a controllable halogen lamp powered by a cision-regulated power supply. The height of the objective table is adjustable. Pieris rapae was collected from the experimental field in Yunyuan of Hunan Ag tural University. To effectively reduce measurement errors, 20 Pieris rapae of fifth l instars were randomly divided into 4 groups, and then the imaging spectral data measured. Before measurement, Pieris rapae and the reflection reference plate were p on the objective table. Additionally, the height of the objective table and the light inte were adjusted to make the image in the clearest state. The spectrogram was corrected the reflection reference plate. In the process of measurement, the angle of the viewing of the spectrometer was adjusted to 15°, and the distance between the lens and the sa was set to 28 cm [28]. The surveyors in the dark-colored clothes without strong refle operated the instrument at the backlight to collect spectral images. Then, the spectra age was imported into SRAnal710e software to calibrate black field and space, spect and spectral radiation. On this basis, the reflectivity was converted. Finally, the spe reflectance data of Pieris rapae is extracted by ENVI5.3 software. The hyperspectral d the 7th to 9th abdominal segments of Pieris rapae were used as the hyperspectral da this Pieris rapae [29]. In each group, the mean reflectance at the 7th to 9th abdomina ments of 5 Pieris rapae was taken as the spectral reflectance of this group [30]. The 5weighted smoothing of spectral data obtained by MATLAB software can effectively inate the influence of interference factors of original spectral data [31].
The leaf head, inner leaves (1-6 leaves outside the leaf head), and outer leaves (l outside the seventh leaf) of cabbage are the main edible parts of Pieris rapae. The damaged by insects of inner leaves account for 62.6-72.6% of the total damaged ar insects [32]. Therefore, the inner leaves of cabbage were selected as the main experim object. The hyperspectral experimental scheme of cabbage was consistent with that Pieris rapae was collected from the experimental field in Yunyuan of Hunan Agricultural University. To effectively reduce measurement errors, 20 Pieris rapae of fifth larval instars were randomly divided into 4 groups, and then the imaging spectral data were measured. Before measurement, Pieris rapae and the reflection reference plate were placed on the objective table. Additionally, the height of the objective table and the light intensity were adjusted to make the image in the clearest state. The spectrogram was corrected with the reflection reference plate. In the process of measurement, the angle of the viewing field of the spectrometer was adjusted to 15 • , and the distance between the lens and the sample was set to 28 cm [28]. The surveyors in the dark-colored clothes without strong reflection operated the instrument at the backlight to collect spectral images. Then, the spectral image was imported into SRAnal710e software to calibrate black field and space, spectrum, and spectral radiation. On this basis, the reflectivity was converted. Finally, the spectral reflectance data of Pieris rapae is extracted by ENVI5.3 software. The hyperspectral data of the 7th to 9th abdominal segments of Pieris rapae were used as the hyperspectral data of this Pieris rapae [29]. In each group, the mean reflectance at the 7th to 9th abdominal segments of 5 Pieris rapae was taken as the spectral reflectance of this group [30]. The 5-point weighted smoothing of spectral data obtained by MATLAB software can effectively eliminate the influence of interference factors of original spectral data [31].
The leaf head, inner leaves (1-6 leaves outside the leaf head), and outer leaves (leaves outside the seventh leaf) of cabbage are the main edible parts of Pieris rapae. The areas damaged by insects of inner leaves account for 62.6-72.6% of the total damaged area by insects [32]. Therefore, the inner leaves of cabbage were selected as the main experimental object. The hyperspectral experimental scheme of cabbage was consistent with that of Pieris rapae. In this study, 5 points are measured for each cabbage leaf, and each point is repeated three times to take the mean value. The mean value was smoothed by 5-point weighted smoothing as the hyperspectral characteristic curve of the leaf.

Pest Image Data Set
Using the Pieris rapae and cabbage spectral information obtained by hyperspectral technology, specific spectral characteristics of Pieris rapae and cabbage can be formed on the spectral curves, and comparing these two curves to get the wavelength with the largest reflectivity difference [33,34]. According to this wavelength, the appropriate camera, light source, filter, and other key components are selected to build the image acquisition system. As exhibited in Figure 3, the system mainly consists of the color camera of The Imaging Source with model DFK 41BU02, industrial lens of Computar with a focal length of 8.5 mm, 850 nm infrared filter, and 850 nm ring light source.

Pest Image Data Set
Using the Pieris rapae and cabbage spectral information obtained by hyperspectral technology, specific spectral characteristics of Pieris rapae and cabbage can be formed on the spectral curves, and comparing these two curves to get the wavelength with the largest reflectivity difference [33,34]. According to this wavelength, the appropriate camera, light source, filter, and other key components are selected to build the image acquisition system. As exhibited in Figure 3, the system mainly consists of the color camera of The Imaging Source with model DFK 41BU02, industrial lens of Computar with a focal length of 8.5 mm, 850 nm infrared filter, and 850 nm ring light source. With the image acquisition system, pest images were collected in the cabbage plantation of Yunyuan, Hunan Agricultural University. With the purpose of improving the robustness of the identification algorithm, the original data set covers Pieris rapae under different camera angles, different postures, and other conditions, as well as images with occlusion and overlap, such as Pieris rapae in curled state and Pieris rapae in an extended state (Figure 4a   With the image acquisition system, pest images were collected in the cabbage plantation of Yunyuan, Hunan Agricultural University. With the purpose of improving the robustness of the identification algorithm, the original data set covers Pieris rapae under different camera angles, different postures, and other conditions, as well as images with occlusion and overlap, such as Pieris rapae in curled state and Pieris rapae in an extended state (Figure 4a (Figure 4g,h). After image acquisition, 500 pest images with high image quality are obtained by manually screening pest images to eliminate the blurred images and distorted images.
The original image data set was expanded through data enhancement to enhance the diversity of the data set, avoid overfitting, and boost the generalization ability and robustness of the identification algorithm [35][36][37]. Common data enhancement methods include rotation, flip, clipping, adding noise, jitter, blur, translation, and staggered transformation [38][39][40]. In this paper, the original image data set was expanded from 500 images to 1500 images through rotation, flip, translation, and changing brightness considering the factors such as the influence of camera angle and light intensity (including the lighting conditions simulating sunny or cloudy days and exposure or insufficient light) on the identification algorithm ( Figure 5). occlusion and overlap, such as Pieris rapae in curled state and Pieris rapae in an exte state (Figure 4a,b), a Pieris rapae and multiple Pieris rapae (Figure 4c,d), unobstructed P rapae and covered Pieris rapae (Figure 4e,f), and Pieris rapae on the left side of the im and Pieris rapae in the middle of the image (Figure 4g,h). After image acquisition, 500 images with high image quality are obtained by manually screening pest images to e inate the blurred images and distorted images. The original image data set was expanded through data enhancement to enhanc diversity of the data set, avoid overfitting, and boost the generalization ability and ro ness of the identification algorithm [35][36][37]. Common data enhancement methods inc rotation, flip, clipping, adding noise, jitter, blur, translation, and staggered transformation [38][39][40]. In this paper, the original image data set was expanded from 500 images to 1500 images through rotation, flip, translation, and changing brightness considering the factors such as the influence of camera angle and light intensity (including the lighting conditions simulating sunny or cloudy days and exposure or insufficient light) on the identification algorithm ( Figure 5). The pest images were labeled one by one by labelimg software. The images of Pieris rapae were labeled with a rectangular box and then named. The annotation information was saved in the format of the Pascal (Pattern Analysis, Statical Modeling and Computational Learning) VOC (Visual Object Classes) dataset, which contained the coordinates, labels, and serial numbers of each box. Pest images, labeled files, and other files are built into the dataset following the directory structure of the Pascal VOC dataset. Then, the pest images and annotation files are divided into training set, verification set and test set according to the proportion of 6:2:2, respectively. The training set was employed to fit the detection network. The validation set was adopted to adjust the super parameters of the detection network and preliminarily evaluate the network performance. The test set is used to evaluate the generalization ability of the final model.

Pest Identification Model
There are many kinds of target detection algorithms based on deep learning, and YOLO (You Only Look Once) is one of the most advanced target detection methods [41]. Different from the target detection algorithm based on region prediction, YOLO directly extracts features from the network to predict object classification and location. In this study, the prediction size of a fixed format was obtained to make the convolutional neural network traverse the whole image. Firstly, the image was adjusted to a fixed size of 416×416 and then divided into 13 × 13 nonoverlapping grid cells. Next, B possible bounding boxes and confidence were detected for each cell, including 5 prediction parameters: x, y, w, h, and confidence. Among them, (x, y) represents the coordinates of the target, (w, h) indicates the width and height of the outer rectangle of the target, and the confidence is used to trade off the prediction results through the threshold [42]. The pest images were labeled one by one by labelimg software. The images of Pieris rapae were labeled with a rectangular box and then named. The annotation information was saved in the format of the Pascal (Pattern Analysis, Statical Modeling and Computational Learning) VOC (Visual Object Classes) dataset, which contained the coordinates, labels, and serial numbers of each box. Pest images, labeled files, and other files are built into the dataset following the directory structure of the Pascal VOC dataset. Then, the pest images and annotation files are divided into training set, verification set and test set according to the proportion of 6:2:2, respectively. The training set was employed to fit the detection network. The validation set was adopted to adjust the super parameters of the detection network and preliminarily evaluate the network performance. The test set is used to evaluate the generalization ability of the final model.

Pest Identification Model
There are many kinds of target detection algorithms based on deep learning, and YOLO (You Only Look Once) is one of the most advanced target detection methods [41]. Different from the target detection algorithm based on region prediction, YOLO directly extracts features from the network to predict object classification and location. In this study, the prediction size of a fixed format was obtained to make the convolutional neural network traverse the whole image. Firstly, the image was adjusted to a fixed size of 416 × 416 and then divided into 13 × 13 nonoverlapping grid cells. Next, B possible bounding boxes and confidence were detected for each cell, including 5 prediction parameters: x, y, w, h, and confidence. Among them, (x, y) represents the coordinates of the target, (w, h) indicates the width and height of the outer rectangle of the target, and the confidence is used to trade off the prediction results through the threshold [42].
Compared with YOLOv1, YOLOv2 improves the performance of the model by referring to anchors [43]. In YOLOv3, multi-dimensional anchors and residual networks are used to further improve the performance of the model [44]. In YOLOv4, the Backbone network adopts CSPDarknet53 (Cross Stage Partial), while PANet (Path Aggregation Network) and SPP-Net (Spatial Pyramid Pooling Network) are introduced to adapt to the input of different sizes [45]. In YOLOv5, Mosaic data enhancement, adaptive anchor, and adaptive picture scaling are employed in the input. The Backbone network can quickly extract the features of the target adopts through Focus and CSPNet (Cross Stage Partial Network). In the Neck network, FPN (Feature Pyramid Network) and PANet are used for multi-scale fusion of the extracted features. Besides, GIoU_Loss (Generalized Intersection over Union) is used as the loss function of the target detection frame in the output. The NMS (Non-maximum suppression) is introduced to filter out the overlapping candidate frames and obtain the best prediction output. These improvements ensure the accuracy and speed of YOLOv5 on small targets. Additionally, YOLOv5 has advantages such as shallow structure, small weight file, and relatively low requirements for equipment configuration [46]. The structure of YOLOv5 is illustrated in Figure 6. The CBL module is composed of Convolution layer, BN (Batch Norm) layer and Leaky_relu activation function, and it is a basic convolution module. The BottleneckCSP module mainly performs feature extraction on the feature map and extracts rich information from the image [47].
Appl. Sci. 2022, 12, x FOR PEER REVIEW 6 of and speed of YOLOv5 on small targets. Additionally, YOLOv5 has advantages such shallow structure, small weight file, and relatively low requirements for equipment co figuration [46]. The structure of YOLOv5 is illustrated in Figure 6. The CBL module composed of Convolution layer, BN (Batch Norm) layer and Leaky_relu activation fun tion, and it is a basic convolution module. The BottleneckCSP module mainly perform feature extraction on the feature map and extracts rich information from the image [47] YOLOv5 forms different models with different parameters by adjusting the dep and width of the BottleneckCSP module: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5 With the deepening and widening of the network, the ability of the network to featu extraction and feature fusion is enhanced by sacrificing speed. In YOLOv5, the detecti speed of YOLOv5s is the fastest, and the precision of YOLOv5x is the highest. Compr hensively considering the complexity and variability of field pest identification and t needs of practical application scenarios, the requirements for the detection speed YOLOv5 are relatively higher than the requirements for identification accuracy. Ther fore, YOLOv5l guaranteeing both speed and identification accuracy is selected to achie pest identification with protective color characteristics.

Comparison and Analysis of Spectral Characteristics
The comparison curve of spectral characteristics between cabbage and Pieris rapae YOLOv5 forms different models with different parameters by adjusting the depth and width of the BottleneckCSP module: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. With the deepening and widening of the network, the ability of the network to feature extraction and feature fusion is enhanced by sacrificing speed. In YOLOv5, the detection speed of YOLOv5s is the fastest, and the precision of YOLOv5x is the highest. Comprehensively considering the complexity and variability of field pest identification and the needs of practical application scenarios, the requirements for the detection speed of YOLOv5 are relatively higher than the requirements for identification accuracy. Therefore, YOLOv5l guaranteeing both speed and identification accuracy is selected to achieve pest identification with protective color characteristics.

Comparison and Analysis of Spectral Characteristics
The comparison curve of spectral characteristics between cabbage and Pieris rapae is presented in Figure 7. As observed from the figure, the spectral reflectance of cabbage was generally higher than that of Pieris rapae. In the visible band range of 375-690 nm, there was little difference in the spectral reflectance between cabbage and Pieris rapae. The spectral reflectance of cabbage was significantly different from that of Pieris rapae when the wavelength was greater than 690 nm. In the range of 780-1000 nm in the near-infrared band, there is a large difference in the spectral reflectance between cabbage and Pieris rapae. As shown in Figure  8, the spectral reflectance difference between cabbage and Pieris rapae is the largest at 823 nm wavelength. Based on the products of many filter production companies on the market, the optional filters in the near-infrared band range (780-1000 nm) are divided into 850 nm and 950 nm. Therefore, 850 nm infrared filter and 850 nm ring light source were selected to acquire pest images. Cabbage is sensitive at 850 nm. Figures 4 and 5 presents the pest image collected with an 850 nm infrared filter and 850 nm ring light source. It can be seen from the figures that the cabbage area is brighter in the pest image while the pest area is darker. Therefore, the As observed from the figure, the spectral reflectance of cabbage was generally higher than that of Pieris rapae. In the visible band range of 375-690 nm, there was little difference in the spectral reflectance between cabbage and Pieris rapae. The spectral reflectance of cabbage was significantly different from that of Pieris rapae when the wavelength was greater than 690 nm. In the range of 780-1000 nm in the near-infrared band, there is a large difference in the spectral reflectance between cabbage and Pieris rapae. As shown in Figure 8, the spectral reflectance difference between cabbage and Pieris rapae is the largest at 823 nm wavelength. Based on the products of many filter production companies on the market, the optional filters in the near-infrared band range (780-1000 nm) are divided into 850 nm and 950 nm. Therefore, 850 nm infrared filter and 850 nm ring light source were selected to acquire pest images. As observed from the figure, the spectral reflectance of cabbage was generally higher than that of Pieris rapae. In the visible band range of 375-690 nm, there was little difference in the spectral reflectance between cabbage and Pieris rapae. The spectral reflectance of cabbage was significantly different from that of Pieris rapae when the wavelength was greater than 690 nm. In the range of 780-1000 nm in the near-infrared band, there is a large difference in the spectral reflectance between cabbage and Pieris rapae. As shown in Figure  8, the spectral reflectance difference between cabbage and Pieris rapae is the largest at 823 nm wavelength. Based on the products of many filter production companies on the market, the optional filters in the near-infrared band range (780-1000 nm) are divided into 850 nm and 950 nm. Therefore, 850 nm infrared filter and 850 nm ring light source were selected to acquire pest images. Cabbage is sensitive at 850 nm. Figures 4 and 5 presents the pest image collected with an 850 nm infrared filter and 850 nm ring light source. It can be seen from the figures that the cabbage area is brighter in the pest image while the pest area is darker. Therefore, the application of an 850 nm infrared filter and 850 nm ring light source can clearly distin- Cabbage is sensitive at 850 nm. Figures 4 and 5 presents the pest image collected with an 850 nm infrared filter and 850 nm ring light source. It can be seen from the figures that the cabbage area is brighter in the pest image while the pest area is darker. Therefore, the application of an 850 nm infrared filter and 850 nm ring light source can clearly distinguish Pieris rapae from cabbage.

Model Training and Performance Evaluation
The operating system is Windows 10, the CPU (Center Processing Unit) is Intel (R) Xeon (R) CPU e5-2623 V3 × 2, the GPU (Graphic Process Unit) is NVIDIA geforce rtx2080 with 32 GB video memory, and the framework is pytoch.
After the training, we can get curves of loss value of bounding box, objectness and classification. Classification loss inspires how well the algorithm can predict the correct class of a given object [48]. Given only one identification target (Pieris rapae) in this paper and no classification of multiple objects, there is no curve of classification loss. As demonstrated in Figure 9, it represents curve of loss value of bounding box. The graph on the left shows the bounding box loss of the training set. The graph on the right shows the bounding box loss of the verification set. Box loss indicates the extent to which the algorithm can position the center of the target and the extent to which the predicted bounding box covers the target. The abscissa of the curve is the epoches of the algorithm, and the ordinate represents the value of box loss. The smaller the value of box loss, the more accurate the predicted bounding box is.
R PEER REVIEW 8 of 12 The operating system is Windows 10, the CPU (Center Processing Unit) is Intel (R) Xeon (R) CPU e5-2623 V3 × 2, the GPU (Graphic Process Unit) is NVIDIA geforce rtx2080 with 32 GB video memory, and the framework is pytoch.
After the training, we can get curves of loss value of bounding box, objectness and classification. Classification loss inspires how well the algorithm can predict the correct class of a given object [48]. Given only one identification target (Pieris rapae) in this paper and no classification of multiple objects, there is no curve of classification loss. As demonstrated in Figure 9, it represents curve of loss value of bounding box. The graph on the left shows the bounding box loss of the training set. The graph on the right shows the bounding box loss of the verification set. Box loss indicates the extent to which the algorithm can position the center of the target and the extent to which the predicted bounding box covers the target. The abscissa of the curve is the epoches of the algorithm, and the ordinate represents the value of box loss. The smaller the value of box loss, the more accurate the predicted bounding box is. As demonstrated in Figure 10, it represents curve of the value of objectness loss. The graph on the left shows the objectness loss of the training set. The graph on the right shows the objectness loss of the verification set. Objectness loss measures the probability that an object exists in a proposed region of interest. If the objectivity is high, the bounding box is likely to contain an object. The abscissa of the curve is the epochs of the algorithm, and the ordinate represents the value of objectness loss. The smaller the value of objectness loss, the more accurate the target detection is. The accuracy evaluation of the identification model mainly consists of visual comparison and performance evaluation index. The visual comparison is to obtain the missing detection and wrong detection of pests through comparison [49]. The performance evaluation index contains identification accuracy and identification speed. The speed index re- As demonstrated in Figure 10, it represents curve of the value of objectness loss. The graph on the left shows the objectness loss of the training set. The graph on the right shows the objectness loss of the verification set. Objectness loss measures the probability that an object exists in a proposed region of interest. If the objectivity is high, the bounding box is likely to contain an object. The abscissa of the curve is the epochs of the algorithm, and the ordinate represents the value of objectness loss. The smaller the value of objectness loss, the more accurate the target detection is.
R PEER REVIEW 8 of 12 The operating system is Windows 10, the CPU (Center Processing Unit) is Intel (R) Xeon (R) CPU e5-2623 V3 × 2, the GPU (Graphic Process Unit) is NVIDIA geforce rtx2080 with 32 GB video memory, and the framework is pytoch.
After the training, we can get curves of loss value of bounding box, objectness and classification. Classification loss inspires how well the algorithm can predict the correct class of a given object [48]. Given only one identification target (Pieris rapae) in this paper and no classification of multiple objects, there is no curve of classification loss. As demonstrated in Figure 9, it represents curve of loss value of bounding box. The graph on the left shows the bounding box loss of the training set. The graph on the right shows the bounding box loss of the verification set. Box loss indicates the extent to which the algorithm can position the center of the target and the extent to which the predicted bounding box covers the target. The abscissa of the curve is the epoches of the algorithm, and the ordinate represents the value of box loss. The smaller the value of box loss, the more accurate the predicted bounding box is. As demonstrated in Figure 10, it represents curve of the value of objectness loss. The graph on the left shows the objectness loss of the training set. The graph on the right shows the objectness loss of the verification set. Objectness loss measures the probability that an object exists in a proposed region of interest. If the objectivity is high, the bounding box is likely to contain an object. The abscissa of the curve is the epochs of the algorithm, and the ordinate represents the value of objectness loss. The smaller the value of objectness loss, the more accurate the target detection is. The accuracy evaluation of the identification model mainly consists of visual comparison and performance evaluation index. The visual comparison is to obtain the missing detection and wrong detection of pests through comparison [49]. The performance evaluation index contains identification accuracy and identification speed. The speed index refers to the average time required to identify a pest image. The basic indicators of identification accuracy are precision (P) and recall (R). Precision indicates the proportion of the The accuracy evaluation of the identification model mainly consists of visual comparison and performance evaluation index. The visual comparison is to obtain the missing detection and wrong detection of pests through comparison [49]. The performance evaluation index contains identification accuracy and identification speed. The speed index refers to the average time required to identify a pest image. The basic indicators of identification accuracy are precision (P) and recall (R). Precision indicates the proportion of the actual positive samples in the forecast samples to all positive samples. Recall indicates the proportion of actual positive samples in all predicted samples. The classification problem of Pieris rapae can be considered as a binary classification problem. In the classification problem, Pieris rapae is a positive sample and all types of background are negative samples. Assuming that the positive sample is expressed as T and the negative sample is expressed as P, the calculation formulas of precision and recall is as follows: where TP represents the number of positive samples correctly predicted as positive samples, TN denotes the number of negative samples correctly predicted as negative samples, FP indicates the number of negative samples predicted as positive samples, and FN suggests the number of positive samples predicted as negative samples [50,51]. The curve of precision and recall are shown in Figure 11. The graph on the left shows the curve of precision.
The graph on the right shows the curve of recall. The model improved swiftly in terms of precision and recall before plateauing after about 20 epochs.
ER REVIEW 9 of 12 where TP represents the number of positive samples correctly predicted as positive samples, TN denotes the number of negative samples correctly predicted as negative samples, FP indicates the number of negative samples predicted as positive samples, and FN suggests the number of positive samples predicted as negative samples [50,51]. The curve of precision and recall are shown in Figure 11. The graph on the left shows the curve of precision. The graph on the right shows the curve of recall. The model improved swiftly in terms of precision and recall before plateauing after about 20 epochs. P-R curve is a graph showing the relationship between precision and recall. The abscissa is recall and the ordinate is precision. The area enclosed by P-R curve and coordinate axis is the AP (Average Precision) of the model. The larger the area between the curve and the coordinate axis, the better its recognition effect. Figure 12 shows the P-R curve with a threshold of 0.5 generated in the training process. Since there is only one recognition target in this paper, the AP is equal to the mAP (mean Average Precision). The mAP is 99.7%. Additionally, the average time required to detect a pest image with a resolution of 480 × 460 is 0.56 s. Figure 11. Curve of precision and recall. P-R curve is a graph showing the relationship between precision and recall. The abscissa is recall and the ordinate is precision. The area enclosed by P-R curve and coordinate axis is the AP (Average Precision) of the model. The larger the area between the curve and the coordinate axis, the better its recognition effect. Figure 12 shows the P-R curve with a threshold of 0.5 generated in the training process. Since there is only one recognition target in this paper, the AP is equal to the mAP (mean Average Precision). The mAP is 99.7%. Additionally, the average time required to detect a pest image with a resolution of 480 × 460 is 0.56 s. and the coordinate axis, the better its recognition effect. Figure 12 shows the P-R with a threshold of 0.5 generated in the training process. Since there is only one r tion target in this paper, the AP is equal to the mAP (mean Average Precision). Th is 99.7%. Additionally, the average time required to detect a pest image with a res of 480 × 460 is 0.56 s.

Conclusions
In this paper, a field pest identification method based on YOLOv5 and hypers technology was proposed. The results have demonstrated that this method can effe identify pests with protective color characteristics in the complex field environme In the process of collecting pest images, to realize the identification of pests w tective color characteristics, obtain the Pieris rapae and cabbage spectral informa

Conclusions
In this paper, a field pest identification method based on YOLOv5 and hyperspectral technology was proposed. The results have demonstrated that this method can effectively identify pests with protective color characteristics in the complex field environment.
In the process of collecting pest images, to realize the identification of pests with protective color characteristics, obtain the Pieris rapae and cabbage spectral information by hyperspectral technology before image acquisition, specific spectral characteristics of Pieris rapae and cabbage can be formed on the spectral curves. Comparing these two curves to get the wavelength with the largest reflectivity difference, and an appropriate infrared filter and ring light source are selected to build an image acquisition system. In order to improve the accuracy of pest identification, we collect pest images in different situations, expanded the original pest data set by data enhancement and select the appropriate target identification algorithm (YOLOv5). The detection results of the test set showed that compared with the existing research articles [37,38,52,53], the combination of YOLOv5 and hyperspectral technology can effectively identify field pests with protective color characteristics. This paper takes Pieris rapae and its host plant (cabbage) as the experimental object, its mAP was 99.7%, and the average time required to detect a pest image is 0.56 s.
Considering the future application scenario of pest identification, the current algorithm has some limitations in detection speed. In order to improve the detection speed of target detection algorithm, efficient models can be designed to accelerate the algorithms, such as decreasing the redundancy in weights by network pruning and knowledge distillation. While improving the detection speed, how to ensure the detection accuracy is also an aspect to be considered in the future. In addition, although only one pest with protective color characteristics (Pieris rapae) is considered in this paper, the relevant literature has proved that the near-infrared technology can distinguish the target objects with similar appearance characteristics and background [24], so this method can still be used to identify other pests with protective color characteristics and their host plants. For different pests, only the wavelength with the largest spectral reflectance difference between pests with protective color characteristics and their host plants needs to be obtained through hyperspectral test, so as to select the appropriate infrared filter. Replace the original infrared filter on the original image acquisition platform. That is, almost the same setting can be implemented in many different situations. In the future, other pests with protective color characteristics will be tested to further improve the universality of this method.