Deep-Learning-Based Rice Disease and Insect Pest Detection on a Mobile Phone

: The realization that mobile phones can detect rice diseases and insect pests not only solves the problems of low efﬁciency and poor accuracy from manually detection and reporting, but it also helps farmers detect and control them in the ﬁeld in a timely fashion, thereby ensuring the quality of rice grains. This study examined two Improved detection models for the detection of six high-frequency diseases and insect pests. These models were the Improved You Only Look Once (YOLO)v5s and YOLOv7-tiny based on their lightweight object detection networks. The Improved YOLOv5s was introduced with the Ghost module to reduce computation and optimize the model structure, and the Improved YOLOv7-tiny was introduced with the Convolutional Block Attention Module (CBAM) and SIoU to improve model learning ability and accuracy. First, we evaluated and analyzed the detection accuracy and operational efﬁciency of the models. Then we deployed two proposed methods to a mobile phone. We also designed an application to further verify their practicality for detecting rice diseases and insect pests. The results showed that Improved YOLOv5s achieved the highest F1-Score of 0.931, 0.961 in mean average precision (mAP) (0.5), and 0.648 in mAP (0.5:0.9). It also reduced network parameters, model size, and the ﬂoating point operations per second (FLOPs) by 47.5, 45.7, and 48.7%, respectively. Furthermore, it increased the model inference speed by 38.6% compared with the original YOLOv5s model. Improved YOLOv7-tiny outperformed the original YOLOv7-tiny in detection accuracy, which was second only to Improved YOLOv5s. The probability heat maps of the detection results showed that Improved YOLOv5s performed better in detecting large target areas of rice diseases and insect pests, while Improved YOLOv7-tiny was more accurate in small target areas. On the mobile phone platform, the precision and recall of Improved YOLOv5s under FP16 accuracy were 0.925 and 0.939, and the inference speed was 374 ms/frame, which was superior to Improved YOLOv7-tiny. Both of the proposed improved models realized accurate identiﬁcation of rice diseases and insect pests. Moreover, the constructed mobile phone application based on the improved detection models provided a reference for realizing fast and efﬁcient ﬁeld diagnoses.


Introduction
Effective detection and monitoring of rice diseases and insect pests is crucial for ensuring a high and stable rice yield [1].The traditional manual identification method cannot meet the increasing demand for field disease and pest detection due to its low efficiency and inconsistent diagnostic results.Therefore, it is important to find an objective and efficient automatic detection method for high-quality rice production.
In the last two decades, many scholars have performed research on image-based automatic identification of diseases and insect pests in crops.Traditional machine-learning methods can classify crop diseases and pests through manually designed feature extraction and classification [2][3][4].However, due to the difficulty with manually designed features and the poor generalizability of the specific models, many achievements have not been put into actual production [5][6][7].With advances of artificial intelligence, the deep-learning method, which has powerful feature learning and extraction capabilities, has been widely used in complex agricultural scenarios [8][9][10], particularly in disease and insect pest identification in crops such as tomatoes [11], apples [12], cassava [13] and rice.In research on rice diseases and insect pests, Rahman et al. (2020) [14] suggested a real-time classification approach to classify 9 types of rice disease and pest images (e.g., false smut, brown plant hopper, and background leaf blight) conducted by a deep convolutional neural network (CNN).Joshi et al. (2022) [15] proposed a mobile phone application named RiceBioS to identify biotic stress in rice crops (rice blast and bacterial leaf blight).Deng et al. (2021) [16] developed a smartphone rice disease identification application that combined deep-learning classification methods.Using the deep CNN model deployed on an online server, it classified 6 types of rice diseases and insect pests such as rice leaf blast, false smut, and neck blast.The above research categorized crop diseases and insect pests by using deeplearning image classification, but this by itself cannot accurately and effectively detect areas of crop diseases and pests in images.Although the online deployment of deep-learning models and mobile-client interaction can to some extent alleviate the problem of high consumption of computing resources because of the complicated structure deep-learning models, it increases the response time of model inference and identification, which is not conducive to offline detection and identification of rice diseases and insect pests.
Deep-learning object detection is an end-to-end method that accurately identifies categories of diseases and insect pests and extracts occurrence information from images.The current dominant object detection algorithms are mainly two types: two-stage and onestage.The Faster R-CNN introduced by Ren et al. (2015) [17] is a representative two-stage algorithm.However, it has not been able to achieve rapid identification due to its two-stage detection method of feature extraction and region-proposal generation by a convolutional network.Redmon et al. (2016) [18] introduced the one-stage object detection algorithm You Only Look Once (YOLO), which achieves end-to-end output detection of objects through a single CNN feature extraction network.Unlike Faster R-CNN, this algorithm combines traditional object detection with target location and classification, thereby greatly increasing the model reasoning speed, which makes it useful for detection tasks in complex farmland environments.
The purpose of this study is to achieve offline detection of rice diseases and insect pests based on mobile phone devices.By combining deep-learning object detection with a mobile phone terminal, we proposed two improved detection and identification models.First, the Improved YOLOv5s model was constructed by introducing Ghost modules as the basic extraction feature to replace the standard convolution modules of the YOLOv5s model.The Improved YOLOv7-tiny model was constructed by introducing the Convolutional Block Attention Module (CBAM) into the YOLOv7-tiny model.In addition, SIoU was applied to replace the original model loss function.Then we compared and analyzed the detection accuracy and operational efficiency of the models.Finally, the improved models were transplanted to an Android mobile phone platform, and a rice disease and pest identification application was built to realize offline mobile phone detection of diseases and insect pests.

Image Acquisition
Due to the diversity of rice diseases and insect pests as well as the high randomness of disease occurrence, there are currently few open-source rice disease databases, so it is difficult to gather a large amount of data on diseases and insect pests.This study carried out research on six diseases and insect pests as shown in Figure 1: Cnaphalocrocis medinalis, rice smut, rice blast, streak disease, sheath blight, and chilo suppressalis.First, we used a mobile phone to obtain visible light images of the diseases and pests in multiple experimental areas.Then we expanded the image dataset through data enhancement methods.Finally, we trained the disease and pest detection models by using the aboveenhanced dataset.This research was carried out for the experiments of rice disease and insect pest data photography and collection in three experimental bases in Guangdong, China in 2020 and 2021: Zengcheng District in Guangzhou, Guangdong, China (113.643906N, 23.242985 E), Gaoyao District in Zhaoqing, Guangdong, China (112.184548N, 23.134322 E) and Xinhui District in Jiangmen, Guangdong, China (112.828022N, 22.441844 E).The data collection situation is shown in Table 1.Different image enhancement methods were applied to expand the rice diseases and insect pests image dataset to reduce the impact of quantitative differences between the different dataset categories: image scaling, rotation, vertical and horizontal flipping, translation, and HSV adjustment.The data enhancement effect is shown in Figure 2. From Table 1, the total number of expanded image datasets was increased to 3300, and the image samples were divided into training and test sets.In the training set, there were 3000 images, and the number of images of each rice disease and insect pest category was 500.In the test set, there were 300 images, and the number of images of each rice disease and pest category was 50.The test set consisted of original images of rice diseases and insect pests without data enhancement.In this study, the constructed image dataset was labeled by the object detection labeling method.The distribution of labeling boxes of each category in the training and test sets is shown in Figure 3. Different image enhancement methods were applied to expand the rice diseas insect pests image dataset to reduce the impact of quantitative differences betwe different dataset categories: image scaling, rotation, vertical and horizontal fli translation, and HSV adjustment.The data enhancement effect is shown in Figure 2 Table 1, the total number of expanded image datasets was increased to 3300, a image samples were divided into training and test sets.In the training set, there we images, and the number of images of each rice disease and insect pest category w In the test set, there were 300 images, and the number of images of each rice disea pest category was 50.The test set consisted of original images of rice diseases and pests without data enhancement.In this study, the constructed image dataset was l by the object detection labeling method.The distribution of labeling boxes o category in the training and test sets is shown in Figure 3.    Different image enhancement methods were applied to expand the rice diseases and insect pests image dataset to reduce the impact of quantitative differences between the different dataset categories: image scaling, rotation, vertical and horizontal flipping, translation, and HSV adjustment.The data enhancement effect is shown in Figure 2. From Table 1, the total number of expanded image datasets was increased to 3300, and the image samples were divided into training and test sets.In the training set, there were 3000 images, and the number of images of each rice disease and insect pest category was 500.In the test set, there were 300 images, and the number of images of each rice disease and pest category was 50.The test set consisted of original images of rice diseases and insect pests without data enhancement.In this study, the constructed image dataset was labeled by the object detection labeling method.The distribution of labeling boxes of each category in the training and test sets is shown in Figure 3.    YOLOv5 is an object detection algorithm based on a one-state anchor detection method released by Uitralytics LLC [19].YOLOv5s is a lightweight model structure of YOLOv5.Its network structure includes three parts: the backbone network, neck, and head structures.Among them, the backbone network includes the CBS, C3, and SPPF modules, the main function of which is to extract image features through down-sampling operations.The neck structure adopts a structure that combines FPN [20] and PAN [21], mainly achieving the fusion of shallow graphic features in the backbone layer and deep semantic features in the detection layer, which enables the model to obtain richer feature information.The head structure outputs the detection results of the target objects in different scales through three detection layers: category probability, probability score, and position information of the target boundary box.
Compared to traditional object detection networks, YOLOv5 has advantages such as higher accuracy and faster inference speed.However, because they use many standard convolutions in the model structure, these modules are not conducive to operation on mobile phones or embedded devices with limited memory and computing resources.Although lightweight models such as the MobileNet [22][23][24] and ShuffleNet series [25,26] can achieve good performance with very few floating point operations per second (FLOPs), the correlation and redundancy between their feature maps have not been well used.The Ghost module proposed by Han, K. et al. [27] is a model compression method that can generate more feature maps with fewer parameters to ensure redundancy of feature information, reduce network parameters and calculation and ensure network accuracy, thereby improving calculation speed and reducing the latency of the model.The feature extraction process of the Ghost module shown in Figure 4b is as follows: first, it reduces input feature map channels through a convolutional operation to obtain the intrinsic feature map.Then the Ghost feature maps are obtained by using the cheap operations of linear transformation with different convolution kernel sizes (usually 4 × 4 or 5 × 5) to process each channel of the intrinsic feature map.Finally, the intrinsic feature map and all the Ghost feature maps are spliced and output by the concatenating operation.YOLOv5 is an object detection algorithm based on a one-state anchor detection method released by Uitralytics LLC [19].YOLOv5s is a lightweight model structure of YOLOv5.Its network structure includes three parts: the backbone network, neck, and head structures.Among them, the backbone network includes the CBS, C3, and SPPF modules, the main function of which is to extract image features through down-sampling operations.The neck structure adopts a structure that combines FPN [20] and PAN [21], mainly achieving the fusion of shallow graphic features in the backbone layer and deep semantic features in the detection layer, which enables the model to obtain richer feature information.The head structure outputs the detection results of the target objects in different scales through three detection layers: category probability, probability score, and position information of the target boundary box.
Compared to traditional object detection networks, YOLOv5 has advantages such as higher accuracy and faster inference speed.However, because they use many standard convolutions in the model structure, these modules are not conducive to operation on mobile phones or embedded devices with limited memory and computing resources.Although lightweight models such as the MobileNet [22][23][24] and ShuffleNet series [25,26] can achieve good performance with very few floating point operations per second (FLOPs), the correlation and redundancy between their feature maps have not been well used.The Ghost module proposed by Han, K. et al. [27] is a model compression method that can generate more feature maps with fewer parameters to ensure redundancy of feature information, reduce network parameters and calculation and ensure network accuracy, thereby improving calculation speed and reducing the latency of the model.The feature extraction process of the Ghost module shown in Figure 4b is as follows: first, it reduces input feature map channels through a convolutional operation to obtain the intrinsic feature map.Then the Ghost feature maps are obtained by using the cheap operations of linear transformation with different convolution kernel sizes (usually 4 × 4 or 5 × 5) to process each channel of the intrinsic feature map.Finally, the intrinsic feature map and all the Ghost feature maps are spliced and output by the concatenating operation.Assume that the shapes of the input and output maps are h × w × c and h × w × n.The convolutional filter size (represented by Conv in Figure 4) is k × k., In the linear transformation process for the Ghost module, it is assumed that the intrinsic feature map channel quantity is m; Φ k represents the operations of linear transformation; and the transformation quantity is s.Since the Ghost transformation process includes an identity transformation, the effective transformation quantity is s − 1.Therefore, the mathematical expression of the parameter calculation-amount ratio between the Ghost module and the standard convolution module is as follows.Generally, s is much smaller than the quantity of input feature map channels c: From Formula (1), the parameter calculation amount of the Ghost module is about 1/s of that of the standard convolution.In this study, referring to the parameter settings of GhostNet, the hyperparameter s of the Ghost module was set to 2. Therefore, compared to standard convolution, the Ghost module reduced the parameter calculation amount by about half, and it can be seen that replacing the standard convolution with the Ghost module led to a significant reduction in model computational complexity.
To build a rice disease and insect pest detection model suitable for mobile platform deployment, YOLOv5s was improved by combining the Ghost module and Ghost Bottle neck structure proposed by Han, K. et al. [27].The Ghost Bottleneck was similar to the basic residual block in ResNet [28], which is mainly composed of two Ghost modules stacked to achieve the expansion and compression of the feature map channels.In this study, the Ghost module was used to replace the convolution module in the backbone and head structure of the original model, and the Ghost bottleneck structure was used to improve the C3 module of the model.The structures of the original YOLOv5s model and the Improved YOLOv5s model are shown in Figure 5a,b.
Assume that the shapes of the input and output maps are h × w × c and h′ × w′ × n.The convolutional filter size (represented by Conv in Figure 4) is k × k. , In the linear transformation process for the Ghost module, it is assumed that the intrinsic feature map channel quantity is m;  represents the operations of linear transformation; and the transformation quantity is s.Since the Ghost transformation process includes an identity transformation, the effective transformation quantity is s − 1.Therefore, the mathematical expression of the parameter calculation-amount ratio between the Ghost module and the standard convolution module is as follows.Generally, s is much smaller than the quantity of input feature map channels c: From Formula (1), the parameter calculation amount of the Ghost module is about 1/s of that of the standard convolution.In this study, referring to the parameter settings of GhostNet, the hyperparameter s of the Ghost module was set to 2. Therefore, compared to standard convolution, the Ghost module reduced the parameter calculation amount by about half, and it can be seen that replacing the standard convolution with the Ghost module led to a significant reduction in model computational complexity.
To build a rice disease and insect pest detection model suitable for mobile platform deployment, YOLOv5s was improved by combining the Ghost module and Ghost Bottle neck structure proposed by Han, K. et al. [27].The Ghost Bottleneck was similar to the basic residual block in ResNet [28], which is mainly composed of two Ghost modules stacked to achieve the expansion and compression of the feature map channels.In this study, the Ghost module was used to replace the convolution module in the backbone and head structure of the original model, and the Ghost bottleneck structure was used to improve the C3 module of the model.The structures of the original YOLOv5s model and the Improved YOLOv5s model are shown in Figure 5a,b.

Construction of Improved YOLOv7-Tiny Rice Diseases and Pests Detection Model
The main contribution of YOLOv7 proposed by Wang, C.Y. et al. [29] lies in the introduction of structural reparameterization and dynamic label allocation strategies for model optimization, as well as the introduction of expansion and composite scaling methods, which not only improves the efficiency of model parameters and calculation, but also significantly reduces model parameters, thereby improving inference speed and model detection accuracy.YOLOv7-tiny is also a lightweight model structure of YOLOv7.Its network structure includes two main sections: backbone and head structures.The functions of each section are the same as those of YOLOv5s.The difference is that the neck and head structures are merged into the head structure.
To make the constructed rice disease and insect pest identification model suitable for mobile phone platform deployment, we used YOLOv7's lightweight model YOLOv7-tiny for experimental analysis.To improve the model accuracy, YOLOv7-tiny was combined with the CBAM [30] (as shown in Figure 6).It uses a combination of the Channel Attention Module (CAM) and the Spatial Attention Module (SAM).The feature map is first input into the CAM to complete channel attention recalibration of the original feature through processing of Global Average Pooling (GAP), Global Max Pooling (GMP) and Shared Multilayer Perceptron (MLP) layers.Therefore, the mathematical expression of the channel attention module is as follow: where F denotes the input feature map with the shape of H × W × C; σ denotes the sigmoid function; and W 0 and W 1 denote the weights of the MLP layers.The output of the CAM module is a 1D channel attention map of the shape 1 × 1 × C.
Agronomy 2023, 13, 2139 7 of but also significantly reduces model parameters, thereby improving inference speed an model detection accuracy.YOLOv7-tiny is also a lightweight model structure of YOLOv Its network structure includes two main sections: backbone and head structures.Th functions of each section are the same as those of YOLOv5s.The difference is that the nec and head structures are merged into the head structure.
To make the constructed rice disease and insect pest identification model suitable fo mobile phone platform deployment, we used YOLOv7's lightweight model YOLOv7-tin for experimental analysis.To improve the model accuracy, YOLOv7-tiny was combine with the CBAM [30] (as shown in Figure 6).The channel attention feature map was then input into the SAM module an processed by the GAP, GMP and a convolution layer.This module can be expressed b the following mathematical expression: where  denotes a convolution layer with a 7 7 filter size, and the output of th SAM module is a 2D attention map of the shape of H × W × 1.Therefore, the CBAM modu can be expressed as follows: where ⊗ denotes the element-wise multiplication operation.
After processing the CBAM module, the new feature maps get the attention weigh in the channel and space dimensions, which greatly improves the relationship betwee each feature in the channel and space, so that it is more conducive to extracting th The channel attention feature map was then input into the SAM module and processed by the GAP, GMP and a convolution layer.This module can be expressed by the following mathematical expression: where f 7×7 denotes a convolution layer with a 7 × 7 filter size, and the output of the SAM module is a 2D attention map of the shape of H × W × 1.Therefore, the CBAM module can be expressed as follows: where ⊗ denotes the element-wise multiplication operation.
After processing the CBAM module, the new feature maps get the attention weight in the channel and space dimensions, which greatly improves the relationship between each feature in the channel and space, so that it is more conducive to extracting the effective features of the target from the model.In this study, the last three convolutional layers of the head structure of YOLOv7-tiny were improved, of which the first two layers retained the RepConv structure [31] of YOLOv7, and the last layer was replaced by the CBAM module.The structures of the original YOLOv7-tiny and the Improved YOLOv7-tiny models are shown in Figure 7a In addition, we introduced SIoU [32] to improve the regression loss function of the original model.Based on the characteristics of the traditional regression box loss, SIoU introduces the angle cost to assist the distance calculation between the ground true box and the predicted box of the target to improve convergence speed and feature learning ability of model training.Specifically, the SIoU loss function includes four parts: angle, distance, shape, and IoU costs.
The angle cost describes the convergence process of the minimum angle of the ground true box and the predicted one, which can be expressed as Formula (5).
where ∧ represents the angle cost;  denotes the difference of the two boxes center points in height;  denotes the distance of the two boxes; and  denotes the radian of the angle of the line connecting the two boxes center points and the Cartesian coordinate x-axis.When α equals 0 or /2, the angle loss obtains the minimum value of 0; when α equals /4, the angle loss obtains the maximum value of 1.
The distance cost describes the convergence process of the minimum distance between the ground true and the predicted boxes, and the distance loss can be expressed mathematically by Formula (6).In addition, we introduced SIoU [32] to improve the regression loss function of the original model.Based on the characteristics of the traditional regression box loss, SIoU introduces the angle cost to assist the distance calculation between the ground true box and the predicted box of the target to improve convergence speed and feature learning ability of model training.Specifically, the SIoU loss function includes four parts: angle, distance, shape, and IoU costs.
The angle cost describes the convergence process of the minimum angle of the ground true box and the predicted one, which can be expressed as Formula (5).
where ∧ represents the angle cost; c h denotes the difference of the two boxes center points in height; σ denotes the distance of the two boxes; and α denotes the radian of the angle of the line connecting the two boxes center points and the Cartesian coordinate x-axis.When α equals 0 or π/2, the angle loss obtains the minimum value of 0; when α equals π/4, the angle loss obtains the maximum value of 1.
The distance cost describes the convergence process of the minimum distance between the ground true and the predicted boxes, and the distance loss can be expressed mathematically by Formula (6). ) denote the center coordinate points of the two boxes, respectively; and bb w and bb h denote the width and height of the bounding box of the two boxes, respectively.When combined with Formula (5), it can be seen that the distance cost correlates positively with the angle cost.
The shape cost describes the overall shape convergence of the minimum difference in border height and width between the ground true box and the predicted box.And the shape cost can be expressed by Formula (8). where where w gt , h gt and (w p , h p ) denote the width and height of the ground true and predicted boxes, respectively; θ is an adjustable variable to adjust the contribution of shape cost to the overall loss.Therefore, the expression of the SIoU Loss function can be expressed by Formula (10), where IoU denotes the overlap ratio of the two boxes; ∆ denotes the distance cost; and Ω denotes the shape cost.

The Comparison Methods Used in This Study
The classic two-stage object detection network Faster-RCNN was improved based on the R-CNN [33] and Fast-RCNN [34] networks.The process by which the algorithm achieved object detection comprised mainly two stages.In the first stage, the model used the layer stacking method of the VGGNet [35] network to perform feature extraction on the input image.In the second stage, it generated a series of region proposals on the last convolution output feature maps through the region proposal network (RPN); then the generated region proposals were mapped to the feature maps extracted by the convolutional layer through the region-of-interest pooling layer (Roi pooling); finally, the output proposed feature map was used for classification and position regression prediction by using a softmax classifier and the bounding box regression algorithm.
The Single Shot MultiBox Detector (SSD) is a one-stage object-detection algorithm proposed by Liu, W. et al. ( 2016) [36].The model is mainly composed of three parts: the VGG-Base backbone, the Extra layers, and the Pred-layers.First, the features of the input image were extracted through the VGG-Base backbone.Then the extracted effective feature information was sent to the extra-layers with different down-sampling operations to obtain a plurality of feature maps with different scales to form a pyramidal feature map set.The extracted multi-scale feature information was then sent to the Pred-layers part of the SSD model for classification prediction and bounding box regression.Finally, the Non-Maximum Suppression (NMS) algorithm was used to filter the prediction result with the best score for the detection and location of the target object.In this study, we compared the Faster-RCNN and SSD models with the improved object detection networks and analyzed the detection effects of different types of object detection models on the rice diseases and insect pests datasets.

Development Platform for Rice Diseases and Insect Pests Identification Application
Mobile phones have great advantages as image acquisition terminals and devices for loading intelligent identification algorithms to detect rice diseases and insect pests quickly in the field.To verify the practicability of the improved detection models, we developed a mobile phone application to detect of diseases and pests, as shown in Figure 8.The specific development process and experimental platform of the system are as follows.The experiment was first conducted on a 64-bit server computer with an Ubuntu 16.04.6system.The configuration is shown in Table 2.The server was configured with two P100 graphics cards with 16G memory, and the system was installed with version 10.2 of Cuda and 7.6.5 Cudnn deep-learning environments.The deep-learning opensource library Pytorch was applied to train and test the object detection models.The NCNN framework was used to optimize the parameters and structure of the trained models, which enabled the models to be deployed and run on a mobile phone.NCNN is a high-performance neural network forward-computing framework suitable for mobile phone deployment, which enables the deep-learning model to achieve efficient reasoning on a mobile phone platform.In this study, the improved object detection models were transplanted to an Android mobile phone platform, and a rice disease and insect pest identification application was developed to verify the practicability of the models.The configuration of the platform is shown in Table 2.The mobile phone was installed with the Android 9.0 version system, and the running memory was 4 GB, which met the running and reasoning of the deeplearning model.The application included two parts: the front-end interaction interface and back-end program.The front-end interaction interface was composed mainly of the tools provided by the Android UI interface to design, including Button, ImgView, TextView.The back-end development processing achieved image-selection and photograph identification by calling the object detection model, which realized the realtime image collection and identification processing of the disease and insect pest image data in the field.

Evaluation Indicators
In this article, we evaluated constructed models from two aspects: object detection accuracy and model operational efficiency.The accuracy of the models was evaluated mainly using Precision, Recall, F1-score, mAP (0.5), and mAP (0.5:0.9), where mAP (0.5) was the AP value at an IoU threshold of 0.5, which was obtained by integrating the PR curve.mAP (0.5:0.9) was the average of the mAP values at different IoU thresholds.Then the evaluation indicators were as shown in Formulas (11)(12)(13)(14), where TP (True Positive) The experiment was first conducted on a 64-bit server computer with an Ubuntu 16.04.6system.The configuration is shown in Table 2.The server was configured with two P100 graphics cards with 16G memory, and the system was installed with version 10.2 of Cuda and 7.6.5 Cudnn deep-learning environments.The deep-learning open-source library Pytorch was applied to train and test the object detection models.The NCNN framework was used to optimize the parameters and structure of the trained models, which enabled the models to be deployed and run on a mobile phone.NCNN is a high-performance neural network forward-computing framework suitable for mobile phone deployment, which enables the deep-learning model to achieve efficient reasoning on a mobile phone platform.In this study, the improved object detection models were transplanted to an Android mobile phone platform, and a rice disease and insect pest identification application was developed to verify the practicability of the models.The configuration of the platform is shown in Table 2.The mobile phone was installed with the Android 9.0 version system, and the running memory was 4 GB, which met the running and reasoning of the deep-learning model.The application included two parts: the front-end interaction interface and back-end program.The front-end interaction interface was composed mainly of the tools provided by the Android UI interface to design, including Button, ImgView, TextView.The back-end development processing achieved image-selection and photograph identification by calling the object detection model, which realized the real-time image collection and identification processing of the disease and insect pest image data in the field.

Evaluation Indicators
In this article, we evaluated constructed models from two aspects: object detection accuracy and model operational efficiency.The accuracy of the models was evaluated mainly using Precision, Recall, F1-score, mAP (0.5), and mAP (0.5:0.9), where mAP (0.5) was the AP value at an IoU threshold of 0.5, which was obtained by integrating the PR curve.mAP (0.5:0.9) was the average of the mAP values at different IoU thresholds.Then the evaluation indicators were as shown in Formulas (11)(12)(13)(14), where TP (True Positive) represents the number of originally positive samples that were correctly predicted to be positive samples; FP (False Positive) represents the number of originally negative samples that were incorrectly predicted to be positive samples; FN (False Negative) represents the number of positive samples that were incorrectly predicted to be negative samples.
The operational efficiency of the model was evaluated mainly by the parameter quantity, model size, FLOPs, and inference speed.

Analysis of Training Processes of Different Model
The models mentioned in Sections 2.2 and 2.3-Faster-RCNN, VGG16-SSD, YOLOv5s, and YOLOv7-tiny-were used for comparison with the Improved YOLOv5s model and the Improved YOLOv7-tiny models.For model training, we used the rice diseases and insect pests image dataset on the server computer.To achieve better training results, the SGD gradient descent algorithm with momentum was applied to update the gradient during model training, and the hyperparameter momentum was set to 0.937.The purpose of adding momentum gradient optimization was to suppress the oscillation generated by the gradient descent and accelerate the convergence of the model.The LambdaLR learning-rate adjustment strategy was adopted, and the initial learning rate and the weight decay were 0.01 and 0.0005, respectively.Regarding model evaluation, the NMS method was used to optimize the output of the model detection results, in which the NMS-confidence and IOU thresholds of the prediction results were set to 0.25 and 0.6, respectively.
For the training results of different models on the rice diseases and insect pests dataset, the training batch size was set to 32, and the model was trained for a total of 300 epochs.We set it to validate the models every 10 epochs in the training process to record the loss value and mAP (0.5) result on the validation set and save the model training files.From Figure 9a,b, the loss values of each model decreased continuously during training, and they tended to be stable after 300 iterations, indicating that the constructed rice diseases and pests detection models fit the data distribution of the training set samples well.The loss values of Improved YOLOv5s and Improved YOLOv7-tiny were slightly lower than those of the original models, and had better convergence effects.It can be seen from the mAP (0.5) results change curve of the validation set (Figure 9c) that the curves of the improved models were higher than those of the original models, which indicated better detection accuracy.

Comparison of Detection Accuracy of Different Models
In this study, we used the test set of the image dataset constructed in Section 2.1 to test the detection accuracy of each model.From Table 3, it can be seen that the Improved YOLOv5s outperformed the original model YOLOv5s in all evaluation indicators.Its precision and recall value improved by 3.2 and 0.3%, respectively, while its F1-Score, mAP (0.5), and mAP (0.5:0.9) improved by 1.7, 0.9, and 3.7%, respectively, and it achieved the highest scores among the six experimental methods with values of 0.931, 0.961, and 0.648.Although the Improved YOLOv7-tiny had a 1.2% decrease in recall compared to the original model, it had improved precision, F1-Score, mAP (0.5), and mAP (0.5:0.9) by 3.5, 1.1, 0.3 and 1.1%, respectively.In addition, the Improved YOLOv7-tiny was second only to Improved YOLOv5s in F1-Score, mAP (0.5), and mAP (0.5:0.9), which are three comprehensive indicators for evaluating the overall identification performance of the model.Therefore, from the above analysis, the detection accuracy of the Improved YOLOv5s and Improved YOLOv7-tiny in this study was better than that of their original models.

Comparison of Operational Efficiency before and after Model Improvement
In this section, we evaluate the operational efficiency of the improved models using parameter quantity, model size, the FLOPs, and model inference speed.From Table 4 below, it can be seen that in the comparative analysis before and after model improvement the parameters, model size, and FLOPs of the Improved YOLOv5s decreased by 47.5, 45.7, and 48.7%, respectively, whereas the inference speed increased by 38.6% compared to the original model.Therefore, the application of the Ghost module significantly improved the

Comparison of Detection Accuracy of Different Models
In this study, we used the test set of the image dataset constructed in Section 2.1 to test the detection accuracy of each model.From Table 3, it can be seen that the Improved YOLOv5s outperformed the original model YOLOv5s in all evaluation indicators.Its precision and recall value improved by 3.2 and 0.3%, respectively, while its F1-Score, mAP (0.5), and mAP (0.5:0.9) improved by 1.7, 0.9, and 3.7%, respectively, and it achieved the highest scores among the six experimental methods with values of 0.931, 0.961, and 0.648.Although the Improved YOLOv7-tiny had a 1.2% decrease in recall compared to the original model, it had improved precision, F1-Score, mAP (0.5), and mAP (0.5:0.9) by 3.5, 1.1, 0.3 and 1.1%, respectively.In addition, the Improved YOLOv7-tiny was second only to Improved YOLOv5s in F1-Score, mAP (0.5), and mAP (0.5:0.9), which are three comprehensive indicators for evaluating the overall identification performance of the model.Therefore, from the above analysis, the detection accuracy of the Improved YOLOv5s and Improved YOLOv7-tiny in this study was better than that of their original models.In this section, we evaluate the operational efficiency of the improved models using parameter quantity, model size, the FLOPs, and model inference speed.From Table 4 below, it can be seen that in the comparative analysis before and after model improvement the parameters, model size, and FLOPs of the Improved YOLOv5s decreased by 47.5, 45.7, and 48.7%, respectively, whereas the inference speed increased by 38.6% compared to the original model.Therefore, the application of the Ghost module significantly improved the structure of YOLOv5s, thereby greatly improving its operational efficiency.Compared to the original model, Improved YOLOv7-tiny had increased parameter quantity, model size, and the FLOPs by 0.5, 1.7, and 3.1%, respectively, whereas the inference speed decreased by 4.9%.Although the application of the CBAM module improved the accuracy of the model, it increased its complexity to some extent.In a comparative analysis of the performance of the two improved models, the parameter quantity, model size, and FLOPs of Improved YOLOv5s were all lower by 61, 62.9, and 60%, respectively.And the inference speed of Improved YOLOv5s was 18.6% faster.Therefore, it outperformed Improved YOLOv7-tiny in operational efficiency.We also evaluated the detection accuracy of the improved models constructed in this study on different categories of rice diseases and insect pests images using average precision (AP) and PR curves.From the data in Table 5 below, it can be seen that the AP of Improved YOLOv5s on Chilo suppressalis, Rice smut, Streak disease, and Sheath light was 1.9, 0.6, 0.5, and 3.6% greater, respectively, than for the original model.The AP of Improved YOLOv7-tiny in Chilo suppressalis, Rice blast, and Sheath light increased by 0.9, 0.1, and 2.5%, respectively, compared to YOLOv7-tiny.For the comparison and analysis of the detection accuracy of the two improved models, the mAP (0.5) of Improved YOLOv5s reached 0.961, which was 0.7% higher than the mAP (0.5) value of 0.954 for Improved YOLOv7-tiny.The overall detection accuracy of Improved YOLOv5s for rice diseases and insect pests was better than that of Improved YOLOv7-tiny.Moreover, the PR curve shown in Figure 10 was used to evaluate model detection accuracy.The enclosed area of the curve was equal to the mAP (0.5).When the curve area was close to 1, the model detection accuracy was the highest.

Comparison of Detection Results of Different Type Rice Diseases and Insect Pests Images before and after Model Improvement
For validation of the detection effect of the improved object detection models, we randomly selected one image from each type of rice disease and insect pest in the test set to demonstrate the detection effect of each model.In Figure 11, the input and label images of different types of rice diseases and insect pests are shown in the first and second columns, while the detection results of these images and the corresponding predicted probability heat maps of Improved YOLOv5s and Improved YOLOv7 tiny are shown in the third to sixth columns.The figure shows that, for the detection results of the rice smut sample, Improved YOLOv5s had better detection accuracy for each target in the image compared to Improved YOLOv7-tiny, while Improved YOLOv7-tiny had a missed detection.For the detection results of the sheath light and Chilo suppressalis samples, the detection accuracy and identification effect of Improved YOLOv7-tiny were better, while the detection results of Improved YOLOv5s for these two types of diseases and insect pests revealed false and missing detection.For the other three types of disease and insect pest, the two improved models displayed little difference in detection accuracy and both effectively detected the target areas of diseases and pests in the image.In addition, the model identification effects were also analyzed through probability heat maps, in which the color change from red to blue indicated that the probability contribution of the region to the prediction results was gradually decreasing.From the heat map results, Improved YOLOv5s performed better in detecting larger target areas in images, as shown in the large red areas in the heat map results of streak disease and sheath blight samples; Improved YOLOv7 tiny was more accurate in detecting smaller target areas in images, such as the small red areas in the heat map results of rice smut and rice blast.We also evaluated the detection accuracy of the improved models constructe study on different categories of rice diseases and insect pests images using precision (AP) and PR curves.From the data in Table 5 below, it can be seen that of Improved YOLOv5s on Chilo suppressalis, Rice smut, Streak disease, and Shea was 1.9, 0.6, 0.5, and 3.6% greater, respectively, than for the original model.Th Improved YOLOv7-tiny in Chilo suppressalis, Rice blast, and Sheath light incre 0.9, 0.1, and 2.5%, respectively, compared to YOLOv7-tiny.For the comparis analysis of the detection accuracy of the two improved models, the mAP (0.5) of Im YOLOv5s reached 0.961, which was 0.7% higher than the mAP (0.5) value of 0 Improved YOLOv7-tiny.The overall detection accuracy of Improved YOLOv5s diseases and insect pests was better than that of Improved YOLOv7-tiny.Moreo PR curve shown in Figure 10 was used to evaluate model detection accuracy.The e area of the curve was equal to the mAP (0.5).When the curve area was close t model detection accuracy was the highest.For further validation of the practicality of the improved models proposed in this study, we transplanted Improved YOLOv5s and Improved YOLOv7-tiny to the mobile phone Android platform to build a rice disease and insect pest identification application.It was designed with image-selection and photo identification functions on which the two models were tested and analyzed using the above test set.In identification accuracy, from Table 6, when the model accuracy was FP16 on the mobile phone, the precision and recall of Improved YOLOv5s reached 0.925 and 0.939, respectively, which were 1.1 and 2.3% higher, respectively, than those of Improved YOLOv7-tiny.The inference speed of Improved YOLOv5s on the Android mobile phone platform was 374 ms/frame, which was 6.7% faster.In identification results, as shown in Figure 12, both models achieved accurate detection of rice diseases and insect pests in images.And Improved YOLOv5s outperformed Improved YOLOv7-tiny in the detection of rice smut, streak disease, and streak blight, which are three diseases and insect pests sample images that have larger target sizes (as shown in Figure 12a), while Improved YOLOv7-tiny had better detection for rice blast, Cnaphalocrocis medinalis, and Chilo suppressalis, which are three sample images with smaller target sizes (as shown in Figure 12b).

Runtime Performance of Rice Diseases and Insect Pests Detection Application on a Mobile Phone
To verify the applicability and compatibility of the improved models with mobile phone hardware, we evaluated the runtime performance of the application based on the improved detection models using indicators such as the model size after conversion with NCNN, the CPU usage, and the RAM usage.From Table 7, the results of model size, CPU usage, and RAM usage for Improved YOLOv5s on the phone were 14.3 MB, 49%, and 262.9 MB, respectively, which were 38, 5.8 and 17.4% less than for Improved YOLOv7tiny.The performance of Improved YOLOv5s was better, but both models can be used for inference on the Android mobile phone platform because the model size, CPU usage, and RAM usage are within a reasonable range of computing resource allocation.Therefore, both proposed models can be applied to most Android mobile phone hardware platforms with similar or better configurations and performance than the platform in this study.

Runtime Performance of Rice Diseases and Insect Pests Detection Application on a Mobile Phone
To verify the applicability and compatibility of the improved models with mobile phone hardware, we evaluated the runtime performance of the application based on the improved detection models using indicators such as the model size after conversion with NCNN, the CPU usage, and the RAM usage.From Table 7, the results of model size, CPU usage, and RAM usage for Improved YOLOv5s on the phone were 14.3 MB, 49%, and 262.9 MB, respectively, which were 38, 5.8 and 17.4% less than for Improved YOLOv7-tiny.The performance of Improved YOLOv5s was better, but both models can be used for inference on the Android mobile phone platform because the model size, CPU usage, and RAM usage are within a reasonable range of computing resource allocation.Therefore, both proposed models can be applied to most Android mobile phone hardware platforms with similar or better configurations and performance than the platform in this study.

Discussion
Previous studies have shown that CNN models can be applied to the image classification of rice diseases and insect pests and have achieved good results [37][38][39].However, the classification method cannot accurately detect the specific occurrence areas of diseases and pests in the image.In addition to identifying the correct information category when diagnosing rice diseases and insect pests on actual farmland, detecting the occurrence areas of diseases and insect pests is also important.In this regard, the object detection method based on deep-learning used in our research proved to be a feasible method for achieving accurate classification and location detection of diseases and pests.
In this study, based on the object detection models YOLOv5 and YOLOv7, two lightweight rice disease and insect pest detection models, Improved YOLOv5s and Improved YOLOv7-tiny, were constructed to identify the categories and detect the occurrence locations of rice diseases and insect pests in images.From an analysis of the experimental results of different models, the two improved models outperformed their original models in F1-Score, mAP (0.5) and mAP (0.5:0.9), and Improved YOLOv5s achieved the highest score in these indicators compared to the other methods, which showed that the two improved methods had effectively improved detection accuracy.Regarding operational efficiency, Improved YOLOv5s had significantly improved parameter quantity, model size, FLOPs, and inference speed compared to the original model, indicating that the application of the Ghost module not only improved detection accuracy but also optimized model structure and enhanced the operational inference efficiency.
In addition, Improved YOLOv5s was better than Improved YOLOv7-tiny in various evaluation indicators of detection accuracy.Regarding identification of rice diseases and insect pests, Improved YOLOv5s had a higher AP value compared to Improved YOLOv7tiny.In addition, the AP value of sheath blight was lower than that of the other five rice diseases and insect pests in the detection results of different models.There were two reasons.First, the original image data of this category of rice disease and insect pest were less than those of the other five categories, and the original features for model learning were relatively less, so it was difficult for the model to cover the complete feature values for this image category in fitting training, thereby causing an increase in model recognition error [40].Second, although certain data enhancement methods reduced the impact of imbalanced distribution of data in various categories on model performance, the image features of the enhanced image data were similar to those of the real image data, so repeated learning of the same features resulted in overfitting.Therefore, when applied to new data, the accuracy and robustness of the model in that category were not as good as those in categories with rich data features [41].Furthermore, through probability heat map analysis, Improved YOLOv5s performed better in detecting larger rice disease and insect pest target areas in images, while Improved YOLOv7-tiny performed more accurately in detecting smaller target areas in images.On the mobile phone with the Android platform, the applications combined with the improved models achieved accurate identification of rice diseases and insect pests, and the application based on Improved YOLOv5s performed better in detection accuracy, inference efficiency, and runtime performance.However, both models are compatible with most Android mobile phones with similar or better configurations and performance than the hardware platform in this study.
To sum up, both the proposed improved models can be applied to the task of rice disease and insect pest detection and achieved good identification results.The constructed mobile phone application for rice disease and insect pest detection provided a fast and convenient intelligent mobile terminal offline identification method that is worth promoting.
As for the prospect of the work, the improved models proposed in this study are the general and extensible methods for effectively identifying rice diseases and insect pests, which is not limited to the detection and identification of the six common rice diseases and insect pests mentioned above.If more identification of rice diseases and insect pests need to be conducted in subsequent research, it is only necessary to increase the categories and quantities of their image data and retrain the construction models.For disease and pest identification on the mobile phone platform, considering that detection results are affected by shooting angle, distance, and ambient light of the phone, we can adapt the model to different scenarios and improve generalization performance by expanding the image dataset and using a variety of different image augmentation strategies to improve the detection and identification accuracy.

1.
We proposed two rice disease and insect pest detection models suitable for mobile phone terminals based on deep-learning detection and realized offline detection by intelligent mobile phone terminals, thus providing an efficient and reliable intelligent detection method for farmers and plant protection personnel.By introducing the Ghost module, Improved YOLOv5s significantly improved detection accuracy and operation efficiency compared to YOLOv5s.It had the highest F1-Score and the highest scores for mAP (0.5) and mAP (0.5:0.9) with values of 0.931, 0.961 and 0.648, respectively.Moreover, the parameter quantity, model size, and FLOPs of Improved YOLOv5s were reduced by 47.5, 45.7, and 48.7%, respectively, while the inference speed improved by 38.6% and model detection accuracy and operation efficiency also significantly improved.By introducing the CBAM attention module and SIoU loss, Improved YOLOv7-tiny outperformed YOLOv7-tiny in detection accuracy, and the model accuracy of F1-Score, mAP (0.5), and mAP (0.5:0.9) were second only to those of Improved YOLOv5s.

2.
For the detection of different categories of rice diseases and insect pests, the AP value of Improved YOLOv5s was higher than that of Improved YOLOv7-tiny for the identification of all rice diseases and insect pests.The probability heat maps showed that Improved YOLOv5s had better detection in areas of rice disease and insect pests where there were larger image target sizes, and Improved YOLOv7-tiny had better detection accuracy where there were smaller image target sizes.

3.
The two improved models were transplanted to an Android mobile phone.Under FP16, the precision and recall of Improved YOLOv5s was 0.925, and 0.939, and the inference speed was 374 ms/frame.Model accuracy, operational efficiency, and runtime performance were better than for Improved YOLOv7-tiny.The mobile phone application constructed by the improved models is compatible with most Android mobile phone hardware platforms for achieving accurate detection of rice diseases and insect pests.
The improved object detection models proposed in this study can realize the accurate detection of rice diseases and insect pests, and the mobile phone application for rice disease and insect pest identification can provide strong support for the rapid diagnosis and intelligent identification of rice diseases and insect pests in the field.

Figure 3 .
Figure 3.The distribution of labeling boxes of each category in the training set and test set.

Figure 3 .
Figure 3.The distribution of labeling boxes of each category in the training set and test set.Figure 3. The distribution of labeling boxes of each category in the training set and test set.

Figure 3 .
Figure 3.The distribution of labeling boxes of each category in the training set and test set.Figure 3. The distribution of labeling boxes of each category in the training set and test set.

Figure 4 .
Figure 4. Feature extraction process of Ghost module and standard convolution module.(a) The feature extraction process of standard convolution module; (b) the feature extraction process of Ghost module.

Figure 4 .
Figure 4. Feature extraction process of Ghost module and standard convolution module.(a) The feature extraction process of standard convolution module; (b) the feature extraction process of Ghost module.

Figure 5 .
Figure 5.The structure of the YOLOv5s model and the Improved YOLOv5s model.(a) The structure of the YOLOv5s model; (b) the structure of the Improved YOLOv5s model.

Figure 5 .
Figure 5.The structure of the YOLOv5s model and the Improved YOLOv5s model.(a) The structure of the YOLOv5s model; (b) the structure of the Improved YOLOv5s model.
It uses a combination of the Channel Attentio Module (CAM) and the Spatial Attention Module (SAM).The feature map is first inpu into the CAM to complete channel attention recalibration of the original feature throug processing of Global Average Pooling (GAP), Global Max Pooling (GMP) and Share Multilayer Perceptron (MLP) layers.Therefore, the mathematical expression of th channel attention module is as follow:            (2 where F denotes the input feature map with the shape of H × W × C;  denotes the sig moid function; and  and  denote the weights of the MLP layers.The output of th CAM module is a 1D channel attention map of the shape 1 × 1 × C.

Figure 7 .
Figure 7.The structure of the YOLOv7-tiny and the Improved YOLOv7-tiny models.(a) The structure of the YOLOv7-tiny model; (b) the structure of the Improved YOLOv7-tiny model.

Figure 7 .
Figure 7.The structure of the YOLOv7-tiny and the Improved YOLOv7-tiny models.(a) The structure of the YOLOv7-tiny model; (b) the structure of the Improved YOLOv7-tiny model.

Figure 8 .
Figure 8.The specific development process of the rice diseases and insect pests identification application.

Figure 8 .
Figure 8.The specific development process of the rice diseases and insect pests identification application.

39 12 of 20 Figure 9 .
Figure 9. Change curves of loss value and mAP (0.5) of different models.(a) Loss value curve of Faster-RCNN and VGG16-SSD; (b) loss value curve of the improve models; (c) mAP (0.5) curve of different models.

Figure 9 .
Figure 9. Change curves of loss value and mAP (0.5) of different models.(a) Loss value curve of Faster-RCNN and VGG16-SSD; (b) loss value curve of the improve models; (c) mAP (0.5) curve of different models.

Figure 10 .
Figure 10.The PR curves before and after model improvement.

Figure 10 .
Figure 10.The PR curves before and after model improvement.

3. 3 .
Analysis of Experimental Results of Rice Diseases and Insect Pests Detection Application on Mobile Phone 3.3.1.Identification Results of Rice Diseases and Insect Pests Detection Mobile Phone Application

Figure 11 .
Figure 11.Detection Results of Different Type Rice Diseases and Insect Pests.

Figure 12 .
Figure 12.Identification results of rice diseases and insect pests detection mobile phone applications.(a) Identification results of Improved YOLOv5s; (b) identification results of Improved YOLOv7-tiny.

Figure 12 .
Figure 12.Identification results of rice diseases and insect pests detection mobile phone applications.(a) Identification results of Improved YOLOv5s; (b) identification results of Improved YOLOv7-tiny.

Table 1 .
Rice disease and insect pest images dataset.

Table 1 .
Rice disease and insect pest images dataset.

Table 3 .
Comparison of detection accuracy indicators of different models.

Table 3 .
Comparison of detection accuracy indicators of different models.

Table 4 .
Comparison of operation performance indicators of different models.

Table 5 .
Comparison of different type rice diseases and insect pests detection accuracy before and after model improvement.

Table 6 .
Identification accuracy of two improved models on the mobile phone platform.

Table 6 .
Identification accuracy of two improved models on the mobile phone platform.

Table 7 .
Runtime results on the mobile phone platform of two improved models.

Table 7 .
Runtime results on the mobile phone platform of two improved models.