Plant Disease Recognition Model Based on Improved YOLOv5

: To accurately recognize plant diseases under complex natural conditions, an improved plant disease-recognition model based on the original YOLOv5 network model was established. First, a new InvolutionBottleneck module was used to reduce the numbers of parameters and calculations, and to capture long-distance information in the space. Second, an SE module was added to improve the sensitivity of the model to channel features. Finally, the loss function ‘Generalized Intersection over Union’ was changed to ‘Efﬁcient Intersection over Union’ to address the former’s degeneration into ‘Intersection over Union’. These proposed methods were used to improve the target recognition effect of the network model. In the experimental phase, to verify the effectiveness of the model, sample images were randomly selected from the constructed rubber tree disease database to form training and test sets. The test results showed that the mean average precision of the improved YOLOv5 network reached 70%, which is 5.4% higher than that of the original YOLOv5 network. The precision values of this model for powdery mildew and anthracnose detection were 86.5% and 86.8%, respectively. The overall detection performance of the improved YOLOv5 network was signiﬁcantly better compared with those of the original YOLOv5 and the YOLOX_nano network models. The improved model accurately identiﬁed plant diseases under natural conditions, and it provides a technical reference for the prevention and control of plant diseases.


Introduction
Agricultural production is an indispensable part of a nation's economic development. Crops are affected by climate, which may make them susceptible to pathogen infection during the growth period, resulting in reduced production. In severe cases, the leaves fall off early and the plants die. To reduce the economic losses caused by diseases, it is necessary to properly diagnose plant diseases. Currently, two methods are used: expert diagnoses and pathogen analyses. The former refers to plant protection experts, with years of field production and real-time investigatory experience, diagnosing the extent of plant lesions. This method relies highly on expert experience and is prone to subjective differences and a low accuracy [1]. The latter involves the cultivation and microscopic observation of pathogens. This method has a high diagnostic accuracy rate, but it is time consuming, and the operational process is cumbersome, making it not suitable for field detection [2,3].
In recent years, the rapid development of machine vision and artificial intelligence has accelerated the process of engineering intelligence in various fields, and machine vision technology has also been rapidly improved in industrial, agricultural and other complex scene applications [4][5][6][7][8][9]. In response to the plant disease detection problem, disease detection methods based on visible light and near-infrared spectroscopic digital images have been widely used. Near-infrared spectroscopic and hyperspectral images contain continuous spectral information and provide information on the spatial distributions of plant diseases. Consequently, they have become the preferred technologies of many researchers [10][11][12][13]. However, the equipment for acquiring spectral images is expensive and difficult to carry; therefore, this technology cannot be widely applied. The acquisition of visible light images is relatively simple and can be achieved using various ordinary electronic devices, such as digital cameras and smart phones, which greatly reduces the challenges of visible light image-recognition research [14,15].
Because of the need for real-time monitoring and sharing of crop growth information, visible light image recognition has been successfully applied to the field of plant disease detection in recent years [16][17][18][19][20]. A variety of traditional image-processing methods have been applied. First, the images are segmented, then the characteristics of plant diseases are extracted and, finally, the diseases are classified. Shrivastava et al. [21] proposed an image-based rice plant disease classification approach using color features only, and it successfully classifies rice plant diseases using a support vector machine classifier. Alajas et al. [22] used a hybrid linear discriminant analysis and a decision tree to predict the percentage of damaged leaf surface on diseased grapevines, with an accuracy of 97.79%. Kianat et al. [23] proposed a hybrid framework based on feature fusion and selection techniques to classify cucumber diseases. They first used the probability distribution-based entropy approach to reduce the extracted features, and then, they used the Manhattan distance-controlled entropy technique to select strong features. Mary et al. [24] used the merits of both the Gabor filter and the 2D log Gabor filter to construct an enhanced Gabor filter to extract features from the images of the diseased plant, and then, they used the k-nearest neighbor classifier to classify banana leaf diseases. Sugiarti et al. [25] combined the grey-level co-occurrence matrix extraction function with the naive Bayes classification to greatly improved the classification accuracy of apple diseases. Mukhopadhyay et al. [26] proposed a novel method based on image-processing technology, and they used the non-dominated sorting genetic algorithm to detect the disease area on tea leaves, with an average accuracy of 83%. However, visible light image-recognition based on traditional image processing technologies requires the artificial preprocessing of images and the extraction of disease features. The feature information is limited to shallow learning, and the generalization ability of new data sets needs to be improved.
However, deep learning methods are gradually being applied to agricultural research because they can automatically learn the deep feature information of images, and their speed and accuracy levels are greater than those of traditional algorithms [27][28][29][30]. Deep learning has also been applied to the detection of plant diseases from visible light images. Abbas et al. [31] proposed a deep learning-based method for tomato plant disease detection that utilizes the conditional generative adversarial network to generate synthetic images of tomato plant leaves. Xiang et al. [32] established a lightweight convolutional neural network-based network model with channel shuffle operation and multiple-size modules that achieved accuracy levels of 90.6% and 97.9% on a plant disease severity and PlantVillage datasets, respectively. Tan et al. [33] compared the recognition effects of deep learning networks and machine learning algorithms on tomato leaf diseases and found that the metrics of the tested deep learning networks are all better than those of the measured machine learning algorithms, with the ResNet34 network obtaining the best results. Alita et al. [34] used the EfficientNet deep learning model to detect plant leaf disease, and it was superior to other state-of-the-art deep learning models in terms of accuracy. Mishra et al. [35] developed a sine-cosine algorithm-based rider neural network and found that the detection performance of the classifier improved, achieving an accuracy of 95.6%. In summary, applying deep learning to plant disease detection has achieved good results.
As a result of climatic factors, rubber trees may suffer from a variety of pests and diseases, most typically powdery mildew and anthracnose, during the tender leaf stage. Rubber tree anthracnose is caused by Colletotrichum gloeosporioides and Colletotrichum acutatum infections, whereas rubber tree powdery mildew is caused by Oidiumheveae [36,37]. The lesion features of the two diseases are highly similar, making them difficult to distinguish, which has a certain impact on the classification results of the network model. Compared with traditional image processing technology, deep convolutional neural networks have greater abilities to express abstract features and can obtain semantic information from complex images.Target detection algorithms based on deep learning can be divided into two categories, one-stage detection algorithms (such as the YOLO series) and two-stage detection algorithms (such as FasterR-CNN). The processing speeds of the former are faster than those of the latter, which makes them more suitable for the real-time detection of plant diseases in complex field environments.
In this paper, we report our attempts to address the above issues, as follows: First, we used convolutional neural networks to automatically detect rubber tree powdery mildew and anthracnose in visible light images, which has some practical benefits for the prevention and control of rubber tree diseases. Second, we focused on solving the existing difficulties in detecting rubber tree diseases using YOLOv5, and we further improved the detection accuracy of the model. Consequently, a rubber tree disease recognition model based on the improved YOLOv5 was established, with the aim of achieving the accurate classification and recognition of rubber tree powdery mildew and anthracnose under natural light conditions. The main contributions of our work are summarized below: (1) In the backbone network, the Bottleneck module in the C3 module was replaced with the InvolutionBottleneck module that reduced the number of calculations in the convolutional neural network; (2) The SE module was added to the last layer of the backbone network to fuse disease characteristics in a weighted manner; (3) The existing loss function Generalized Intersection over Union (GIOU) in YOLOv5 was replaced by the loss function Efficient Intersection over Union (EIOU), which takes into account differences in target frame width, height and confidence; (4) The proposed model can realize the accurate and automatic identification of rubber tree diseases in visible light images, which has some significance for the prevention and control of rubber tree diseases.
The remainder of this article is organized as follows: In Section 2, we give a brief review of the original YOLOv5 model, and the improved YOLOv5 model is proposed. In Section 3, we list the experimental materials and methods. Experiments and analyses of the results are covered in Section 4. Finally, the conclusions are summarized in Section 5.

YOLOv5 Network Module
YOLOv5 [38] is a one-stage target recognition algorithm proposed by Glenn Jocher in 2020. On the basis of differences in network depth and width, YOLOv5 can be divided into four network model versions: YOLOv5s, YOLOv5m, YOLOv5l and YOLOv5x. Among them, the YOLOv5s network has the fastest calculation speed, but the average precision is the lowest, whereas the YOLOv5x network has the opposite characteristics. The model size of the YOLOv5 network is approximately one-tenth that of the YOLOv4 network. It has faster recognition and positioning speeds, and the accuracy is no less than that of YOLOv4. The YOLOv5 network is composed of three main components: Backbone, Neck and Head. After the image is inputted, Backbone aggregates and forms image features on different image granularities. Then, Neck stitches the image features and transmits them to the prediction layer, and Head predicts the image features to generate bounding boxes and predicted categories. The YOLOv5 network uses as the network loss function, as shown in Equation (1).
where , ⊆ ⊆ represent two arbitrary boxes, represents the smallest convex box, ⊆ ⊆ , enclosing both and and = | ∩ | | ∪ | ⁄ . When the input network predicts image features, the optimal target frame is filtered by combining the loss function and the non-maximum suppression algorithm.

InvolutionBottleneck Module Design
In the Backbone, the Bottleneck module in the C3 module was replaced with the InvolutionBottleneck module. The two inherent principles of standard convolution kernels are spatial-agnostic and channel-specific, whereas those of involution [39] are the opposite. Convolutional neural networks usually increase the receptive field by stacking convolution kernels of different sizes. Using different kernel calculations for each channel causes a substantial increase in the number of calculations. In the Backbone, the In-volutionBottleneck module was used to replace the Bottleneck module, which alleviated the kernel redundancy by sharing the involution kernel along the channel dimension, and this is beneficial for capturing the long-distance information of the spatial range and reducing the number of network parameters. The output feature map Y of the Involution-convolution operation is defined as shown in Equations (2) and (3).
where represents the calculation channel, represents the generation function of the convolution kernel and , represents the index to the pixel set , . The convolution kernel , ,…, ∈ × ( = 1,2, … , ) was specifically customized for the pixel , ∈ located at the corresponding coordinates( , ), but it is shared on the channel. represents the number of groups sharing the same convolution kernel. The size of the convolution kernel depends on the size of the input feature map.

SE Module Design
The squeeze-and-excitation network [40] is a network model proposed by Hu et al. (2017) that focuses on the relationship between channels. It aims to learn each image feature according to the loss function, increase the weight of effective image features and reduce the weight of invalid or ineffective image features, thereby training the network model to produce the best results. The SE modules with different structures are shown in Figure 1. The SE module is a calculation block that can be built on the transformation between the input feature vector X and the output feature map U, and the transformation relationship is shown in Equation (4): where * represents convolution, = , , … , , = , , … , and ∈ × .
represents a 2D spatial kernel, which denotes a singlechannel of that acts on the corresponding channel of .
In this paper, the SE module was added to the last layer of the Backbone, allowing it to merge the image features of powdery mildew and anthracnose in a weighted manner, thereby improving the network performance at a small cost.

Loss Function Design
The loss function was changed from GIOU to [41]. The GIOU function was proposed on the basis of the IOU function. It solves the problem of the IOU not being able to reflect how the two boxes intersect. However, if the anchor and target boxes are part of a containment relationship, then GIOU will still degenerate into IOU. Therefore, we changed the loss function GIOU to . was obtained on the basis of complete-IOU loss (CIOU), and it not only takes into account the central point distance and the aspect ratio, but also the true discrepancies in the target and anchor boxes' widths and heights. The loss function directly minimizes these discrepancies and accelerates model convergence. The loss function is shown in Equation (5).
where and represent the width and height, respectively, of the smallest enclosing box covering the two boxes; and represent the central points of the predicted and target boxes, respectively; represents the Euclidean distance; represents the diagonal length of the smallest enclosing box covering the two boxes. The loss function is divided into three parts: the loss , the distance loss and the aspect loss . Combined with the InvolutionBottleneck and the SE modules, the whole improved YOLOv5 network model framework is constructed, as shown in Figure 2.

Experimental Materials
The images of rubber tree diseases were collected from a rubber plantation in Shengli State Farm, Maoming City, China. It is located at 22°6′ N, 110°80′ E, with an altitude of 34-69 m, an average annual precipitation of 1698.1 mm and an annual average temperature of 19.9-26.5 °C. The high humidity and warm climate are conducive to widespread epidemics of powdery mildew and anthracnose. To ensure the representativeness of the image set, they were collected under natural light conditions. A Sony digital ILCE-7m3 camera was used to photograph powdery mildew and anthracnose of rubber leaves at different angles, with an image resolution of 6000 × 4000 pixels. There were 2375 images in the rubber tree disease database, including 1203 powdery mildew images and 1172 anthracnose images, which were used for the training and testing of disease recognition models. We identified these two diseases with the guidance of plant protection experts. Images of these rubber tree diseases are shown in Figure 3.

Data Preprocessing
Before the images were inputted into the improved YOLOv5 network model, the mosaic data enhancement method was used to expand the image set. The images were spliced using several methods, such as random scaling, random cropping and random arrangement, which not only expanded the image set, but also improved the detection of small targets. In addition, before training the model, adaptive scaling and filling operations were performed on the images of rubber tree diseases, and the input image size was normalized to 640 × 640 pixels. The preprocessing results are shown in Figure 4.

Experimental Equipment
A desktop computer was used as the processing platform, the operating system was Ubuntu 18.04 and the Pytorch framework and the YOLOv5 environment were built in the Anaconda3 environment. The program was written in Python 3.8, and the CUDA version was 10.1. For hardware, the processor was an Intel Core i3-4150, the main frequency was 3.5 GHz, the memory was 3G and the graphics card was a GeForce GTX 1060 6G. The specific configurations are provided in Table 1.

Experimental Process
First, the manual labeling method was used to mark each rubber disease image for powdery mildew or anthracnose to obtain training label images, and then the disease image set was divided at a 4:1:1 ratio into training, validation and test sets. The training set was inputted into the improved YOLOv5 network of different structures for training. The training process was divided into 80 batches, with each batch containing 96 images. The Stochastic Gradient Descent algorithm was used to optimize the network model during the training process, and the optimal network weight was obtained after the training was completed. Subsequently, the performance of the network model was determined using the test set and compared with the test results of the original YOLOv5 and the YOLOX_nano networks. The network model with the best result was selected as the rubber tree disease recognition model. The test process is shown in Figure 5.

Convergence Results of the Network Model
The training and verification sets were inputted into the network for training. After 80 batches of training, the loss function value curves of the training and verification sets were determined (Figure 6), and they included the detection frame loss, the detection object loss and the classification loss. The loss of the detection frame indicates whether an algorithm can locate the center point of an object well and whether the detection target is covered by the predicted bounding box. The smaller the loss function value, the more accurate the prediction frame. The object loss function is essentially a measure of the probability that the detection target exists in the region of interest. The smaller the value of the loss function, the higher the accuracy. The classification loss represents the ability of the algorithm to correctly predict a given object category. The smaller the loss value, the more accurate the classification.
As shown in Figure 6, the loss function value had a downward trend during the training process, the Stochastic Gradient Descent algorithm optimized the network and the network weight and other parameters were constantly updated. Before the training batch reached 20, the loss function value dropped rapidly, and the accuracy, recall rate and average accuracy rapidly improved. The network continued to iterate. When the training batch reached approximately 20, the decrease in the loss function value gradually slowed. Similarly, the increases in parameters such as average accuracy also slowed. When the training batch reached 80, the loss curves of the training and validation sets showed almost no downward trends, and other index values also tended to have stabilized. The network model basically reached the convergence state, and the optimal network weight was obtained at the end of training.

Verification of the Network Model
To evaluate the detection performance of the improved YOLOv5 network, it was crucial to use appropriate evaluation metrics for each problem. The precision, recall, average precision and mean average precision were used as the evaluation metrics, and they were respectively defined as follows: where represents the number of positive samples that are correctly detected, represents the number of negative samples that are falsely detected and represents the number of positive samples that are not detected.
In total, 200 powdery mildew images and 200 anthracnose images were randomly selected as the test set and inputted into the improved YOLOv5 network for testing. The test results were compared with those of the original YOLOv5 and theYOLOX_nano networks. The comparison results are shown in Figure 7. As shown in Figure 7, the detection performance of the improved YOLOv5 network was better than that of the original YOLOv5 network for each of the tested two diseases of rubber trees. For the detection of powdery mildew, precision increased by 8.7% and average precision increased by 1%; however, recall decreased by 1.5%. For the detection of anthracnose, average precision increased by 9.2% and recall increased by 9.3%; however, precision decreased by 5.2%. Overall, the mean average precision increased by 5.4%. The improved YOLOV5 network achieved 86.5% and 86.8% precision levels for the detection of powdery mildew and anthracnose, respectively. In summary, the improved YOLOv5 network's performance was greatly enhanced compared with that of the original YOLOv5 network; consequently, it achieved more accurate rubber tree disease identification and location functions.As shown in Figure 7, the performance of the improved YOLOv5 network was better than those of the original YOLOv5 and YOLOX_nano networks for the detection of the two diseases of rubber trees. Compared with the original YOLOv5 network, the precision of powdery mildew detection increased by 8.7% and the average precision increased by 1%; however, recall decreased by 1.5%. The average precision of anthracnose detection increased by 9.2% and recall increased by 9.3%; however, precision decreased by 5.2%. Overall, the mean average precision increased by 5.4%. Compared with the YOLOX_nano network, the precision of powdery mildew detection increased by 3.7% and the average precision increased by 0.3%; however, recall decreased by 2%. The precision of anthracnose detection increased by 4.4% and recall increased by 3.8%; however, precision decreased by 4.4%. Overall, themean average precision increased by 1.4%. The improved YOLOv5 network achieved 86.5% and 86.8% levels for the detection of powdery mildew and anthracnose, respectively. In summary, the improved YOLOv5 network's performance was greatly enhanced compared with those of the original YOLOv5 and YOLOX_nano networks; consequently, it more accurately locates and identifies rubber tree diseases.

Comparison of Recognition Results
The original YOLOv5, the YOLOX_nano and the improved YOLOv5 networks were used to detect two kinds of diseases of rubber trees to verify the actual classification and recognition effects of the improved network. A comparison of test results is shown in Figure 8. As shown in Figure8, compared with the other networks, the improved network significantly improved the detection of powdery mildew, including obscured diseased leaves. Additionally, the recognition effect of the YOLOX_nano network for powdery mildew is better than that of the original YOLOv5 network. For the detection of anthracnose, the recognition effects of the three networks were similar, with all three effectively detecting anthracnose. Therefore, the effectiveness of the improved network for diseased leaves detection is generally better than those of the original YOLOv5 and the YOLOX_nano networks.

Conclusions
The detection and location of plant diseases in the natural environment are of great significance to plant disease control. In this paper, a rubber tree disease recognition model based on the improved YOLOv5 network was established. We replaced the Bottleneck module with the InvolutionBottleneck module to achieve channel sharing within the group and reduce the number of network parameters. In addition, the SE module was added to the last layer of the Backbone for feature fusion, which improved network performance at a small cost. Finally, the loss function was changed from GIOU to EIOU to accelerate the convergence of the network model. According to the experimental results, the following conclusions can be drawn: (1) The model performance verification experiment showed that the rubber tree disease recognition model based on the improved YOLOv5 network achieved 86.5% precision for powdery mildew detection and 86.8% precision for anthracnose detection.
In general, the mean average precision reached 70%, which is an increase of 5.4% compared with the original YOLOv5 network. Therefore, the improved YOLOv5 network more accurately identified and classified rubber tree diseases, and it provides a technical reference for the prevention and control of rubber tree diseases. (2) A comparison of the detection results showed that the performance of the improved YOLOv5 network was generally better than those of the original YOLOv5 and the YOLOX_nano networks, especially in the detection of powdery mildew. The problem of the missing obscured diseased leaves was improved.
Although the improved YOLOv5 network, as applied to rubber tree disease detection, achieved good results, the detection accuracy still needs to be improved. In future research, the network model structure will be further optimized to improve the network performance of the rubber tree disease recognition model. Author Contributions:Conceptualization, Z.C. and X.Z.; methodology, Z.C.; software, Z.C.; validation, Z.C.; formal analysis, Z.C. and X.Z.; investigation, Z.C., X.Z., R.W., Y.L., C.L. and S.C.(Siyu Chen); resources, Z.C., X.Z., R.W., Z.Y. and S.C.(Shiwei Chen); data curation, Z.C.; writing-original draft preparation, Z.C.; writing-review and editing, Z.C. and X.Z.; visualization, Z.C.; supervision, X.Z.; project administration, X.Z. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the privacy policy of the organization.

Acknowledgments:
The authors would like to thank the anonymous reviewers for their critical comments and suggestions for improving the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.