DBA_SSD: A Novel End-to-End Object Detection Algorithm Applied to Plant Disease Detection

: In response to the difﬁculty of plant leaf disease detection and classiﬁcation, this study proposes a novel plant leaf disease detection method called deep block attention SSD (DBA_SSD) for disease identiﬁcation and disease degree classiﬁcation of plant leaves. We propose three plant leaf detection methods, namely, squeeze-and-excitation SSD (Se_SSD), deep block SSD (DB_SSD), and DBA_SSD. Se_SSD fuses SSD feature extraction network and attention mechanism channel, DB_SSD improves VGG feature extraction network, and DBA_SSD fuses the improved VGG network and channel attention mechanism. To reduce the training time and accelerate the training process, the convolutional layers trained in the Image Net image dataset by the VGG model are migrated to this model, whereas the collected plant leaves disease image dataset is randomly divided into training set, validation set, and test set in the ratio of 8:1:1. We chose the PlantVillage dataset after careful consideration because it contains images related to the domain of interest. This dataset consists of images of 14 plants, including images of apples, tomatoes, strawberries, peppers, and potatoes, as well as the leaves of other plants. In addition, data enhancement methods, such as histogram equalization and horizontal ﬂip were used to expand the image data. The performance of the three improved algorithms is compared and analyzed in the same environment and with the classical target detection algorithms YOLOv4, YOLOv3, Faster RCNN, and YOLOv4 tiny. Experiments show that DBA_SSD outperforms the two other improved algorithms, and its performance in comparative analysis is superior to other target detection algorithms.


Introduction
Plants are susceptible to various diseases, thereby affecting their quality and yield seriously. The formulation of prevention and control plans as soon as possible before the outbreak of the diseases can maximize the effect of prevention and control and reduce economic losses. Therefore, the identification of plant diseases is an effective way to inhibit the rapid development of diseases and avoid their occurrence. Previously, People are used to making subjective judgments by crop disease category, and often disease detection is expert-based, making it a costly and error-prone process.
Agricultural detection based on artificial intelligence, such as crop yield prediction [1], weed identification processing [2], and plant disease detection [3,4], is widely used with the development of artificial intelligence technology. Machine learning-based disease detection requires preprocessing the dataset, extracting the features of disease regions in the image using feature extraction algorithms, sending the obtained feature information to the classifier to obtain the model parameters, and obtaining the disease categories and the degree of disease to be detected. However, the model generalization ability is weak because of the machine learning-based image recognition. When the number of categories is excessive, the features of each class cannot be distinguished effectively. Moreover, the (1) We proposed a novel end-to-end detection algorithm for plant disease, DBA_SSD, by combining the attention mechanism and convolution kernel, which combines the attributes of the plant leaf disease pictures and pay more attention to disease details when testing plant disease. (2) We graded the health of the fruit and vegetable leaves. According to the research results of the paper, different measures can be taken according to the severity of the diseases of the fruit and vegetable leaves. Increasing the yield of plants is of great significance. (3) We implemented the classic SSD, YOLOv4, YOLOv3, Faster RCNN, and YOLOv4 tiny models and compared them with our proposed DBA_SSD. Our method is better than the classic baseline method on the vegetable and fruit leaf data set.
The main structure of this article is presented as follows. The first chapter mainly introduces the related work on the detection of leaf disease and combs the detection technology of leaf disease. The second chapter introduces the SSD model and related improvement modules and proposes two improved methods for the SSD target detection algorithm. The third chapter introduces the environment of algorithm experiment, data set structure, experiment procedure, and experiment evaluation standard. The fourth chapter conducts a comparative analysis of the results of the two sets of experiments and related ablation experiments on the proposed DBA_SSD. The other is a comparative analysis of the results of SSD improved algorithms and other target detection algorithms. Finally, we summarize and prospect the research in this article.

Related Work
At present, the research methods on plant disease recognition mainly focuses on two aspects: one is disease recognition based on machine learning, and the general steps include diseased leaf image segmentation, feature extraction, and disease recognition; and the other is target recognition technology based on deep learning, wherein terminal end-to-end target detection is favored by many researchers because of its fast recognition speed and efficient feature extraction methods. End-to-end target detection algorithm is also called one-stage target detection algorithm. One-stage means that no candidate frames are generated and the target frame localization problem is directly transformed into a regression problem processing.
In the research on the identification of plant diseases based on machine learning, Literature [20] proposed a DCNN-based apple tree leaf disease (ATLD) diagnosis method, and established 5 common ATLDs and healthy leaf data sets. The DCNN model combines DenseNet and Xception [21] models by using support vector machine to classify apple leaf diseases, the experimental results show that the accuracy of the DCNN model better than and comparing Inception-v3, MobileNet [22], VGG-16, DenseNet-201, Xception, VGG-INCEP. Shrivastava et al. [23] proposed a rice disease image classification by only method using color features, and explored the feature extraction methods of 14 different color channels. They obtained 172 different color channel feature information and used 7 different classifiers. The performance is compared, and the result shows that the classification accuracy of the support vector machine classifier is up to 94.65%. Literature [24] introduced a hybrid method for detecting plant leaf disease. The first stage corresponds to the image enhancement and image conversion scheme to overcome the problems related to low illumination and noise. The second stage combines the feature extraction technology of GLCM, complex Gabor filter, Curvelet, and image moments. The third stage uses the extracted features to train the nerve fuzzy logic classifier, and the proposed combination of feature extraction and image preprocessing can improve classification accuracy. Abdulridha [25] used hyperspectral imaging and machine learning to develop a technique for detecting pumpkin powdery mildew in the asymptomatic, early, middle, and late stages. This method uses a radial basis function to treat the disease. Strains and healthy strains were distinguished, and the severity of diseased strains was classified. Abdu [26] proposed a method for identifying the surface of plant diseased leaves, extracting optimized features from the diseased area, and identifying plant diseased leaves based on a feature-based machine learning classifier. The diseased features are connected in series to form a pathological feature vector for disease recognition to improve detection accuracy.
In deep learning-based research on fruit and vegetable diseases, Salma Samiei [27] used red clover and alfalfa as research objects and proposed CNN-LSTM models combined with denoising algorithms to classify the different growth stages of two different plant species. Based on high-resolution remote sensing data, Alin-Ionut, Ples, oianu et al. [28] Information 2021, 12, 474 4 of 19 and others proposed an integrated deep learning model for individual tree crown detection and species classification. Mohamed Kerkech et al. [29] proposed a new method of grape disease detection based on the SegNet [30] architecture for visible light and infrared image segmentation to identify shadows, ground, healthy and symptomatic vines, and finally merge the segmentation obtained from visible light and infrared images to generate the whole disease map of grapes. Literature [31], a U-Net method for pixel-level purple rapeseed segmentation was proposed to calculate the model parameters by adjusting the sample size. In the literature [32], a new thermal imaging method was proposed to calculate the color similarity problem between unripe citrus fruits and leaves, which were prone to temperature differences between fruit and leaf surfaces because of the varying rates of temperature change between the fruit and leaf surfaces caused by water mist and to build a deep learning model based on the thermal imaging system. Meanwhile, the disease detection algorithm is moving towards lightweight, thereby making deploy into embedded devices easy. Chongke Bi [33] proposed a lightweight method for apple leaf disease identification based on MobileNet model. This method was also compared with ResNet152 and InceptionV3. The method can provide stable recognition results and is easily deployed in mobile devices. Utpal Barman [34] compared MobileNet CNN and Self-Structured CNN (SSCNN) based on citrus disease dataset from smartphone images. The experiments show that SSCNN is more accurate in classifying citrus leaf diseases based on smartphone images and takes less computation time. After research, increasing number of scholars tend to detect plant diseases using deep learning-based target detection methods, especially YOLO, SSD, and other target detection algorithms represented by one-stage methods, which omit tedious machine learning steps, such as image preprocessing, segmentation, and feature extraction, in a one-step end-to-end method with high recognition accuracy. Therefore, this paper explores the effectiveness of target detection algorithms for vegetable and fruit leaf disease detection and grading by using SSD as baseline method.

SSD Network
The SSD algorithm model is a one-stage real-time target detection model proposed simultaneously with YOLO series. SSD combines the one-stage regression prediction idea of the YOLO series and the Anchor Box mechanism of the Faster RCNN by using VGG as the base feature extraction network and extracting six different size feature layers from the bottom to the top layer as the regression prediction features. The advantage of SSD is that it improves the operation speed of the algorithm greatly while maintaining the detection accuracy. Moreover, the detection of small targets and large objects are considered. Figure 1 shows the SSD backbone network structure.
Information 2021, 10, 4 o plant species. Based on high-resolution remote sensing data, Alin-Ionut, Ples, oianu et [28] and others proposed an integrated deep learning model for individual tree cro detection and species classification. Mohamed Kerkech et al. [29] proposed a new meth of grape disease detection based on the SegNet [30] architecture for visible light and frared image segmentation to identify shadows, ground, healthy and symptomatic vin and finally merge the segmentation obtained from visible light and infrared images generate the whole disease map of grapes. Literature [31], a U-Net method for pixel-le purple rapeseed segmentation was proposed to calculate the model parameters by adju ing the sample size. In the literature [32], a new thermal imaging method was propo to calculate the color similarity problem between unripe citrus fruits and leaves, wh were prone to temperature differences between fruit and leaf surfaces because of the v ying rates of temperature change between the fruit and leaf surfaces caused by water m and to build a deep learning model based on the thermal imaging system. Meanwhile, disease detection algorithm is moving towards lightweight, thereby making deploy i embedded devices easy. Chongke Bi [33] proposed a lightweight method for apple l disease identification based on MobileNet model. This method was also compared w ResNet152 and InceptionV3. The method can provide stable recognition results and is e ily deployed in mobile devices. Utpal Barman [34] compared MobileNet CNN and S Structured CNN (SSCNN) based on citrus disease dataset from smartphone images. T experiments show that SSCNN is more accurate in classifying citrus leaf diseases ba on smartphone images and takes less computation time. After research, increasing nu ber of scholars tend to detect plant diseases using deep learning-based target detect methods, especially YOLO, SSD, and other target detection algorithms represented one-stage methods, which omit tedious machine learning steps, such as image prep cessing, segmentation, and feature extraction, in a one-step end-to-end method with h recognition accuracy. Therefore, this paper explores the effectiveness of target detect algorithms for vegetable and fruit leaf disease detection and grading by using SSD baseline method.

SSD Network
The SSD algorithm model is a one-stage real-time target detection model propo simultaneously with YOLO series. SSD combines the one-stage regression prediction id of the YOLO series and the Anchor Box mechanism of the Faster RCNN by using VGG the base feature extraction network and extracting six different size feature layers fr the bottom to the top layer as the regression prediction features. The advantage of SSD that it improves the operation speed of the algorithm greatly while maintaining the det tion accuracy. Moreover, the detection of small targets and large objects are consider Figure 1 shows the SSD backbone network structure.  Figure 1. SSD backbone network structure. The loss function of SSD contains log loss for classification and smooth L1 for regression, and controls the proportion of positive and negative samples, which can improve the speed of optimization and the stability of training results. The total loss function is the sum of the errors of classification and regression. α is used to adjust the weight between the confidence loss and location loss, default = 1, and N denotes the total number of default boxes that eventually match with Ground Truth boxes. Confidence loss is a typical softmax loss, and location loss is a typical smooth L1 loss.
Total loss: Classified losses: Of which: x represents whether the i-th regression box matches the j-th GroundTruth box of type P. Regression of losses: SSD adopts full convolution for direct regression prediction and no longer generates candidate frames, which greatly improves the detection speed of SSD network. But there are some cases where the detection accuracy is not as good as we expect. When the surface features of leaves are similar or leaves are occluded from each other, SSD will miss and mis-detect, which often occurs in the actual leaf disease detection. For this reason, SSD needs to be improved to enhance feature recognition.

Squeeze-and-Excitation SSD (Se_SSD) Network
Se_Block [35] mainly focuses on the relationship between channels and can explicitly model the interdependencies between feature channels with the structural unit "Squeezeand Excitation (SE)" module, which adaptively adjusts the feature response values of each channel and internal dependencies between channels. The Se_Block module works as shown in Figure 2, First, feature compression is performed along the spatial dimension of the feature map, and each two-dimensional feature channel is turned into a real number, that has a global perceptual field to a certain extent. The output has the same number of dimensions as the input feature channels. Then, based on the correlation between the feature channels, a weight is generated for each feature channel to represent the importance of the feature channels. Finally, the original features are re-calibrated in the channel dimension by multiplying the channel-by-channel weights onto the previous features.
number of default boxes that eventually match with Ground Truth boxes. Confid is a typical softmax loss, and location loss is a typical smooth L1 loss.
represents whether the i-th regression box matches the j-th GroundTruth box of Regression of losses: SSD adopts full convolution for direct regression prediction and no longer g candidate frames, which greatly improves the detection speed of SSD network. B are some cases where the detection accuracy is not as good as we expect. When th features of leaves are similar or leaves are occluded from each other, SSD will m mis-detect, which often occurs in the actual leaf disease detection. For this reas needs to be improved to enhance feature recognition.

Squeeze-and-Excitation SSD (Se_SSD) Network
Se_Block [35] mainly focuses on the relationship between channels and can e model the interdependencies between feature channels with the structural unit "S and Excitation (SE)" module, which adaptively adjusts the feature response value channel and internal dependencies between channels. The Se_Block module w shown in Figure 2, First, feature compression is performed along the spatial dime the feature map, and each two-dimensional feature channel is turned into a real that has a global perceptual field to a certain extent. The output has the same nu dimensions as the input feature channels. Then, based on the correlation betw feature channels, a weight is generated for each feature channel to repre importance of the feature channels. Finally, the original features are re-calibrate channel dimension by multiplying the channel-by-channel weights onto the features.

AdaptiveAvgPool2d
Conv2D ReLU Conv2D Sigmoid X K*X K X To increase the feature extraction capability of SSD feature extraction mo focus more on the feature layers with higher importance, this paper adds S To increase the feature extraction capability of SSD feature extraction model and focus more on the feature layers with higher importance, this paper adds Se_Block attention mechanism module in front of the last six effective feature layers used for regression prediction on the basis of SSD model. The feature layers are rescaled by channel dimension. The structure of Se_SSD network is shown in Figure 3. attention mechanism module in front of the last six effective feature regression prediction on the basis of SSD model. The feature layers are resc dimension. The structure of Se_SSD network is shown in Figure 3.

DB_SSD and DBA_SSD Network
The residual network module, which is a module with good applica two years, is shown in Figure 4a. X is the input feature map, Wi is the w layer network, F (X, Wi) + X is the feature output, and F (X, Wi) + X is ho computed in the module. The residual network is superior to the traditiona network. The residual network module implements an ultra-deep network bottleneck problem of saturating the neural network with correctness du deepening. In addition, by directly connecting the input and output to ach simplifying the learning objective and difficulty. 1 × 1 convolution is show and 1 × 1 convolution is usually followed by a nonlinear layer of Relu for n to learn more features. In addition to this 1 × 1 convolution's can change th ity of the image and transform the original image by 1 × 1 convolution generalization ability to reduce overfitting, and at the same time reduce the effort by boosting and reducing the number of channels to achieve cross mation interaction and feature integration in the process.

DB_SSD and DBA_SSD Network
The residual network module, which is a module with good application in the last two years, is shown in Figure 4a. X is the input feature map, Wi is the weight of the ith layer network, F (X, Wi) + X is the feature output, and F (X, Wi) + X is how the data are computed in the module. The residual network is superior to the traditional convolutional network. The residual network module implements an ultra-deep network and avoids the bottleneck problem of saturating the neural network with correctness due to continuous deepening. In addition, by directly connecting the input and output to achieve the goal of simplifying the learning objective and difficulty. 1 × 1 convolution is shown in Figure 4b, and 1 × 1 convolution is usually followed by a nonlinear layer of Relu for nonlinearization to learn more features. In addition to this 1 × 1 convolution's can change the dimensionality of the image and transform the original image by 1 × 1 convolution to improve the generalization ability to reduce overfitting, and at the same time reduce the computational effort by boosting and reducing the number of channels to achieve cross-channel information interaction and feature integration in the process.  As shown in Figure 5, two kinds of rich feature extraction modules are designed in this paper, as shown in Figure 5a, Deep_Block is used to enhance the network feature extraction capability by using 1 × 1 convolution to reduce the number of channels after convolution, fusing multi-channel information, while introducing a residual structure to prevent the loss of feature layer information. Deep_Block_Attention adds a channel attention mechanism at the end of the Deep_Block structure for fine-tuning at the channel level. As shown in Figure 5b, the feature extraction network of SSD is reconstructed with the rich feature extraction module as the basic feature extraction unit, as shown in Figure  6, to deepen the feature extraction of each layer and increase the richness of feature learning by the rich feature extraction module.  As shown in Figure 5, two kinds of rich feature extraction modules are designed in this paper, as shown in Figure 5a, Deep_Block is used to enhance the network feature extraction capability by using 1 × 1 convolution to reduce the number of channels after convolution, fusing multi-channel information, while introducing a residual structure to prevent the loss of feature layer information. Deep_Block_Attention adds a channel attention mechanism at the end of the Deep_Block structure for fine-tuning at the channel level. As shown in Figure 5b, the feature extraction network of SSD is reconstructed with the rich feature extraction module as the basic feature extraction unit, as shown in Figure 6, to deepen the feature extraction of each layer and increase the richness of feature learning by the rich feature extraction module.
weight layer ReLU weight layer ReLU X identity As shown in Figure 5, two kinds of rich feature extraction modules are designe this paper, as shown in Figure 5a, Deep_Block is used to enhance the network fea extraction capability by using 1 × 1 convolution to reduce the number of channels convolution, fusing multi-channel information, while introducing a residual structu prevent the loss of feature layer information. Deep_Block_Attention adds a cha attention mechanism at the end of the Deep_Block structure for fine-tuning at the cha level. As shown in Figure 5b, the feature extraction network of SSD is reconstructed the rich feature extraction module as the basic feature extraction unit, as shown in Fi 6, to deepen the feature extraction of each layer and increase the richness of fea learning by the rich feature extraction module.    Figure 6. DBA_SSD network structure.

Experimental Environment
This experiment is a deep learning model built under the Pytorch deep framework, using a dataset of 3000 plant leaves, and the final output predicti identifies the leaf species and determines the severity of leaf disease. The exp were conducted on an Asus laptop from Shanghai, China, with an AMD Ryzen processor, NVIDIA Ge-Force RTX 2060 graphics card, and 32G RAM. The deep framework we use is Pytorch.

Experimental Environment
This experiment is a deep learning model built under the Pytorch deep learning framework, using a dataset of 3000 plant leaves, and the final output prediction frame identifies the leaf species and determines the severity of leaf disease. The experiments were conducted on an Asus laptop from Shanghai, China, with an AMD Ryzen 7 4800H processor, NVIDIA Ge-Force RTX 2060 graphics card, and 32G RAM. The deep learning framework we use is Pytorch.

Dataset
We chose the PlantVillage dataset [36] after careful consideration because of the large number of leaf species and the abundance of disease species in this dataset. And benefiting from the convenience and simplicity of Labelimg, this experiment uses Labelimg software to label the dataset and obtain data in VOC format for training, with label files as xml. files and pictures as jpg. files. The dataset of the experiment has 3000 images, which are divided into 5 major categories: Apple, Tomato, Potatoes, Strawberry and Chili; each major category is divided into 3 subcategories according to the severity of leaf disease: healthy, general, and severe. In total, 15 subcategories are noted, and the image resolution is around 255 × 470 × 3 pixels. The ratio of test, train, and val in the total data set is 1:8:1. Figure 7 shows the composition of the data set.

Experimental Design
To ensure the equalization of the dataset and to increase the richness and quality of the dataset, data enhancement and image preprocessing were performed on the images before the experimental tests [37]. The means of enhancement are Histogram Equalization, Horizontal Flip + Hue Saturation Value, Vertical Flip + Channel Shuffle, Horizontal Flip + Vertical Flip+ Channel Shuffle. The enhanced images are shown in Figure 8, with each of the 15 classes expanded to 1000 images, and the number of data sets expanded from 3000 to 15,000, with the training, validation, and testing ratios randomly assigned according to 1:8:1.

Experimental Design
To ensure the equalization of the dataset and to increase the richness and quality the dataset, data enhancement and image preprocessing were performed on the imag before the experimental tests [37]. The means of enhancement are Histogram Equalizatio Horizontal Flip + Hue Saturation Value, Vertical Flip + Channel Shuffle, Horizontal Flip Vertical Flip+ Channel Shuffle. The enhanced images are shown in Figure 8, with each the 15 classes expanded to 1,000 images, and the number of data sets expanded from 30 to 15,000, with the training, validation, and testing ratios randomly assigned according 1:8:1.

Experimental Design
To ensure the equalization of the dataset and to increase the richness and quality of the dataset, data enhancement and image preprocessing were performed on the images before the experimental tests [37]. The means of enhancement are Histogram Equalization, Horizontal Flip + Hue Saturation Value, Vertical Flip + Channel Shuffle, Horizontal Flip + Vertical Flip+ Channel Shuffle. The enhanced images are shown in Figure 8, with each of the 15 classes expanded to 1,000 images, and the number of data sets expanded from 3000 to 15,000, with the training, validation, and testing ratios randomly assigned according to 1:8:1. To better test the performance of the improved algorithm, four experiments were designed. Se_SSD with channel attention mechanism added at the end of the feature extraction network, DB_SSD (Deep Block SSD) with improved VGG feature extraction network, DBA_SSD with fusion of the improved VGG network and channel attention mechanism, and SSD of the original network are compared, and the VGG model trained on Image Net image dataset is trained by migrated convolutional layers to this model. Experiment 1: The Se_SSD network with the Se_Block channel attention mechanism added is trained and the average accuracy of this network for the detection of plant leaves is tested. Experiment 2: The DB_SSD network with the Deep_Block module added, where the Deep_Block module does not contain the attention mechanism, is trained in the environment and hardware conditions of Experiment 1. Experiment 3. The DBA_SSD network with the Deep_Block_Attention module added, where the Deep_Block_Attention module containing the attention mechanism, is trained and tested under the environment and hardware conditions of Experiment 1. Experiment 4. The original SSD network is trained and tested under the environment and hardware conditions of Experiment 1.
All the four experiments were trained on the basis of 15,000 plant leaf datasets and tested 1500 randomly selected images. The experiments followed the experimental flow in Figure 9, the experiment-comparison-optimization-experiment pattern, to obtain the average accuracy mAP under this model and to compare the mAP values of different models. To better test the performance of the improved algorithm, four experiments were designed. Se_SSD with channel attention mechanism added at the end of the feature extraction network, DB_SSD (Deep Block SSD) with improved VGG feature extraction network, DBA_SSD with fusion of the improved VGG network and channel attention mechanism, and SSD of the original network are compared, and the VGG model trained on Image Net image dataset is trained by migrated convolutional layers to this model. Experiment 1: The Se_SSD network with the Se_Block channel attention mechanism added is trained and the average accuracy of this network for the detection of plant leaves is tested. Experiment 2: The DB_SSD network with the Deep_Block module added, where the Deep_Block module does not contain the attention mechanism, is trained in the environment and hardware conditions of Experiment 1. Experiment 3. The DBA_SSD network with the Deep_Block_Attention module added, where the Deep_Block_Attention module containing the attention mechanism, is trained and tested under the environment and hardware conditions of Experiment 1. Experiment 4. The original SSD network is trained and tested under the environment and hardware conditions of Experiment 1.
All the four experiments were trained on the basis of 15,000 plant leaf datasets and tested 1500 randomly selected images. The experiments followed the experimental flow in Figure 9, the experiment-comparison-optimization-experiment pattern, to obtain the average accuracy mAP under this model and to compare the mAP values of different models. Information 2021, 10, 11 of 19 Figure 9. Experimental flow.

Performance Evaluation Metrics
Precision is a measure of the accuracy of a model's prediction, and its value is equal to the number of correctly predicted positive samples over the total number of positively predicted samples. Recall (Recall) is a measure of the model's ability to identify positive samples, and its value is the number of correctly predicted positive samples over the total number of positively predicted samples. The prediction results of the model are shown in Table 1 for TP, FP, FN, and TN.
The PR curve is a graph drawn with Recall as the horizontal axis and Precision as the vertical axis; Precision is negatively correlated with Recall, and the recall rate decreases as precision increases. AP (Average Precision) as a single category indicator is the integration of PR curve.
The value of mAP (mean average precision), as one of the important metrics for the evaluation of the whole model, is the average of the summation of all the category APs.
where n is the category and N is the total number of categories.

Performance Evaluation Metrics
Precision is a measure of the accuracy of a model's prediction, and its value is equal to the number of correctly predicted positive samples over the total number of positively predicted samples. Recall (Recall) is a measure of the model's ability to identify positive samples, and its value is the number of correctly predicted positive samples over the total number of positively predicted samples. The prediction results of the model are shown in Table 1 for TP, FP, FN, and TN.
The PR curve is a graph drawn with Recall as the horizontal axis and Precision as the vertical axis; Precision is negatively correlated with Recall, and the recall rate decreases as precision increases. AP (Average Precision) as a single category indicator is the integration of PR curve.
The value of mAP (mean average precision), as one of the important metrics for the evaluation of the whole model, is the average of the summation of all the category APs.
where n is the category and N is the total number of categories.

DBA_SSD Model Experimental Comparison Analysis
The first 50 Epochs were trained by freezing some of the network layer weights, and each batch was trained with 8 images. For the last 50 Epochs, the frozen layers were unfrozen and the full network was trained. The learning rate started at 5 × 10 −4 , and after unfrozen the learning rate was 10 −4 . Fine tuning of the model parameters was performed. As shown in Figure 10

DBA_SSD Model Experimental Comparison Analysis
The first 50 Epochs were trained by freezing some of the network layer weights, and each batch was trained with 8 images. For the last 50 Epochs, the frozen layers were unfrozen and the full network was trained. The learning rate started at 5 × 10 −4 , and after unfrozen the learning rate was 10 −4 . Fine tuning of the model parameters was performed. As shown in Figure 10  The test results between SSD and its improved algorithm are shown in Table 2. DBA_SSD has the highest accuracy because Deep Block strengthens the network's feature extraction ability on the one hand, and it incorporates the channel attention mechanism to accelerate the network learning on the other hand, so that the network focuses on the channels with high information content for feature learning. The prediction accuracy between its SSD and its improved algorithm for predicting different species of fruit and vegetable diseases is shown in Figure 11. The prediction accuracy of DBA_SSD is relatively high among most of the categories, and the mAP value of DBA_SSD is 92.20%, while the mAP values of SSD, Se_SSD, and DB_SSD are 9.96%, 90.77%, and 89.93%, respectively.  The test results between SSD and its improved algorithm are shown in Table 2. DBA_SSD has the highest accuracy because Deep Block strengthens the network's feature extraction ability on the one hand, and it incorporates the channel attention mechanism to accelerate the network learning on the other hand, so that the network focuses on the channels with high information content for feature learning. The prediction accuracy between its SSD and its improved algorithm for predicting different species of fruit and vegetable diseases is shown in Figure 11. The prediction accuracy of DBA_SSD is relatively high among most of the categories, and the mAP value of DBA_SSD is 92.20%, while the mAP values of SSD, Se_SSD, and DB_SSD are 9.96%, 90.77%, and 89.93%, respectively.  Figure 11. AP diagram of SSD and its improved algorithm for the detection of different kinds of diseases.
Further observe the data distribution of the experimental results in Figure 12. The horizontal coordinates indicate the improved algorithm types, the vertical coordinates are the distribution of predicted AP values for the 15 types, the points of the triangle indicate the mean, and the thin solid line in the middle of the rectangle indicates the median. From Figure 12, we can see that among the four algorithms SSD, Se_SSD, DB_SSD, and DBA_SSD, DBA_SSD prediction accuracy is more concentrated. Moreover, the median and mean are the highest. DBA_SSD algorithm has better performance compared with other improved algorithms.  Further observe the data distribution of the experimental results in Figure 12. The horizontal coordinates indicate the improved algorithm types, the vertical coordinates are the distribution of predicted AP values for the 15 types, the points of the triangle indicate the mean, and the thin solid line in the middle of the rectangle indicates the median. From Figure 12, we can see that among the four algorithms SSD, Se_SSD, DB_SSD, and DBA_SSD, DBA_SSD prediction accuracy is more concentrated. Moreover, the median and mean are the highest. DBA_SSD algorithm has better performance compared with other improved algorithms. Further observe the data distribution of the experimental results in Figure 12. The horizontal coordinates indicate the improved algorithm types, the vertical coordinates are the distribution of predicted AP values for the 15 types, the points of the triangle indicate the mean, and the thin solid line in the middle of the rectangle indicates the median. From Figure 12, we can see that among the four algorithms SSD, Se_SSD, DB_SSD, and DBA_SSD, DBA_SSD prediction accuracy is more concentrated. Moreover, the median and mean are the highest. DBA_SSD algorithm has better performance compared with other improved algorithms.

Comparative Analysis with Classical Target Detection Algorithms
This experiment compares and analyzes the test results of the classical target detection algorithms YOLOv4 [38], YOLOv4 tiny [39], Faster RCNN, and YOLOv3. This experiment is conducted with the same dataset in the same experimental environment, and its Loss variation of each algorithm is shown in Figure 13.

Comparative Analysis with Classical Target Detection Algorithms
This experiment compares and analyzes the test results of the classical target detection algorithms YOLOv4 [38], YOLOv4 tiny [39], Faster RCNN, and YOLOv3. This experiment is conducted with the same dataset in the same experimental environment, and its Loss variation of each algorithm is shown in Figure 13. The disease degree of each plant leaf in this article can be divided into three categories: healthy, normal and severe (Table 3). Figure 14 then averages the detection accuracy of the same leaves on the basis of Table 3. The prediction accuracy of this category is the average of the sum of the prediction accuracy of the three degrees of leaves. Therefore, its horizontal coordinates indicate different target detection algorithms, and its vertical coordinates indicate the average prediction accuracy and the total average prediction accuracy (mAP) of different kinds of plant leaves.
Compared with DBA_SSD, YOLOv4 has lower prediction accuracy for Strawberry and Chili, YOLOv4 tiny has weaker prediction ability for Tomato, and YOLOv3 has lower prediction accuracy for Strawberry. This is the learning difference caused by different algorithms of feature extraction networks focusing on different information of the learned images, and DBA_SSD solves this deficiency by covering all levels of semantic information. The rightmost column indicates the average detection accuracy of the DBA_SSD algorithm in different categories, with the highest classification accuracy of 100% and the lowest of 82.24%.  The disease degree of each plant leaf in this article can be divided into three categories: healthy, normal and severe (Table 3). Figure 14 then averages the detection accuracy of the same leaves on the basis of Table 3. The prediction accuracy of this category is the average of the sum of the prediction accuracy of the three degrees of leaves. Therefore, its horizontal coordinates indicate different target detection algorithms, and its vertical coordinates indicate the average prediction accuracy and the total average prediction accuracy (mAP) of different kinds of plant leaves.  Figure 15 shows that YOLOv4 corresponds to the largest rectangular box area, and its upper quartile edge is close to 100%, indicating the existence of a certain number of prediction accuracies higher than 95%. However, its predicted category accuracy is more discrete. YOLOv3 has a smaller rectangular area, but its distance at the top of the rectangle is not as far as DBA_SSD, indicating that the number of its higher accuracy is not as high as DBA_SSD. Although the upper quartile line of SSD is in contact with the 100% line, its rectangle area is larger, indicating that the prediction accuracy varies widely and is unstable. The rectangle box area of DBA_SSD is the smallest among other algorithms, indicating that the prediction accuracy is more concentrated and is closer to the 100% line, suggesting that a large part of the prediction accuracy is high and the prediction of each kind is more stable. The experiment shows that the DBA_SSD model has a high accuracy rate for the recognition of fruit and vegetable leaves, and the SSD is a one-stage target recognition algorithm with the advantage of fast recognition speed. The comprehensive performance of DBA_SSD has been improved compared with the previous SSD, and the performance is also higher compared with other target detection algorithms. The detection effect is shown in Figure 16. Compared with DBA_SSD, YOLOv4 has lower prediction accuracy for Strawberry and Chili, YOLOv4 tiny has weaker prediction ability for Tomato, and YOLOv3 has lower prediction accuracy for Strawberry. This is the learning difference caused by different algorithms of feature extraction networks focusing on different information of the learned images, and DBA_SSD solves this deficiency by covering all levels of semantic information. The rightmost column indicates the average detection accuracy of the DBA_SSD algorithm in different categories, with the highest classification accuracy of 100% and the lowest of 82.24%. Figure 15 shows that YOLOv4 corresponds to the largest rectangular box area, and its upper quartile edge is close to 100%, indicating the existence of a certain number of prediction accuracies higher than 95%. However, its predicted category accuracy is more discrete. YOLOv3 has a smaller rectangular area, but its distance at the top of the rectangle is not as far as DBA_SSD, indicating that the number of its higher accuracy is not as high as DBA_SSD. Although the upper quartile line of SSD is in contact with the 100% line, its rectangle area is larger, indicating that the prediction accuracy varies widely and is unstable. The rectangle box area of DBA_SSD is the smallest among other algorithms, indicating that the prediction accuracy is more concentrated and is closer to the 100% line, suggesting that a large part of the prediction accuracy is high and the prediction of each kind is more stable. The experiment shows that the DBA_SSD model has a high accuracy rate for the recognition of fruit and vegetable leaves, and the SSD is a one-stage target recognition algorithm with the advantage of fast recognition speed. The comprehensive performance of DBA_SSD has been improved compared with the previous SSD, and the performance is also higher compared with other target detection algorithms. The detection effect is shown in Figure 16.

Discussion
In the above experiments, we not only compare the performance of different improved algorithms, but also compare the performance of DBA_SSD with other classical target detection algorithms. The following is the performance comparison of each algorithm: Table 4 shows the FPS, the number of parameters, and computational complexity for different algorithms based on the same image input. We can see that DBA_SSD has lower number of parameters than other classical target detection algorithms except YOLOv4tiny method, but a little bit more parameters than SSD, SE_SSD and DB_SSD. It is worth mentioning that the fps of DBA_SSD is not reduced too much. The algorithm can be applied to students' academic research, scientific algorithm research, but it is still far from agricultural applications. The real-time performance of the algorithm still needs to be improved. Another shortcoming is that the algorithm has a high accuracy only for the currently trained species. If the plants that need to be predicted are not mentioned in this paper, they need to be retrained. But on the other hand, the algorithm is more effective if it is applied to the disease identification of the same plant only. At the same time, considering that individual differences occur in the same plant growing in different environments, we add pictures of individual differences of the same plant in the data enhancement process, so that the individual differences will not affect the final detection results and make the algorithm proposed in this paper generalize better. The algorithm proposed in this paper is able to detect plant diseases early in their development and take timely control measures, which helps to reduce production costs. At the commercial scale, it is clear that capital investment in the adopted method is initially required [40]. However, broad-scale commercial applications can provide high returns through significant improvements in process improvements and cost reductions. This is the significance of the algorithm presented in this paper.

Discussion
In the above experiments, we not only compare the performance of different improved algorithms, but also compare the performance of DBA_SSD with other classical target detection algorithms. The following is the performance comparison of each algorithm: Table 4 shows the FPS, the number of parameters, and computational complexity for different algorithms based on the same image input. We can see that DBA_SSD has lower number of parameters than other classical target detection algorithms except YOLOv4tiny method, but a little bit more parameters than SSD, SE_SSD and DB_SSD. It is worth mentioning that the fps of DBA_SSD is not reduced too much. The algorithm can be applied to students' academic research, scientific algorithm research, but it is still far from agricultural applications. The real-time performance of the algorithm still needs to be improved. Another shortcoming is that the algorithm has a high accuracy only for the currently trained species. If the plants that need to be predicted are not mentioned in this paper, they need to be retrained. But on the other hand, the algorithm is more effective if it is applied to the disease identification of the same plant only. At the same time, considering that individual differences occur in the same plant growing in different environments, we add pictures of individual differences of the same plant in the data enhancement process, so that the individual differences will not affect the final detection results and make the algorithm proposed in this paper generalize better. The algorithm proposed in this paper is able to detect plant diseases early in their development and take timely control measures, which helps to reduce production costs. At the commercial scale, it is clear that capital investment in the adopted method is initially required [40]. However, broadscale commercial applications can provide high returns through significant improvements in process improvements and cost reductions. This is the significance of the algorithm presented in this paper.

Conclusions
In this paper, we discuss work related to plant disease detection and enhance the number and variety of datasets by performing spatial transformations as well as pixel processing based on the original dataset. To address the problem of low recognition rate and low accuracy of SSD model, we propose a DBA_SSD network model for plant leaf detection by incorporating 1 × 1 convolution, residual network and attention mechanism in the SSD algorithm. In our experiments we compare several classical target detection algorithms and verify the efficacy of DBA_SSD algorithm in plant disease detection. The experiments show that the DBA_SSD algorithm improves the accuracy to 92.20% and has high robustness and speed. The significance of this algorithm is to be able to detect the disease at the early stage of plant disease in time, so as to prevent the disease and reduce the economic loss in time. This is of great significance for disease control. The shortcoming of the algorithm in this paper is that the algorithm is still too far from being applied in real production, so future work will focus on optimizing the algorithm and implanting it easily into embedded devices so that it can be applied to the real-time monitoring of agricultural plant diseases.

Data Availability Statement:
The data used to support this study's findings are available from the corresponding author upon request.

Conflicts of Interest:
The authors declare no conflict of interest.