Rice Disease Identiﬁcation Method Based on Attention Mechanism and Deep Dense Network

: It is of great practical signiﬁcance to quickly, accurately, and effectively identify the effects of rice diseases on rice yield. This paper proposes a rice disease identiﬁcation method based on an improved DenseNet network (DenseNet). This method uses DenseNet as the benchmark model and uses the channel attention mechanism squeeze-and-excitation to strengthen the favorable features, while suppressing the unfavorable features. Then, depth wise separable convolutions are introduced to replace some standard convolutions in the dense network to improve the parameter utilization and training speed. Using the AdaBound algorithm, combined with the adaptive optimization method, the parameter adjustment time reduces. In the experiments on ﬁve kinds of rice disease datasets, the average classiﬁcation accuracy of the method in this paper is 99.4%, which is 13.8 percentage points higher than the original model. At the same time, it is compared with other existing recognition methods, such as ResNet, VGG, and Vision Transformer. The recognition accuracy of this method is higher, realizes the effective classiﬁcation of rice disease images, and provides a new method for the development of crop disease identiﬁcation technology and smart agriculture.


Introduction
As the main food crop in the world, rice has the characteristics of a wide planting area and is difficult to find diseases in it in the early stages. In 2020, China's rice planting area will resume to grow to 30.076 million hectares, an increase of 382,000 hectares over 2019. With the continuous expansion of the rice planting scale, the problem of rice diseases will become more obvious [1,2]. Rice disease is an important reason affecting the rice yield. At the present, the traditional manual identification technology of rice disease mainly depends on human observation in the field, which requires a lot of time and manpower, is difficult to carry out in a large scale, and is limited by the observer's own experience, with low accuracy. The timely detection of rice diseases and the accurate and rapid identification of disease types are of great significance to ensuring the safety of rice plants and controlling the spread of diseases [3]. Therefore, the accurate, rapid, and efficient identification of disease types is the key to taking effective measures and achieving accurate spraying, which is of great significance to ensuring China's grain yield and food security. The rapid development of deep learning [4] has led to its application in many fields; for example, image segmentation [5][6][7], medical image processing [8][9][10], face recognition [11][12][13][14], and autonomous driving [15,16].
Meanwhile, more and more researchers apply the combination of image recognition and deep learning to crop disease recognition. Ma et al. [17] used a deep convolutional network to identify the disease categories of cucumbers. First, the disease images were segmented to construct a disease image dataset, and then they used AlexNet and DCNN for identification, with an accuracy rate of 93.4%. Almasoud et al. [18] proposed a rice disease fusion model based on efficient deep learning. The model is mainly based on median filtering and k-means to locate the disease spot features and uses the gray-level co-occurrence matrix and Inception to derive the features. Finally, FSVM is used for classification, and the classification accuracy reaches 96.17%. Chen et al. [19] proposed the algorithm of optimizing back propagation neural networks to identify and classify three common diseases of rice. The recognition accuracies of the three diseases are 98.5%, 96%, and 92.5%, respectively. Chen et al. [20] adopted the transfer learning method, using MobileNetV2 pre-trained on ImageNet as the backbone network, and added an attention mechanism to enhance the learning ability of the disease spot features, with the average recognition accuracy of rice diseases reaching 98.48%. Qiu et al. [21] used the deep convolution network to establish the rice disease recognition model, trained with the keras deep learning framework, and set different convolution kernel sizes and pooling functions to study the classification and recognition of three rice diseases, with an accuracy of more than 90%. Krishnamoorthy et al. [22] proposed a transfer learning based InceptionResNetV2 model, which integrates features in the form of weights and fine-tunes hyperparameters to identify three rice diseases, achieving a recognition accuracy of 95.67%. Rahman et al. [23] fine-tuned the VGG16 and Inception V3, and detected rice diseases, with an accuracy of 93.3%. These studies show that deep learning combined with image processing can be used for crop disease detection and achieve good results.
Wang Chunshan et al. [24] introduced a multi-scale feature extraction module on the basis of resnet18, established a multi-scale residual network, changed the connection method of the residual layer, and performed grouping convolution operations. The accuracy rate of self-collecting real environmental disease image data is 93.5%. Wu et al. [25] uses the Bayesian algorithm to reduce the difficulty of training and add a residual module to the basic neural network, which can effectively identify tomato diseases. Waheed et al. [26] proposed an optimized DenseNet network with reduced parameters to identify maize leaf diseases. Zhou et al. [27] combined the residual network and dense network and formed a hybrid network architecture for tomato disease identification by adjusting the hyperparameters. Cheng et al. [28] used deep residual network to identify crop pest categories in the complex background. Under the background of complex farmland, the classification accuracy of ten types of crop diseases and insect pests' images reached 98.67%.
The above literature shows that the convolutional neural network is suitable for crop disease identification, but the network model used at this stage ignores the over-fitting problem caused by the disappearance of the gradient to improve the identification accuracy. Certainly, a network model that alleviates the vanishing gradient problem, such as DenseNet [29] and ResNet [30], has been widely used in the field of crop disease identification. Although the problem of vanishing gradient is alleviated to a certain extent [31], it also brings problems such as low feature utilization and redundant features. Therefore, this paper takes the DenseNet network as the backbone and introduces the extrusion and excitation module (SE) with the advantages of feature weight adaptation. Starting from the feature channel, the feature is merged to solve the problem of low feature utilization and feature redundancy. In addition, using the AdaBound [32] algorithm and depth wise separable convolution, a method for the identification and classification of rice leaf disease categories is constructed.
In summary, the main contributions of our work are as follows: 1. An improved DenseNet network-based rice disease identification method is proposed for identifying multiple rice diseases.
2. Depending on the location of the attention module embedded in the DenseNet network, we propose three variants of the SE-DenseNet network.
3. We introduced the AdaBound optimization algorithm to construct the AB-SE-DenseNet network.
The rest of the paper is organized as follows: Section 2, our proposed method and its variants are introduced, followed by the dataset used and the data pre-processing methods; Electronics 2023, 12, 508 3 of 14 Section 3, includes the results and a discussion of ablative and comparative experiments; and Section 4, is the conclusion of the paper and future research directions.

Materials and Methods
The DenseNet network is a dense convolutional neural network proposed in 2017. The main structure of the DenseNet network is the internal dense block (Dense Block). The inner dense module consists of batch normalization (BN), ReLU activation function, and 3 × 3 and 1 × 1 convolutional layers (Conv), as shown in Figure 1. Each neuron is not only connected to its previous neuron, it also establishes connections with the back neurons; for a dense block containing L neurons, there are L × (L + 1)/2 connections, and the output of the first layer is where H l represents the convolution operation of layer l, and x l represents the output of layer l.
The rest of the paper is organized as follows: Section 2, our proposed method an variants are introduced, followed by the dataset used and the data pre-processing m ods; Section 3, includes the results and a discussion of ablative and comparative ex ments; and Section 4, is the conclusion of the paper and future research directions.

DenseNet
The DenseNet network is a dense convolutional neural network proposed in 2 The main structure of the DenseNet network is the internal dense block (Dense Bl The inner dense module consists of batch normalization (BN), ReLU activation func and 3 × 3 and 1 × 1 convolutional layers (Conv), as shown in Figure 1. Each neuron i only connected to its previous neuron, it also establishes connections with the back rons; for a dense block containing L neurons, there are L×(L + 1)/2 connections, and output of the first layer is  Generally, to avoid complex calculations, a bottleneck module, that is, a 1 × 1 co lutional layer, is added to the dense block to reduce the number of features. Adja dense blocks are connected through transition layers to reduce the overall param amount of the network and improve computational efficiency. The transition layer sists of convolutional layers and pooling layers. The transition layer is composed of a volutional layer and a 2 × 2 AvgPooling layer. The DenseNet network structure is sh in Figure 2.
Generally, to avoid complex calculations, a bottleneck module, that is, a 1 × 1 convolutional layer, is added to the dense block to reduce the number of features. Adjacent dense blocks are connected through transition layers to reduce the overall parameter amount of the network and improve computational efficiency. The transition layer consists of convolutional layers and pooling layers. The transition layer is composed of a convolutional layer and a 2 × 2 AvgPooling layer. The DenseNet network structure is shown in Figure 2.

SE Module
Using the DenseNet network for rice disease identification, although the gradient disappearance problem is alleviated to a certain extent, it can be seen from Figure 1 that the connection between any two layers is an equal output fusion, and there is no setting

SE Module
Using the DenseNet network for rice disease identification, although the gradient disappearance problem is alleviated to a certain extent, it can be seen from Figure 1 that the connection between any two layers is an equal output fusion, and there is no setting of the connection weight. For the input of each layer, the output of the upper layer of the main road should be used as an important processing object of the input of the lower layer. The combined output of the previous layers should reduce the proportion accordingly, but the features extracted from some earlier layers may still be used directly by the deeper layers, and there are transition layers that will output a large number of redundant features, resulting in low utilization of the previous output features by subsequent dense blocks [33].
This paper proposes an unequal dense convolutional neural network with adaptive weights. The SE [34] module is added to the DenseNet network structure. The weight adaptive method is adopted, and the weight of each channel is allocated by using the interdependence of the feature channels, to enable the neural network to learn important feature information and reduce the impact of feature redundancy.
The SE module mainly includes the basic structures of global average pooling, two activation functions, and fully connected layers. Mainly divided into squeeze-and-excitation operations. The compression operation uses special pooling layers to compress the features map of size C × H × W (W: the width of feature map, H: the height of feature map, and C: the number of feature channels) into C × 1 × 1 features, thereby reducing the amount of parameters. In addition, it will not change the total channel dimension. If the size of the input feature map is C × H × W, then there is input set U = [u 1 , u 2 , . . . , u c ], the mapping relationship of the compression operation is where c ∈ C, Zc represents the global information of c feature maps, and Fsq represents the squeeze operation. The excitation operation is composed of a full connection layer and a sigmoid activation function. The full connection layer integrates all input characteristic information, and the sigmoid function maps the input to the interval [0, 1]. The mapping relationship of the incentive operation is where σ is the sigmoid activation function, δ is the ReLU activation function, Fex is the excitation operation, and W 1 and W 2 are weight parameters of the full connection layer. Finally, the method performs the scale operation, multiplies the weight of the channel feature learned by the SE module by the weight of the input channel, and fuses the weight with the original feature to obtain the fused feature. This produces the fused input features of rice pests and diseases as the input of the network, reducing feature redundancy and improving the network performance as a result. The structure is shown in Figure 3. where σ is the sigmoid activation function, δ is the ReLU activation function, Fex is the excitation operation, and W1 and W2 are weight parameters of the full connection layer. Finally, the method performs the scale operation, multiplies the weight of the channel feature learned by the SE module by the weight of the input channel, and fuses the weight with the original feature to obtain the fused feature. This produces the fused input features of rice pests and diseases as the input of the network, reducing feature redundancy and improving the network performance as a result. The structure is shown in Figure 3.

AB-SE-DenseNet
We use the DenseNet framework of Section 2.1 and the SE module of Section 2.2 to establish the SE-DenseNet network, as shown in Table 1. The network uses the SE module to perform adaptive learning of the features and uses global information to selectively enhance favorable features and to suppress unfavorable features, adjusting the adaptation of feature channels, reducing the impact of feature redundancy caused by DenseNet, and improving the network performance. The depth wise separation convolution replaces part of the standard convolution in the dense network. The input channel is convolved with the corresponding channel convolution kernel through the depth wise convolution of the depth wise separable convolution, and, then, all feature maps are integrated through the point convolution operation process to reduce the number of network parameters. The frame structure parameters of the SE-DenseNet model are shown in Table 1. The SE module is embedded in different locations in DenseNet. This paper proposes three different network models: SE-DenseNet-1, SE-DenseNet-2, and SE-DenseNet-3, as shown in Figure 4. Among them, SE-DenseNet-1 embeds the SE module in the adjacent transition layer and the dense block in the DenseNet model, SE-DenseNet-2 embeds the SE module in the dense block of the DenseNet model, and SE-DenseNet-3 embeds the SE module in the transition layer and in the dense block of DenseNet simultaneously.
To accelerate the training process with better generalization ability and to improve the learning rate and training results of the model, we use the AdaBound algorithm to dynamically tailor the learning rate and create the AB-SE-DenseNet network model, the structure is shown in Figure 5. AdaBound is used to speed up the model fitting. The AdaBound optimizer combines the advantages of the Stochastic Gradient Descent (SGD) and Adam optimizers to take advantage of the dynamic bound on the learning rate, which varies with the descent gradient, and becomes tighter and tighter over time, limiting the learning rate reduction to a minimal value, so the model will also become more and more stable during the training process. The transition from the adaptive optimizer to the SGD optimizer is also realized during the training of the model, which not only improves the convergence speed of the model, but also avoids the problem that the model is easy to fall into a local minimum.  To accelerate the training process with better generalization ability and to improve the learning rate and training results of the model, we use the AdaBound algorithm to dynamically tailor the learning rate and create the AB-SE-DenseNet network model, the structure is shown in Figure 5. AdaBound is used to speed up the model fitting. The AdaBound optimizer combines the advantages of the Stochastic Gradient Descent (SGD) and Adam optimizers to take advantage of the dynamic bound on the learning rate, which varies with the descent gradient, and becomes tighter and tighter over time, limiting the learning rate reduction to a minimal value, so the model will also become more and more stable during the training process. The transition from the adaptive optimizer to the SGD optimizer is also realized during the training of the model, which not only improves the convergence speed of the model, but also avoids the problem that the model is easy to fall into a local minimum.

Experimental Materials and Experimental Environment
To evaluate our method's performance, we used the experimental environment configuration shown in Table 2.

Experimental Materials and Evaluation Indicators 2.2.1. Experimental Materials and Experimental Environment
To evaluate our method's performance, we used the experimental environment configuration shown in Table 2. The data used in the experiment comes from public data and contains images of five diseases: rice blast, blight, brownspot, sheath blight, and tungro, as shown in Figure 6. The images were divided as follows: 298 rice blast, 283 blight, 292 brownspot, 236 leaf sheath blight, and 244 tungro. We expanded the dataset by changing the images' brightness to simulate sunny and cloudy days in the natural environment, horizontal inversion, viewing angle, and colors to simulate occlusions to obtain a total of 4235 images. We divided the dataset in the ratio of 6:2:2 and preprocessed the experimental dataset images to a size of 224 × 224 pixels. Figure 7 shows an example of the data augmentation. We used crossentropy loss as the loss function, along with an initial learning rate of 0.001, 50 iterations, and a batch size of 32.

Evaluation Criteria.
A confusion matrix is an evaluation metric used to evaluate model performance in deep learning [Error! Reference source not found.] that compares the accuracy rate (Acc),

Evaluation Criteria.
A confusion matrix is an evaluation metric used to evaluate model performance in deep learning [Error! Reference source not found.] that compares the accuracy rate (Acc), precision rate (Pre), recall rate (Rec), and F1 value (F1) of the model. The accuracy rate is

Evaluation Criteria
A confusion matrix is an evaluation metric used to evaluate model performance in deep learning [35] that compares the accuracy rate (Acc), precision rate (Pre), recall rate (Rec), and F1 value (F1) of the model. The accuracy rate is the ratio of the correctly classified samples to all samples. The precision rate is the ratio of the number of actual correct samples to the total number of samples classified as correct. The recall rate is the ratio of the number of samples classified as correct to the total number of samples that are actually correct. The F1 value is the weighted average of precision and recall. The calculation of all metrics is given in formulas (4)- (7). In these formulas, TP is the number of positive samples predicted correctly; FP is the number of false samples incorrectly predicted as positive; TN is the number of false samples predicted correctly; and FN is the number of positive samples predicted incorrectly as false.

Ablation Experiment
We also performed ablation tests

Ablation Experiment
We also performed ablation tests using the variation curve of the relationship between the training accuracy and the number of iterations of the models AB-SE-DenseNet, SE-DenseNet, and the original model DenseNet, along with the model loss and the number of iterations as shown in Figure 8. The comparison curves show that the training accuracy of the three models increased, while the prediction error decreased. Stochastic Gradient Descent (SGD) and Adam optimizer DenseNet converged the slowest, requiring about 45 epochs to converge. SE-DenseNet tended to converge in 30 epochs, with a final training accuracy rate of 92.62%. The improved AB-SE-DenseNet model tended to converge in 20 epochs, with the final training accuracy rate reaching 99.93%.  Table 3 shows that the accuracy of the three SE-DenseNet models was higher than that of the DenseNet model, with an accuracy increase of between 6.54% and 10%. The SE-DenseNet's other evaluation indicators were also better than those of the DenseNet model, indicating that the combination of the SE module and the DenseNet was helpful for the model. As far as the three models are concerned, the SE-DenseNet-3 had the highest accuracy, reaching 95.59%, better than the accuracy of both the SE-DenseNet-1 and the SE-DenseNet-2. This shows that embedding the SE modules in transition layers and dense blocks calibrated the feature channels and recalibrated the original features successfully, boosting the useful features and suppressing the less useful features. The classification performance of the three models of AB-SE-DenseNet was also better than that of the cor-  Table 3 shows that the accuracy of the three SE-DenseNet models was higher than that of the DenseNet model, with an accuracy increase of between 6.54% and 10%. The SE-DenseNet's other evaluation indicators were also better than those of the DenseNet model, indicating that the combination of the SE module and the DenseNet was helpful for the model. As far as the three models are concerned, the SE-DenseNet-3 had the highest accuracy, reaching 95.59%, better than the accuracy of both the SE-DenseNet-1 and the SE-DenseNet-2. This shows that embedding the SE modules in transition layers and dense blocks calibrated the feature channels and recalibrated the original features successfully, boosting the useful features and suppressing the less useful features. The classification performance of the three models of AB-SE-DenseNet was also better than that of the corresponding SE-DenseNet model, confirming that AdaBound dynamically tailored the learning rate successfully, improved the training results of the model, and improved the classification accuracy as a result. The AB-DenseNet-3 had the highest evaluation index, with an accuracy rate of 99.4% and an F1 value of 0.9942. To understand the performance of the model, we compared the AB-SE-DenseNet and SE-DenseNet models and measured the performance of the model through the confusion matrix. The classification and recognition confusion matrix of each model is shown in Figure 9. Figure 9 shows that AB-SE-DenseNet-3 had the fewest false identifications and the highest accuracy of identifying the five rice diseases.  To understand the performance of the model, we compared the AB-SE-DenseNet and SE-DenseNet models and measured the performance of the model through the confusion matrix. The classification and recognition confusion matrix of each model is shown in Figure 9. Figure 9 shows that AB-SE-DenseNet-3 had the fewest false identifications and the highest accuracy of identifying the five rice diseases.

Comparison of Different Models
To further verify the performance of our proposed model, we selected the AB-SE-DenseNet-3, ResNet50, DenseNet, VGG, and Vision Transformer models for comparison experiments. The confusion matrix comparison of different methods is shown in Figure  10.

Comparison of Different Models
To further verify the performance of our proposed model, we selected the AB-SE-DenseNet-3, ResNet50, DenseNet, VGG, and Vision Transformer models for comparison experiments. The confusion matrix comparison of different methods is shown in Figure 10.

Comparison of Different Models
To further verify the performance of our proposed model, we selected the AB-SE-DenseNet-3, ResNet50, DenseNet, VGG, and Vision Transformer models for comparison experiments. The confusion matrix comparison of different methods is shown in Figure  10. The confusion matrix shows that the model proposed here misidentified only three types of diseases, mostly misidentifying rice blast as blight and brownspot. The rice blast lesions are similar to the blight and brownspot lesions, making it difficult to distinguish. Compared with the 14, 10, 17, and 10 misidentifications by the ResNet, DenseNet, VGG, and Vision Transformer models, the recognition error rate of our method was low, and our model correctly classified more of the diseases than other models. Tables 4 and 5 show the comparison of the classification accuracy of each disease class of different models. The classification accuracy of our proposed method was not the best at classifying blast. The Vision Transformer had a blast classification accuracy of 98.8%, which was 1.7 percentage points higher than that of our method. This occurred The confusion matrix shows that the model proposed here misidentified only three types of diseases, mostly misidentifying rice blast as blight and brownspot. The rice blast lesions are similar to the blight and brownspot lesions, making it difficult to distinguish.
Compared with the 14, 10, 17, and 10 misidentifications by the ResNet, DenseNet, VGG, and Vision Transformer models, the recognition error rate of our method was low, and our model correctly classified more of the diseases than other models. Tables 4 and 5 show the comparison of the classification accuracy of each disease class of different models. The classification accuracy of our proposed method was not the best at classifying blast. The Vision Transformer had a blast classification accuracy of 98.8%, which was 1.7 percentage points higher than that of our method. This occurred because the blast lesions are similar to those of other disease types, and our method was less suited to learning the extracted features from blast images. On the remaining four diseases, our method had the best classification and recognition rates at 100%, 99.4%, 100%, and 100%. Overall, the average recognition rate of our method was more accurate and stable. To compare the prediction time of the model, we used the trained model parameters to time the test of a single image. The results are shown in Table 6. The recognition time of our method was 3.71 s, which is faster than that of ResNet, VGG, and Vision Transformer, and slightly slower than that of DenseNet. However, the recognition accuracy of our method was higher, which shows that our method was more suitable for rice disease recognition. To better represent the effectiveness of the attention mechanism we used, we compared it with the ECA attention module and the CBAM attention module on the same data set and in the same experimental setting, and the results of the comparison are shown in Table 7. From Table 7, we can see that our proposed model outperforms the DenseNet model of the ECA attention module in terms of accuracy and recognition time. Compared with the DenseNet model of the CBAM attention module, the accuracy is slightly lower by 0.4%, but the recognition time is more than 20 s slower than our proposed method, so that our proposed method has better recognition speed. In addition, we compared our proposed model with those identified in the literature in the field of rice diseases, as shown in Table 8. The accuracy rate of our model for rice disease identification reached 99.4%, which was higher than those reported in the literature, which further shows that the AB-SE-DenseNet model had better identification accuracy. Table 8. Comparison with models proposed in the related literature.

Authors Models Accuracy
Liu [36] Attention-enhanced DenseNe 96% Yang [37] Improved ResNet 97.4% Wang [38] ADSNNBO 94.65% Krishnamoorthy [22] InceptionResNetV2 95.67% ours AB-SE-DenseNet 99.4% Figure 11 shows a visualization of the class activation heatmap of the sample input images to more intuitively illustrate the classification prediction effect of our proposed model. Compared with other models, our proposed model learned more comprehensive rice disease features and a variety of feature information, with better recognition and classification results.

Conclusions
In this study, we introduced a DenseNet-based channel attention mechanism for rice disease recognition and classification. The average recognition accuracy of the improved model for rice diseases is 99.4%. Compared with the DenseNet, ResNet, VGG, and Vision Transformer models, our proposed method has the highest average accuracy and has a faster recognition speed of 3.71 s. In addition, the recognition accuracy for each disease can reach more than 97%, and some can even reach 100%, indicating that our proposed method is effective for rice disease identification. In future work, we will focus on the severity of the disease and use a deep learning approach to evaluate the severity of rice diseases and deploy it on the mobile so that it can identify diseases and their severity more quickly.