RiceDRA-Net: Precise Identiﬁcation of Rice Leaf Diseases with Complex Backgrounds Using a Res-Attention Mechanism

: In this study, computer vision applicable to traditional agriculture was used to achieve accurate identiﬁcation of rice leaf diseases with complex backgrounds. The researchers developed the RiceDRA-Net deep residual network model and used it to identify four different rice leaf diseases. The rice leaf disease test set with a complex background was named the CBG-Dataset, and a new single background rice leaf disease test set was constructed, the SBG-Dataset, based on the original dataset. The Res-Attention module used 3 × 3 convolutional kernels and denser connections compared with other attention mechanisms to reduce information loss. The experimental results showed that RiceDRA-Net achieved a recognition accuracy of 99.71% for the SBG-Dataset test set and possessed a recognition accuracy of 97.86% on the CBG-Dataset test set. In comparison with other classical models used in the experiments, the test accuracy of RiceDRA-Net on the CBG-Dataset decreased by only 1.85% compared with that on the SBG-Dataset. This fully illustrated that RiceDRA-Net is able to accurately recognize rice leaf diseases with complex backgrounds. RiceDRA-Net was very effective in some categories and was even capable of reaching 100% precision, indicating that the proposed model is accurate and efﬁcient in identifying rice ﬁeld diseases. The evaluation results also showed that RiceDRA-Net had a good recall ability, F 1 score, and confusion matrix in both cases, demonstrating its strong robustness and stability.


Introduction
Rice is one of the major cereal crops cultivated in China and plays a significant role in its agricultural industry [1].However, rice diseases can have a significant impact on grain production.Traditional methods of identifying these diseases require trained personnel to observe them, which is time-consuming and laborious.Often, by the time the disease is detected, it has progressed to a severe level, resulting in a loss of yield, time, and money [2].Therefore, in this paper, we proposed a deep learning model with a deep residual architecture for identifying rice leaf diseases with a complex context in rice fields.
To address this issue, we proposed a deep learning model with a deep residual architecture that could identify rice leaf diseases with a complex context in rice fields.Our proposed model, the Rice Dense Residual Attention Net (RiceDRA-Net), has a denser residual structure and integrates a new attention mechanism, the Residual Attention Mechanism (Res-Attention), to improve disease recognition.With this model, we aimed to achieve a high rate of rice disease recognition, with the Res-Attention mechanism accurately identifying the disease locations in rice leaves.
Studies have shown that traditional machine learning methods can achieve high accuracy in plant disease identification tasks.For example, Jiang F et al. [3] conducted a study in which the average correct recognition rate of a deep-learning-and SVM-based rice disease identification model was 96.8%, representing a higher accuracy than that of a traditional back-propagation neural network model.In another study by Govardhan M et al. [4], they used random forest to diagnose tomato plant diseases and developed a disease identification system with an overall accuracy of 95%.Ramesh S et al. [5] trained diseased and healthy leaf datasets collectively under random forest to classify diseased and healthy images.Ahmed K et al. [6] used machine learning techniques to detect rice leaf diseases and achieved an accuracy of more than 97% on the test dataset.Sethy P K et al. [7] used a support vector machine approach for rice disease identification and obtained a good result.Although these machine-learning-based models showed good results, they still face some challenges in terms of their practical applications.
Several previous studies have investigated the use of deep learning for identifying plant diseases.For instance, Mohanty S P et al. [8] proposed using deep learning for image-based plant disease detection.Edna Chebet Too et al. [9] conducted a comparative study on fine-tuning deep learning models for plant disease recognition, adapting and comparing different deep learning models for plant diseases.Sk Mahmudul Hassan et al. [10] proposed a novel deep learning model based on initial layers and residual connections, where depth-separable convolution could be used to reduce the number of parameters.Ahila Priyadharshini R et al. [11] proposed a deep convolutional neural network (CNN) for identifying maize leaf diseases, which achieved an accuracy of 97.89%.Similarly, Hassan S M et al. [12] used CNN and migration learning methods to identify plant leaf diseases and achieved better recognition results in their experiments.In Shin J et al.'s study [13], a deep learning method was used for strawberry leaf powdery mildew detection using RGB-based images, resulting in 98% classification accuracy.Zhong Y et al. [14] conducted a study on deep learning in apple leaf disease recognition and achieved a recognition rate of 96% in a database of collected apple leaf disease images.Junde Chen et al. [15] introduced transfer learning in plant disease recognition, which uses networks pre-trained on large labeled datasets, such as ImageNet, to initialize weights, rather than randomly initializing weights and training from scratch.Hareem Kibriya et al. [16] explored the identification and classification of plant diseases in leaf images using deep learning (DL) and machine learning (ML) algorithms.Atila Ü et al. [17] studied plant leaf disease classification using the EfficientNet deep learning model.Mishra A M et al. [18] achieved a good result using a deep convolutional neural network to estimate weed density in a soybean crop using smart agriculture.Kaur P et al. [19] used hybrid convolutional neural networks to recognize allopatric disease by applying feature reduction.However, most of these studies focused solely on identifying the presence of diseases without accurately locating the diseases in plants.Thus, the proposed Res-Attention mechanism can better solve this problem and improve the accuracy of disease identification.Additionally, some studies have investigated the use of attention mechanisms for object recognition in deep learning.For example, Hu et al. [20] proposed a squeeze and excite (SE) block, which uses attention mechanisms to enhance feature maps.Similarly, Bhujel A et al. [21] proposed a lightweight attention-based convolutional neural network for tomato leaf disease classification to improve the accuracy of image classification.In Zhao Y et al.'s study [22], the attention mechanism was embedded into a residual network for plant disease severity detection.Inspired by these studies, we integrated the Res-Attention mechanism into RiceDRA-Net, thus improving the model's ability to identify rice leaf diseases.
Several studies have investigated the use of deep learning for identifying rice diseases.For example, Ghosal S et al. [23] proposed using CNN and migration learning to classify rice leaf diseases.They created a small dataset of rice diseases and used migration learning to develop a deep learning model that achieved an accuracy of 92.46%.Similarly, Swathika R et al. [24] used a convolutional neural network to identify rice diseases by training using 3500 images of healthy and diseased rice leaves, achieving an accuracy of nearly 70%.Rahman C R et al. [25] proposed a two-stage small CNN architecture for rice pest identification and detection.Zhou G et al. [26] proposed a fast disease detection method for rice based on the fusion of FCM-KM and Faster R-CNN.Su N T et al. [27] used deep learning techniques and mobile device targets for rice leaf disease classification and achieved an accuracy of 81.87% for the training data and 81.25% for the validation data.Archana K S et al. [28] proposed a new method to improve the computational and classification performance of rice disease identification.Patil RR et al. [29] proposed a multimodal data fusion framework for rice disease diagnosis, Rice-Fusion.However, these studies all used a relatively simple CNN architecture and did not explore the use of attention mechanisms.Therefore, using RiceDRA-Net with the Res-Attention mechanism can be highly effective in improving the recognition of rice leaf diseases.
None of the models for recognizing rice leaf diseases discussed above have employed attention mechanisms to enhance accuracy.Furthermore, most have focused solely on identifying the presence of disease without precisely locating it in the plant.Moreover, the models proposed previously are typically trained in a single context and do not guarantee reliable recognition in more complex contexts.To address these shortcomings, we have developed a novel approach that offers three key contributions: 1.
The novel RiceDRA-Net with a denser residual structure and a Res-Attention mechanism was proposed, which effectively improves the accuracy, robustness, and disease localization capabilities of the model for identifying rice leaf diseases.

2.
A new single-background rice leaf disease dataset, SBG-Dataset, was constructed.

3.
RiceDRA-Net has better recognition capabilities in rice leaf disease identification with a complex background compared with other classical models.
The rest of the paper is organized as follows: Section 2 presents the details of our dataset and network architecture.In Section 3 we provide all the experimental results and discuss the analysis of the experimental results.Finally, we conclude our paper in Section 4.

Dataset
The dataset used in this study was sourced from a publicly available rice disease dataset available via the internet.This dataset contained four different rice diseases, which were selected because of their complex context, namely bacterial wilt, rice blast, brown spot, and rice Tungro virus disease.This dataset contained a total of 5932 rice disease images, including 4153 training samples and 1779 test samples.The distribution of the number of disease images is shown in Table 1.We integrated 1779 test images from the test samples containing rice leaf disease images with complex backgrounds into a test dataset named the Complex Background test Dataset (CBG-Dataset).The images in the CBG-Dataset are shown in Figure 1.In this study, we used image processing techniques to investigate the relationship between the recognition of rice leaf diseases using different models and the complexity of their backgrounds.We removed the complex background from the test CBG-Dataset to obtain a rice leaf disease test dataset with a single background, which we named the Single Background test Dataset (SBG-Dataset).The pictures of rice leaf diseases in SBG-Dataset are shown in Figure 2. Therefore, the number of disease pictures in the CBG-Dataset and SBG-Dataset was kept the same, having a complex background and a single background, respectively.

Image Processing
Since the images in the CBG dataset were from a public dataset, they were of varying sizes, including 256 × 256 and 300 × 300 pixels.To ensure consistency in the image sizes for input into the network, we resized all of the images to 224 × 224 pixels.Additionally, we performed data augmentation techniques such as random flipping and rotating at random angles on some samples using programming language [30] to increase the amount of data in both datasets equally.

Image Processing
Since the images in the CBG dataset were from a public dataset, they were of varying sizes, including 256 × 256 and 300 × 300 pixels.To ensure consistency in the image sizes for input into the network, we resized all of the images to 224 × 224 pixels.Additionally, we performed data augmentation techniques such as random flipping and rotating at random angles on some samples using programming language [30] to increase the amount of data in both datasets equally.

Deep Residual Network
The emergence of deep residual networks has greatly improved the characterization ability and learning ability of deep learning, and become a hot research direction in the field of image classification [31].The deep residual network is able to utilize the residual structure to maximize the information loss generated in the convolution process while performing feature extraction in the ordinary deep convolutional neural network.This greatly improves the accuracy of rice disease recognition.
The DenseNet network model was developed in 2017 by Huang G et al. [32], a deep residual model proposed at CVPR.The model uses densely connected connectivity, in which all layers can access the feature maps from their preceding layers, thus encouraging feature reuse.As a direct result, the model is more compact and less prone to overfitting.In addition, each individual layer receives direct supervision from the loss function via a shortcut path, which provides implicit deep supervision [33].

Dense Connection
In order to achieve the transmission of maximum information flow during image

Image Processing
Since the images in the CBG dataset were from a public dataset, they were of varying sizes, including 256 × 256 and 300 × 300 pixels.To ensure consistency in the image sizes for input into the network, we resized all of the images to 224 × 224 pixels.Additionally, we performed data augmentation techniques such as random flipping and rotating at random angles on some samples using programming language [30] to increase the amount of data in both datasets equally.

Deep Residual Network
The emergence of deep residual networks has greatly improved the characterization ability and learning ability of deep learning, and become a hot research direction in the field of image classification [31].The deep residual network is able to utilize the residual structure to maximize the information loss generated in the convolution process while performing feature extraction in the ordinary deep convolutional neural network.This greatly improves the accuracy of rice disease recognition.
The DenseNet network model was developed in 2017 by Huang G et al. [32], a deep residual model proposed at CVPR.The model uses densely connected connectivity, in which all layers can access the feature maps from their preceding layers, thus encouraging feature reuse.As a direct result, the model is more compact and less prone to overfitting.In addition, each individual layer receives direct supervision from the loss function via a shortcut path, which provides implicit deep supervision [33].

Dense Connection
In order to achieve the transmission of maximum information flow during image  The DenseNet network model was developed in 2017 by Huang G et al. [32], a deep residual model proposed at CVPR.The model uses densely connected connectivity, in which all layers can access the feature maps from their preceding layers, thus encouraging feature reuse.As a direct result, the model is more compact and less prone to overfitting.In addition, each individual layer receives direct supervision from the loss function via a shortcut path, which provides implicit deep supervision [33].

Dense Connection
In order to achieve the transmission of maximum information flow during image recognition, DenseNet uses a different connection pattern than that used previously, using dense connectivity.This means that each layer in DenseNet is connected to all of the previous layers.The purpose of this is to ensure that maximum information flow is achieved during the network transfer process, where each layer of the network receives additional input from the previous network and passes the feature maps it obtains to the later network.where x l represents the feature input of layer l; x 0 through x l−1 represent all layers before layer l, with layer l receiving input from all previous layers; (x 0 , x 1 , • • • , x l−1 ) represents the feature maps output from all layers before input layer l for merging; and H l represents the composite function that implements the join operation of multiple features in the equation, which includes the batch normalization (BN) layer, the ReLU activation layer, and the (3 × 3) convolutional layer.The DenseNet structure is shown in Figure 3.The network structure of DenseNet-121 is shown in Table 2.
additional input from the previous network and passes the feature maps it obtains to t later network.

𝑥 = 𝐻 ( 𝑥 , 𝑥 , ⋯ , 𝑥 )
where  represents the feature input of layer l;  through  represent all layers b fore layer l, with layer l receiving input from all previous layers; ( ,  , ⋯ ,  ) rep sents the feature maps output from all layers before input layer l for merging; and represents the composite function that implements the join operation of multiple featu in the equation, which includes the batch normalization (BN) layer, the ReLU activati layer, and the (3 × 3) convolutional layer.The DenseNet structure is shown in Figure 3.The network structure of DenseN 121 is shown in Table 2.  'a' represents the number of 3D tensors.

Transition
The transition module is introduced in DenseNet to perform down-sampling ope tions.The transition module mainly performs two operations, convolution and poolin so that the model can be compressed to reduce the number of channels and the numb of parameters to the next dense module, and an average pooling layer of (2 × 2).

Attentional Mechanisms
When we process an image, we want the convolutional neural network to pay atte tion to the areas of the image that can positively affect the predicted outcome, rather th paying attention to everything.The attention mechanism allows the convolutional neu network to adaptively adjust what to pay attention to.Attention mechanisms have be used in computer vision and natural language processing.They have been widely used sequential models with recurrent neural networks and long short-term memory (LST  'a' represents the number of 3D tensors.

Transition
The transition module is introduced in DenseNet to perform down-sampling operations.The transition module mainly performs two operations, convolution and pooling, so that the model can be compressed to reduce the number of channels and the number of parameters to the next dense module, and an average pooling layer of (2 × 2).

Attentional Mechanisms
When we process an image, we want the convolutional neural network to pay attention to the areas of the image that can positively affect the predicted outcome, rather than paying attention to everything.The attention mechanism allows the convolutional neural network to adaptively adjust what to pay attention to.Attention mechanisms have been used in computer vision and natural language processing.They have been widely used in sequential models with recurrent neural networks and long short-term memory (LSTM) [34].Among the attention mechanisms, they are generally divided into spatial attention mechanisms and channel attention mechanisms.This paper used a combination of both CBAM models.The CBAM module is a simple and effective attention module for feedforward convolutional neural networks [35], and the structure of the model is shown in Figure 4. [34].Among the attention mechanisms, they are generally divided into spatial attention mechanisms and channel attention mechanisms.This paper used a combination of both CBAM models.The CBAM module is a simple and effective attention module for feedforward convolutional neural networks [35], and the structure of the model is shown in Figure 4. Firstly, an intermediate feature map of the process is provided; then, different attention weights are assigned sequentially along two different dimensions of the spatial attention module and the channel attention module.Then, the original feature map is multiplied by the inferred attention weights to obtain the adaptive adjustment.The two submodules of the CBAM module, the spatial attention module and the channel attention module are shown in Figure 5.

The Channel Attention Mechanism Module
The channel attention mechanism first takes the input feature map  ∈ ℝ × × and passes it through global maximum pooling and global average pooling to obtain two ( × 1 × 1) feature maps  ∈ ℝ × × .Then, the resulting feature maps pass through a shared network consisting of a two-layer multilayer perceptron (MLP) with the number of neurons in the first hidden activation layer /, where  is the shrinkage rate, and the number of neurons in the second layer is .Afterwards, the output features of the shared network are summed element-wise, and the final complete feature vector is output by the channel attention mechanism  ().The formula of the channel attention mechanism is shown in Equation (1).Firstly, an intermediate feature map of the process is provided; then, different attention weights are assigned sequentially along two different dimensions of the spatial attention module and the channel attention module.Then, the original feature map is multiplied by the inferred attention weights to obtain the adaptive adjustment.The two sub-modules of the CBAM module, the spatial attention module and the channel attention module are shown in Figure 5. [34].Among the attention mechanisms, they are generally divided into spatial attention mechanisms and channel attention mechanisms.This paper used a combination of both CBAM models.The CBAM module is a simple and effective attention module for feedforward convolutional neural networks [35], and the structure of the model is shown in Figure 4. Firstly, an intermediate feature map of the process is provided; then, different attention weights are assigned sequentially along two different dimensions of the spatial attention module and the channel attention module.Then, the original feature map is multiplied by the inferred attention weights to obtain the adaptive adjustment.The two submodules of the CBAM module, the spatial attention module and the channel attention module are shown in Figure 5.

The Channel Attention Mechanism Module
The channel attention mechanism first takes the input feature map  ∈ ℝ × × and passes it through global maximum pooling and global average pooling to obtain two ( × 1 × 1) feature maps  ∈ ℝ × × .Then, the resulting feature maps pass through a shared network consisting of a two-layer multilayer perceptron (MLP) with the number of neurons in the first hidden activation layer /, where  is the shrinkage rate, and the number of neurons in the second layer is .Afterwards, the output features of the shared network are summed element-wise, and the final complete feature vector is output by the channel attention mechanism  ().The formula of the channel attention mechanism is shown in Equation (1).

The Channel Attention Mechanism Module
The channel attention mechanism first takes the input feature map F ∈ R C×H×W and passes it through global maximum pooling and global average pooling to obtain two (C × 1 × 1) feature maps M S ∈ R C×1×1 .Then, the resulting feature maps pass through a shared network consisting of a two-layer multilayer perceptron (MLP) with the number of neurons in the first hidden activation layer C/r, where r is the shrinkage rate, and the number of neurons in the second layer is C. Afterwards, the output features of the shared network are summed element-wise, and the final complete feature vector is output by the channel attention mechanism M C (F).The formula of the channel attention mechanism is shown in Equation (1).
where σ denotes the sigmoid function; , and the MLP weights W 0 and W 1 are shared for both inputs; and W 0 is after the ReLU activation function.

Spatial Attention Mechanism Module
The spatial attention mechanism first takes the feature vectors obtained from the previous channel attention mechanism module as the input feature vectors for this module.The input feature vector is first subjected to a maximum pooling operation and an average pooling operation to obtain two feature vectors F S max ∈ R 1×H×W and F S avg ∈ R 1×H×W , respectively.Then, the maximum pooled features and average pooled features are subjected to a channel splicing operation.Afterwards, the feature vectors are reduced to one dimension by a convolutional convolution operation of (7 × 7).Finally, after a sigmoid function, the feature vector is obtained M S (F) ∈ R H×W .The equation of the spatial attention mechanism is shown in Equation ( 2).
where σ denotes the sigmoid function and f 7×7 denotes the convolution operation [36] with a convolution kernel of size (7 × 7).

Rice Leaf Disease Identification Model 2.4.1. Res-Attention
In this study, we developed a new attention mechanism called the Res-Attention module based on the CBAM module.The Res-Attention module aims to reduce information loss during transmission by adding a residual structure.In the Res-Attention module, a portion of the information is retained when the feature map enters the channel attention model.Since the spatial attention model is connected after the channel attention model, the previously retained portion of information is fused with the information output from the spatial attention module.Finally, the fused feature information is output from the Res-Attention module.The structure of the Res-Attention module is shown in Figure 6.
The input feature vector is first subjected to a maximum pooling operation an pooling operation to obtain two feature vectors  ∈ ℝ × × and  ∈ spectively.Then, the maximum pooled features and average pooled features a to a channel splicing operation.Afterwards, the feature vectors are reduced to sion by a convolutional convolution operation of (7 × 7).Finally, after a sigm the feature vector is obtained  () ∈  × .The equation of the spatial atte anism is shown in Equation ( 2). ; where  denotes the sigmoid function and  × denotes the convolution op with a convolution kernel of size (7 × 7).

Res-Attention
In this study, we developed a new attention mechanism called the R module based on the CBAM module.The Res-Attention module aims to r mation loss during transmission by adding a residual structure.In the Res-Att ule, a portion of the information is retained when the feature map enters the tention model.Since the spatial attention model is connected after the chan model, the previously retained portion of information is fused with the info put from the spatial attention module.Finally, the fused feature informati from the Res-Attention module.The structure of the Res-Attention module Figure 6.

Experimental Platform
This experiment used Ubuntu 20.04.4 LTS 64 as the operating system (Canonical Ltd., London, UK) and Intel€ X€ (R) Silver 4214 as the processor, CPU@2.20GHz,32 G of RAM (Intel, Santa Clara, CA, USA).The GPU is an NVIDIA Tesla T4 with 16 G of video memory (Nvidia, Santa Clara, CA, USA).The programming language was Python and the PyTorch deep learning framework was used.

Experimental Platform
This experiment used Ubuntu 20.04.4 LTS 64 as the operating system (Canonical Ltd., London, UK) and Intel€ X€ (R) Silver 4214 as the processor, CPU@2.20GHz,32 G of RAM (Intel, Santa Clara, CA, USA).The GPU is an NVIDIA Tesla T4 with 16 G of video memory (Nvidia, Santa Clara, CA, USA).The programming language was Python and the PyTorch deep learning framework was used.

Experimental Design
In this study, we conducted several comparative experiments.Firstly, we chose the SBG-Dataset as the test dataset and conducted comparative experiments on the selection of hyperparameters.These included the selection of optimizer, learning rate, and convolutional kernel size in the network to identify the most suitable experimental hyperparameters for the improved model.Finally, we evaluated six different experimental models on two test datasets, the SBG-Dataset and CBG-Dataset, respectively.We compared RiceDRA-Net in this study with other classical models for rice leaf disease recognition and evaluated the recognition effects of different models on rice leaf diseases with complex and single backgrounds.

Evaluation Indicators
In this study, we used precision, recall, accuracy, and the F1 score as evaluation metrics.We used the values of these evaluation metrics to evaluate the model in a comprehensive manner.Precision, recall, accuracy and the F1 score were calculated as follows.
Accuray = TP + TN TP + FN + FP + TN (5) where TP is the number of true positive samples, TN is the number of true negative samples, FP is the number of false positive samples, and FN is the number of false negative samples.Accuracy is the ratio of the number of correctly predicted samples to the total number of samples used for model experiments.Precision is the ratio of the number of correctly predicted positive samples to the total number of correctly predicted samples, and recall is the ratio of the number of correctly predicted positive samples to the total number of positive samples.The F1 score is a comprehensive evaluation index that considers precision and recall, and the average of precision and recall.We chose the cross-entropy loss function as the loss function for our experiments.The cross-entropy loss function is commonly used for classification problems.It measures the difference between the model output and the true label, making it a preferred loss function for training classification models.The cross-entropy loss function is widely used in deep learning because of its good performance, ability to directly optimize classification probability, and ability to handle category imbalance.The cross-entropy loss function is shown in Equation ( 7), and we can calculate it by using the equation where M, P i,c and y i,c represent the class, predicted probabilities, and ground truth, respectively, for specific images.

Selection of Hyperparameters
The choice of optimizer and learning rate can improve the training of neural networks, making them faster and better at achieving the desired results.In this research experiment, we combined four different optimizers (SGD, Adam, Adagrad, and AdaDelta) and five different learning rates (lr).We conducted a comparison experiment to find the most suitable optimizer and learning rate for this experiment.We used five learning rates of 0.1, 0.05, 0.01, 0.005, and 0.001, which were combined with the four optimizers.We tested the experiment on the SBG-Dataset test data, setting the epoch of the experiment to 30 rounds.The comparison results of different optimizers and learning rates are shown in Tables 3 and 4. According to Tables 3 and 4, we can see that the highest accuracy was obtained by using the Adagrad optimizer with a learning rate of 0.01, which resulted in an accuracy of 99.71% and the lowest total loss value on the test set.Therefore, we can conclude that the optimal combination of optimizer and learning rate for this study's model was Adagrad with a learning rate of 0.01.
The spatial attention mechanism in the Res-Attention module is impacted by the choice of its convolutional kernel size, and for this experiment, we compared the impact of using convolutional kernels of (3 × 3) or (7 × 7) on the recognition accuracy of the model.We used the Adagrad optimizer with a learning rate of 0.01, ran the experiment for 30 epochs, and tested the model on the SBG-Dataset.
Based on Table 5, we observed that modifying the size of the convolutional kernel of the spatial attention mechanism in the Res-Attention module to (3 × 3) was more suitable for identifying rice disease infestations in this network.

Comparison with Different Classical Models
Currently, in the field of image recognition for classification research, five models are often used: AlexNet [37], Vgg [38], ResNet [39], MobileNet [40], and DenseNet-121, and DenseNet-121 is the benchmark model of RiceDRA-Net.Consequently, in order to ensure the objectivity of the experimental model, we compared the model of this study with these classical models to obtain objective results.Among these models, both Vgg and ResNet networks have a variety of models with different network sizes.In this experiment, while ensuring objectivity, it was not appropriate to use too large a network, considering the size of the dataset and the experimental hardware.Therefore, we used two models, Vgg-19 and ResNet-101, as the experimental models for comparison experiments.We compared the model in this study with five different types of classical models for rice leaf disease recognition, which set the epoch to 30.We used the Adagrad optimizer and a 0.01 learning rate.We applied the trained models to two test datasets, the SBG-Dataset and CBG-Dataset, respectively.The experimental results are shown in Tables 6 and 7, and Figure 9.As can be seen from Table 6 and Figure 9, the test results on the rice leaf disease dataset with single backgrounds showed that the accuracy of the model in this study improved by 15.34%, 6.35%, 5.06%, and 2.08% compared with AlexNet, Vgg-19, MobileNet, and ResNet-101, respectively.RiceDRA-Net had the highest recognition accuracy of 1.85% compared to its benchmark model DenseNet-121.It can be seen from Figure 9 that this model also had the highest accuracy of 99.71% compared to other models in terms of precision, recall, and the F1 score.Furthermore, it can be observed from Figure 9 that this experimental model had a faster convergence speed compared to other classical models and did not overfit.We can also see from Table 7 and Figure 9 that RiceDRA-Net also produced good experimental results in terms of the rice leaf disease test dataset with complex backgrounds, with a 97.86% recognition accuracy on the test set, which was much higher than the other five models.Furthermore, it performed very well in terms of precision, recall, and the F1 score.
Therefore, the RiceDRA-Net studied in this paper had the highest recognition accuracy compared to the other five models with both single and complex backgrounds for rice leaf disease recognition.In addition, we also added the total loss values of the six different models on the two test sets, as shown in Figure 10.We can see very intuitively that, among the six models, the RiceDRA-Net proposed in this paper had the lowest total loss value compared to the other five models in the test set.Moreover, the loss value decreased the fastest, regardless of whether it had a single background or a complex background.
We compared the recognition accuracy of the six models for two different test datasets for rice leaf diseases, as shown in Figure 11.It can be observed that the recognition accuracy of RiceDRA-Net was 99.71% for the SBG-Dataset and 97.86% for the CBG-Dataset.In addition, the rice leaf disease recognition accuracy of RiceDRA-Net with a single background decreased by only 1.85% compared with a complex background.The accuracy of AlexNet decreased by 8.49%, MobileNet decreased by 9.16%, Vgg-19 decreased by 9.16%, ResNet-101 decreased by 7.53%, and DenseNet-121 decreased by 5.34%.The exper-imental results show that complex backgrounds have the least impact on RiceDRA-Net, and its accuracy in terms of rice leaf disease recognition with complex backgrounds decreased the least compared with the remaining five comparison models.The experiments demonstrate that RiceDRA-Net has a better recognition effect for rice leaf diseases with complex backgrounds.We compared the recognition accuracy of the six models for two different test datasets for rice leaf diseases, as shown in Figure 11.It can be observed that the recognition accuracy of RiceDRA-Net was 99.71% for the SBG-Dataset and 97.86% for the CBG-Dataset.In addition, the rice leaf disease recognition accuracy of RiceDRA-Net with a single background decreased by only 1.85% compared with a complex background.The accuracy of AlexNet decreased by 8.49%, MobileNet decreased by 9.16%, Vgg-19 decreased by 9.16%, ResNet-101 decreased by 7.53%, and DenseNet-121 decreased by 5.34%.The experimental results show that complex backgrounds have the least impact on RiceDRA-Net, and its accuracy in terms of rice leaf disease recognition with complex backgrounds decreased the least compared with the remaining five comparison models.The experiments demonstrate that RiceDRA-Net has a better recognition effect for rice leaf diseases with complex backgrounds.We used six different models with two test sets, the SBG-Dataset and CBG-Dataset, to test their accuracy for four different disease categories on the rice leaf disease dataset, as shown in Figure 12.This allowed more intuitive analysis of the recognition effect of the models on different rice leaf diseases.From the figure, it can be seen that RiceDRA-Net We used six different models with two test sets, the SBG-Dataset and CBG-Dataset, to test their accuracy for four different disease categories on the rice leaf disease dataset, as shown in Figure 12.This allowed more intuitive analysis of the recognition effect of the models on different rice leaf diseases.From the figure, it can be seen that RiceDRA-Net outperformed the other five reference models in terms of its recognition of the four different rice leaf diseases, both with a single context and a complex context.As can be seen from Figure 12, RiceDRA-Net achieved 100% precision for three diseases, blast, brown spot and Tungro, with a single background.Although RiceDRA-Net had lower precision than MobileNet for bacterial blight with a complex background, RiceDRA-Net had better precision than the remaining five models for the remaining three diseases.This indicates that RiceDRA-Net had stronger robustness and stability, and high accuracy.Furthermore, the evaluation results can be seen in Table 8.RiceDRA-Net had a significantly higher recall and F1 score for four different rice leaf diseases compared to the five reference models for both test sets, which indicates that RiceDRA-Net has a good recall ability and F1 score and is less influenced by complex backgrounds.Consequently, it is suitable for rice leaf disease identification with a complex background.Finally, we produced confusion matrix plots for the six models for the two test sets and analyzes them as shown in Figures 13 and 14  Where the horizontal coordinate of the confusion matrix represented the true category , the vertical coordinate represented the predicted category, and the shades of colors in the squares represent different quantities, with darker colors representing higher quantities,we can see from Figure 13 that the precision for the three disease categories of blast, brown spot, and Tungro with a single context reached 100% on the model being studied.In addition, the precision for bacterial blight also reached 98.9%.Five experimental samples were misclassified as bacterial blight instead of blast.These two disease categories, bacterial blight and blast, are somewhat similar in disease characteristics, which may cause misclassification.As can be seen from Figure 14, RiceDRA-Net with a complex background had the highest precision for all three diseases, blast, brown spot and Tungro, and the precision for Tungro reached 100%.The experimental results showed that the recall and 1 Score of RiceDRA-Net for rice leaf disease identification with both single and complex contexts were higher than those of the remaining five comparative models.This indicates that RiceDRA-Net had the highest recall and 1 Score for each.It can be seen that RiceDRA-Net outperformed the classical models in the identification of different categories of rice leaf diseases.Overall, the RiceDRA-Net model studied in this paper had a good recognition effect and is able to meet the requirements of rice leaf disease recognition in general production.Where the horizontal coordinate of the confusion matrix represented the true category, the vertical coordinate represented the predicted category, and the shades of colors in the squares represent different quantities, with darker colors representing higher quantities, we can see from Figure 13 that the precision for the three disease categories of blast, brown spot, and Tungro with a single context reached 100% on the model being studied.In addition, the precision for bacterial blight also reached 98.9%.Five experimental samples were misclassified as bacterial blight instead of blast.These two disease categories, bacterial blight and blast, are somewhat similar in disease characteristics, which may cause misclassification.As can be seen from Figure 14, RiceDRA-Net with a complex background had the highest precision for all three diseases, blast, brown spot and Tungro, and the precision for Tungro reached 100%.The experimental results showed that the recall and F1 Score of RiceDRA-Net for rice leaf disease identification with both single and complex contexts were higher than those of the remaining five comparative models.This indicates that RiceDRA-Net had the highest recall and F1 Score for each.It can be seen that RiceDRA-Net outperformed the classical models in the identification of different categories of rice leaf diseases.Overall, the RiceDRA-Net model studied in this paper had a good recognition effect and is able to meet the requirements of rice leaf disease recognition in general production.In classification tasks, PR curves are a common performance evaluation metric that can be used to measure the accuracy and recall of a classifier.PR curves visualize the trade-off between accuracy (precision) and recall.The average PR curve is a method of combining PR curves from multiple categories into a single graph to calculate their average.The average PR curve balances the performance of each category and provides an overall assessment metric.The area under the curve (AUC) is a commonly used performance metric.Higher AUC values indicate a better performance of the classifier.We compared the average PR curves of six different models using the CBG-Dataset in our experiments, as shown in Figure 15.The experiments showed that the RiceDRA-Net model proposed in this study had the highest AUC value, indicating that RiceDRA-Net exhibited the best performance in terms of the trade-off between precision and recall.The curve was very steep, indicating that the model was able to obtain higher precision while maintaining high recall.In classification tasks, PR curves are a common performance evaluation metric that can be used to measure the accuracy and recall of a classifier.PR curves visualize the trade-off between accuracy (precision) and recall.The average PR curve is a method of combining PR curves from multiple categories into a single graph to calculate their average.The average PR curve balances the performance of each category and provides an overall assessment metric.The area under the curve (AUC) is a commonly used performance metric.Higher AUC values indicate a better performance of the classifier.We compared the average PR curves of six different models using the CBG-Dataset in our experiments, as shown in Figure 15.The experiments showed that the RiceDRA-Net model proposed in this study had the highest AUC value, indicating that RiceDRA-Net exhibited the best performance in terms of the trade-off between precision and recall.The curve was very steep, indicating that the model was able to obtain higher precision while maintaining high recall.
The computational complexity of a convolutional neural network can be determined by calculating the number of parameters and the computational volume of the model.Specifically, the number of parameters is the total number of weight parameters in all layers with parameters in the model.The computational volume is the number of floating-point of operations (FLOPs) required for forward inference.We compared the number of FLOPs and parameters for the six experimental models mentioned in the experiments, and the results are shown in Table 9.From Table 9, we can see that RiceDRA-Net's FLOPs were only higher than those of AlexNet and MobileNet, but these models constitute an early structurally simple network model and a lightweight network, respectively.Furthermore, the recognition accuracy of both AlexNet and MobileNet is far inferior to that of RiceDRA-Net.Comparing RiceDRA-Net with VGG-19 and ResNet-101, the number of FLOPs for RiceDRA-Net was much lower than for these two models.Comparing RiceDRA-Net with the benchmark model DenseNet-121, we can see that RiceDRA-Net improved the accuracy of rice leaf disease recognition without changing the number of floating-point operations.It can also be observed from Table 9 that the number of parameters in RiceDRA-Net was only higher than that of MobileNet, a lightweight model.However, the recognition accuracy of RiceDRA-Net for rice leaf diseases was much higher than that of MobileNet.It is worth mentioning that RiceDRA-Net has achieved a reduction in the number of parameters while improving the accuracy of the model compared to the benchmark model DenseNet-121.In summary, RiceDRA-Net has excellent performance compared with other models in terms of the number of model parameters and computational power.The computational complexity of a convolutional neural network can be determined by calculating the number of parameters and the computational volume of the model.Specifically, the number of parameters is the total number of weight parameters in all layers with parameters in the model.The computational volume is the number of floatingpoint of operations (FLOPs) required for forward inference.We compared the number of FLOPs and parameters for the six experimental models mentioned in the experiments, and the results are shown in Table 9.From Table 9, we can see that RiceDRA-Net's FLOPs were only higher than those of AlexNet and MobileNet, but these models constitute an early structurally simple network model and a lightweight network, respectively.Furthermore, the recognition accuracy of both AlexNet and MobileNet is far inferior to that of RiceDRA-Net.Comparing RiceDRA-Net with VGG-19 and ResNet-101, the number of FLOPs for RiceDRA-Net was much lower than for these two models.Comparing RiceDRA-Net with the benchmark model DenseNet-121, we can see that RiceDRA-Net improved the accuracy of rice leaf disease recognition without changing the number of floating-point operations.It can also be observed from Table 9 that the number of parameters in RiceDRA-Net was only higher than that of MobileNet, a lightweight model.However, the recognition accuracy of RiceDRA-Net for rice leaf diseases was much higher than that of MobileNet.It is worth mentioning that RiceDRA-Net has achieved a reduction in the number of parameters while improving the accuracy of the model compared to the

Heat Map Comparison
As the Res-Attention module is integrated in RiceDRA-Net, Grad-CAM was adopted to extract the heat map of the disease pictures after the Res-Attention module to visualize the map extracted by the convolutional neural network features.This experiment extracted the feature maps output from the last layer of the improved Res-Attention module in RiceDRA-Net to better show the parts of the picture samples that were focused on in the model, and RiceDRA-Net was tested on the CBG-Dataset.The heat maps are shown in Figure 16.compare these four models for rice disease, a heat map was compared with our proposed model.It can be seen from Figure 16 that the five different classical models used in the comparison were able to identify the exact disease location for a specific disease.However, this was not very accurate for several other diseases.For example, ResNet-101 was more accurate in locating bacterial blight, but not blast or brown spot.Furthermore, DenseNet-121 was accurate for bacterial blight and brown spot, but not for blast.However, RiceDRA-Net was very accurate at rice disease location and was able to locate several diseases in rice leaves.This model was able to locate the location of the disease in the rice leaf, which achieved a better recognition effect.This also demonstrates that adding the Res-Attention module to the dense residual network can better achieve accurate disease identification.

Conclusions
In this study, we proposed a deep learning model called RiceDRA-Net for identifying rice leaf diseases in the context of complex rice fields.We incorporated the Res-Attention module into the deep residual structure of the deep residual network to form RiceDRA- We used three disease images, bacterial blight, blast and brown spot, for our experiments.We also simultaneously extracted the last layer of convolutional layers from four classical models, AlexNet, Vgg-19, MobileNet, ResNet-101, and DenseNet-121.In order to compare these four models for rice disease, a heat map was compared with our proposed model.It can be seen from Figure 16 that the five different classical models used in the comparison were able to identify the exact disease location for a specific disease.However, this was not very accurate for several other diseases.For example, ResNet-101 was more accurate in locating bacterial blight, but not blast or brown spot.Furthermore, DenseNet-121 was accurate for bacterial blight and brown spot, but not for blast.However, RiceDRA-Net was very accurate at rice disease location and was able to locate several diseases in rice leaves.This model was able to locate the location of the disease in the rice leaf, which achieved a better recognition effect.This also demonstrates that adding the Res-Attention module to the dense residual network can better achieve accurate disease identification.

Conclusions
In this study, we proposed a deep learning model called RiceDRA-Net for identifying rice leaf diseases in the context of complex rice fields.We incorporated the Res-Attention module into the deep residual structure of the deep residual network to form RiceDRA-Net, making the residual connections more dense and more suitable for identifying rice leaf diseases in rice fields.We also constructed a test dataset, SBG-Dataset, consisting of a single background of rice leaf diseases.We also experimented with the effects of different optimizers, different learning rates, and different convolutional kernel sizes in Res-Attention in terms of the recognition effect of the model.
RiceDRA-Net extracts image features while retaining as much information as possible that may be lost during the convolution process.This greatly improves the accuracy in identifying rice leaf diseases with complex rice backgrounds.We compared RiceDRA-Net with various classical models on two datasets and found that RiceDRA-Net achieved higher accuracy in identifying rice leaf diseases with complex rice backgrounds and accurately locating the diseases in rice leaves.
In conclusion, our study presents a novel deep learning model, RiceDRA-Net, that performs well in identifying rice leaf diseases with complex rice backgrounds.The Res-Attention module incorporated into the model improves the ability of the model to identify rice leaf diseases and retain important information.The SBG-Dataset is a useful dataset to further study rice leaf diseases.Our results show that RiceDRA-Net outperforms classical models in identifying rice leaf diseases and locating diseases in rice leaves.These findings

2. 3 .
Deep Learning Networks 2.3.1.Deep Residual Network The emergence of deep residual networks has greatly improved the characterization ability and learning ability of deep learning, and become a hot research direction in the field of image classification [31].The deep residual network is able to utilize the residual structure to maximize the information loss generated in the convolution process while performing feature extraction in the ordinary deep convolutional neural network.This greatly improves the accuracy of rice disease recognition.

Figure 3 .
Figure 3. Structure of DenseNet for achieving maximum information flow during image recog tion.

Figure 3 .
Figure 3. Structure of DenseNet for achieving maximum information flow during image recognition.

Figure 4 .
Figure 4. Structure of the CBAM module for implementing both spatial and channel attention mechanisms in feedforward convolutional neural networks.

Figure 5 .
Figure 5. Structure of the spatial attention module and the channel attention module in the CBAM module for adaptive adjustment of feature maps.

Figure 4 .
Figure 4. Structure of the CBAM module for implementing both spatial and channel attention mechanisms in feedforward convolutional neural networks.

Figure 4 .
Figure 4. Structure of the CBAM module for implementing both spatial and channel attention mechanisms in feedforward convolutional neural networks.

Figure 5 .
Figure 5. Structure of the spatial attention module and the channel attention module in the CBAM module for adaptive adjustment of feature maps.

Figure 5 .
Figure 5. Structure of the spatial attention module and the channel attention module in the CBAM module for adaptive adjustment of feature maps.

Figure 6 .
Figure 6.Structure of the Res-Attention module for reducing information loss during

Figure 6 .
Figure 6.Structure of the Res-Attention module for reducing information loss during transmission.The network flow diagram of the Res-Attention module is shown in Figure 7, which gives a more intuitive view of the transmission process of the feature map in the Res-Attention module.2.4.2.RiceDRA-Net In this study, we developed and studied the RiceDRA-Net network model by adding the Res-Attention module to the DenseNet-121 network model in order to improve the accuracy of this model in rice disease identification.The RiceDRA-Net model consists of four Dense Block modules, four Res-Attention modules, and three Transition Layers, which are interconnected and superimposed.In this study, the Res-Attention module was placed after the Dense Block module, and the Transition module was connected afterwards.The first three Res-Attention modules were connected to the Transition module, and the last Res-Attention module was directly connected to the Classification Layer.Finally, we changed the output features of the Classification Layer to 4, corresponding to the 4 categories of rice diseases we identified.

Figure 7 .
Figure 7. Network flow diagram of feature map transmission process in the Res-Atte

Figure 8 .
Figure 8. Structure of RiceDRA-Net network model for rice leaf disease identification

Figure 7 .
Figure 7. Network flow diagram of feature map transmission process in the Res-Attention module.In the RiceDRA-Net network model, the feature maps from the Dense Block are directly passed to the Res-Attention module, and then passed through the Channel Attention Model and Spatial Attention Model in the Res-Attention module, respectively.The structure of the RiceDRA-Net network is shown in Figure 8.

Figure 7 .
Figure 7. Network flow diagram of feature map transmission process in the Res-Attention module.

Figure 8 .
Figure 8. Structure of RiceDRA-Net network model for rice leaf disease identification.

Figure 8 .
Figure 8. Structure of RiceDRA-Net network model for rice leaf disease identification.

Figure 9 .
Figure 9.Comparison of various classical models in a single context.

Figure 9 .
Figure 9.Comparison of various classical models in a single context.

Figure 9 .
Figure 9.Comparison of various classical models in a single context.

Figure 10 .
Figure 10.Comparison of the total loss values of the five different models on the test set in a single context.

Figure 10 .
Figure 10.Comparison of the total loss values of the five different models on the test set in a single context.

Figure 11 .
Figure 11.Comparison of model accuracy under two data sets.

Figure 11 .
Figure 11.Comparison of model accuracy under two data sets.

21 Figure 12 .
Figure 12.Comparison of accuracy rates of different models for each disease type.

Figure 12 .
Figure 12.Comparison of accuracy rates of different models for each disease type. .

Figure 15 .
Figure 15.Average PR curves of six models using CBG-Dataset.

Figure 16 .
Figure 16.Heat map of disease pictures.

Figure 16 .
Figure 16.Heat map of disease pictures.

Table 1 .
Number of rice disease datasets.

Table 3 .
Recognition accuracy (%) of RiceDRA-Net on SBG-Dataset under different optimization algorithms and learning rates.

Table 4 .
Total loss values of RiceDRA-Net on SBG-Dataset under different optimization algorithms and learning rates.

Table 5 .
Comparison of different convolution kernel sizes.

Table 6 .
Comparison of various classical models under SBG-Dataset.

Table 7 .
Comparison of various classical models under CBG-Dataset.

Table 8 .
Comparison of multiple evaluation indicators in different models on each disease type.

Table 8 .
Comparison of multiple evaluation indicators in different models on each disease type.

Table 9 .
Comparison of Computational Complexity.