1. Introduction
Rice cultivation has long occupied an important place in agricultural development. However, ill-defined agricultural production patterns and changing climatic environments have affected the yield, quality, and food security of rice cultivation. Plant phenotypes are physical, physiological, and biochemical characteristics that reflect the structural and functional features of plant cells, tissues, organs, plants, and communities [
1]. Plant diseases, as important indicators of morphological and structural characteristics of plant phenotypes, can be directly accessed by visible light cameras (including cell phones, digital cameras, etc.) [
2]. Rice leaf disease identification is an important research component in the field of plant protection, and efficient disease identification is a major challenge for breeders. Traditional identification methods rely on manual diagnosis by agronomists or disease mapping comparisons. However, due to the complexity and similarity of rice disease spots, manual identification requires extensive experience and agricultural knowledge. This method is extremely demanding for breeders, but it is difficult to provide a large number of specialized human resources in practice. Therefore, intelligent planting combining computer vision, image processing, machine learning (ML), and deep learning (DL) algorithms has attracted much attention. Compared with the artificial vision-based recognition model, this method automatically identifies rice diseases by learning their phenotypic characteristics, which helps farmers to detect, diagnose, and treat the diseases in time at the early stage of disease occurrence, and to seize the best time for prevention and treatment [
3].
Previous machine learning disease recognition research incorporating image processing techniques achieved great success, but the method still relied on manual feature extraction [
4]. The emergence of deep learning has broken this limitation, and its ability to automatically learn features and reasoning has achieved a major breakthrough in plant disease phenotype recognition applications. A convolutional neural network (CNN), as a representative deep learning technology, performs far better than other traditional recognition methods in image recognition. Lu et al. [
5] applied a CNN to rice disease recognition research for the first time, and found that CNNs have better convergence and recognition accuracy than other algorithms, such as the standard BP algorithm, Support Vector Machines (SVMs), and Particle Swarm Optimization (PSO). Jiang et al. [
6] combined a CNN and an SVM to improve the recognition performance of rice leaf diseases using a 10-fold cross-validation method. Hu et al. [
7] proposed a multi-scale two-branch structured rice pest recognition model based on a generative adversarial network and improved ResNet, which solved the problems of overfitting caused by a too-small dataset and the difficulty of extracting disease features from the background of the picture.
CNN-based rice disease recognition in simple contexts has achieved high accuracy, but there are fewer applications in natural contexts. The most important reason for that is that CNNs are too complex to be deployed on hardware devices, so the application of lightweight networks is necessary. Cai et al. [
8] designed a corn leaf disease recognition application based on improved EfficientNet with 3.44 M parameters and 339.74 M Flops, which can be quickly deployed and recognized on Android devices. Butera et al. [
9] investigated baseline crop pest detection models that can be applied to real scenarios, taking into account the detection performance and computational resources, in which FasterRCNN with MobileNetV3 as the backbone was outstanding in terms of accuracy and real-time performance. Zhou et al. [
10] proposed a lightweight ShuffleNetV2-based crop leaf disease recognition model, which achieved 96.72% accuracy on a field crop leaf disease dataset, with a smaller model size for easy mobile deployment. The above lightweight network model has been widely used to meet the application conditions in natural scenarios.
You Only Look Once (YOLO) is currently a widely used lightweight target detection network. The Ultralytics team released YOLOv5-cls [
11], an improved classification model based on YOLOv5, in 2022, comprising five versions. Compared to other traditional CNNs trained on the ImageNet dataset, YOLOv5-cls has practical applications as a lightweight model that ensures model accuracy and reduces parameters. Yang et al. [
12] applied YOLOv5-cls for the first time to a real-world classification task, comparing the different versions of the free-roaming bird behavioral classification task on the EfficientNetV2 and YOLOv5-cls, of which YOLOv5m-cls performed best with an average accuracy of 95.3%.
Traditional model training requires constant parameter tuning and a large number of uniformly distributed datasets to obtain efficient models. Currently, it is difficult to construct large-scale, uniform, and high-quality datasets in practical engineering. Transfer learning realizes cross-domain knowledge migration by adapting feature information from pre-trained models to new tasks [
13]. When resources and data are limited, this method can effectively reduce data volume requirements and computational costs, and improve the diagnostic and error-correction capabilities of the model. Hassan et al. [
14] applied transfer learning to MobileNetV2 and EfficientNet-B0 models by freezing the weights of the models before the fully connected layer and training them to achieve the recognition of multiple plant diseases. Zhao et al. [
15] used a model pre-trained on the large dataset ImageNET to initialize the network weight parameters, and the method achieved 92.00% accuracy in recognizing rice plant images. Chen et al. [
16] used the PlantVillage dataset for model pre-training, and then fine-tuned the model parameters and applied them to the training of the cotton disease image dataset, achieving an accuracy of 97.16% accuracy. Wang et al. [
17] proposed a backbone network based on improved SwinT combined with transfer learning to train a cucumber leaf disease recognition model with 98.97% accuracy. Yuan et al. [
18] proposed a framework for rice disease phenotype recognition in a natural context based on transfer learning and SENet with attention policy on a cloud platform that achieved an accuracy rate of 95.73%.
The currently studied rice disease recognition models still face many challenges when facing real field environments. Different deep learning models have different feature extraction strategies, and it is difficult to construct generalized models to achieve optimal recognition performance for various diseases in natural contexts. Therefore, researchers have introduced integrated learning into crop disease recognition [
19]. Xu et al. [
20] constructed a tomato pest and disease diagnostic model based on Stacking integrated learning, with a diagnostic accuracy of 94.84% and a reduction in inference time of 12.08%. Mathew et al. [
21] analyzed the symptoms of plant leaves using the GLCM algorithm and used simple integrated learning techniques such as voting strategies for disease identification. In addition, deep integrated learning models [
22] combine the advantages of deep learning models and integrated learning with better recognition performance and generalization ability. Palanisamy et al. [
23] used Sequence2D and DenseNet deep learning models combined with integrated learning techniques such as weighted voting strategies for maize leaf infection identification. Yang et al. [
24] proposed a Stacking-based deep integrated learning model integrating an improved CNN and using SVM for rice leaf disease recognition.
Some of the above studies introduced integrated learning based on voting strategies into crop disease recognition, which effectively improved the recognition accuracy. However, rice leaf disease recognition under natural background is susceptible to light angle, light intensity, and shading, and there are fewer models available for practical applications. The existing rice leaf disease dataset under a natural background is small and difficult to construct. The stability and generalization ability of deep CNNs on small datasets are difficult to balance.
We propose CEL-DL-Bagging, an integrated learning algorithm combining deep learning (DL) and Bootstrap aggregation (Bagging)-based aggregation, for small-sample rice leaf disease recognition in natural contexts. Four lightweight CNNs are selected as base learners, including YOLOv5-cls, EfficientNet-B0 [
25], MobileNetV3 [
26], and ShuffleNetV2 [
27], to reduce the computational cost while extracting rice leaf disease phenotypic features more comprehensively. Transfer learning was used to train the model to minimize the effect of insufficient data volume. A new CEL-based weighted voting strategy is proposed for disease identification to improve the identification accuracy. In addition, a salient position-based attention mechanism (SPA) is introduced in this paper to further improve the base learner’s ability to extract valid information in complex backgrounds. The method has the following five main contributions:
- (1)
A rice disease recognition algorithm based on salient positions attention (SPA) mechanism was designed, which reduces the processing of non-critical information.
- (2)
The DL-Bagging algorithm was proposed, and the lightweight model is integrated by using Bootstrap sampling to improve the generalization ability under small sample sizes.
- (3)
A CEL-based weighted voting strategy was proposed to improve the DL-Bagging algorithm, and the weights were dynamically adjusted according to the model performance to improve the integration accuracy.
- (4)
An improved method was proposed that uses the label smoothing technique and cosine annealing strategy in transfer learning training.
- (5)
A large number of ablation experiments were designed to validate the performance of the proposed method in rice leaf disease recognition under natural backgrounds.
The method proposed in this paper can be applied to real fields to effectively recognize rice leaf diseases in a natural context. The rest of the paper is structured as follows:
Section 2 describes the image acquisition, preprocessing, model construction, parameter settings, and evaluation metrics;
Section 3 analyzes the results of the experiments;
Section 4 discusses the advantages and limitations of the model we are currently using compared to other related studies; and
Section 5 concludes the whole paper and looks into the future.
4. Discussion
Recent studies on rice disease classification have made significant progress using deep learning techniques, particularly with lightweight models and ensemble methods. For instance, Gan et al. [
35] proposed that BiFPN multi-scale fusion combined with the Wise-IoU loss function can improve small-sample generalization ability, but the mAP of small samples in natural environments is still 3.8% lower than that in laboratory conditions. Similarly, Wang et al. [
36] proposed a stacked ensemble of CG-EfficientNet models, achieving 96.10% accuracy by leveraging attention mechanisms and sequential least squares programming for weight optimization. While this approach demonstrates commendable performance, it primarily focuses on balancing model accuracy and computational efficiency, with limited emphasis on handling complex field conditions or small-sample datasets. Jia et al. [
37] proposed the MobileNet-CA-YOLO model, which significantly improved the efficiency and lightweight level of rice pest and disease detection by combining the MobileNetV3 lightweight network, the coordinate attention mechanism (CA), and the SIoU loss function. However, their dataset mainly comes from web crawling, and the sample coverage and diversity may be insufficient. Moreover, it does not include real scene data under complex field environments (such as occlusion and light changes), which affects the generalization ability of the model in practical applications. These methods typically rely on single-model optimizations or simple ensemble strategies, which may not fully exploit the diversity of feature representations or adapt dynamically to varying field conditions.
Our method, CEL-DL-Bagging, addresses these limitations by integrating a salient position attention (SPA) mechanism with a cross-entropy loss-weighted voting strategy, achieving 98.33% accuracy. Unlike previous approaches, our framework combines multiple lightweight models (YOLOv5s-cls, EfficientNet-B0, MobileNetV3, and ShuffleNetV2) through Bootstrap Aggregating (Bagging), enhancing feature diversity and robustness. The SPA mechanism specifically targets diagnostically relevant regions, reducing interference from complex backgrounds, while the adaptive weighting strategy ensures optimal contributions from each base learner. This dual focus on attention and ensemble diversity allows our model to outperform existing methods in both accuracy and generalization, particularly in natural field environments where lighting and occlusion variations are prevalent.
While the proposed CEL-DL-Bagging framework demonstrates superior performance in rice leaf disease identification, several important considerations and limitations merit discussion.
(1) The field rice disease dataset used in this paper was sourced from the rice green high-yield, efficient, and productive demonstration fields in Gaoyou City, and Yangzhou City, both in Jiangsu Province. The experiment utilized network data expansion, data enhancement, and transfer learning techniques to mitigate the impact of limited data volume. Further expansion of the field rice disease dataset will be required with the assistance of agricultural researchers;
(2) The field rice disease recognition model in this paper has not yet been tested with a more diverse range of rice disease phenotypic features. Further efforts are needed to increase data diversity and validate the model’s performance under other disease manifestation scenarios to enhance its generalization capability;
(3) This paper improved the network model’s performance in identifying rice diseases in the field through a position-based attention mechanism and an improved depth-guided aggregation algorithm. However, there is still room for improvement. In the future, we will attempt to integrate more advanced deep learning technologies with the identification model to improve identification accuracy, speed, and adaptability under different environmental conditions, thereby optimizing the rice disease identification model in the field.