Training and Optimization of a Rice Disease Detection Model Based on Ensemble Learning

Sun, Jihong; Tian, Peng; Zhao, Jiawei; Zhang, Haokai; Qian, Ye

doi:10.3390/agriculture15212283

Open AccessArticle

Training and Optimization of a Rice Disease Detection Model Based on Ensemble Learning

by

Jihong Sun

¹

,

Peng Tian

²

,

Jiawei Zhao

²,

Haokai Zhang

³ and

Ye Qian

^2,*

¹

College of Information Engineering, Kunming University, Kunming 650091, China

²

College of Big Data, Yunnan Agricultural University, Kunming 650201, China

³

College of Engineering, China Agricultural University, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(21), 2283; https://doi.org/10.3390/agriculture15212283

Submission received: 26 September 2025 / Revised: 27 October 2025 / Accepted: 1 November 2025 / Published: 2 November 2025

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Accurate and reliable detection of rice diseases and pests is crucial for ensuring food security. However, traditional deep learning methods often suffer from high rates of missed and false detections when dealing with complex field environments, especially in the presence of tiny disease spots, due to insufficient feature extraction capabilities. To address this issue, this study proposes a high-precision rice disease detection method based on ensemble learning and conducts experiments on a self-built dataset of 12,572 images containing five types of diseases and one type of pest. The ensemble learning model is optimized and constructed through a phased approach: First, using YOLOv8s as the baseline, transfer learning is performed with the agriculture-related dataset PlantDoc. Subsequently, a P2 small-object detection head, an EMA mechanism, and the Focal Loss function are introduced to build an optimized single model, which achieves an mAP_0.5 of 0.899, an absolute improvement of 5.5% compared to the baseline YOLOv8s. Then, three high-performance YOLO object detection models, including the improved model mentioned above, are selected, and the Weighted Box Fusion technique is used to integrate their prediction results to construct the final Ensemble-WBF model. Finally, the AP_0.5 and AR_0.5:0.95 of the model reach 0.922 and 0.648, respectively, with absolute improvements of 2.2% and 3.2% compared to the improved single model, further reducing the false and missed detection rates. The experimental results show that the ensemble learning method proposed in this study can effectively overcome the interference of complex backgrounds, significantly improve the detection accuracy and robustness for tiny and similar diseases, and reduce the missed detection rate, providing an efficient technical solution for the accurate and automated monitoring of rice diseases in real agricultural scenarios.

Keywords:

rice disease; transfer learning; YOLO; ensemble learning; precise monitoring

1. Introduction

China is one of the cradles of rice cultivation in the world and also the largest rice producer globally [1,2]. Yunnan is an important rice cultivation and breeding base in the plateau region of China, with a rice cultivation history of over 4000 years and abundant rice germplasm resources [3]. The rice industry is not only related to national food security but also plays an important role in the cultivation of disease-resistant strains and the research and development of special rice varieties. However, the terrain of Yunnan is mainly mountainous and plateau, with mountainous areas accounting for 88.64% of the province’s total area [4]. This unique terrain not only results in a lower level of mechanization and automation in rice cultivation compared to plain areas but also poses greater challenges to the precise monitoring of breeding experimental fields and the disease prevention and control in terrace areas [5,6,7]. Moreover, the diversity of altitude, climate, and biodiversity makes it difficult to predict the occurrence, development, and outbreak of rice diseases, directly affecting the screening efficiency of disease-resistant strains during the breeding process and the stability of field yields.

Accurate disease monitoring is crucial for scenarios such as disease resistance identification in rice breeding experimental fields and fixed-point monitoring of mountain terraced fields. During the breeding stage, precise identification of tiny disease spots is required to screen excellent disease-resistant strains for early disease prevention and control. In field disease control, it is necessary to quickly distinguish visually similar diseases to formulate targeted strategies [8,9]. However, traditional manual detection methods rely on expert experience, which are not only inefficient but also have high omission rates for small-target diseases and prominent misjudgments of similar diseases, hindering the progress of disease-resistant breeding and delaying field disease control [10]. Therefore, developing a rice disease detection technology that enables fixed-point detection with high accuracy and the ability to detect small targets is of great significance for ensuring stable rice yields and improving breeding efficiency.

The rapid development of artificial intelligence technology has promoted the transformation of rice disease detection from traditional expert experience-based judgment to automated and intelligent monitoring. Early studies mainly combined traditional computer vision with machine learning methods. Through image processing techniques such as threshold segmentation and edge detection, and combined with manually designed features, disease spots were separated. Then, machine learning algorithms such as SVM and KNN were used to classify these features [11,12,13,14]. Although these methods achieved good results in the early stage, they were limited by manually designed features and were difficult to adapt to scenarios with changing lighting, complex backgrounds, or inconspicuous disease characteristics. Therefore, they had limitations in terms of robustness and accuracy [15]. With the rise of deep learning, especially the great success of convolutional neural networks and Vision Transformer in the field of general image recognition, it has promoted the transformation of rice disease detection research [16,17]. By improving CNN-based architectures such as VGG16 [18] and ResNet [19], as well as Transformer-based architectures such as ViT [20] and Swin Transformer [21], they can better learn deep and high-dimensional disease features, thereby improving the recognition accuracy. However, the limitation of classification models is that they can only determine whether there is a disease in the image but cannot provide the specific location and range of the disease spots. It is difficult to meet the application scenarios that require precise localization of disease spots for disease resistance screening or assessment of disease severity.

To overcome the aforementioned difficulties, researchers have gradually turned to object detection technology, which can simultaneously provide category and location information. Existing studies are mainly based on mainstream detection frameworks such as YOLO and Faster R-CNN [22], and conduct adaptability research on these models for rice disease detection tasks. For example, in the YOLO framework, pre-trained models from large-scale general datasets like COCO and ImageNet 1K are used for transfer learning [23,24] to accelerate the model’s convergence speed on specific disease datasets and improve its basic performance. Meanwhile, to optimize the model’s ability to extract features of specific diseases, researchers have improved the baseline model from multiple perspectives: First, attention mechanisms such as GAM [25], Triplet Attention [26], and CBAM [27] are introduced to enable the model to adaptively focus on lesion areas and suppress the interference of complex backgrounds. Second, the convolution module is modified to enhance the perception ability of lesions of different scales, especially tiny lesions [28,29]. Third, loss functions such as Wise IOU [30,31] and DIoU Loss [32] are introduced to address the class imbalance problem where background samples far outnumber lesion samples in field scenarios or to improve the accuracy of target disease localization. These improvements have achieved good results on specific datasets, promoting the transformation of disease monitoring from category judgment to precise positioning. However, despite the significant progress made in existing object detection technology, when applied to complex real-world field environments, problems such as missed detections, false detections, and low accuracy still commonly occur for early tiny lesions or diseases with similar visual features [33,34]. These problems restrict the practical application of automation technology in key processes such as disease-resistant breeding screening and precise field management.

To further break through the detection bottleneck of a single model in complex scenarios and enhance the detection stability, this study explores the applicability of ensemble learning in the task of rice disease detection. Ensemble learning is a technique widely used in traditional machine learning. Its core idea is to make joint decisions by constructing and combining multiple diverse base learners [35,36]. By effectively combining the advantages of different models, it can achieve better generalization ability and robustness than any single model. However, there are relatively few studies on applying the ensemble learning strategy to deep learning object detection tasks. For the rice disease detection task in this study, integrating multiple high-performance object detectors helps to overcome the interference of complex backgrounds, solve the misjudgment of similar diseases, and ultimately improve the upper limit of detection accuracy and stability [37].

However, designing an effective integration strategy and fusing prediction results according to the specific requirements of rice disease detection is one of the main challenges in current research [38]. To address this issue, this paper proposes an ensemble learning method based on post-processing integration to tackle the problem of missed detections in rice disease detection. The detection boxes are integrated at the post-processing stage of multiple detection models, and the results are optimized and integrated using the Weighted Boxes Fusion method [39]. Ultimately, high-precision detection of rice diseases under complex backgrounds, especially tiny and similar lesions, is achieved. The innovations and contributions of this paper are as follows:

To address the significant difference between general datasets and the characteristics of agricultural diseases, domain adaptation pre-training was carried out using the PlantDoc plant disease-specific dataset. This provided a more targeted feature foundation for the subsequent disease detection model and effectively enhanced the model’s sensitivity to disease characteristics.
Regarding the detection challenges of missed detection of tiny lesions and difficult-to-distinguish samples commonly found in rice diseases, in this study, based on YOLOv8s-transfer, a P2 detection head was introduced to improve the detection performance of small targets. Additionally, the EMA mechanism and the Focal loss function were combined to strengthen key features and focus on difficult samples.
To break through the detection bottlenecks of single models in complex backgrounds and visually similar diseases, an ensemble detection framework based on weighted box fusion was designed and implemented. By integrating three high-performance single detectors and utilizing the WBF post-processing technology, the upper limit of the final accuracy and the robustness of the detection model were improved.

In this study, a high-precision and high-robustness ensemble detection method was constructed through multiple stages, which is beneficial for the accurate identification of disease resistance in breeding experimental fields and the fixed-point monitoring of diseases in mountain terraces. This has important practical significance for improving the efficiency of rice breeding, ensuring the stable rice yield in plateau areas, and even promoting the application of smart agriculture in complex scenarios.

2. Materials and Methods

The overall technical roadmap of this study is shown in Figure 1, which is divided into three parts: data collection, data processing, and model construction. First, data on five rice diseases, namely Bacteria blight, Blast, Brown spot, Entyloma, and Tungro, and one pest, Rice planthopper, were obtained through two channels: self-collection and public datasets. Then, fuzzy and duplicate images were deleted. LabelImg was used to annotate the disease images, and the dataset was expanded through five data augmentation methods. Subsequently, five different architectures, YOLOv8, YOLOv9, Faster-RCNN, RT-DETR, and EfficientDet, were adopted to construct disease detection models. After selecting the YOLO framework for subsequent experiments, the YOLOv8 and YOLOv9 disease detection models were first constructed using the PlantDoc dataset. After obtaining the pre-trained weights, the pre-trained weights were used to train on the dataset constructed in this study, resulting in two models, YOLOv8-transfer and YOLOv9-transfer. Subsequently, based on the YOLOv8 model, the YOLOv8s-transfer model was improved by adding a P2 small-target detection head, EMA, and changing the loss function to Focal Loss, aiming to improve its detection ability and prediction accuracy for small-target diseases. Then, to further improve the model performance, through ensemble learning, the prediction results of the three base models with the highest mAP_0.5 were integrated in the post-processing stage. By comparing the results of four post-processing methods, NMS, SNMS, NMW, and WBF, the Ensemble-WBF model was finally constructed. After that, the advantages and disadvantages of the single models and the Ensemble-WBF model were compared, and the research results were analyzed and discussed.

2.1. Data Acquisition

In this study, data on rice diseases and pests were collected in two ways. First, in May 2024, images of rice diseases and pests were collected at the Rice Research Institute of Yunnan Agricultural University. This included two parts: rice planted in the greenhouse and that planted outside the greenhouse. Using mobile devices, images of four diseases (Bacteria blight, Blast, Brown spot, Entyloma) and one pest (Rice Planthopper) of rice were taken at a distance of 15–30 cm from the diseased leaves. Each image had a resolution of 3120 × 4160 pixels, with a total of 3688 images. To ensure data quality, blurred and duplicate images were removed through manual screening, and finally 2757 high-quality images of diseases and pests were obtained. To improve the robustness of the model and its adaptability in different natural environments, a rice disease dataset created by Sethy et al. [40] was further obtained from Mendeley Data. This dataset is mainly used for classification tasks and includes four diseases (Bacteria Blight, Blast, Brown spot, Tungro) in real field scenarios. To make it suitable for the rice disease detection in this study, 2634 images with complex backgrounds were selected from it to construct the dataset. In total, 5391 original images of rice diseases and pests were obtained. Figure 2 shows the collected images of five diseases and one pest.

2.2. Image Processing and Dataset Construction

To construct a rice disease detection model with high accuracy and strong generalization ability, this study simulates rice disease images at different angles and under different lighting conditions through data augmentation and simultaneously addresses the issue of limited data volume. Initially, 2634 images from the Mendeley dataset were augmented twice using vertical flipping and rotation, resulting in 7902 disease images. Additionally, to improve the model’s robustness and adaptability to different environments and reduce the impact of the long-tail effect on model performance, random data augmentation was performed on the images of Bacteria blight, Blast, Brown spot, and Entyloma, which are in relatively small quantities in the manually collected data, using five methods: vertical flipping, random adjustment of brightness and contrast, horizontal flipping, random adjustment of gamma value, and random adjustment of hue. Taking Entyloma as an example, the effect of data augmentation is shown in Figure 3. The probability of each data augmentation method was uniformly set to 0.5, and 3618 images were obtained after two rounds of random augmentation. Finally, the dataset was randomly divided into a training set and a validation set at a ratio of 8:2. The positions and categories of all pests and diseases were manually annotated using the LabelImg software (1.8.1) and saved in the YOLO format. The quantities of pest and disease images and labels in the dataset are detailed in Table 1.

2.3. Model Construction

2.3.1. YOLOv8s Disease Detection Model Based on Transfer Learning of PlantDoc

Figure 4 illustrates the feature extraction and prediction workflow of YOLOv8s. The input image is initially subjected to feature extraction by the backbone part. After passing through multiple convolutional layers, feature maps of multiple scales are generated. Subsequently, these feature maps undergo feature fusion in the neck module to capture context information at different levels, thereby supporting the final detection task. Finally, the head module outputs prediction results based on feature maps of different sizes [41].

The Backbone part consists of CBS, C2f, and SPPF modules. The backbone networks of both YOLOv5 and YOLOv8 are based on the CSPDarkNet53 architecture [42]. Different from YOLOv5, YOLOv8 replaces the C3 module with the C2f module to maintain light weight while providing more abundant gradient flow information, thus improving the convergence speed and performance.

The Head part of YOLOv8 differs from those of YOLOv3 and YOLOv5. YOLOv3 and YOLOv5 adopt a strategy based on coupled heads and anchors, while YOLOv8 uses the same decoupled head and anchor-free strategy as YOLOX. The Decoupled Head eliminates object prediction and is divided into two branches for bounding box (bbox) prediction and category prediction, respectively, extracting position features and classification features separately. Then, each branch completes the positioning and classification tasks through convolutional layers, thereby improving the detection accuracy and accelerating the model convergence. For these two tasks, YOLOv8 employs different loss functions: Binary Cross-Entropy Loss (BCE Loss) for the classification task, and Distribution Focal Loss (DFL) and CIoU for the bounding box regression prediction task.

In addition, transfer learning can leverage the knowledge gained from one task and apply it to another related task, thereby improving performance [43]. When data is limited, it enables the model to transfer knowledge from a pre-trained model rather than training from scratch [44]. In terms of disease detection, taking advantage of transfer learning, we first trained the YOLO model on the publicly available PlantDoc dataset [45], which is specifically tailored for plant disease detection. Then, we used its pre-trained weights for the rice disease detection dataset in this experiment, thus achieving more efficient feature learning on the limited rice disease sample set and improving the detection accuracy.

2.3.2. EMA Mechanism

The efficient multi-scale attention (EMA) module [46] aims to enhance the feature representation ability of convolutional neural networks (CNNs) in computer vision tasks such as image classification and object detection. Its architecture focuses on retaining information in each channel while reducing computational overhead. The EMA module is implemented through three steps: feature grouping, parallel sub-networks, and cross-spatial learning. By combining features of different scales with global context information, parallel processing and feature grouping improve computational efficiency. The Matmul function is used for the weighted fusion of cross-spatial features. Through matrix operations of global and local features, it enhances the representation ability of key disease regions. The structure diagram of the EMA mechanism is shown in Figure 5.

2.3.3. Focal Loss

Object detection methods typically utilize prior boxes to enhance prediction performance. An image may generate thousands of prior boxes, but only a small fraction of them can match the targets (positive samples), while most prior boxes cannot match any targets [47]. This situation leads to an imbalance between positive and negative samples in one-stage object detection methods. Focal Loss [48] dynamically adjusts the cross-entropy loss based on confidence, addressing the sample imbalance problem from another perspective. As the confidence of correct predictions increases, the weight coefficient of the loss gradually decreases to zero. This ensures that the model training loss is more concentrated on challenging cases, while the loss from a large number of simple instances remains at a low level.

F L (P) = - α {(1 - p)}^{γ} \times y l o g (p) - (1 - α) p^{γ} \times (1 - y) l o g (1 - p)

(1)

2.3.4. Rice Pest and Disease Detection Model Based on Improved YOLOv8s

To address the issue of the limited number of original images in the rice dataset used in this study, although the sample size was increased through local data collection, public datasets, and data augmentation, the generalization ability of the model may still be limited. Therefore, in this study, the backbone, neck, and loss function of the YOLOv8s model were improved, respectively. The structure of the improved model is shown in Figure 6. By introducing the EMA mechanism after the feature fusion module SPPF (Spatial Pyramid Pooling-Fast), the ability of the model to express important features was enhanced. This not only improved the stability of the model but also reduced the instability during the training process, alleviated the overfitting problem, and enhanced the robustness of the model. Subsequently, a P2 detection head was added as a new output layer. The introduction of the P2 detection head can enhance the model’s ability to capture features and details of small targets with a relatively small increase in model parameters, thereby improving the accuracy of detecting small-target pests and diseases. In addition, Focal loss can better adapt to the target distribution in the dataset, thus improving the detection performance for difficult samples.

2.3.5. Rice Disease Detection Model Based on Ensemble Learning

In response to the limitations of transfer learning and improved models in enhancing the generalization ability and prediction stability of models, this study utilizes the concept of ensemble learning to further optimize the comprehensive performance of rice disease detection by integrating multiple object detection models in the post-processing stage. The integration process is shown in Figure 7.

As shown in Figure 7, the proposed integration method evaluates multiple trained object detection models by aggregating a large number of candidate bounding boxes generated by multiple single models during the detection process. Subsequently, a suitable post-processing algorithm is employed to select the most appropriate bounding boxes. Currently, the application of ensemble learning methods in the agricultural field is relatively limited. In this study, based on the Bagging method, multiple models with the best performance among the previously constructed pest and disease detection models are predicted in parallel. During the prediction process, individual prediction boxes along with their corresponding categories and confidence scores are generated. Subsequently, all prediction boxes are aggregated, and model fusion is achieved through four commonly used post-processing techniques: Non-Maximum Suppression (NMS) [49], Soft Non-Maximum Suppression (SNMS) [50], Non-Maximum Weighting (NMW) [51], and Weighted Box Fusion (WBF) [39]. These methods facilitate the complementary advantages between different models, thereby enhancing the overall performance of the model. Notably, Non-Maximum Suppression is one of the most commonly used post-processing techniques in object detection tasks. It retains the prediction box with the highest confidence while suppressing other prediction boxes with lower confidence or those highly overlapping with the high-confidence prediction box, thus selecting the best prediction box from multiple overlapping prediction boxes. Soft Non-Maximum Suppression is an improved version of NMS. The difference lies in that it does not directly suppress other prediction boxes but linearly reduces the confidence scores of overlapping prediction boxes, thereby retaining useful information. The Non-Maximum Weighting method is similar to NMS, but it assigns weights to each candidate prediction box, aiming to improve object detection performance in cases of high overlap. Weighted Box Fusion balances the prediction results of different models by combining candidate boxes from different models and assigning weights, thereby enhancing the detection ability.

2.4. Experimental Setup

The hardware configuration of this experiment includes Intel (R) Core (TM) i9-13900K Central Processing Unit (CPU), 32 GB of memory, and NVIDIA RTX 2080 Ti 11 G Graphics Processing Unit (GPU). The software runs on the Windows 10 operating system. All programs are executed under Python 3.11 and the deep-learning framework PyTorch 2.0.1, and the NVIDIA CUDA 11.8 parallel computing driver is utilized for accelerated training. The models are trained using the YOLOv8 and YOLOv9 algorithms defined in Ultralytics version 8.2.36. The parameters for each model are set as follows: 300 epochs, a batch size of 16, an initial learning rate of 0.1, a learning rate scaling factor of 0.001, the AdamW optimizer with a weight decay coefficient of 5 × 10⁻⁴; the thresholds for different post-processing methods are all set to 0.6; data pre-processing includes pixel value normalization (/255) and mosaic augmentation (performed under the configuration enabled during training), and the cosine annealing strategy is adopted for learning rate adjustment.

2.5. Evaluation Metrics

The YOLO series models include different versions such as n(t), s, m, l, and x. Among them, the n(t) and s versions perform excellently in terms of real-time performance, which can not only maintain high detection accuracy but also ensure real-time responsiveness. Therefore, this study mainly focuses on discussing the accuracy and efficiency of the models in the disease detection task. Correspondingly, for EfficientDet, the EfficientDet–D0 with the fewest parameters is selected for the experiment. Since there are only two sizes (L and X) of RT-DETR in the Ultralytics library, RT-DETR-L is selected for the experiment in this study. Three indicators, Precision, Recall, and mAP_0.5, are selected to evaluate the model performance.

(1): Precision

Precision characterizes the proportion of samples that are actually positive among those predicted to be positive. The computational formula for this metric is presented in Equation (2).

P = \frac{T P}{T P + F P}

(2)

where TP (True Positives) denotes the instances where the model correctly classifies positive samples as positive. FP (False Positives) indicates the cases where the model erroneously predicts negative samples as positive.

(2): Recall

The recall rate characterizes the proportion of samples that are correctly classified as positive among all samples that are actually positive. Its computational formula is presented in Equation (3).

R = \frac{T P}{T P + F N}

(3)

where FN (False Negatives) denotes instances where the model erroneously classifies positive samples as negative. Additionally, TN (True Negatives) indicates cases where the model accurately predicts negative samples as such.

(3): Mean Average Precision (mAP)

The mean average precision (mAP) is derived from the average precision (AP). The calculation formulas for AP and mAP are presented in Equations (4) and (5) [16]. Average precision describes the accuracy of predictions for each class, calculated as the area enclosed by the precision-recall curve and the coordinate axes for each class. Mean average precision, on the other hand, is the average of the average precisions across all classes [16]. Generally, a higher mAP indicates better model performance.

A P = \sum_{j = 1}^{n} {(R_{j + 1} - R_{j}) \cdot P}_{i n t e r} (j)

(4)

m A P = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i}

(5)

where R_j denotes the recall rate at the j-th recall threshold point, P_inter(j) represents the interpolated precision at the j-th recall threshold. N represents the total number of categories, and AP_i is the Average Precision (AP) value for the i-th category.

The Intersection over Union (IoU) is a widely used evaluation metric in object detection tasks. It measures the correlation, or the degree of overlap, between the predicted bounding boxes and the actual bounding boxes. Its value ranges from 0 to 1, with a higher value indicating a greater overlap between the predicted and actual bounding boxes, and thus a higher accuracy in object detection. IoU is calculated by dividing the overlapping area of the predicted and actual bounding boxes by the combined total area of both regions. A predetermined threshold is then compared with the calculation result to determine whether to retain the predicted bounding box. The calculation formula is as follows:

I o U (B_{t}, B_{p}) = \frac{B_{t} \cap B_{p}}{B_{t} \cup B_{p}}

(6)

where

B_{t}

denotes the true bounding box, while

B_{p}

represents the predicted bounding box.

B_{t} \cap B_{p}

signifies the intersection area between the true bounding box

B_{t}

and the predicted bounding box

B_{p}

;

B_{t} \cup B_{p}

denotes the union area between the true bounding box

B_{t}

and the predicted bounding box

B_{p}

.

3. Results and Discussion

3.1. Analysis of Experimental Results of Different Object Detection Models

To screen the object detection models suitable for improvement and integration experiments, this study selects five models with different architectures, namely Faster-RCNN, EfficientDet, YOLOv8, YOLOv9, and RT-DETR, and compares their three accuracy indicators, mAP_0.5, Precision, and Recall, as well as two efficiency indicators, Param (M) and GFLOPs, on a self-built rice pest and disease dataset. All models are tested under the same parameter configuration, and the results are shown in Table 2. Among them, Faster-RCNN and EfficientDet-D0 are trained through the MMDetection framework, while YOLO and RT-DETR-L are trained via Ultralytics.

As shown in Table 2, there are significant differences in the accuracy and efficiency of models with different architectures. RT-DETR achieves the highest detection accuracy, with mAP_0.5 and Recall of 0.869 and 0.848, respectively, which are absolutely improved by 0.9% and 4.3% compared with YOLOv9s. This may be because the Transformer architecture has a stronger ability to capture global features, especially performing better on small-scale rice diseases. However, it should be noted that its Precision is 0.854, slightly lower than that of YOLOv9s, and the number of parameters and GFLOPs are 32.00 M and 103.5, respectively, which are 4.5 times and 3.9 times those of YOLOv9s. Its excessively high computational cost makes it difficult to adapt to the scenario of batch detection. In addition, the traditional two-stage model Faster-RCNN has the worst performance, with an mAP_0.5 of only 0.791. Moreover, the two-stage candidate box generation process has poor compatibility with the subsequent post-processing integration methods such as WBF and NMS in this study, resulting in a relatively high integration difficulty. Although the real-time-oriented EfficientDet has the lowest GFLOPs of 3.61, its mAP_0.5 is only 0.804, indicating that its lightweight structure has difficulty in capturing the subtle features of rice diseases and lacks detection stability.

In contrast, the YOLO series strikes a better balance between accuracy and efficiency. The mAP_0.5 of YOLOv9s reaches 0.86, and the Precision reaches 0.858, showing absolute improvements of 1.6% and 3.7%, respectively, compared to YOLOv8s. Moreover, the Recall values of the two models are close. Meanwhile, its number of parameters and GFLOPs are 7.17 M and 26.7, respectively, representing decreases of 35.6% and 6.0% compared to YOLOv8s, indicating that it achieves better performance after architecture optimization. Additionally, even for the lightweight versions YOLOv8n and YOLOv9t, their streamlined mAP_0.5 values are 0.818 and 0.813, respectively, which are lower than those of YOLOv8s and YOLOv9s. However, their numbers of parameters and GFLOPs are significantly reduced, and the overall performance is still superior to that of Faster-RCNN and EfficientDet, further validating the adaptability of the YOLO series.

In summary, the YOLO series has achieved advantages in terms of accuracy, efficiency, and integration convenience. Moreover, it is developed based on the Ultralytics library, and subsequent updates to newer versions such as YOLOv10 and YOLOv11 can be directly made without the need to reconstruct data pre-processing and integration interfaces, which guarantees the scalability of this study. Therefore, this study selects the YOLO series for subsequent improvement and integration experiments to develop a high-precision disease detection framework suitable for special agricultural scenarios.

3.2. Performance Analysis of YOLO Pre-Trained Model Based on PlantDoc Dataset

3.2.1. PlantDoc Pre-Trained Model

The default pre-trained models of the YOLO series are built based on the COCO dataset. The general targets such as vehicles and pedestrians included in it have significant differences from the characteristics of rice diseases, which easily leads to the problem of insufficient feature adaptation of the model in agricultural scenarios. To enhance the model’s learning ability for plant disease characteristics, this study selects the PlantDoc dataset dedicated to plant diseases and pests for pre-training, laying a foundation for subsequent transfer learning. The training results are shown in Table 3.

As can be seen from Table 3, the YOLOv9s model achieved the optimal performance on the PlantDoc dataset, with mAP_0.5, Precision, and Recall of 0.667, 0.625, and 0.619, respectively. This might be because it improved the feature extraction and fusion mechanism of YOLOv8, thereby effectively mining the details of diseases. It is worth noting that the mAP_0.5 and Precision of YOLOv8n are higher than those of YOLOv8s, but its Recall is significantly lower than that of YOLOv8s. This could be because the streamlined architecture of YOLOv8n reduces the redundant features brought by complex networks, thus reducing the probability of misjudgment in non-disease areas and showing higher prediction accuracy. However, due to its limited ability to extract shallow features, it is difficult to fully cover the scattered small disease areas in the image, resulting in missed detections of real disease targets and ultimately reducing the Recall. In contrast, YOLOv9t improved the missed detection phenomenon through structural improvements. However, the optimized structure has difficulty extracting disease features under the lightweight structure, leading to a decrease in accuracy.

Table 4 presents the detection results of the top five categories of YOLOv9s in the PlantDoc dataset. The mAP_0.5 values for Apple leaf and Apple rust reach 0.902 and 0.912, respectively, indicating that the model has strong feature learning ability for single-category targets. The results validate the effectiveness of domain-related pre-training, suggesting that ensuring the close alignment between source-domain crop diseases and target-domain rice diseases is an effective strategy for improving model performance [56,57]. By introducing the plant disease features from the PlantDoc dataset, the feature differences between general targets in the COCO dataset and rice diseases are avoided, laying a foundation for subsequent transfer learning. However, the current pre-training only covers various crop diseases. In the future, based on the dataset of this study, pre-training samples of different types of rice diseases can be further supplemented to enhance the feature adaptation ability.

3.2.2. Validation of the Effectiveness of Transfer Learning

To verify the impact of pre-trained weights (transfer) on the PlantDoc dataset on the model accuracy, this study conducted a comparative experiment on transfer learning using the rice disease dataset. The experimental results are shown in Table 5.

According to the results in Table 5, after all models were pre-trained on PlantDoc, all indicators were improved, which verified the effectiveness of transfer learning. Without using the pre-trained weights from PlantDoc, YOLOv9s showed the best performance, with mAP_0.5, Precision, and Recall values of 0.86, 0.858, and 0.805, respectively. Compared with YOLOv8s, these values had absolute improvements of 1.6%, 3.7%, and 0.3%, respectively. After all models used the pre-trained weights from PlantDoc, the mAP_0.5, Precision, and Recall all increased. Among them, the mAP_0.5, Precision, and Recall of YOLOv9s had absolute improvements of 2.3%, 1.1%, and 4.0%, respectively, compared to the original model. YOLOv8s showed more significant improvements after transfer learning, with absolute improvements of 3.2%, 4.7%, and 4.1% in the three indicators, respectively, indicating the optimization potential of transfer learning for YOLOv8s.

3.3. Improved YOLOv8s for Rice Disease Detection

Based on the experimental results of transfer learning, this study selects YOLOv8s-transfer as the basic model for improvement. The improved model, Improved YOLOv8s-transfer, is constructed by introducing the small target detection head P2, the EMA mechanism, and the Focal Loss function. To verify the effectiveness of each component, this study conducts an ablation experiment using the control variable method by gradually adding components and evaluating the performance changes. The results of the ablation experiment are shown in Table 6.

The results of the ablation experiments in Table 6 clearly demonstrate the contributions of each improvement. After adding the P2 detection head to the baseline model, the mAP_0.5 increased from 0.876 to 0.888, with an absolute improvement of 1.2%. However, the GFLOPs increased from 28.4 to 37.0, resulting in an approximately 30.3% increase in computational overhead. This indicates that the P2 small-object detection head is crucial for capturing small disease spots such as Entyloma and Brown Spot, and it compensates for the feature loss of small objects by enhancing the utilization ability of shallow features. After adding the EMA module, the mAP_0.5 further increased to 0.892, with an absolute improvement of 0.4%, and both the number of parameters and GFLOPs slightly decreased. This shows that through the efficient design of the parallel sub-networks in the EMA, while enhancing the focus on key disease features, the computational efficiency is optimized. After introducing the Focal Loss, the mAP_0.5 of the model reached 0.899, with a further absolute improvement of 0.7%. This indicates that by reducing the weights of easily classifiable samples and focusing on difficult-to-distinguish samples such as Blast and Brown Spot, it effectively reduces the false detection rate of similar diseases. The mAP_0.5 of the final Improved YOLOv8s-transfer model is 0.899, with an absolute improvement of 2.3% compared to the baseline model, which verifies the rationality of the improvement strategy. It shows that the Improved YOLOv8s-transfer proposed in this study can effectively alleviate the problem of feature loss in small-object detection tasks.

To further analyze the optimization effect, Table 7 compares the performance differences between the improved model and the baseline model for each disease category. Among them, the Improved YOLOv8s-transfer shows the highest improvement in the detection ability for small-target diseases. The mAP_0.5 of Entyloma is absolutely increased by 6.7% compared with YOLOv8s-transfer, and that of Brown Spot is absolutely increased by 3.3%, directly verifying the synergistic effect of the P2 detection head and the EMA mechanism. In terms of the recall rate, the recall rates of the improved model for Bacterial blight, Brown spot, Tungro, Entyloma, and Rice planthopper are absolutely increased by 0.9%, 1.2%, 1.3%, 8.0%, and 0.7%, respectively, compared with YOLOv8s-transfer, indicating that it can effectively solve the problem of missed detection. However, for Blast, its mAP_0.5 decreases from 0.852 to 0.837, with an absolute reduction of 1.5%. This may be because although the improved model optimizes small targets and difficult samples, for Blast, the difficult sample focusing of Focal Loss leads to insufficient learning weights for easily classified samples of Blast, resulting in a decrease in prediction accuracy.

Furthermore, it can be seen from Figure 8 that for the dense small-target scenario of Entyloma diseases, the detection effects of YOLOv8s-transfer and YOLOv9s-transfer are similar. When the diseases are small, YOLOv9s has the highest missed detection rate. In contrast, Improved YOLOv8s-transfer can identify disease spots of smaller sizes and detect more small-target diseases, further confirming the enhanced ability of the improvement strategy to capture subtle features.

Although the performance of the improved model has been significantly enhanced, there are still certain limitations. It has insufficient ability to distinguish visually similar diseases such as Blast and Brown Spot in complex backgrounds, and there are still cases of missed detection of tiny targets in extreme backlight environments. The current model cannot fully adapt to the complex field environment, and the problem of missed detection of tiny targets still exists. This trade-off reflects the challenges of fine-grained classification in visual tasks, that is, like humans, the model may have difficulty distinguishing visually similar categories [58,59]. In the future, fine-grained feature modules such as lesion texture and color depth can be introduced to further reduce the false detection rate of similar diseases.

3.4. Rice Disease Detection Based on Ensemble Learning

To further enhance the detection robustness, this study selected three models with the best performance, namely Improved YOLOv8s-transfer, YOLOv9s-transfer, and YOLOv8s-transfer (original), to construct an ensemble framework, and compared the fusion effects of four post-processing techniques: NMS, SNMS, NMW, and WBF. The experiment recorded the output in the JSON format consistent with the COCO dataset. Since the inference time of the ensemble model was long, for efficient evaluation, we randomly selected 600 images (100 per class) from the validation set to construct a representative subset for testing, including 100 images for each of the 6 diseases and pests. The Pycocotools library was used to calculate the AP_0.5 and AR_0.5:0.95 metrics. The experimental conditions were set as an IoU threshold of 0.60 and post-processing weights of 1. The experimental results are shown in Table 8.

As shown in Table 8, among the four integration strategies, Ensemble-WBF performs the best, with an AP_0.5 of 0.922 and an AR_0.5:0.95 of 0.648, significantly outperforming the single model. Compared with the Improved YOLOv8s-transfer, it has an absolute increase of 2.2% and 3.2%, respectively. WBF generates new bounding boxes by performing a weighted average on highly overlapping predicted boxes, rather than simply suppressing redundant boxes like NMS. This allows it to fully integrate the consensus information from different models and improve the positioning accuracy. In contrast, the performance of Ensemble-SNMS is lower than that of the single model, which may be due to its excessive suppression of bounding boxes, leading to the loss of effective information.

However, the performance improvement of the integrated model results in significant performance overhead. The inference times of Ensemble-WBF and Improved YOLOv8s-transfer are 86.021 ms/image and 30.028 ms/image, respectively, with a 2.9-fold increase, which makes it difficult to meet the requirements of low-latency scenarios such as real-time monitoring by drones. Nevertheless, in scenarios where high precision and high recall are prioritized, such as rice breeding greenhouses, the ensemble learning strategy proposed in this study still has certain practical value. For instance, by regularly collecting images with fixed cameras and uploading them to the cloud, the early disease warning ability can be significantly enhanced at an average inference speed of 86 ms/image.

In addition, Table 9 further compares the performance differences between Ensemble-WBF and Improved YOLOv8s transfer for each disease category. Among them, Ensemble-WBF shows the most significant improvement in easily confused diseases: the AP and AR of Bacterial blight are absolutely increased by 4.2% and 7.0%, respectively, and those of Blast are absolutely increased by 3.8% and 3.1%, respectively, indicating that the ensemble strategy can effectively integrate the advantages of different models and reduce the misjudgments of a single model. For Brown Spot, both the AP and AR are absolutely increased by 3.3%, further optimizing the detection performance. However, the accuracy improvement of the integrated model for the small-target disease Entyloma is relatively small, with an absolute increase of 0.2%, and the average recall rate remains the same, indicating that the optimization of Improved YOLOv8s-transfer for small targets has approached the limit under the current strategy, while the ensemble strategy mainly enhances the ability to distinguish disease samples of similar categories. Figure 9 intuitively demonstrates the advantages of Ensemble-WBF in complex and dense scenarios. It not only reduces missed detections but also enhances the recognition stability of small and dense targets, verifying the application potential of ensemble learning in agricultural disease detection.

Although Ensemble-WBF demonstrates excellent performance, the high computational complexity resulting from multi-model fusion restricts its deployment on portable devices [60]. In the future, knowledge distillation [61] or pruning techniques [62] can be employed to transfer the detection capabilities of the ensemble model to a single lightweight model, thereby reducing the deployment difficulty while maintaining high accuracy.

3.5. Limitations and Future Prospects

This study has made certain progress in the construction and optimization of the rice disease detection model in complex environments, but there are still some limitations. First, the proposed model needs to improve its ability to distinguish visually similar diseases. For example, under complex backgrounds, there are still repeated false detections between Blast and Brown spot, indicating that the model has deficiencies in capturing fine-grained features such as lesion texture and color depth. Second, the deployment of the integrated model is challenging. The fusion of multiple models increases the computational overhead, making it challenging to deploy on resource-constrained portable devices. Although model deployment can be well achieved through cloud computing and network-based image capture, it is difficult to adapt to portable detection devices. In addition, limited by the rice growth cycle, the off-greenhouse cultivation in this study was an artificially controlled environment, which differs from the real field scenarios with natural weeds and complex lighting. In the follow-up, data will be supplemented in combination with the next season’s rice cultivation to improve the model’s robustness in field scenarios such as weed backgrounds and backlighting.

Future research will focus on three aspects. First, expand the multi-source dataset by further acquiring public rice datasets from different regions to cover disease phenotypes under different climate and soil conditions, thereby improving the generalization ability of the model. Second, introduce a disease feature extraction module according to different disease types to enhance the recognition ability of similar diseases. Finally, explore lightweight ensemble strategies. Through techniques such as knowledge distillation and model pruning, transfer the detection ability of the Ensemble-WBF model to a single lightweight model, thus lowering the threshold for model deployment. This will promote the large-scale application of high-precision rice disease detection technology on field portable devices (such as handheld detectors and small drones [63]) and provide efficient and implementable technical support for the early prevention and control of diseases in modern agriculture.

4. Conclusions

To address the issues of missed detection of minor diseases and misjudgment of similar diseases in rice disease detection under complex field and greenhouse backgrounds, this study collected images of five common rice diseases and one common pest and proposed a phased model optimization method based on transfer learning, model improvement, and ensemble learning. The research results show that, aiming at the limitations of a single model, on the basis of transfer learning, improving the model by adding a small target detection head P2, the EMA mechanism, and the Focal loss function is an effective way to reduce the missed detection of small target rice diseases and improve the prediction accuracy. On this basis, the ensemble learning method can further solve the misjudgment problem of similar diseases and achieve more accurate disease detection.

The findings of this study contribute to solving the problem of accurate disease detection in core scenarios such as breeding experimental fields in the Yunnan Plateau region and mountainous terraced fields. By constructing an integrated detection framework, it holds significant practical implications for enhancing rice breeding efficiency, ensuring food security in the plateau region, and even promoting the application of smart agriculture in complex scenarios. Future research can combine methods such as knowledge distillation and pruning to build multiple single lightweight models, so as to further balance the accuracy and efficiency of the integrated learning framework.

Author Contributions

Conceptualization, J.S. and Y.Q.; Methodology, J.S. and P.T.; Investigation, J.S. and P.T.; Validation, J.Z., P.T. and H.Z.; Data curation, P.T. and J.Z.; Resources, J.S. and Y.Q.; Writing—Original Draft, J.S. and P.T.; Writing—Review and Editing, P.T. and H.Z.; Supervision, J.S. and Y.Q.; Project administration, J.S. and Y.Q.; Funding acquisition, J.S. and Y.Q.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the “Academician (Expert) Workstation Project of Yunnan Provincial Science and Technology Department” (Project No.: 202405AF140077), awarded to Ye Qian; and the “Young and Middle-aged Academic and Technical Leaders Reserve Talent Project of Yunnan Provincial Science and Technology Department” (Project No.: 202405AC350108), awarded to Jihong Sun.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data collected by ourselves are still required for subsequent research and thus cannot be made publicly available at this stage. However, the public datasets used in this study can be accessed via the following links: Rice Leaf Disease Images: https://data.mendeley.com/datasets/fwcj7stb8r/1 (accessed on 3 June 2024). Plantdoc: https://github.com/pratikkayal/PlantDoc-Object-Detection-Dataset (accessed on 3 June 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tang, L.; Risalat, H.; Cao, R.; Hu, Q.; Pan, X.; Hu, Y.; Zhang, G. Food Security in China: A Brief View of Rice Production in Recent 20 Years. Foods 2022, 11, 3324. [Google Scholar] [CrossRef]
Gross, B.L.; Zhao, Z. Archaeological and genetic insights into the origins of domesticated rice. Proc. Natl. Acad. Sci. USA 2014, 111, 6190–6197. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Wang, X.; Zhou, L. Discussion on Rice Breeding Test Base Planning on the Yunnan-Guizhou Plateau. China Rice 2017, 23, 115–117. [Google Scholar] [CrossRef]
Zhou, F.; Huai, X.; Yan, P.; Zhao, C.; Jiang, X.; Pan, H.; Ma, Y.; Geng, H. Research on the Identification of Typical Terrain Patterns in Yunnan Province Based on the K-Means Technology. Atmosphere 2024, 15, 244. [Google Scholar] [CrossRef]
Zhao, C.; Ma, C.; Li, J.; Wang, X. Research status and prospects of mechanization technology for rice in hilly and mountainous areas. Trans. Chin. Soc. Agric. Eng. 2025, 41, 1–11. [Google Scholar] [CrossRef]
Zhao, P.; Yan, Y.; Jia, S.; Zhao, J.; Zhang, W. Construction and Evaluation of a Cross-Regional and Cross-Year Monitoring Model for Millet Canopy Phenotype Based on UAV Multispectral Remote Sensing. Agronomy 2025, 15, 789. [Google Scholar] [CrossRef]
Tan, S.; Xie, D.; Ni, J.; Chen, F.; Ni, C.; Shao, J.; Zhu, D.; Wang, S.; Lei, P.; Zhao, G.; et al. Characteristics and influencing factors of chemical fertilizer and pesticide applications by farmers in hilly and mountainous areas of Southwest, China. Ecol. Indic. 2022, 143, 109346. [Google Scholar] [CrossRef]
Cao, Y.; Yuan, P.; Xu, H.; Martínez-Ortega, J.F.; Feng, J.; Zhai, Z. Detecting asymptomatic infections of rice bacterial leaf blight using hyperspectral imaging and 3-dimensional convolutional neural network with spectral dilated convolution. Front. Plant Sci. 2022, 13, 963170. [Google Scholar] [CrossRef]
Yang, G.; He, Y.; Yang, Y.; Xu, B. Fine-grained image classification for crop disease based on attention mechanism. Front. Plant Sci. 2020, 11, 600854. [Google Scholar] [CrossRef]
Li, L.; Zhang, S.; Wang, B. Plant Disease Detection and Classification by Deep Learning—A Review. IEEE Access 2021, 9, 56683–56698. [Google Scholar] [CrossRef]
Azim, M.A.; Islam, M.K.; Rahman, M.M.; Jahan, F. An effective feature extraction method for rice leaf disease classification. Telkomnika (Telecommun. Comput. Electron. Control) 2021, 19, 463–470. [Google Scholar] [CrossRef]
Acharya, S.; Kar, T.; Samal, U.C.; Patra, P.K. Performance comparison between svm and ls-svm for rice leaf disease detection. EAI Endorsed Trans. Scalable Inf. Syst. 2023, 10, 1–7. [Google Scholar] [CrossRef]
Yao, Q.; Guan, Z.; Zhou, Y.; Tang, J.; Hu, Y.; Yang, B. Application of Support Vector Machine for Detecting Rice Diseases Using Shape and Color Texture Features. In Proceedings of the 2009 International Conference on Engineering Computation, Hong Kong, China, 2–3 May 2009; pp. 79–83. [Google Scholar] [CrossRef]
Ahmed, K.; Shahidi, T.R.; Alam, S.M.I.; Momen, S. Rice Leaf Disease Detection Using Machine Learning Techniques. In Proceedings of the 2019 International Conference on Sustainable Technologies for Industry 4.0 (STI), Dhaka, Bangladesh, 24–25 December 2019; pp. 1–5. [Google Scholar] [CrossRef]
Andreas, K.; Francesc, X.P.-B. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Front. Plant Sci. 2016, 7, 215232. [Google Scholar] [CrossRef]
Tasfe, M.; Nivrito, A.; Al Machot, F.; Ullah, M.; Ullah, H. Deep Learning Based Models for Paddy Disease Identification and Classification: A Systematic Survey. IEEE Access 2024, 12, 100862–100891. [Google Scholar] [CrossRef]
Ritharson, P.I.; Kumudha, R.; Mary, X.A.; Jennifer Eunice, R.; Andrew, J. DeepRice: A deep learning and deep feature based classification of Rice leaf disease subtypes. Artif. Intell. Agric. 2024, 11, 34–49. [Google Scholar] [CrossRef]
Al-Gaashani, M.S.A.M.; Samee, N.A.; Alnashwan, R.; Khayyat, M.; Muthanna, M.S.A. Using a Resnet50 with a Kernel Attention Mechanism for Rice Disease Diagnosis. Life 2023, 13, 1277. [Google Scholar] [CrossRef]
Borhani, Y.; Khoramdel, J.; Najafi, E. A deep learning based approach for automated plant disease classification using vision transformer. Sci. Rep. 2022, 12, 11554. [Google Scholar] [CrossRef]
Yang, L.; Haoyang, Z.; Peng, W.; Erzhi, W.; Gongfa, L.; Tongjian, Y. IMobileTransformer: A fusion-based lightweight model for rice disease identification. Eng. Appl. Artif. Intell. 2025, 161, 112271. [Google Scholar] [CrossRef]
Singh, J.P.; Ghosh, D.; Pradhan, C.; Singh, J. Rice Disease Detection Using Faster R-CNN Approach. In Proceedings of the 2024 OITS International Conference on Information Technology (OCIT), Vijayawada, India, 12–14 December 2024; 2024; pp. 60–67. [Google Scholar] [CrossRef]
Usman Idris, I.; Hui Na, C.; Rosdiadee, N.; Muhammed Kabir, A. A comprehensive review of deep learning approaches for rice disease detection: Datasets, methodologies, and future directions. Smart Agric. Technol. 2025, 11, 100976. [Google Scholar] [CrossRef]
Anshuman, N.; Somsubhra, C.; Dillip Kumar, S. Application of smartphone-image processing and transfer learning for rice disease and nutrient deficiency detection. Smart Agric. Technol. 2023, 4, 100195. [Google Scholar] [CrossRef]
Yang, Y.; Jiao, G.; Liu, J.; Zhao, W.; Zheng, J. A lightweight rice disease identification network based on attention mechanism and dynamic convolution. Ecol. Inform. 2023, 78, 102320. [Google Scholar] [CrossRef]
Zhou, T.; Wei, L. YOLO-DP: A detection model of fifteen common rice diseases and pests. Sci. Rep. 2025, 15, 35968. [Google Scholar] [CrossRef]
Zhao, Y.; Cui, Y.; Wang, Z. Improved YOLOv5 based on CBAM and BiFPN for Rice Pest and Disease Detection. In Proceedings of the 2023 International Conference on Computer, Internet of Things and Smart City (CIoTSC), Luoyang, China, 3–5 November 2023; pp. 29–36. [Google Scholar] [CrossRef]
Liu, G.; Di, J.; Wang, Q.; Zhao, Y.; Yang, Y. An Enhanced and Lightweight YOLOv8-Based Model for Accurate Rice Pest Detection. IEEE Access 2025, 13, 91046–91064. [Google Scholar] [CrossRef]
Rui, Z.; Tonghai, L.; Wenzheng, L.; Chaungchuang, Y.; Xiaoyue, S.; Tiantian, G.; Xue, W. YOLO-CRD: A Lightweight Model for the Detection of Rice Diseases in Natural Environments. Phyton-Int. J. Exp. Bot. 2024, 93, 1275–1296. [Google Scholar] [CrossRef]
Lu, Y.; Yu, J.; Zhu, X.; Zhang, B.; Sun, Z. YOLOv8-Rice: A rice leaf disease detection model based on YOLOv8. Paddy Water Environ. 2024, 22, 695–710. [Google Scholar] [CrossRef]
Li, P.; Zhou, J.; Sun, H.; Zeng, J. RDRM-YOLO: A High-Accuracy and Lightweight Rice Disease Detection Model for Complex Field Environments Based on Improved YOLOv5. Agriculture 2025, 15, 479. [Google Scholar] [CrossRef]
Zhao, Y.; Yang, Y.; Xu, X.; Sun, C. Precision detection of crop diseases based on improved YOLOv5 model. Front. Plant Sci. 2023, 13, 1066835. [Google Scholar] [CrossRef]
Zeng, N.; Gong, G.; Zhou, G.; Hu, C. An Accurate Classification of Rice Diseases Based on ICAI-V4. Plants 2023, 12, 2225. [Google Scholar] [CrossRef]
Fuentes, A.; Yoon, S.; Kim, S.C.; Park, D.S. A Robust Deep-Learning-Based Detector for Real-Time Tomato Plant Diseases and Pests Recognition. Sensors 2017, 17, 2022. [Google Scholar] [CrossRef]
Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2025. [Google Scholar] [CrossRef]
Dietterich, T.G. Ensemble Methods in Machine Learning. In Proceedings of the Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar] [CrossRef]
Jie, X.; Wei, W.; Hanyuan, W.; Jinhong, G. Multi-model ensemble with rich spatial information for object detection. Pattern Recognit. 2020, 99, 107098. [Google Scholar] [CrossRef]
Ganaie, M.A.; Minghui, H.; Malik, A.K.; Tanveer, M.; Suganthan, P.N. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
Roman, S.; Weimin, W.; Tatiana, G. Weighted boxes fusion: Ensembling boxes from different object detection models. Image Vis. Comput. 2021, 107, 104117. [Google Scholar] [CrossRef]
Sethy, P.K.; Barpanda, N.K.; Rath, A.K.; Behera, S.K. Deep feature based rice leaf disease identification using support vector machine. Comput. Electron. Agric. 2020, 175, 105527. [Google Scholar] [CrossRef]
Hussain, M. YOLOv1 to v8: Unveiling Each Variant–A Comprehensive Review of YOLO. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
Wang, C.-Y.; Mark Liao, H.-Y.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.H. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? arXiv 2014, arXiv:1411.1792. [Google Scholar] [CrossRef]
Singh, D.; Jain, N.; Jain, P.; Kayal, P.; Kumawat, S.; Batra, N. PlantDoc: A Dataset for Visual Plant Disease Detection. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (CoDS COMAD 2020), Hyderabad, India, 5–8 January 2020; pp. 249–253. [Google Scholar] [CrossRef]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef]
Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef]
Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS—Improving Object Detection with One Line of Code. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5562–5570. [Google Scholar] [CrossRef]
Zhou, H.; Li, Z.; Ning, C.; Tang, J. CAD: Scale Invariant Framework for Real-Time Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 760–768. [Google Scholar] [CrossRef]
Wang, C.-Y.; Yeh, I.-H.; Mark Liao, H.-Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Proceedings of the Computer Vision—ECCV 2024, Milan, Italy, 29 September–4 October 2024; pp. 1–21. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
Edna Chebet, T.; Li, Y.; Sam, N.; Liu, Y. A comparative study of fine-tuning deep learning models for plant disease identification. Comput. Electron. Agric. 2019, 161, 272–279. [Google Scholar] [CrossRef]
Jayme, G. Plant disease identification from individual lesions and spots using deep learning. Biosyst. Eng. 2019, 180, 96–107. [Google Scholar] [CrossRef]
Lin, T.-Y.; RoyChowdhury, A.; Maji, S. Bilinear CNN Models for Fine-Grained Visual Recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chil, 7–13 December 2015; pp. 1449–1457. [Google Scholar] [CrossRef]
Xie, Y.; Gong, Q.; Luan, X.; Yan, J.; Zhang, J. A Survey of Fine-Grained Visual Categorization Based on Deep Learning. J. Syst. Eng. Electron. 2024, 35, 1337–1356. [Google Scholar] [CrossRef]
Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
Hinton, G.E.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Caruana, R.; Niculescu-Mizil, A.; Crew, G.; Ksikes, A. Ensemble selection from libraries of models. In Proceedings of the Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 18. [Google Scholar] [CrossRef]
Sangaiah, A.K.; Anandakrishnan, J.; Devarapelly, A.R.; Mohamad, M.L.A.B.; Bian, G.-B.; Alenazi, M.J.F.; AlQahtani, S.A. R-UAV-Net: Enhanced YOLOv4 With Graph-Semantic Compression for Transformative UAV Sensing in Paddy Agronomy. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 1197–1209. [Google Scholar] [CrossRef]

Figure 1. Overall Technology Roadmap.

Figure 2. Sample images of six rice diseases and pests in the dataset.

Figure 3. Visual comparison of data enhancement effects (Entyloma as an example).

Figure 4. YOLOv8 architecture diagram.

Figure 5. EMA mechanism architecture diagram.

Figure 6. The improved YOLOv8 architecture diagram.

Figure 7. Ensemble learning framework diagram.

Figure 8. Comparison of the detection results of Entyloma (small target scenario) between YOLOv8s-transfer (before/after improvement) and YOLOv9s-transfer.

Figure 9. Comparison of the detection results between a single model and the Ensemble-WBF model.

Table 1. Number of rice disease images and labels.

Pest/Disease Categories	Data Sources	Original Image Count	Augmented Image Count	Label Count
Bacterial blight	Mendeley Data	871	2613	3206
Bacterial blight	Yunnan Agricultural University	871	2613	3206
Blast	Mendeley Data	1065	3195	5139
Blast	Yunnan Agricultural University	1065	3195	5139
Brown spot	Mendeley Data	869	2607	5794
Brown spot	Yunnan Agricultural University	869	2607	5794
Tungro	Mendeley Data	904	2712	4366
Entyloma	Yunnan Agricultural University	630	1890	7307
Rice planthopper	Yunnan Agricultural University	1052	1052	1339
Total		5391	12,572	27,151

Table 2. Performance comparison of different object detection models on the rice pest and disease dataset.

Model	mAP_0.5	Precision	Recall	Param(M)	GFLOps
YOLOv8n	0.818	0.796	0.768	3.01	8.1
YOLOv8s	0.844	0.821	0.802	11.13	28.4
YOLOv9t	0.813	0.809	0.759	1.97	7.6
YOLOv9s [52]	0.86	0.858	0.805	7.17	26.7
RT-DETR-L [53]	0.869	0.854	0.848	32.00	103.5
Faster-RCNN [54]	0.791	-	-	41.37	83.6
EfficientDet-D0 [55]	0.804	-	-	3.83	3.61

Table 3. Comparison of training results of the PlantDoc dataset on different YOLO models.

Model	mAP_0.5	Precision	Recall
YOLOv9s	0.667	0.625	0.619
YOLOv9t	0.59	0.535	0.587
YOLOv8s	0.613	0.561	0.624
YOLOv8n	0.632	0.645	0.564

Table 4. Comparison of the training results of the top five diseases in the Yolov9s model using the PlantDoc dataset.

Type of Disease	mAP_0.5	Precision	Recall
Apple scab leaf	0.497	0.432	0.432
Apple leaf	0.902	0.801	0.859
Apple rust	0.912	0.62	0.824
Bell pepper leaf	0.729	0.7	0.636
Leaf spot disease of bell pepper	0.497	0.252	1

Table 5. Performance comparison of rice disease detection models after transfer learning.

Model	mAP_0.5	Precision	Recall	Param(M)	GFLOPs
YOLOv8n	0.818	0.796	0.768	3.01	8.1
YOLOv8s	0.844	0.821	0.802	11.13	28.4
YOLOv9t	0.813	0.809	0.759	1.97	7.6
YOLOv9s	0.86	0.858	0.805	7.17	26.7
YOLOv8n-transfer	0.853	0.845	0.806	-	-
YOLOv8s-transfer	0.876	0.868	0.843	-	-
YOLOv9t-transfer	0.856	0.847	0.804	-	-
YOLOv9s-transfer	0.883	0.869	0.845	-	-

Table 6. Comparison of ablation experiment results for different components of the improved yolov8s-transfer model.

Transfer Learning	P2	EMA	Focal Loss	mAP_0.5	Param(M)	GFLOPs
+	−	−	−	0.876	11.13	28.4
+	+	−	−	0.888	10.64	37.0
+	+	+	−	0.892	10.63	36.7
+	+	+	+	0.899	10.63	36.7

Table 7. Comparison of pest and disease detection results after model improvement.

Model	Disease	mAP_0.5	Precision	Recall
YOLOv8s	Bacterial blight	0.855	0.761	0.813
	Blast	0.809	0.792	0.764
	Brown spot	0.77	0.727	0.747
	Tungro	0.904	0.871	0.861
	Entyloma	0.8	0.84	0.72
	Rice planthopper	0.929	0.937	0.906
YOLOv8s-transfer	Bacterial blight	0.869	0.833	0.846
	Blast	0.852	0.843	0.833
	Brown spot	0.851	0.844	0.85
	Tungro	0.928	0.899	0.902
	Entyloma	0.822	0.864	0.719
	Rice planthopper	0.934	0.926	0.909
YOLOv9s-transfer	Bacterial blight	0.898	0.84	0.865
	Blast	0.851	0.823	0.818
	Brown spot	0.85	0.837	0.844
	Tungro	0.943	0.887	0.915
	Entyloma	0.831	0.893	0.718
	Rice planthopper	0.928	0.937	0.909
Improved YOLOv8s-transfer	Bacterial blight	0.899	0.864	0.855
	Blast	0.837	0.835	0.814
	Brown spot	0.884	0.839	0.862
	Tungro	0.94	0.898	0.915
	Entyloma	0.889	0.885	0.799
	Rice planthopper	0.948	0.924	0.916

Table 8. Comparison of the comprehensive performance of rice disease detection models before and after ensemble learning.

Model	AP_0.5	AR_0.5:0.95	Inference Time (ms/Image)
YOLOv9s-transfer	0.891	0.611	37.060
YOLOv8s-transfer	0.893	0.613	28.492
Improved YOLOv8s-transfer	0.900	0.616	30.028
Ensemble-NMS	0.915	0.637	100.054
Ensemble-SNMS	0.889	0.610	86.763
Ensemble-NMW	0.914	0.648	86.654
Ensemble-WBF	0.922	0.648	86.021

Table 9. Performance analysis of each disease detection after ensemble learning.

Model	Disease	AP_0.5	AR_0.5:0.95
YOLOv9s-transfer	Bacterial blight	0.869	0.657
	Blast	0.878	0.552
	Brown spot	0.866	0.587
	Tungro	0.946	0.775
	Entyloma	0.842	0.524
	Rice planthopper	0.947	0.570
YOLOv8s-transfer	Bacterial blight	0.868	0.668
	Blast	0.859	0.570
	Brown spot	0.863	0.579
	Tungro	0.924	0.739
	Entyloma	0.867	0.540
	Rice planthopper	0.975	0.584
Improved YOLOv8s-transfer	Bacterial blight	0.869	0.637
	Blast	0.855	0.564
	Brown spot	0.877	0.595
	Tungro	0.925	0.732
	Entyloma	0.908	0.592
	Rice planthopper	0.967	0.578
Ensemble-WBF	Bacterial blight	0.911	0.707
	Blast	0.893	0.595
	Brown spot	0.910	0.628
	Tungro	0.944	0.771
	Entyloma	0.910	0.592
	Rice planthopper	0.967	0.597

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, J.; Tian, P.; Zhao, J.; Zhang, H.; Qian, Y. Training and Optimization of a Rice Disease Detection Model Based on Ensemble Learning. Agriculture 2025, 15, 2283. https://doi.org/10.3390/agriculture15212283

AMA Style

Sun J, Tian P, Zhao J, Zhang H, Qian Y. Training and Optimization of a Rice Disease Detection Model Based on Ensemble Learning. Agriculture. 2025; 15(21):2283. https://doi.org/10.3390/agriculture15212283

Chicago/Turabian Style

Sun, Jihong, Peng Tian, Jiawei Zhao, Haokai Zhang, and Ye Qian. 2025. "Training and Optimization of a Rice Disease Detection Model Based on Ensemble Learning" Agriculture 15, no. 21: 2283. https://doi.org/10.3390/agriculture15212283

APA Style

Sun, J., Tian, P., Zhao, J., Zhang, H., & Qian, Y. (2025). Training and Optimization of a Rice Disease Detection Model Based on Ensemble Learning. Agriculture, 15(21), 2283. https://doi.org/10.3390/agriculture15212283

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Training and Optimization of a Rice Disease Detection Model Based on Ensemble Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition

2.2. Image Processing and Dataset Construction

2.3. Model Construction

2.3.1. YOLOv8s Disease Detection Model Based on Transfer Learning of PlantDoc

2.3.2. EMA Mechanism

2.3.3. Focal Loss

2.3.4. Rice Pest and Disease Detection Model Based on Improved YOLOv8s

2.3.5. Rice Disease Detection Model Based on Ensemble Learning

2.4. Experimental Setup

2.5. Evaluation Metrics

3. Results and Discussion

3.1. Analysis of Experimental Results of Different Object Detection Models

3.2. Performance Analysis of YOLO Pre-Trained Model Based on PlantDoc Dataset

3.2.1. PlantDoc Pre-Trained Model

3.2.2. Validation of the Effectiveness of Transfer Learning

3.3. Improved YOLOv8s for Rice Disease Detection

3.4. Rice Disease Detection Based on Ensemble Learning

3.5. Limitations and Future Prospects

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI