A Lightweight YOLO Model for Rice Panicle Detection in Fields Based on UAV Aerial Images

Song, Zixuan; Ban, Songtao; Hu, Dong; Xu, Mengyuan; Yuan, Tao; Zheng, Xiuguo; Sun, Huifeng; Zhou, Sheng; Tian, Minglu; Li, Linyi

doi:10.3390/drones9010001

Open AccessArticle

A Lightweight YOLO Model for Rice Panicle Detection in Fields Based on UAV Aerial Images

by

Zixuan Song

^1,2,3,†,

Songtao Ban

^2,3,†

,

Dong Hu

^2,3,

Mengyuan Xu

^2,3

,

Tao Yuan

^2,3,

Xiuguo Zheng

^2,3,

Huifeng Sun

^4,5,6,

Sheng Zhou

^4,5,6

,

Minglu Tian

^2,3,*

and

Linyi Li

^2,3,*

¹

College of Information Technology, Shanghai Ocean University, Shanghai 201306, China

²

Institute of Agricultural Science and Technology Information, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China

³

Key Laboratory of Intelligent Agricultural Technology (Yangtze River Delta), Ministry of Agriculture and Rural Affairs, Shanghai 201403, China

⁴

Eco-Environmental Protection Research Institute, Shanghai Academy of Agricultural Science, Shanghai 201403, China

⁵

Shanghai Engineering Research Centre of Low-Carbon Agriculture (SERCLA), Shanghai 201415, China

⁶

Key Laboratory of Low-Carbon Green Agriculture in Southeastern China, Ministry of Agriculture and Rural Affairs, Shanghai 201403, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Drones 2025, 9(1), 1; https://doi.org/10.3390/drones9010001

Submission received: 12 November 2024 / Revised: 12 December 2024 / Accepted: 22 December 2024 / Published: 24 December 2024

Download

Browse Figures

Versions Notes

Abstract

Accurate counting of the number of rice panicles per unit area is essential for rice yield estimation. However, intensive planting, complex growth environments, and the overlapping of rice panicles and leaves in paddy fields pose significant challenges for precise panicle detection. In this study, we propose YOLO-Rice, a rice panicle detection model based on the You Only Look Once version 8 nano (YOLOv8n). The model employs FasterNet, a lightweight backbone network, and incorporates a two-layer detection head to improve rice panicle detection performance while reducing the overall model size. Additionally, we integrate a Normalization-based Attention Module (NAM) and introduce a Minimum Point Distance-based IoU (MPDIoU) loss function to further improve the detection capability. The results demonstrate that the YOLO-Rice model achieved an object detection accuracy of 93.5% and a mean Average Precision (mAP) of 95.9%, with model parameters reduced to 32.6% of the original YOLOv8n model. When deployed on a Raspberry Pi 5, YOLO-Rice achieved 2.233 frames per second (FPS) on full-sized images, reducing the average detection time per image by 81.7% compared to YOLOv8n. By decreasing the input image size, the FPS increased to 11.36. Overall, the YOLO-Rice model demonstrates enhanced robustness and real-time detection capabilities, achieving higher accuracy and making it well-suited for deployment on low-cost portable devices. This model offers effective support for rice yield estimation, as well as for cultivation and breeding applications.

Keywords:

rice panicle detection; YOLOv8; FasterNet; lightweight network

1. Introduction

Rice is one of the world’s most important food crops and plays a crucial role in feeding the world’s population [1]. Yield estimation, a critical aspect of rice production, primarily depends on the number of rice panicles per unit area [2]. Rapid and accurate determination of the number of rice panicles per unit area is important for yield estimation as well as crop breeding and plant phenotyping [3]. Traditionally, the number of rice panicles has been counted manually, which is time-consuming, labor-intensive and inefficient [4]. In order to improve the efficiency of rice panicle detection and counting, there is an urgent need for a fast and accurate real-time detection algorithm suitable for portable computing devices in field conditions.

In recent years, with the advent of artificial intelligence and advances in computer hardware, machine learning and deep learning [5] techniques have been introduced into agriculture. Traditional machine learning excels in handling classification and prediction tasks involving structured data. Algorithms such as support vector machines, random forests, and continuous wavelet transforms have been widely utilized for disease identification [6,7], classification [8], yield prediction [9,10], and growth stages monitoring [11]. However, the above machine learning methods face limitations when applied to complex image-based tasks. For instance, in rice panicle detection [12], these methods rely on manually designed and selected features, leading to insufficient accuracy in recognizing individual rice panicles in complex field conditions and limited generalization ability.

By contrast, deep learning automatically learns features of a target from massive datasets. In recent years, deep learning techniques have been increasingly applied in agriculture, particularly in object detection, including crop classification [13], pest detection [14], fruit identification and counting [15], plant identification [16], weed identification [17], and animal behavior recognition [18]. To meet diverse application requirements, deep learning-based object detection models are categorized into one-stage and two-stage models, accommodating the real-time and accuracy needs of various detection tasks. Two-stage algorithms represented by models such as Region-Convolutional Neural Networks (R-CNNs) [19], Faster R-CNN [20], etc., and candidate regions are generated in the first stage, and targets are classified in the second stage. Zhang et al. [21] proposed a Faster R-CNN-based method to detect rice panicles in indoor environments. Jiang et al. [22] proposed an improved Faster R-CNN model that optimizes the model’s level of detection for occlusion as well as small-sized rice panicles. These studies have demonstrated the performance of two-stage models for rice panicle detection on their respective datasets. However, the detection speed of two-stage models is generally much lower than one-stage methods, and these models tend to be large in size with high computational costs. Consequently, they are not well-suited for real-time applications in the field. The one-stage model predicts both the target’s bounding box and the target category, striking a better balance between detection speed, accuracy, and model volume. Among the one-stage object detection models, You Only Look Once (YOLO) [23] and the Single Shot MultiBox Detector (SSD) [24] stand out as prominent representatives. The YOLO series of algorithms have become widely popular and are extensively used, owing to their high speed and accuracy. Sun et al. [25] and Wang et al. [26] both used a YOLO-based approach to detect rice panicles from images captured by stationary ground-based cameras. However, ground-based cameras have limited coverage for capturing image data and generally operate with lower efficiency, making them unsuitable for large-scale field surveys.

The emergence of drones, also known as UAVs, has significantly expanded coverage and greatly enhanced the efficiency of data acquisition. Equipped with sensors such as high-resolution RGB cameras, multispectral cameras, or radar, drones can rapidly collect large volumes of diverse and comprehensive data, providing valuable insights in a short period of time. Therefore, UAVs have been widely used in agriculture [27]. In the detection of rice panicles, Zhou et al. [28] used a drone to capture images of rice panicles at a height of 17 m and used an improved region-based fully convolutional network; the algorithm achieved a detection accuracy of 86.8%. Chen et al. [29] proposes an algorithm named Refined Feature Fusion for Panicle Counting, which extracts and fuses optimal features based on the size distribution of the objects, with an average counting accuracy of 89.80%. In practical agricultural applications, it is essential to obtain timely information about rice panicles using portable and low-cost equipment. However, most existing algorithms that are capable of real-time detection and counting for rice panicles require significant computational power, making them unsuitable for deployment on resource-constrained portable or edge devices like laptops and microcomputers. Thus, it is crucial to develop efficient rice panicle detection models that can operate on portable, low-cost devices, maintaining high accuracy while enabling real-time detection.

To enable real-time detection and counting of rice panicles with diverse morphologies in complex field environments, this study introduces a lightweight detection model, YOLO-Rice, and demonstrates its deployment on portable computing devices. The main contributions of this work are as follows: (1) Dataset Creation: Image data of rice panicles under varying scales, lighting conditions, and morphologies were collected. Preprocessing steps, including image cropping and data augmentation, were performed to build a comprehensive field dataset of rice panicles. (2) Lightweight Backbone Network: The FasterNet [30] architecture was adopted as the backbone feature extraction network for the YOLO-Rice model. This choice aimed to reduce model parameters and computational complexity, making the model more lightweight. (3) Enhanced Detection Performance: A Normalization-based Attention Module (NAM) was incorporated into the backbone network of the YOLO architecture to enhance the model’s detection performance specifically for rice panicles. (4) Model Optimization: The original three detection heads of the YOLO model were reduced to two, further decreasing the model size while optimizing the neck network structure. (5) Improved Loss Function: The Minimum Point Distance-based IoU (MPDIoU) was employed as a replacement for the CIoU loss function in the YOLOv8n framework to enhance overall network performance. Finally, the optimized lightweight model was deployed on portable computing devices, and its performance was validated through field tests.

2. Materials and Methods

2.1. Image Data Acquisition

The experimental data were collected in the rice fields at the Zhuanghang Comprehensive Experimental Station of Shanghai Academy of Agricultural Sciences, Fengxian District, Shanghai, China (30.891° N, 121.359° E), as shown in Figure 1. The rice, the variety of which was Shenyou 28, was sown in late May and had an approximate growth cycle of 150–155 days. The rice fields comprised 36 plots with four nitrogen fertilization levels (0 kg ha⁻¹, 100 kg ha⁻¹, 200 kg ha⁻¹, and 300 kg ha⁻¹), leading to varied panicle density and morphologies.

Rice panicle images were captured using a DJI Mavic 2 Pro drone (SZ DJI Technology Co., Shenzhen, China), equipped with a Hasselblad L1D-20c camera. The camera has a field of view (FOV) of 77° and effective pixels of 20 million. The images were stored in JPG format with a resolution of 5472 × 3648 pixels. Both ISO and aperture were set to automatic mode for optimal image capture. The flight height was set as about 3 m above the rice canopy. The flight campaign was performed between 9 a.m. and 2 p.m. on 25 September, when the rice was at the tasseling to ripening stage. The weather was partly cloudy with intermittent sunshine, creating varying lighting conditions.

Figure 2 presents a selection of images from the dataset, revealing the diversity within the rice panicle images. These images capture rice panicles in various sizes, shapes, and developmental stages. Moreover, they illustrate challenging environmental factors such as dense clustering, mutual occlusion, varying lighting conditions, and water surface disturbances, all of which significantly complicate the accurate detection of rice panicles.

2.2. Data Processing

To reduce the time required for labeling rice panicle targets and training the model, each original image was first cropped into multiple sub-images of 640 × 640 pixels. These cropped sub-images, saved in JPG format, contained between 10 to 70 rice panicles each. After collecting the rice panicle sub-images, manual labeling of the targets was conducted using LabelImg software (Version 1.8.6). Each rice panicle was annotated by drawing the smallest enclosing rectangle around it, with the labeling information saved in a corresponding TXT file following the YOLO object detection format.

In order to prevent model overfitting and increase model robustness and rice panicle recognition, image enhancement methods including adding blur and noise, cropping, adjusting brightness, adjusting contrast, rotating, flipping, etc., were randomly performed on the labeled images before building the dataset. These methods were used to simulate different data acquisition conditions, such as different lighting (via brightness and contrast adjustments) and different shooting angles (via image flipping and rotation). Ultimately, 5525 rice panicle images were compiled to form the dataset for this experiment. The data enhancement effects are shown in Figure 3.

The data-enhanced images and labeled files were divided into training set, validation set, and test set in a 8:1:1 ratio. The training set, consisting of 4420 images, contained 223,847 rice panicle targets, while the validation set included 552 images with 27,681 rice panicle targets. The test set was made up of 553 images with 28,048 rice panicle targets.

2.3. YOLOv8 Object Detection Model

The YOLO object detection algorithm was introduced in 2015 [23] and quickly became popular due to its high speed and accuracy. Ultralytics LLC, the team behind YOLOv5, released the YOLOv8 in January 2023. As a one-stage object detection algorithm, YOLO provides a better balance between detection accuracy and speed compared to two-stage algorithms, leading to its wide use in agriculture. Similar to YOLOv5, YOLOv8 offers models of different sizes across five scales: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. This study selected YOLOv8n as the foundational model, taking into account factors such as detection speed, training cost, and other relevant considerations.

2.4. YOLO-Rice Rice Panicle Detection Model

The YOLO-Rice lightweight rice panicle detection model proposed in this study made several improvements compared to the original YOLOv8n model. Firstly, the lightweight feature extraction network, FasterNet, was used to replace the original backbone feature extraction network of YOLOv8n. The neck network employed a two-layer detection head and incorporated a NAM module to enhance feature extraction while reducing the model size. Finally, MPDIoU was utilized to improve network performance. The model structure of the improved YOLO-Rice is shown in Figure 4.

2.4.1. Lightweight Feature Extraction Network

In order to improve the processing speed of the network while ensuring accuracy, the FasterNet network was used to replace the original backbone feature extraction network of YOLOv8n. The FasterNet architecture diminishes the computational and memory demands, allowing algorithms to operate efficiently in environments with limited resources. It enhances computational efficiency and accelerates model inference by employing Partial Convolution (PConv).

In the submodules of the FasterNet network, PConv is employed to reduce redundant computations and memory access. The convolution module processes only a subset of the channel information, while the remaining channels are not involved in the computation. This approach allows for the extraction of spatial features while simultaneously reducing FLOPs, lowering network latency, and enhancing computational speed. The use of PConv is particularly beneficial when dealing with complex backgrounds in rice panicle detection tasks. Given that rice panicles vary in density and are often influenced by background elements such as leaves and water, FasterNet’s efficient convolution process allows the model to focus on the most relevant features in the image, improving detection speed and accuracy.

A schematic diagram of the PConv structure is shown in Figure 5. Assuming that the feature maps of the input and output modules have the same number of channels, let

k

represent the kernel size. The theoretical computational cost in terms of FLOPs for a standard

k \times k

convolution can be calculated using the following formula:

{F L O P s}_{C o n v} = h \times w \times k^{2} \times c^{2}

(1)

For the PConv module, let

c_{p}

denote the number of channels in PConv. The module processes only

c_{p}

channels, while the remaining

c - c_{p}

channels are not processed. Therefore, the FLOPs for PConv can be expressed as

{F L O P s}_{P C o n v} = h \times w \times k^{2} \times c_{p}^{2}

(2)

When

c_{p} = \frac{c}{4}

, the FLOPs of PConv are

\frac{1}{16}

of the standard convolution. Additionally, to fully utilize the features from all channels, a Pointwise Convolution (PWConv) module is connected after the PConv, which is analogous to a standard

1 \times 1

convolution. Therefore, the total FLOPs for the two convolution modules can be expressed as

{F L O P s}_{P C o n v + P W C o n v} = h \times w \times (k^{2} \times c_{p}^{2} + c^{2})

(3)

The images captured by UAVs are characterized by a wide viewing angle, high resolution, and real-time acquisition, which impose stringent demands on the processing speed and accuracy of detection algorithms. Leveraging its efficient architecture, FasterNet is capable of quickly processing large volumes of UAV-captured images for rice panicle detection. By reducing computational complexity, FasterNet enables the model to effectively handle rice panicles in environments with varying morphologies, densities, and occlusions, while maintaining high accuracy in challenging scenarios, such as overlapping leaves and complex backgrounds. The integration of FasterNet into the YOLO-Rice network significantly boosts processing speed while maintaining high accuracy, effectively meeting the real-time requirements for rice panicle detection and counting.

2.4.2. Normalization-Based Attention Module

In this study, we collected images of rice panicles with various shapes, sizes, colors, morphologies, and complex and variable backgrounds under different lighting conditions. These differences place high demands on the model’s ability to extract features. Therefore, in order to better capture the rice panicle features while constructing a lightweight model, the NAM [31] module was added. NAM was used to self-adaptively adjust the network’s attention to the rice panicle features, thus improving the model detection performance.

The attention mechanism includes channel attention, spatial attention, and a hybrid of both. SENet [32] pioneered channel attention with its core Squeeze and Excitation module, which uses global average pooling and channel weighting to focus on important channel information, albeit limiting the selection of key regions. To address this limitation, the Convolutional Block Attention Module (CBAM) [33] combines channel and spatial attention, enabling the network to focus on informative channels and their important regions. However, the spatial attention distribution across all output channels in CBAM remains consistent. NAM builds on this by redesigning the channel and spatial attention submodules, as shown in Figure 6, to enhance network flexibility and performance.

For the Channel Attention Module, NAM uses weights from Batch Normalization (BN) [34] for identifying relatively important channels in channel attention, as shown in Equation (4).

B_{o u t} = B N (B_{i n}) = γ \frac{B_{i n} - μ_{B}}{\sqrt{σ_{B}^{2} + ϵ}} + β

(4)

where

B_{i n}, B_{o u t}

are the inputs and outputs of the batch normalization layer, respectively,

μ_{B}

and

σ_{B}

are the mean and variance of the input batch of data, respectively,

γ

and

β

are the parameters used in the scaling and panning learned through training, and

ϵ

is a constant used for numerical stabilization. The Channel Attention Module is shown in Figure 6 with Equation (5).

M_{c} = s i g m o i d (W_{γ} (B N (F_{1})))

(5)

where

M_{c}

represents the output of channel attention,

γ

represents the scaling factor for each channel, and the weight

W_{γ} = γ_{i} / \sum_{j = 0} γ_{j}

. In addition, the Spatial Attention Module is shown in Figure 6 with Equation (6).

M_{s} = s i g m o i d (W_{λ} ({B N}_{s} (F_{2})))

(6)

where

M_{s}

denotes the output of spatial attention,

λ

is the scaling factor in the spatial dimension, and the weight is

W_{λ} = λ_{i} / \sum_{j = 0} λ_{j}

.

By combining the two attention mechanisms, NAM enables the network to selectively focus on informative channels and their corresponding spatial regions. This enhances the model’s ability to identify key features in the image, which is crucial for accurately detecting rice panicles, even in complex background conditions. This targeted emphasis on relevant features plays a crucial role in improving the model’s robustness and its ability to effectively distinguish the target from distracting background elements.

2.4.3. Loss Function Improvement

In the YOLO algorithm, Intersection over Union (IoU) is a measure of how accurately the corresponding object is detected in a given dataset. IoU calculates the overlap rate between the “predicted border” and the “real border”, which is the ratio of their intersection and concatenation, as shown in Equation (7). Ideally, there would be a complete overlap, with a ratio of 1.

I o U = \frac{|A \cap B|}{|A \cup B|}

(7)

Complete Intersection over Union Loss (CIoU) [35] is used in the YOLOv8n algorithm. However, the CIoU formula mainly calculates the difference in aspect ratios rather than the true difference between width and height and its confidence level, which sometimes affects model optimization. Moreover, the rice panicle targets in this study had different morphologies and distribution densities, and the edges of the rice panicles were irregular, with overlapping in some areas. CIoU has some limitations in dealing with these type of rice panicle targets with complex shapes and irregular locations.

To enhance the detection of rice panicles in complex backgrounds, this study optimizes the model by integrating the MPDIoU loss function. MPDIoU extends the traditional IoU method by incorporating the minimum point distance, while also considering key factors such as overlapping areas, centroid distance, and deviations in width and height, thereby improving the precision of bounding box regression. In complex backgrounds, challenges such as background noise, occlusion, and cluttered elements often hinder accurate detection, particularly when the background closely resembles the target object. Traditional IoU methods may struggle to effectively differentiate the target from the background under such conditions. MPDIoU mitigates this issue by calculating the minimum distance between the upper-left and lower-right corners of the predicted and ground truth bounding boxes, ensuring better alignment and more accurate localization. The formulas of the MPDIoU calculation process are as follows:

d_{1}^{2} = {(x_{1}^{B} - x_{1}^{A})}^{2} + {(y_{1}^{B} - y_{1}^{A})}^{2}

(8)

d_{2}^{2} = {(x_{2}^{B} - x_{2}^{A})}^{2} + {(y_{2}^{B} - y_{2}^{A})}^{2}

(9)

M P D I o U = I o U - \frac{d_{1}^{2}}{w^{2} + h^{2}} - \frac{d_{2}^{2}}{w^{2} + h^{2}}

(10)

L_{M P D I o U} = 1 - M P D I o U

(11)

where

(x_{1}^{A}, y_{1}^{A})

and

(x_{2}^{A}, y_{2}^{A})

are the coordinates of the upper-left and lower-right corners of the prediction frame

A

;

(x_{1}^{B}, y_{1}^{B})

and

(x_{2}^{B}, y_{2}^{B})

are the coordinates of the upper-left and lower-right corners of the target frame.

2.4.4. Network Neck Improvement

During the UAV flight, rice panicles appear smaller compared to images captured by a close-range handheld device, primarily due to the flight altitude. Additionally, the overhead angle limits the capture to only the upper portions of the rice panicles, leading to reduced pixel coverage for certain targets. To accurately detect these smaller and incomplete targets, the network must focus on more detailed information. The original neck network of the YOLOv8n has three scaled feature layers: P3 (80 × 80), P4 (40 × 40), and P5 (20 × 20). The P5 detection head in the network is often used to process lower resolution feature maps and is primarily used for the detection of medium to large targets. Given the dataset’s limited number of larger rice panicle targets, we opted to remove the P5 detection head and retain only the P3 and P4 layers. This modification improves detection accuracy for small- and medium-sized targets while also reducing model complexity.

2.5. Experiment Platform

The training and testing of models were conducted on a high-performance computer (HPC) running the Windows 11 operating system. The software environment for model training included Python 3.11, CUDA 12.3, and cuDNN 8.9.6.50, with the deep learning framework being PyTorch 2.2.1. Table 1 presents some of the training parameter configurations used in this study.

In this study, the original training platform, a high-performance computing (HPC) system, offers substantial computational power and processing speed. However, it does not align with the practical application requirements of this research. To enable real-time processing and analysis of field data, the proposed lightweight model must be deployable on resource-constrained portable devices or embedded systems, which are typically limited in computational power, memory, and storage capacity.

To comprehensively evaluate the model’s performance in real-world scenarios, three representative test platforms were selected: the ThinkPad X13 Gen 2, the Apple Mac mini M2, and the Raspberry Pi 5. The ThinkPad X13 Gen 2 and Apple Mac mini M2 exemplify portable computing devices optimized for scenarios demanding both high performance and compact form factors. In contrast, the Raspberry Pi 5 represents a low-power embedded device widely used in edge computing applications. These platforms also span a diverse range of operating systems—Windows, macOS, and Linux—and processor architectures, enabling a well-rounded assessment of the model under varying computational constraints.

The main hardware specifications of these devices are summarized in Table 2. The software environment for model testing includes Python 3.11, with PyTorch 2.2.1 serving as the deep learning framework.

2.6. Evaluating Metric

The study used four evaluation metrics to assess the performance of the models for rice panicle detection, namely mean Average Precision (mAP), Precision, and Recall, as shown in Equations (12)–(15).

A P = \int_{0}^{1} P (R) d R

(12)

m A P = \frac{\sum_{i = 1}^{n} A P_{i}}{n}

(13)

P e r c i s i o n = \frac{T P}{T P + F P}

(14)

R e c a l l = \frac{T P}{T P + F N}

(15)

where Average Precision (AP) denotes the precision for individual categories. The mAP is defined as the average of the AP values across all categories, representing the overall precision of the model. From Equations (14) and (15), it can be inferred that the calculations of Precision and Recall are contingent upon the counts of True Positives (TP), False Positives (FP), and False Negatives (FN). TP indicates that positive sample targets are correctly identified as positive, while True Negative (TN) signifies that negative samples are correctly identified as negative. FP refers to negative samples that are incorrectly classified as positive, and FN denotes positive samples that are incorrectly classified as negative. Precision is the proportion of model-identified positive samples that are actually positive, while Recall is the proportion of actual positive samples correctly identified by the model.

To evaluate the complexity and detection speed of the models, this study utilized three metrics: Parameters, Model Size (MS), and Frames Per Second (FPS). Parameters refers to the total number of learnable and optimizable parameters within the model. MS denotes the amount of disk space occupied by the model file after training is completed. FPS is calculated based on the total time consumed by each module of the algorithm.

The counting performance of the models was evaluated using the coefficient of determination (R-squared, R²) and Root Mean Square Error (RMSE). A higher R² value indicates a stronger relationship between the predicted and actual values. RMSE quantifies the deviation between predicted and actual values, with a smaller RMSE signifying that the predicted values are closer to the actual values. Therefore, an R² value closer to 1, combined with a lower RMSE, indicates greater accuracy in the model’s counting results. The calculations for R² and RMSE are presented in Equations (16) and (17).

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(16)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}}

(17)

where

y_{i}, {\hat{y}}_{i}, \bar{y}

are the actual value, predicted value, and the average of the actual value, respectively, and n is the number of samples.

2.7. Experiment Setting

We conducted several experiments to validate the performance of YOLO-Rice, using the dataset developed in this study for training and testing all comparative models. The specific experimental procedures are as follows: (1) To determine the optimal lightweight network, FasterNet, MobileNetv3 [36], GhostNet [37], and ShuffleNet [38] were integrated into the backbone of YOLOv8n, and their performance was evaluated. (2) An ablation study was performed to assess the impact of various improvement mechanisms on model performance. (3) YOLO-Rice was compared with several mainstream deep learning object detection algorithms from recent years. (4) Finally, to verify the lightweight nature of the proposed model and its performance on resource-constrained devices, all models were deployed on three devices, and the impact of varying input image resolutions on performance in embedded systems was further evaluated. The technical flow of this study is shown in Figure 7.

3. Results

3.1. Comparative Analysis with Different Lightweight Networks

To determine the optimal lightweight network, FasterNet, MobileNetv3, GhostNet, and ShuffleNet were integrated into the backbone of YOLOv8n, and their performance was evaluated. The results are presented in Table 3. FasterNet achieved the highest accuracy on the rice panicle dataset utilized in this study, with relatively balanced metrics. Compared to the other networks, GhostNet and ShuffleNet have significantly less model parameters. However, the model employing ShuffleNet exhibited more pronounced losses in Recall and Precision on the rice panicle dataset. The MobileNetv3 network experienced a smaller decrease in accuracy, but it had a 34.7% increase in model parameters compared to FasterNet, along with a slight decrease in detection speed. Although the parameters of the GhostNet model were slightly less than those of FasterNet, its other performance indices were inferior. Therefore, considering all aspects, FasterNet is the most suitable backbone network for the improved model of rice panicle detection.

3.2. Ablation Experiments

In this study, a rice panicle detection model, YOLO-Rice, was constructed based on the YOLOv8n algorithm. The test results for various model improvements are summarized in Table 4. Here, the base model is YOLOv8n. A represents replacing the backbone network of the YOLO model with FasterNet; B represents replacing the network head with two detection heads from three detection heads of the original neck network on the basis of A; C represents adding NAM to the Neck part; and D represents replacing the original CIoU loss function with MPDIoU loss function.

The experimental results indicate that YOLO-Rice achieves a significant reduction in both model size and parameters, with the parameter number at only 32.6% of the original model and the model volume reduced to 34.9% of its predecessor. Additionally, the detection speed increased by 17.4%. Specifically, the replacement of the backbone network with FasterNet and the use of two detection heads effectively lowered the parameter number of the original model. However, this modification resulted in a decrease in the mAP from 95.1% to 94.3%. Due to the reduction in the number of layers and computational load, the model attained a detection speed of 124.95 FPS. Furthermore, the incorporation of the NAM module enhanced the network’s feature extraction capabilities, improving the model’s performance in detecting rice panicles against complex backgrounds, leading to a 0.9% increase in mAP. Although the introduction of the NAM module slightly reduced detection speed, the overall performance was improved. Optimization of the loss function further enhanced model performance. As shown in Table 4, these incremental improvements led to a progressive enhancement in YOLO-Rice’s detection performance. Compared to the original network, YOLO-Rice achieved a mAP of 95.9%, with model parameters reduced to 981,778, and the model size decreased to 2327 KB, reflecting reductions of 67.4% and 65.1%, respectively. The detection speed increased to 121.648 FPS, with an average processing time of only 8.22 ms per image.

3.3. Comparative Analysis with Different Object Detection Models

The proposed model in this study was compared with various object detection algorithms, including Faster R-CNN, SSD, and lightweight YOLO series algorithms such as YOLOv4-Tiny, YOLOv7-Tiny, YOLOv5n, and YOLOv8n, to validate the comprehensive performance of the YOLO-Rice model. The results are presented in Table 5, while the comparative performance is illustrated in Figure 8.

Table 5 demonstrates that the mAP of YOLO-Rice is improved by 18.3%, 2.9%, 9.2%, and 1.6% compared to Faster R-CNN, SSD, YOLOv4-Tiny, and YOLOv5n, respectively, achieving outstanding detection performance. The YOLO-Rice model comprises a total of 981,778 parameters, which is significantly less than that of the other models. Additionally, YOLO-Rice exhibits the fastest detection speed, with an FPS of 121.648, outperforming Faster R-CNN, SSD, YOLOv4-Tiny, YOLOv7-Tiny, YOLOv5n, and YOLOv8n. These results indicate that the detection performance of the YOLO-Rice model surpasses that of the other models while maintaining an advantageous balance between model size and detection speed.

Figure 8 presents the detection results of different object detection models on images featuring diverse rice panicle morphologies, complex backgrounds, dense distributions, and occlusions. Arrows indicate areas where the models made errors, such as false detections, omissions, or duplicate detections. The proposed YOLO-Rice model demonstrated superior performance in detecting rice panicles, particularly in complex backgrounds, densely packed scenes, and small-sized panicles, offering more accurate and comprehensive results.

3.4. Performance of Different for Rice Panicle Detection Models

To evaluate the performance of various object detection models for rice panicle detection, 100 images were randomly selected, and the actual number of rice panicles in each image was manually counted. Each model was then applied to detect panicles in these images, and performance was assessed using R² and RMSE. As illustrated in Figure 9, the YOLO-Rice model achieved the highest performance, with an R² of 0.925 and an RMSE of 3.075, surpassing the other models. The original YOLOv8n model closely followed, with an R² of 0.911 and an RMSE of 3.345, but was surpassed by YOLO-Rice, which handled complex scenes more effectively. Other models, including Faster R-CNN (R² = 0.808, RMSE = 4.923), SSD (R² = 0.849, RMSE = 4.363), YOLOv4-Tiny (R² = 0.815, RMSE = 4.828), YOLOv7-Tiny (R² = 0.816, RMSE = 4.818), and YOLOv5n (R² = 0.910, RMSE = 3.356), exhibited higher error rates, indicating lower effectiveness in handling dense panicle distributions and occlusions. The superior performance of YOLO-Rice is attributed to its optimized lightweight architecture, which improves its capability to detect small targets in complex environments. In comparison to other models, YOLO-Rice demonstrated superior generalization and robustness, rendering it particularly well-suited for rice panicle detection in challenging drone-captured scenes.

3.5. Performance of YOLO-Rice on Portable Computing Devices

To evaluate the performance of YOLO-Rice as a lightweight model on low-cost and resource-constrained portable or edge devices, this study deployed all models on three computers with varying configuration: the ThinkPad X13 Gen 2, Apple Mac mini M2, and Raspberry Pi 5. The model speed tests utilized all 5525 images from the dataset, with image detection time and FPS as the primary evaluation metrics.

As presented in Table 6, YOLO-Rice consistently outperformed other mainstream object detection models across all tested hardware platforms. On the HPC device equipped with an RTX 4080 GPU, YOLO-Rice achieves an inference speed of 121.648 FPS, which is 17.9% faster than the next-fastest lightweight models. When deployed on a ThinkPad laptop and an Apple Mac mini, both of which rely solely on the CPU for inference, YOLO-Rice maintains speeds approaching 25 FPS, outperforming all comparative models. Furthermore, on the Raspberry Pi 5, using the same environmental configuration and input images of 640 × 640 pixels yields an average detection speed of 2.233 FPS, representing an 81.7% improvement over YOLOv8n while maintaining an acceptable level of detection accuracy. Subsequently, we retrained and tested the performance of YOLO-Rice on the Raspberry Pi 5 by reducing the input image size to 256 × 256 pixels. This adjustment allows the model to achieve 91.9% of the mAP for rice panicle detection, while the average detection speed improves to 11.36 FPS. In summary, compared to other object detection models, YOLO-Rice enhances detection speed for target images by reducing both network parameters and computational requirements, making it more suitable for deployment on low-cost, portable or edge devices with limited computational resources. This capability facilitates real-time analysis of field data in practical agricultural settings, offering a valuable tool for efficient, on-site crop monitoring and yield estimation.

4. Discussion

4.1. Effect of Optical Distortion on YOLO-Rice

In this study, the UAV was equipped with a wide-angle lens camera operating at a flight height of 3 m above the rice canopy, with the lens oriented vertically downward. Under these conditions, significant optical distortion was observed at the edges of the images. The central region of the images was less affected by optical distortion, where the rice panicles were clear, with distinct features and a low overlap rate. However, as the rice panicles approached the image edges, the distortion increased, resulting in elongated panicle shapes, increased overlap between panicles, and more pronounced occlusion by leaves (as shown in Figure 10a).

To improve detection accuracy in edge regions, we took two approaches: first, ensuring sufficient edge region samples during annotation, and second, modifying the model by adding the NAM module, integrating the MPDIoU loss function, and removing the P5 detection head, among other adjustments. These improvements aimed to enhance the model’s performance in detecting rice panicles under complex conditions. However, these methods can only maximize detection accuracy for the existing data and remain ineffective for panicles that are completely obscured or heavily overlapped by leaves in the edge regions. By separately evaluating the detection of rice panicles in the central and edge regions of the images, the results revealed that the mAP for the central region reached 97.0%, while the mAP for the edge region was 93.7% (as shown in Figure 10b).

The primary factor contributing to this difference is that rice panicles in the central region are less susceptible to occlusion and background noise, which makes the target features more distinguishable. In contrast, the peripheral region is more influenced by optical distortions, such as barrel or pincushion distortion, resulting from lens characteristics and variations in shooting angle. These distortions alter the shape and position of the targets, leading to a decline in detection accuracy in the edge regions.

To further improve detection accuracy, future efforts could prioritize optimizing data acquisition. Minimizing optical distortions at the edges of images requires the use of higher-resolution cameras, lenses with narrower fields of view, and increased flight altitudes during data collection. These strategies aim to ensure high-quality images at the source while effectively reducing distortion.

4.2. Effect of Different Scenarios and Growth States on YOLO-Rice

Rice panicles in different growth stages exhibited significant variations in morphology, size, color, density, and background. As shown in Figure 2, panicles with better fertilizer applications tended to be larger, with denser distribution and more pronounced occlusion, while panicles in poorer growth conditions were smaller and more sparsely distributed, with minimal occlusion. In terms of color, some panicles, which had reached or were approaching maturity, appeared bright yellow, whereas others, in earlier growth stages, retained a yellow-green hue similar to the leaves. Regarding the background, rice plants in poor condition were smaller and more sparsely distributed, with fewer occlusions between leaves and panicles or between adjacent panicles; their background mainly consisted of water. In contrast, well-growing rice had a higher planting density, with the background predominantly consisting of rice leaves, leading to more occlusion between leaves and panicles, as well as between adjacent panicles. Additionally, some images were affected by strong direct sunlight and other noise factors.

Figure 11 presents the detection results of the proposed YOLO-Rice model on the dataset illustrated in Figure 2. The experimental results demonstrate that the proposed method effectively identified the majority of panicles in these complex environments. It performed well across panicles of varying morphologies and planting densities, indicating the model’s robustness and strong generalization ability. Although some panicles in densely distributed and heavily occluded areas were missed, the occurrence of repeated detections was minimal, and the model still achieved a high level of accuracy in both detection and counting.

5. Conclusions

This study presents YOLO-Rice, a lightweight and efficient rice panicle detection model tailored for large field applications using UAV imagery. The model leverages a modified YOLOv8n framework, integrating a lightweight backbone network, FasterNet, and a two-layer detection head to enhance detection performance while reducing computational overhead. The inclusion of the NAM and the MPDIoU loss function further improves the model’s ability to accurately detect rice panicles in complex agricultural settings. The experimental results validate the effectiveness of YOLO-Rice, achieving an object detection accuracy and mAP of 93.5% and 95.9%, respectively. The model’s compact size, with parameters reduced to only 32.6% of the original YOLOv8n model, facilitates deployment on resource-constrained platforms such as Raspberry Pi 5. The model demonstrated a significant increase in FPS, with a 15.3% reduction in average detection time per image compared to YOLOv8n, highlighting its suitability for real-time applications. The performance of YOLO-Rice on various computing devices, including high-performance computers and low-cost, portable or edge devices like Raspberry Pi 5, underscores its versatility and practicality for in-field use. The model’s robustness against challenges such as optical distortion and varying growth conditions of rice panicles was also demonstrated, although some miss-detections in densely packed areas were noted, suggesting potential areas for future improvement. In conclusion, YOLO-Rice offers a promising solution for accurate and efficient rice panicle detection, contributing valuable support to rice yield estimation and related agricultural practices.

In this study, to ensure detection accuracy, our experimental design prioritized capturing the clearest possible images with minimal disruption to the canopy caused by drone-induced wind effects. However, this approach, while effective for improving detection precision under controlled conditions, may not fully account for more variability scenarios, such as different drone models, camera specifications, and diverse rice field conditions. Future research should address these limitations by collecting more diverse datasets. This includes using drones and cameras of various models, acquiring images at multiple altitudes, and covering a wider range of rice varieties and planting methods. Such efforts will contribute to the development of rice panicle detection models with broader applicability and improved performance across diverse environments, enabling the models to be effectively utilized in real-world crop monitoring scenarios.

Author Contributions

Conceptualization, S.B. and M.T.; methodology, Z.S.; supervision, L.L.; validation, D.H., T.Y. and Z.S.; formal analysis, Z.S.; investigation, D.H., M.X. and Z.S.; resources, H.S. and S.Z.; data curation, S.B. and X.Z.; writing—original draft preparation, Z.S.; writing—review and editing, S.B. and M.T.; visualization, Z.S.; project administration, L.L.; funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shanghai Agricultural Science and Technology Innovation Program (Grant No. I2023005), the Application of Spatial Information Technology project at the Demonstration Base for Unmanned Farms in Shanghai (HCXBCY-2023-028) under the 2023 Shanghai Industrial Coordination and Innovation Project, and the Shanghai Academy of Agricultural Sciences Program for Excellent Research Team (Grant No. 2022015).

Data Availability Statement

Data will be made available upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zeigler, R.S.; Barclay, A. The relevance of rice. Rice 2008, 1, 3–10. [Google Scholar] [CrossRef]
Fageria, N.K. Yield Physiology of Rice. J. Plant Nutr. 2007, 30, 843–879. [Google Scholar] [CrossRef]
Duan, L.; Huang, C.; Chen, G.; Xiong, L.; Liu, Q.; Yang, W. Determination of rice panicle numbers during heading by multi-angle imaging. Crop J. 2015, 3, 211–219. [Google Scholar] [CrossRef]
Madec, S.; Jin, X.; Lu, H.; De Solan, B.; Liu, S.; Duyme, F.; Heritier, E.; Baret, F. Ear density estimation from high resolution RGB imagery using deep learning technique. Agric. For. Meteorol. 2019, 264, 225–234. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Sethy, D.P.K.; Barpanda, N.; Rath, A.; Behera, S. Deep feature based rice leaf disease identification using support vector machine. Comput. Electron. Agric. 2020, 175, 105527. [Google Scholar] [CrossRef]
Conrad, A.O.; Li, W.; Lee, D.-Y.; Wang, G.L.; Rodriguez-Saona, L.; Bonello, P. Machine Learning-Based Presymptomatic Detection of Rice Sheath Blight Using Spectral Profiles. Plant Phenomics 2020, 2020, 1–10. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Ke, S.; Chen, W. Detection of Rice Exterior Quality based on Machine Vision. JDCTA 2012, 6, 28–35. [Google Scholar] [CrossRef]
Gu, C.; Ji, S.; Xi, X.; Zhang, Z.; Hong, Q.; Huo, Z.-y.; Li, W.; Mao, W.; Zhao, H.; Zhang, R.-X.; et al. Rice Yield Estimation Based on Continuous Wavelet Transform with Multiple Growth Periods. Front. Plant Sci. 2022, 13, 931789. [Google Scholar] [CrossRef] [PubMed]
Zhang, K.-k.; Ge, X.; Shen, P.; Li, W.; Liu, X.; Cao, Q.; Zhu, Y.; Cao, W.; Tian, Y. Predicting Rice Grain Yield Based on Dynamic Changes in Vegetation Indexes During Early to Mid-Growth Stages. Remote Sens. 2019, 11, 387. [Google Scholar] [CrossRef]
Tan, S.; Liu, J.; Lu, H.; Lan, M.; Yu, J.; Liao, G.; Wang, Y.; Li, Z.; Qi, L.; Ma, X. Machine Learning Approaches for Rice Seedling Growth Stages Detection. Front. Plant Sci. 2022, 13, 914771. [Google Scholar] [CrossRef]
Hayat, M.A.; Wu, J.; Cao, Y. Unsupervised Bayesian learning for rice panicle segmentation with UAV images. Plant Methods 2020, 16, 18. [Google Scholar] [CrossRef] [PubMed]
Guadagna, P.; Fernandes, M.; Chen, F.; Santamaria, A.; Teng, T.; Frioni, T.; Caldwell, D.; Poni, S.; Semini, C.; Gatti, M. Using deep learning for pruning region detection and plant organ segmentation in dormant spur-pruned grapevines. Precis. Agric. 2023, 24, 1547–1569. [Google Scholar] [CrossRef]
Xu, W.; Xu, T.; Thomasson, J.A.; Chen, W.; Karthikeyan, R.; Tian, G.; Shi, Y.; Ji, C.; Su, Q. A lightweight SSV2-YOLO based model for detection of sugarcane aphids in unstructured natural environments. Comput. Electron. Agric. 2023, 211, 107961. [Google Scholar] [CrossRef]
Cui, M.; Lou, Y.; Ge, Y.; Wang, K. LES-YOLO: A lightweight pinecone detection algorithm based on improved YOLOv4-Tiny network. Comput. Electron. Agric. 2023, 205, 107613. [Google Scholar] [CrossRef]
Li, J.; Li, J.; Zhao, X.; Su, X.; Wu, W. Lightweight detection networks for tea bud on complex agricultural environment via improved YOLO v4. Comput. Electron. Agric. 2023, 211, 107955. [Google Scholar] [CrossRef]
Rai, N.; Sun, X. WeedVision: A single-stage deep learning architecture to perform weed detection and segmentation using drone-acquired images. Comput. Electron. Agric. 2024, 219, 108792. [Google Scholar] [CrossRef]
Xiong, H.; Xiao, Y.; Zhao, H.; Xuan, K.; Zhao, Y.; Li, J. AD-YOLOv5: An object detection approach for key parts of sika deer based on deep learning. Comput. Electron. Agric. 2024, 217, 108610. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Zhang, Y.; Xiao, D.; Chen, H.; Liu, Y. Rice panicle detection method based on improved faster R-CNN. Trans. Chin. Soc. Agric. Mach. 2021, 52, 231–240. [Google Scholar]
Jiang, H.; Xu, C.; Chen, Y.; Cheng, Y. Detecting and counting method for small-sized and occluded rice panicles based on in-field images. Trans. Chin. Soc. Agric. Mach. 2020, 51, 152–162. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part I 14. pp. 21–37. [Google Scholar]
Sun, B.; Zhou, W.; Zhu, S.; Huang, S.; Yu, X.; Wu, Z.; Lei, X.; Yin, D.; Xia, H.; Chen, Y.; et al. Universal detection of curved rice panicles in complex environments using aerial images and improved YOLOv4 model. Front. Plant Sci. 2022, 13, 1021398. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Yang, W.; Lv, Q.; Huang, C.; Liang, X.; Chen, G.; Xiong, L.; Duan, L. Field rice panicle detection and counting based on deep learning. Front. Plant Sci. 2022, 13, 966495. [Google Scholar] [CrossRef]
Rejeb, A.; Abdollahi, A.; Rejeb, K.; Treiblmaier, H. Drones in agriculture: A review and bibliometric analysis. Comput. Electron. Agric. 2022, 198, 107017. [Google Scholar] [CrossRef]
Zhou, C.; Ye, H.; Hu, J.; Shi, X.; Hua, S.; Yue, J.; Xu, Z.; Yang, G. Automated Counting of Rice Panicle by Applying Deep Learning Model to Images from Unmanned Aerial Vehicle Platform. Sensors 2019, 19, 3106. [Google Scholar] [CrossRef]
Chen, Y.; Xin, R.; Jiang, H.; Liu, Y.; Zhang, X.; Yu, J. Refined feature fusion for in-field high-density and multi-scale rice panicle counting in UAV images. Comput. Electron. Agric. 2023, 211, 108032. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.-h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, Don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Liu, Y.; Shao, Z.; Teng, Y.; Hoffmann, N. NAM: Normalization-based Attention Module. arXiv 2021, arXiv:2111.12419. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. AAAI 2019, 34, 12993–13000. [Google Scholar] [CrossRef]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. arXiv 2019, arXiv:1905.02244. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. arXiv 2019, arXiv:1911.11907. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]

Figure 1. Location and UAV aerial RGB image of data acquisition in this study.

Figure 2. Rice panicles in different conditions. (a) Rice panicles with a water surface background; (b) heavily occluded rice panicles; (c) rice panicles in a complex environment.

Figure 3. Data enhancement effect.

Figure 4. YOLO-Rice network structure.

Figure 5. PConv structure.

Figure 6. (a) Channel Attention Module; (b) Spatial Attention Module.

Figure 7. Technical flow.

Figure 8. Detecting results of different object detection models, where orange arrows mark missed, false and repeat detection targets, red and blue boxes represent detected targets.

Figure 9. Counting accuracy of different models.

Figure 10. Morphology of rice panicles in different regions, where the red boxes indicate the targets detected by the model. (a) Original images; (b) detected images.

Figure 11. Identification results of rice panicles with different morphology, where the red boxes indicate the targets detected by the model. (a) Rice panicles in the water background; (b) densely growing rice panicles; (c) rice panicles under different light conditions.

Table 1. Configuration of experimental parameters.

Parameter	Values
Input image size	640 × 640
Epochs	300
Initial learning rate	0.01
Momentum	0.937
Weight decay	0.0005
Optimizer	SGD

Table 2. Device Information.

Device	RAM	CPU	GPU
HPC	32 GB	Intel^® Core™ i9-13900KF	NVIDIA GeForce RTX 4080
ThinkPad X13 Gen 2	16 GB	Intel^® Core™ i5-1135G7	-
Apple Mac mini M2	16 GB	Apple Silicon M2	-
Raspberry Pi 5	8 GB	Broadcom BCM2712	-

Table 3. Comparison of different lightweight networks.

Backbone	Precision	Recall	mAP	Parameters	FPS	MS(MB)
YOLOv8n	0.936	0.927	0.951	3,009,587	103.621	6.3
FasterNet	0.924	0.910	0.939	1,962,643	117.204	4.2
MobileNetv3	0.927	0.887	0.922	2,642,913	115.864	5.3
GhostNet	0.925	0.893	0.928	1,719,156	121.832	3.6
ShuffleNet	0.916	0.864	0.911	1,712,147	123.041	3.5

Table 4. Ablation experiments results.

YOLOv8n	A	B	C	D	Precision	Recall	mAP	Parameters	FPS	MS(MB)
√					0.936	0.927	0.951	3,009,587	103.621	6.3
√	√				0.924	0.910	0.939	1,962,643	117.204	4.2
√	√	√			0.923	0.921	0.943	981,650	124.95	2.2
√	√	√	√		0.927	0.926	0.952	981,778	118.251	2.2
√	√	√	√	√	0.935	0.932	0.959	981,778	121.648	2.2

√ represents the various improvements in the network.

Table 5. Comparison of different object detection models.

Model	Precision	Recall	mAP	Parameters	FPS	MS(MB)
Faster R-CNN	0.710	0.809	0.776	136,689,024	36.905	108
SSD	0.818	0.894	0.930	23,611,734	89.137	90.6
YOLOv4-Tiny	0.746	0.881	0.867	5,874,116	91.429	22.4
YOLOv7-Tiny	0.951	0.935	0.960	6,007,596	108.696	11.7
YOLOv5n	0.935	0.912	0.943	2,503,139	117.309	5.05
YOLOv8n	0.936	0.927	0.951	3,009,587	103.621	6.3
YOLO-Rice	0.935	0.932	0.959	981,778	121.648	2.2

Table 6. Detection speed comparison results.

Model	HPC		ThinkPad X13 Gen 2		Apple Mac Mini M2		Raspberry Pi 5
Model	Speed (FPS)	Time (s)	Speed (FPS)	Time (s)	Speed (FPS)	Time (s)	Speed (FPS)	Time (s)
Faster R-CNN	36.905	149.7	0.060	92,083.3	0.233	23,712.4	0.081	68,210.0
SSD	89.137	62.0	0.919	6012.0	1.232	4484.6	0.123	44,918.7
YOLOv4-Tiny	91.429	60.4	10.242	539.4	15.374	359.3	1.595	3463.9
YOLOv7-Tiny	34.364	160.8	4.263	1296.0	6.531	846.0	1.597	3459.6
YOLOv5n	103.173	53.6	18.184	303.8	15.938	346.7	1.278	4292.9
YOLOv8n	100.621	54.9	16.825	328.4	16.447	335.9	1.229	4495.5
YOLO-Rice	121.648	45.4	24.826	222.5	24.715	223.5	2.233	2474.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, Z.; Ban, S.; Hu, D.; Xu, M.; Yuan, T.; Zheng, X.; Sun, H.; Zhou, S.; Tian, M.; Li, L. A Lightweight YOLO Model for Rice Panicle Detection in Fields Based on UAV Aerial Images. Drones 2025, 9, 1. https://doi.org/10.3390/drones9010001

AMA Style

Song Z, Ban S, Hu D, Xu M, Yuan T, Zheng X, Sun H, Zhou S, Tian M, Li L. A Lightweight YOLO Model for Rice Panicle Detection in Fields Based on UAV Aerial Images. Drones. 2025; 9(1):1. https://doi.org/10.3390/drones9010001

Chicago/Turabian Style

Song, Zixuan, Songtao Ban, Dong Hu, Mengyuan Xu, Tao Yuan, Xiuguo Zheng, Huifeng Sun, Sheng Zhou, Minglu Tian, and Linyi Li. 2025. "A Lightweight YOLO Model for Rice Panicle Detection in Fields Based on UAV Aerial Images" Drones 9, no. 1: 1. https://doi.org/10.3390/drones9010001

APA Style

Song, Z., Ban, S., Hu, D., Xu, M., Yuan, T., Zheng, X., Sun, H., Zhou, S., Tian, M., & Li, L. (2025). A Lightweight YOLO Model for Rice Panicle Detection in Fields Based on UAV Aerial Images. Drones, 9(1), 1. https://doi.org/10.3390/drones9010001

Article Menu

A Lightweight YOLO Model for Rice Panicle Detection in Fields Based on UAV Aerial Images

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Data Acquisition

2.2. Data Processing

2.3. YOLOv8 Object Detection Model

2.4. YOLO-Rice Rice Panicle Detection Model

2.4.1. Lightweight Feature Extraction Network

2.4.2. Normalization-Based Attention Module

2.4.3. Loss Function Improvement

2.4.4. Network Neck Improvement

2.5. Experiment Platform

2.6. Evaluating Metric

2.7. Experiment Setting

3. Results

3.1. Comparative Analysis with Different Lightweight Networks

3.2. Ablation Experiments

3.3. Comparative Analysis with Different Object Detection Models

3.4. Performance of Different for Rice Panicle Detection Models

3.5. Performance of YOLO-Rice on Portable Computing Devices

4. Discussion

4.1. Effect of Optical Distortion on YOLO-Rice

4.2. Effect of Different Scenarios and Growth States on YOLO-Rice

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI