Development of a 2D Image-Based Rice Panicle-Level Yield Prediction Framework Using Image-Based Reconstruction Technique

Kim, Daehong; Lim, Hyeongjun; Kim, Sojung

doi:10.3390/agronomy16090896

Open AccessArticle

Development of a 2D Image-Based Rice Panicle-Level Yield Prediction Framework Using Image-Based Reconstruction Technique

by

Daehong Kim

,

Hyeongjun Lim

and

Sojung Kim

^*

Department of Industrial and Systems Engineering, Dongguk University-Seoul, Seoul 04620, Republic of Korea

^*

Author to whom correspondence should be addressed.

Agronomy 2026, 16(9), 896; https://doi.org/10.3390/agronomy16090896

Submission received: 20 March 2026 / Revised: 26 April 2026 / Accepted: 28 April 2026 / Published: 29 April 2026

(This article belongs to the Special Issue Advanced Machine Learning in Agriculture—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Asian countries, which account for more than 60% of global rice consumption, are expanding the adoption of precision agriculture technology using image sensors to increase the profitability of rice production. This requires the development of technology to process 2D images that can be obtained by individual farmers instead of expensive 3D scanners. This study aims to quantitatively extract grain-level shape information necessary for yield prediction using 2D rice panicle images. To achieve this, a framework for predicting rice panicle yield from 2D images that uses a convolutional neural network (CNN) to detect grains is developed. Unlike existing approaches that measure grain length, width, and thickness using vernier calipers or 3D scanners to reconstruct 3D volume and estimate yield factors through volume-weight relationships, this methodology utilizes panicle length and projected grain area, which are relatively stable shape indices derived from 2D panicle images, to accurately describe weight variation within the same variety (e.g., Huaidao, Sidao, Suxiu, Jingjing). Experiments are conducted using panicle image data of Chinese Japonica rice varieties collected in Jiangsu Province, China. The proposed methodology demonstrates high prediction accuracy, with coefficients of determination ranging from 0.89 to 0.96, by combining panicle length and projected grain area information.

Keywords:

rice; yield prediction; machine learning; convolutional neural network; image reconstruction

1. Introduction

Rice is a staple food consumed by over 50% of the world’s population, playing a pivotal role in global food security and agricultural production systems [1,2]. In particular, since a significant portion of global rice production is concentrated in Asia to meet this demand, the stability of rice production is directly linked to the stability of the global food system as a whole, transcending regional boundaries [3]. According to the latest cereal outlook of the Food and Agriculture Organization (FAO), global rice production is projected to reach approximately 560 million tons in 2025–2026, a record high, representing an increase of approximately 1.6% from the previous year. Simultaneously, global rice consumption is also projected to increase by an average of approximately 2.4% annually, reaching 550 million tons [4]. This increase in demand is driven by population growth and shifting dietary patterns. However, the physical expansion of arable land is limited, and climate change presents uncertainties in the production environment. Therefore, accurate yield prediction technology is emerging as a key technology for enhancing production efficiency and ensuring a stable food supply [5].

Climate change is a major factor increasing rice yield variability [6]. High temperature stress and changes in precipitation patterns have been reported to inhibit reproductive growth during the heading and flowering stages, increase sterile grains, and reduce the ripening rate, which have a significant correlation with reduced filled grain weight and panicle yield [7]. In such an environment, empirical or simple regression models based solely on past weather and yield statistics have difficulty sufficiently reflecting differences in responses by variety and growth stage, which increases prediction errors [8]. In other words, the increased yield variability implies the importance of real-time observations utilizing precision agriculture technology and yield prediction based on these observations.

Rice yield is determined by complex yield components, such as the number of grains per panicle, grain size (length, area, and volume), the percentage of filled grains, and the spatial arrangement within the panicle [9]. Therefore, recent studies in high-throughput phenotyping have utilized image-based approaches that directly utilize such shape information to estimate yield [10]. Existing two-dimensional (2D) image-based studies have mainly estimated yield components using individual shape indices, such as the projected area or length, derived from single-view images [11]. However, rice panicles have structural characteristics that cause overlap and occlusion between grains, making it difficult to sufficiently reflect the actual grain distribution and filled grains within the panicle with only a single shape index [12]. This increases the sensitivity depending on the shooting angle, arrangement conditions, and resolution, and even for panicles with the same projected area, the actual number of grains or filled grain weight may differ, increasing the uncertainty in yield prediction [13]. To overcome these shortcomings, three-dimensional (3D) crop phenotyping techniques utilizing multi-view images, photogrammetry, point clouds, light detection and ranging (LiDAR), and Neural Radiance Fields (NeRF) have been proposed [14,15,16]. 3D-based approaches have been reported to directly restore structural traits such as grain volume, surface area, and spatial distribution, thereby providing shape indices that are highly correlated with panicle grain weight and final yield [14]. However, 3D-based reconstruction techniques require multi-view photography, sensor equipment, and computational resources. Furthermore, management of shooting conditions and the reconstruction process present limitations in practical agricultural field applications and large-scale data processing [17].

This study proposes an alternative framework that complements the limitations of existing 2D approaches that rely on single shape metrics, rather than directly reconstructing the spatial structure of the rice panicle through 3D reconstruction, by leveraging composite shape information, including grain count, length, and area. Specifically, the proposed framework (1) quantifies the number of individual grains within the panicle based on object detection, (2) simultaneously derives grain length and area information through binarization-based image processing, and (3) combines these metrics to estimate whole-grain weight per rice panicle. While this approach does not directly reconstruct the spatial arrangement within the panicle, it mitigates the uncertainty of single-metric estimation due to overlap and occlusion by integrating complementary yield components, grain count, and area. This framework presents a practical image-based analysis technique that can provide meaningful information for predicting panicle-level yield without the need for complex 3D reconstruction.

2. Materials and Methods

2.1. 2D & 3D Modeling Method of Crops

In precision agriculture, image-based plant phenotype analysis technology is an essential tool for quantifying crop morphological traits and estimating yield components [18]. Crop modeling techniques for precision agriculture, which ensure data processing efficiency and precise analyses, are broadly categorized into 2D image-based modeling and 3D reconstruction-based modeling, depending on data dimensionality and the characteristics of shape restoration algorithms [19]. Table 1 summarizes the core algorithms, key extracted traits, and technical limitations of these two modeling techniques.

2D image-based modeling is a technique for analyzing the projected shape of crops using image data, such as Red-Green-Blue (RGB), grayscale, or other planar scan data acquired from a single viewpoint or limited angles as input [20,21]. From a technical perspective, this modeling performs adaptive binarization based on the Otsu algorithm [22] and Connected Component Analysis (CCA) [23] on the input high-resolution images to quantitatively calculate the projected area, maximum length, and contour-based shape indices [24]. Recently, with the introduction of deep learning-based architectures such as YOLO (You Only Look Once) and Mask R-CNN (Mask Region Convolutional Neural Network), technological advancements have been made in precisely segmenting and identifying the boundaries of individual grains within dense panicle structures [25]. As described in Table 1, this method exhibits high efficiency in high-throughput phenotyping (HTP) environments that require rapid processing of large samples due to its simple hardware configuration and low computational cost [26]. However, there are technical limitations, such as information loss in the process of projecting 3D physical entities onto a 2D plane, and the inability to fully reflect the complex internal structures of the panicle due to self-occlusion and overlapping between grains [27].

Alternatively, 3D reconstruction-based modeling focuses on restoring the three-dimensional spatial structure of crops using multi-view images or active sensors [28]. Representative examples include Structure from Motion (SfM) and Multi-view Stereo (MVS) pipelines, which generate high-density point clouds by matching feature points between images captured from different viewpoints. LiDAR-based approaches acquire direct 3D coordinate information through laser distance measurements [29,30]. Additionally, methods for reconstructing high-resolution shapes from sequential viewpoint images using neural network-based rendering techniques, such as Neural Radiance Fields (NeRF), have been proposed [31]. 3D modeling can directly derive structural characteristics such as volume, surface area, and spatial distribution, ensuring exceptional precision in quantitative analysis. However, as shown in the analysis results in Table 1, it requires expensive sensor equipment, high-performance GPU resources, and a complex post-processing pipeline [32]. These economic and technical requirements are major factors hindering the spread of data-driven precision agriculture and intelligent farming systems that are reshaping modern agriculture dissemination in the field [33].

Existing studies have attempted to address the technical tradeoff between the high precision of 3D modeling and the analytical efficiency of 2D modeling [34]. In particular, most 2D-based previous studies have relied on pixel counts rather than actual physical units (cm, cm²), limiting their versatility in cross-cultivar weight prediction [35]. Consequently, this study combines high-precision grain detection using YOLOv12 [36] with a dots per inch (DPI)-based resolution compensation algorithm, while maintaining the economic feasibility of 2D imaging, to bridge this technological gap. This provides an alternative technology solution that simultaneously secures the reliability of quantity prediction and field applicability by accurately calculating actual physical areas and lengths from 2D projection data without complex 3D hardware.

2.2. 2D Image-Based Rice Yield Prediction Framework

Figure 1 shows an overview of the proposed 2D image-based rice yield prediction framework, which consists of four modules: (1) data acquisition module, (2) rice panicle detection and 2D surface area estimation module, (3) unit correction and verification module, and (4) rice panicle weight prediction module.

In the data acquisition module, the image dataset of rice panicle is used as input, and through image preprocessing and labeling, object detection model learning and image-based shape analysis are constructed. The learning data generated in this module is then used as the common input of the shape detection module and shape information extraction module.

In the rice panicle detection and 2D surface area estimation module, two independent analysis processes are carried out in parallel: (1) detection of individual rice panicles using a YOLOv12-based object detection model, which identifies their locations as bounding boxes and sums the detected instances to estimate the number of panicles per image, and (2) 2D surface area estimation, performed simultaneously but independently from the panicle detection process. Considering the importance of minimizing missed detections in image environments where small objects, such as rice panicles, are densely distributed and frequently overlap, this study compared and evaluated YOLO models under the same dataset and learning conditions. As a result, YOLOv12, which demonstrated stable performance in terms of Recall and F1-score, is selected as the final object detection model for shape index calculation and yield prediction. Next, a binarization-based image processing technique is applied to the original panicle image, converting pixel values into binary (black-and-white) images based on threshold values. This process is used to isolate grain regions and calculate the pixel-level projected grain area. This process is performed independently of the object detection results and is an image processing-based approach for precisely calculating continuous shape indices (area, length).

The unit correction and verification module converts the pixel-level shape indices produced during the image processing step into actual physical units (cm, cm²) based on the image resolution (DPI). The converted projected grain area and grain count are compared with the measured values included in the dataset to calculate the difference value and verification index, thereby quantitatively assessing the reliability of the image-based shape indices. Since the dataset does not provide a reference value for rice panicle length, a separate quantitative verification step is not performed.

Finally, the rice panicle weight prediction module constructs a regression-based prediction model using the shape indices calibrated to physical units as input variables. In this step, single and multivariable linear regression models are constructed, selecting panicle length, projected grain area, and grain count as independent variables, to predict rice grain weight per panicle. The predictive performance of the developed model is evaluated using the coefficient of determination and error index, and the differences in prediction performance across shape indicator combinations are compared and analyzed. The specific algorithms, formulas, and quantitative performance evaluations used in each module are described in detail in subsequent sections.

2.2.1. Data Module

The data module performs image preprocessing and labeling for object detection and shape analysis. All rice panicle samples used in this study were harvested at the fully mature stage, ensuring that the analyzed morphological traits reflect the final outcome of grain development. During image preprocessing, the original rice panicle images are organized to ensure consistency in file format and resolution, minimizing errors due to resolution differences that may arise during subsequent image processing-based analysis. Labeling is then performed using the LabelMe tool, which processes individual grains within the panicle images into bounding boxes. Each bounding box is defined based on a single grain, and the annotation results are converted into a format suitable for training the YOLOv12 object detection model and used as training data. Figure 2 shows an example of grain labeling performed on a panicle image.

The preprocessed and labeled data are then used as input data for the learning and detection performance evaluation of a CNN-based model (i.e., YOLOv12 [38]) in the grain detection module. Furthermore, the measured quantity information included in the dataset serves as a comparison and verification reference value with image-based shape indices and is used as a dependent variable in a regression-based panicle-unit filled grain weight prediction model.

This study utilized the publicly available 2D image dataset of rice panicles from [39]. This dataset contains rice panicle images along with measured yield information, such as filled grain weight per panicle, projected grain area, and number of grains. It serves as a foundation for validating image-based shape indices and developing yield prediction models. The dataset was constructed based on rice panicle samples of the japonica variety collected in Huaian City, Jiangsu Province, China in 2017. The total data size consists of 1200 high-resolution 2D RGB color panicle images and a corresponding dataset of 1198 panicle-level precision measurements. The samples are classified into six cultivar groups: Huaidao330, Huaidao268, Sidao233, Sidao26, Suxiu867, and Jingjing1. Each cultivar group serves as an independent analysis unit for quantifying the relationship between shape indices and yield. For each panicle sample, quantitative measurement information such as panicle weight, panicle weight without unfilled grain weight, filled grain weight per panicle, total grain weight, total number of grains per panicle, number of filled grains per panicle, setting percentage, and grain area are included. All images were acquired as two-dimensional (2D) planar images using a scanner at a fixed resolution of 72 DPI. This standardized setting ensures a consistent spatial scale of approximately 0.35 mm per pixel, providing a uniform geometric foundation. By maintaining this constant resolution, we ensured that the morphological metrics such as grain area, length, and width are directly comparable to actual measured variables without scale-induced errors, thereby providing reliable input for weight prediction.

2.2.2. Rice Panicle and 2D Surface Area Estimation Module

This module automatically estimates the location and number of grains from rice panicle images and applies an analysis procedure combining deep learning-based object detection and binarization-based image processing to derive quantitative shape metrics such as projected grain area and particle length. This process is a key step in ensuring the reliability of input variables used in subsequent yield prediction models. Rice panicle images (see Figure 2) feature densely arranged individual grains with frequent overlap and occlusion between grains. Therefore, it is difficult to consistently separate grain-level objects using traditional threshold-based segmentation or simple image processing techniques. Considering these characteristics, this study utilized YOLOv12 to reliably obtain location information for rice panicles, which are dense, small objects. YOLOv12 is a model that enhances feature extraction performance by combining the attention mechanism with the existing CNN structure [40]. The core technology, Area-Attention, is a technique that divides the entire image feature map into small unit areas and performs parallel operations. Compared to standard self-attention, it drastically reduces computational complexity while securing a wide receptive field, clearly identifying fine boundaries between dense grains. The Residual Efficient Layer Aggregation Networks (R-ELAN) structure introduced in the feature aggregation stage applies residual connection and scaling when transferring information between layers to increase the learning stability of large-scale models and plays a role in capturing grains of various sizes without false detection [41]. Furthermore, the Position Perceiver technique is utilized to efficiently learn location information. This method implicitly identifies location information through a 7 × 7 separable convolutional layer instead of a separate explicit location encoding, thereby simultaneously achieving model lightweighting and improved computational speed. Compatibility with FlashAttention technology minimizes memory access overhead and ensures real-time inference performance [38]. This technical configuration compensates for counting errors due to object overlap, a chronic problem in 2D image analysis, and serves as a crucial mechanism for ensuring the reliability of shape indicators, which serve as input variables for quantity prediction models.

The bounding boxes derived through YOLO-based object detection are used to identify individual grains within the panicle image as objects, and the total number of detected bounding box instances is defined as an image-based estimate of the number of grains per image. In this study, the reliability of the object detection-based counting method is evaluated by comparing the predicted values with the actual number of grains in the dataset (see Section 3.1).

Furthermore, continuous shape metrics, such as the projected area and panicle length, are derived independently from the object detection results by applying a binarization-based image processing technique to the original panicle image. This approach is based on the belief that while object detection models are effective in identifying the location and presence of grains, image processing-based approaches are more appropriate for precisely calculating shape metrics such as area and length at the pixel level. First, the original image is converted to a grayscale image to remove the influence of color information, enabling image analysis based on brightness information. Next, thresholding using the Otsu algorithm is applied to binarize the grain region and background. The Otsu algorithm analyzes the image’s intensity histogram and automatically selects a threshold that maximizes the variance between two classes (background and grain). This approach reduces sensitivity to lighting conditions and background brightness variations, providing relatively stable binarization results. The binarized image, based on the threshold T, is defined in Equation (1).

g (x, y) = \{\begin{matrix} 1, f (x, y) \geq T \\ 0, f (x, y) < T \end{matrix}

(1)

where

g (x, y) = 1

represents the grain area, and

g (x, y) = 0

represents the background area. After binarization, morphological operations are applied to remove the stem area and background noise, and the white space inside the grain is corrected to generate a final binary image (rice panicle) that contains only the grain area. The number of pixels in the grain area represented by white pixels in the final binary image is calculated to obtain the grain projection area, and the panicle length is estimated by applying an image processing technique based on the same segmentation result. Figure 3 shows an example of rice panicle area segmentation based on binarization.

The rice panicle detection and binarization-based shape information extraction process presented in this section is a method for extracting shape indices, such as the number of rice panicles, projected area of rice panicles, and length of rice panicles, at the pixel level from rice panicle images. These shape indices are then calibrated and verified in cm and cm² units, and used as basic input variables for constructing a panicle-level filled grain weight prediction model.

2.2.3. Unit Correction and Verification Module

Shape indices, such as projected grain area and panicle length, derived through binarization-based image processing, are expressed in pixels of digital images. However, pixel-based shape indices are dependent on image resolution and scale. Therefore, conversion to actual physical units is essential for comparison between different images and quantitative interpretation of shape indices. Therefore, this study applied a Pixel-to-Physical Scale Calibration procedure to normalize image-derived shape indices into actual units (cm, cm²).

Digital images have a two-dimensional grid structure composed of pixels of equal size, with each pixel defined as the smallest unit representing the same area in real space. The actual spatial size of a pixel is determined by the image resolution, and DPI typically refers to the number of pixels contained within 1 inch (2.54 cm). Therefore, the actual length occupied by a single pixel,

l_{p x}

, can be defined in Equation (2).

l_{p x} = \frac{2.54}{D P I}

(2)

Therefore, the actual area represented by a single pixel,

A_{p x}

, is calculated as the square of the pixel length shown in Equation (3).

A_{p x} = ({\frac{2.54}{D P I})}^{2}

(3)

This study applies the standard resolution of 72 DPI to all images, converting shape parameters calculated in pixels into cm, cm² units. This conversion serves as a baseline to maintain a consistent scale across images within the dataset and ensure comparability of shape parameters. The 72 DPI value corresponds to the fixed setting of the scanner used for data acquisition. Since scanner-acquired images are captured under controlled illumination and a fixed imaging angle, the influence of shadows and perspective distortion is minimized, allowing for stable Otsu-based binarization. While 72 DPI was used as the baseline for this specific dataset, the framework is designed to maintain generalizability across different imaging sensors. For images acquired at different resolutions, the corresponding DPI or pixels-per-inch (PPI) value can be substituted into Equations (2) and (3) to ensure consistent physical scale calibration.

When the number of white pixels corresponding to grain areas in the binarized image is N, the actual projected area of the grain,

{A r e a}_{g r a i n}

, is calculated in Equation (4).

{A r e a}_{g r a i n} = N \times A_{p x}

(4)

The projected grain area calculated in this way serves as a key indicator reflecting the shape characteristics of each rice panicle and is subsequently used as an input variable for rice panicle length estimation and filled grain weight prediction models. Rice panicle length is also calculated in pixels and normalized to physical units (cm) through a resolution-based conversion.

The pixel-to-physical unit calibration procedure performed in this study aims to ensure the reproducibility and comparability of shape indices by eliminating the influence of differences in image resolution, shooting conditions, and scale. This establishes a foundation for directly linking image-based shape indices to actual growth characteristics, thereby quantitatively improving the reliability of rice panicle weight prediction models.

2.2.4. Rice Panicle Weight Prediction Module

The proposed regression-based analytical model for predicts the filled grain weight per panicle utilizes shape indices calibrated to physical units as independent variables. The model predicts not the final yield per unit area, but rather the filled grain weight per panicle, which directly contributes to yield formation. This approach can be used not only for panicle-level validation but also for overall rice yield prediction.

For regression analysis, a multiple linear regression approach is adopted to construct predictive models for each cultivar. Variable selection was conducted using stepwise regression, which iteratively adds or removes predictors based on statistical significance, and backward elimination, which begins with the full model and sequentially removes non-significant predictors until a parsimonious model is achieved. The filled grain weight per panicle (G) is set as the dependent variable. Initially, three independent variables are simultaneously considered.

To explicitly establish the connection between the defined metrics and the weight prediction, these three variables are integrated into a multiple linear regression equation (i.e.,

G = β_{1} L + β_{2} A + β_{3} C + β_{0}

) to estimate the final weight. It should be noted that panicle length (L) is not obtained from direct filed measurement but is estimated through image-based shape analysis using binarization-based image processing. By contrast, projected panicle area (A) and grain count (C) are obtained from the field-based dataset used in this study. To account for cultivar-specific differences in panicle structure and grain distribution, regression coefficients are estimated separately for each cultivar rather than being integrated into a single model. Furthermore, backward elimination is applied to optimize the models and identify the most influential predictive variables. Equations (5)–(8) present the initial full models formulated with all three independent variables (L, A, and C). Equations (5)–(8) correspond to the Huaidao, Sidao, Suxiu, and Jingjing cultivars, respectively.

G = 0.048 \times L + 0.153 \times A - 0.005 \times C - 0.438

(5)

G = 0.019 \times L + 0.106 \times A + 0.008 \times C - 0.146

(6)

G = 0.022 \times L + 0.071 \times A + 0.013 \times C - 0.013

(7)

G = 0.015 \times L + 0.149 \times A - 0.003 \times C - 0.230

(8)

In the backward elimination process, variables with p-values greater than 0.05 are initially treated as statistically insignificant and are sequentially removed. In addition, variables with p-values close to the significance threshold are further screened when a simpler model provides comparable predictive performance and clearer interpretability.

Detailed statistical results, including p-values and model performance metrics, are presented in Section 3.3. Based on this procedure, projected panicle area (A) is excluded from the Huaidao model, whereas panicle length (L) is excluded from the Sidao, Suxiu and Jingjing models. Equations (9)–(12) present the reconstructed models after this first elimination step. Equation (9) correspond to the Huaidao cultivars after excluding projected panicle are, whereas Equations (10)–(12) correspond to the Sidao, Suxiu and Jingjing cultivars after excluding panicle length.

G = 0.125 \times L + 0.015 \times C - 1.095

(9)

G = 0.110 \times A + 0.008 \times C + 0.002

(10)

G = 0.079 \times A + 0.013 \times C - 0.095

(11)

G = 0.157 \times A - 0.003 \times C - 0.064

(12)

For the Jingjing cultivar, further analysis of the reduced model in Equation (12) shows that grain count (C) still has a p-value greater than 0.05. Consequently, grain count is also removed in the final step. Equation (13) presents the final predictive model for Jingjing, which uses projected panicle area (A) as the sole explanatory variable.

G = 0.139 \times A - 0.112

(13)

Prediction accuracy of the proposed models is evaluated using the coefficient of determination (

R^{2}

) and error indicators. The coefficient of determination (

R^{2}

) represents the ratio of the variation explained by the model (SSR: Sum of Squared Residuals) to the total variation (SST: Total Sum of Squares), as shown in Equation (14). A higher

R^{2}

value indicates that the shape index-based regression model effectively explains the variation in the filled grain weight.

R^{2} = 1 - \frac{S S R}{S S T}

(14)

To quantitatively assess prediction errors, the Mean Absolute Error (MAE) in Equation (15), Root Mean Square Error (RMSE) in Equation (16), and Mean Absolute Percentage Error (MAPE) in Equation (17) are used.

G_{i}

is the measured filled grain weight per panicle (G) for rice panicle i,

\hat{G_{i}}

is the estimated filled grain weight per panicle (G) for rice panicle i, and

n

is the number of sample rice panicles considered in the analysis.

M A E = \frac{1}{n} \sum_{i = 1}^{n} | G_{i} - \hat{G_{i}} |

(15)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(G_{i} - \hat{G_{i}})}^{2}}

(16)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{G_{i} - \hat{G_{i}}}{G_{i}}|

(17)

3. Results

3.1. Grain Detection Performance

This section quantitatively evaluates the performance of object detection models for identifying rice panicles, the results of which are subsequently used for shape index calculation. Specifically, the proposed YOLOv12 model is assessed and compared with six baseline architectures, including YOLO-family models (YOLOv5, YOLOv8, and YOLOv10) and CNN-based object detectors (Faster R-CNN, RetinaNet, and SSD), to demonstrate the superiority of the proposed approach.

To compare the performance of YOLO algorithms and other baseline architectures, the dataset is randomly sampled and split into training, validation, and test sets at an 8:1:1 ratio across five repeated random subsampling validation runs. The same experimental conditions are applied: an image resolution of 840 DPI, and each neural network is trained for 500 epochs. Model performance is evaluated on the test dataset using the optimal weights saved from each model, rather than during the training process, and comparisons are made based on generalization performance. Rice panicle detection performance was quantitatively evaluated using Precision, Recall, mAP@0.5, and F1 score metrics. Precision and Recall are metrics that evaluate false positives and false negatives, respectively, and are defined as follows.

P r e c i s i o n = \frac{T P}{T P + F P}

(18)

R e c a l l = \frac{T P}{T P + F N}

(19)

I o U = \frac{|B_{p r e d} {\cap B}_{t r u t h}|}{|B_{p r e d} {\cup B}_{t r u t h}|}

(20)

m A P @ 50 = \int_{0}^{1} P r e c i s i o n (R e c a l l) d (R e c a l l), I o U \geq 0.5

(21)

F 1 s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(22)

In Equations (18) and (19), TP (True Positive) indicates a correct detection of a true grain; FP (False Positive) indicates a false detection of a non-grain; and FN (False Negative) indicates a true grain that exists but is not detected. Intersection over Union (IoU) is an indicator of the degree of overlap between the predicted bounding box

B_{p r e d}

and the actual bounding

B_{t r u t h}

, and is defined as in Equation (20). mAP@50 in Equation (21) is defined as the area under the Precision-Recall curve calculated under the conditions after determining a True Positive by setting the IoU threshold to 0.5. The F1 score in Equation (22) is a harmonic mean metric that simultaneously reflects false detection and non-detection. It is useful for comparing the overall detection stability between models in environments like this study, where grain objects are densely distributed and frequently overlap. Table 2 compares the Precision, Recall, mAP@50, F1-score, and training time of the YOLO algorithms.

YOLOv5 showed the weakest detection performance, with a Recall about 28.5% lower than YOLOv12, indicating it misses grain objects more often. YOLOv8’s Recall and F1-score show approximately 23.0% and 27.6% improvements over YOLOv5. YOLOv10 maintained stable detection; differences with YOLOv8 were small, at 1.2% for Recall and 3.1% for F1-score. YOLOv12 achieved the highest Recall and F1-score, surpassing YOLOv8 by 5.5% and YOLOv10 by 6.7%. Precision and mAP@50 remained similar to YOLOv8 and YOLOv10, confirming that improved Recall did not lead to excessive false positives.

In terms of training time, YOLOv5 had the shortest time at 250 min, followed by YOLOv8 (314 min), YOLOv10 (432 min), and YOLOv12 (476 min). Training was conducted under identical hardware configurations, batch sizes, and training conditions for 500 epochs. Although the number of parameters for YOLOv12n (2.6 M) is smaller than that of YOLOv8n (3.2 M) and comparable to YOLOv10n (2.3 M), its training time was the longest. This discrepancy indicates that training time is not determined solely by the parameter count but is significantly influenced by architectural design and computational complexity. Specifically, the integration of the Area-Attention mechanism in YOLOv12 introduces additional computational overhead during training compared to standard CNN architectures. Consequently, YOLOv12 required approximately 162 min (51.6%) more training time than YOLOv8, and approximately 44 min (10.2%) more than YOLOv10. However, considering that model training is typically performed offline in agricultural applications and that minimizing missed detections is paramount for accurate yield estimation, this tradeoff between increased training time and superior predictive performance is considered highly economically feasible and justified.

Rice panicles are small, densely packed, and frequently overlap each other, so Recall degradation has a greater impact on yield prediction accuracy than Precision degradation. Although YOLOv8 achieves the highest Precision, YOLOv12 actually improves Recall by 5.5% compared to YOLOv8 and by 6.7% compared to YOLOv10. This indicates a substantial reduction in the omission rate of grain objects under the same training conditions, making the proposed YOLOv12 suitable for rice panicle detection. The high Recall performance observed in YOLOv12 is explained by its structural differences from other YOLO algorithms. Compared to existing YOLO algorithms, YOLOv12 adopts a structure that maintains more stable feature extraction resolution at the backbone stage while strengthening the multi-scale feature fusion pathway. This structural characteristic directly contributes to improving detection sensitivity for small objects by mitigating feature loss issues that occur in environments with small object sizes or frequent object overlap. Furthermore, YOLOv12 explicitly incorporates an attention-based feature stagnation mechanism, which selectively emphasizes features with high spatial and channel significance. This process structurally reduces the number of grain object misdetections by more clearly distinguishing between object and non-object regions in grain image environments where the boundaries between the background and objects are unclear. Table 3 presents a comparative evaluation of CNN-based object detection models in terms of Precision, Recall, mAP@50, F1-score, and training time. Faster R-CNN is a two-stage object detection model that first generates candidate regions using a Region Proposal Network (RPN) and then performs classification and position correction on each region. Its structural characteristics enable precise candidate region extraction even in dense environments with small objects [42]. RetinaNet is based on a single-stage architecture and incorporates Focal Loss to mitigate class imbalance. However, in environments with dense distribution of small objects, the candidate region generation stage is not separated, which limits the number of non-detections [43]. SSD is a lightweight single-stage model that simultaneously detects objects from feature maps of multiple resolutions. While computationally efficient, it struggles to maintain feature resolution for small objects, resulting in significant performance degradation in dense environments [44].

The results in Table 3 indicate that, in environments with densely distributed small objects, CNN-based baseline models exhibit a tradeoff between detection performance and training time. For example, RetinaNet, compared to Faster R-CNN, increases Precision by 27.1%, Recall by 25.0%, mAP@50 by 53.3%, and F1-score by 25.9%, while decreasing training time by approximately 12.0%.

From the perspective of large-scale data processing, training time is also an important consideration. Faster R-CNN and RetinaNet require relatively long training time, whereas SSD achieves shorter training time but shows a substantial degradation in detection performance in dense small-object environments. In contrast, YOLO-based detectors employ a single-stage architecture that processes the entire image simultaneously, thereby improving training efficiency, and such a framework is required to minimize missed detections for small and densely distributed objects such as rice grains. Therefore, this study adopts a YOLO-based model as the final grain detection model to ensure reliable grain counting and subsequent shape-index extraction.

In Figure 4, the bounding box samples detected by YOLOv12 can be visualized overlaid on the original rice panicle image. The predicted bounding boxes accurately reflect the actual panicle locations, and even in areas where panicles are densely packed or overlap each other, individual panicles can be separated. This demonstrates that YOLOv12-based object detection can effectively extract grain-level location information even in structurally complex rice panicle images.

Additionally, Table 4 describes the results of evaluating image-level panicle count prediction performance of YOLOv12. The evaluation defines the total number of detected bounding box instances as the image-based grain count prediction value and compares it to the ground-truth grain counts in the dataset. The coefficient of determination for grain count prediction is

R^{2}

of 0.95, with MAE and RMSE of 4.68 and 6.24, respectively. This demonstrates that YOLOv12-based object detection achieves quantitative reliability in both grain location estimation and image-level grain count calculation.

3.2. Performance of Pixel-to-Physical Scale Calibration

This section evaluates the quantitative accuracy of projected grain area metrics derived through Otsu-based binarization and pixel-to-physical unit correction after YOLO-based object detection, for each variety. Table 5 compares the image-based projected grain area values with the measured values for each variety.

R^{2}

is 0.98 or higher for all varieties, 0.99 for Huaidao, Sidao, and Jingjing varieties, and 0.98 for Suxiu. This indicates that the image-based projected area metrics account for most of the variation in measured area, regardless of variety, demonstrating limited variety-specific bias in the area calculation process. MAE is 0.24 for Huaidao, 0.37 for Sidao, 0.52 for Suxiu, and 0.54 for Jingjing. The difference between the minimum value for Huaidao and the maximum value for Jingjing is approximately 55.6%, indicating that the level of area calculation error varied across varieties. RMSE also ranged from 0.20 (Huaidao) and 0.43 (Jingjing), with the maximum value approximately 53.5% higher than the minimum. MAPE, which represents the relative error, remains low across most varieties: 1.00 for Huaidao, 1.41 for Sidao, and 2.20 for Jingjing. Conversely, Suxiu showed the highest value at 5.10, approximately 80.4% higher than the lowest value for Huaidao. This increase in relative error is interpreted as a result of increased variability in segmented areas after binarization due to differences in grain density and spatial arrangement characteristics compared to the same projected area.

In summary, the projected grain area index derived through Otsu-based binarization and pixel-to-physical unit correction simultaneously achieves high independent power and low absolute and relative errors across all varieties. This confirms that the area index satisfies cross-cultivar comparability and reproducibility, and is a shape index with sufficient quantitative reliability to be used as an input variable in models for estimating panicle length and predicting panicle-level filled grain weight. It should be noted that the direct quantitative validation for panicle length is omitted in this section, as the manual measurement of actual panicle length was not included in the ground truth field dataset. However, the estimated length, consistently calculated using the validated calibration metric, is directly utilized as an independent variable in the sub-sequent regression analysis.

3.3. Shape Metrics Based Grain Panicle-Level Grain Weight Reconstruction

This section presents the results of linear regression analyses of the relationship between panicle shape indices (i.e., panicle length, projected grain area, and number of grains), confirmed through quantitative validation in Section 3.2, and panicle filled grain weight. For each cultivar, both full and reduced regression models were evaluated using 200 panicle samples randomly divided into training and test sets at an 8:2 ratio. This repeated random subsampling procedure was performed five times, and the average results were reported.

Table 6 presents the statistical significance (p-value) of the independent variables panicle length, projected grain are, and grain count for each rice cultivar in the full regression model. In Huaidao, all three variables are statistically criterion (

p < 0.05

). Conversely, for both Sidao and Suxiu, only projected grain area and grain count are significant, whereas panicle length is not. Specifically, for Suxiu, projected grain area and grain count are highly significant, but panicle length exceeds the threshold with a p-value of 0.1. Furthermore, in Jingjing, only projected grain area proves significant, as both panicle length (p = 0.23) and grain count (p = 0.15) fail to meet the criterion. These findings confirm that the contribution of morphological traits is cultivar-specific under the same set of candidate variables, thereby necessitating cultivar-specific variable optimization.

Table 7 presents the regression performance of the full model across the cultivars. While Suxiu and Sidao both yielded an

R^{2}

of 0.96, their error margins differed notably, the MAPE of Suxiu (1.15%) was approximately one-third that of Sidao (3.83%). Furthermore, the prediction error widened for Huaidao (

R^{2} = 0.89

) and Jingjing (

R^{2}

= 0.92), with Jingjing MAPE reaching 7.34%, a maximum difference of 6.19 percentage points compared to Suxiu. This substantial variation in error metrics demonstrates that a uniform variable set cannot ensure consistent prediction, thereby necessitating variable optimization through a backward elimination process.

Table 8 presents the p-values of the independent variables in the reduced regression model for Huaidao. During the variable selection process, a variable yielding a p-value of 0.041 was excluded. Although technically satisfying the

p < 0.05

threshold, it was considered marginally significant; thus, it was removed to prioritize model parsimony and mitigate the risk of overfitting. The final reduced model for Huaidao retains only panicle length and grain count. Both variables satisfy the significance criterion (

p < 0.05

), with p-values of

4.2 \times 10^{- 9}

for panicle length and

1.1 \times 10^{- 29}

for grain count, indicating a clear difference in their relative statistical contributions.

Table 9 presents the regression performance of the reduced model for Huaidao, which utilizes only panicle length and grain count. Compared to the full model (Table 7), the exclusion of projected grain area resulted in a moderate decrease in predictive accuracy. Specifically, the

R^{2}

value decreased from 0.89 to 0.84. Correspondingly, the overall prediction error increased; both RMSE (0.32 g) and MAE (0.24 g) experienced an approximate 20% increase compared to the full model. Furthermore, the MAPE rose from 6.45% to 7.80%, representing a 1.35% increase. These findings illustrate the performance tradeoff in the reduced model: while removing the marginally significant variable improves model parsimony, it results in a slight sacrifice in predictive precision.

Table 10 describes the p-values of the independent variables in the reduced model using projected grain area and grain count. For Suxiu, both projected grain area and grain count satisfy the significance criterion (

p < 0.05

), with p-values of

4.2 \times 10^{- 6}

and

7.8 \times 10^{- 8},

respectively. Among the two variables, grain count has a smaller p-value than projected grain area. For Jingjing, projected grain area remains statistically significant, with a p-value

2.7 \times 10^{- 31}

, whereas grain count does not satisfy the significance criterion, with a p-value of 0.098. These results indicate that, under the same two-variable specifications, the statistical significance of grain count differs between cultivars, while projected grain area remains significant in both cultivars.

Table 11 presents the regression performance of the reduced models for Sidao, Suxiu, and Jingjing, which retain projected grain area and grain count by excluding panicle length. Compared to the full model (Table 7), the

R^{2}

values remain unchanged: 0.96 for both Sidao and Suxiu, and 0.92 for Jingjing. However, the prediction error metrics show cultivar-specific changes. For Suxiu, the MAPE increases from 1.15% to 4.70%. In contrast, the error metrics for Sidao and Jingjing show minor variations, with Jingjing MAPE decreasing from 7.34% to 7.20%. These results indicate that excluding panicle length maintains the overall explanatory power (

R^{2}

) but results in cultivar-dependent changes in prediction error.

Table 12 describes the p-value of the projected grain area variable in the single-variable model for Jingjing. The variable remains statistically significant when used as the sole independent variable, with a p-value of

5.4 \times 10^{- 108}

. This p-value indicates that projected grain area alone is statistically associated with filled grain weight in Jingjing.

Table 13 presents the prediction performance of the single-variable model using projected grain area only. The model shows a coefficient of determination of

R^{2}

of 0.92, with RMSE, MAE, and MAPE values of 0.24, 0.19, and 7.39%, respectively. Compared with the two-variable models in Table 11, which use projected grain area and grain count, the

R^{2}

remains at 0.92. RMSE and MAE also remain at 0.24 and 0.19, respectively, while MAPE changes slightly from 7.20% to 7.39%, corresponding to a 2.64% increase. These results indicate that the prediction performance changes only slightly when grain count is excluded from the model.

Based on the backward elimination results, the retained independent variables differ across cultivars. Although the initial elimination criterion is set at

α = 0.05

, independent variables with p-values close to the threshold are considered weak contributors and are excluded when a simpler model can be obtained without substantial loss of predictive performance. For Huaidao, the full model includes three statistically significant variables, but grain count shows only marginal significance, with a p-value of 0.041. In contrast, the reduced model using panicle length and grain count retains both variables with smaller p-values and is therefore selected as the final model despite a moderate decrease in predictive performance. For Sidao, the reduced model using panicle length and grain count is selected because both variables remain statistically significant and the predictive performance is nearly identical to that of the full model. For Suxiu, panicle length is not significant in the full model (

p = 0.10

) and is therefore excluded. The reduced model using projected grain area and grain count is selected because both retained variables are significant and yield comparable predictive performance. For Jingjing, projected grain area remains highly significant across all model specifications, whereas grain count is not significant in the reduced model; therefore, the final model is simplified to a single-variable model using projected grain area only. Overall, these results indicate that the contribution of each panicle shape index to filled grain weight varies across cultivars and support the selection of cultivar-specific parsimonious models for regression-based prediction. Accordingly, the final selected models are the reduced length-count model for Huaidao, the reduced length-count model for Sidao, the reduced area-count model for Suxiu, and the single-variable area model for Jingjing.

4. Discussion

In Section 3, this study evaluates the performance of the proposed 2D image-based framework for rice panicle yield estimation. The framework consists of three analytical components, including grain detection using deep learning, panicle shape metric extraction through image processing, and regression-based grain weight prediction. The experimental results demonstrate that the proposed approach can effectively extract panicle morphological features from 2D images and utilize them for panicle-level grain weight estimation without requiring complex 3D reconstruction. The evaluation results further indicate that the framework achieves high prediction accuracy across multiple analysis stages, including grain detection, shape metric extraction, and regression-based weight estimation.

First, the object detection results confirm that YOLO-based models are well suited for detecting densely distributed rice grains. Among the evaluated models, YOLOv12 achieves the highest Recall (0.921) and F1-score (0.903), while maintaining comparable precision and mAP@50 values relative to YOLOv8 and YOLOv10. The Recall improvement of approximately 5.5% over YOLOv8 and 6.7% over YOLOv10 indicates that YOLOv12 reduces missed detections of small grain objects in densely packed panicle images. This characteristic is particularly important for grain counting tasks because missed detections directly propagate errors to subsequent yield estimation models. Although YOLOv12 requires longer training time it increases by approximately 51.6% compared with YOLOv8. This increase is acceptable in relation to the improvement in Recall performance. These results indicate that Recall is an important factor in image-based grain counting tasks.

The comparison with baseline detectors shows that the YOLO-based model achieves higher Recall than Faster R-CNN and RetinaNet (0.921, 0.560, and 0.700, respectively), demonstrating the suitability of the proposed model. SSD shows lower performance across all metrics, indicating limitations in detecting densely distributed small objects. Recall is important as it represents the proportion of true objects correctly detected, reflecting the model’s ability to minimize missed detections in dense small-object scenarios.

The high detection reliability of YOLOv12 also translates into accurate grain count estimation. The predicted grain counts achieve a coefficient of determination of

R^{2}

of 0.95 with relatively small prediction errors (MAE of 4.68 and RMSE of 6.24). This result indicates that object detection outputs can serve as reliable quantitative inputs for subsequent yield prediction models.

The regression analysis results show that panicle shape indices explain variation in filled grain weight. Across cultivars, the full regression models achieve coefficients of determination ranging from 0.89 to 0.97. However, the statistical significance of individual predictors varies among cultivars based on p-values, indicating differences in predictor contributions. The backward elimination procedure removes predictors with p-values greater than 0.05 and identifies cultivar-specific reduced models that retain statistically significant variables while maintaining similar prediction performance. For Huaidao, panicle length and grain count remain significant predictors. For Sidao and Suxiu, projected grain area and grain count remain significant (

p < 0.05

), whereas for Jingjing, projected grain area alone remains significant and maintains prediction accuracy (

R^{2}

of 0.92), indicating that additional predictors are not required. These results indicate that cultivar-specific modeling approaches are necessary, as the contribution and relevance of predictors differ across cultivars.

The proposed rice grain weight prediction module can be extended to a field-scale scenario. Specifically, cultivation of the Jingjing variety is assumed over a 200 m² field with an average panicle density of 300 panicles/m² [45]. The field is divided into 1 m² plots, and one panicle is sampled from each plot (n = 200) [37]. Using the mean predicted grain weight of 2.71 g per panicle from the Jingjing model (see Table 13), the total predicted yield is estimated as

200 \times 300 \times 2.71 = 162.6 k g

. To align with standard agronomic reporting, this estimation is extrapolated to a hectare scale:

(162.6 k g / {200 m}^{2}) \times 10,000 m^{2} / h a = 8130 k g / h a

. This result is comparable to the reported yield of

7912 k g / h a

in Huaiyin District [46], a specific area within the Huaian region where the study dataset was acquired.

The mean absolute prediction error and its standard deviation are 0.19 g and 0.15 g, respectively. Accordingly, the standard error of the mean absolute error is calculated as

S E = 0.15 / \sqrt{200} \approx 0.0106

g, with a 95% t-critical value of

t_{0.025.199} \approx 1.972

. When extrapolated to the field scale, the estimated total absolute error is

E_{t o t a l} = 200 \times 300 \times (0.19 \pm S E \times t_{0.025.199}) = 11.40 \pm 1.25

kg corresponding to an approximate error range of

10.15 - 12.65

kg relative to the total predicted yield of 162.6 kg.

The scanner-based acquisition employed in this study provides a standardized environment that minimizes external variables, such as lighting fluctuations and camera angles, allowing for the most precise quantification of panicle geometry. Establishing such a high-fidelity dataset is a fundamental step in ensuring the reliability of image-based prediction models [34]. These precise indices serve as a critical reference point for evaluating the accuracy of diverse data captured in less controlled field environments, thereby contributing to the fundamental standardization of precision agriculture technology [47].

5. Conclusions

This study proposes a 2D image-based framework for estimating rice panicle-level filled grain weight using deep learning-based grain detection, image processing-based shape metric extraction, and regression-based modeling. The proposed approach extracts key panicle traits, including grain count, projected grain area, and panicle length, from 2D images and uses them to estimate filled grain weight.

The experimental results show that YOLOv12 provides stable grain detection performance in densely distributed panicle images, achieving a Recall of 0.921 and an F1-score of 0.903. The model reduces missed detections of densely packed grain objects, supporting reliable grain counting. The detection outputs enable accurate grain count estimation, with an

R^{2}

of 0.95. In addition, panicle length is extracted from 2D images without manual measurement. Binarization-based shape analysis combined with pixel-to-physical scale calibration provides consistent projected grain area measurements, with

R^{2}

values exceeding 0.98 across cultivars. Using these shape metrics, regression models achieve coefficients of determination ranging from 0.89 to 0.97, depending on the cultivar. The results show that the contribution of individual predictors varies across cultivars, indicating that model structure should be defined separately for each cultivar.

Despite these results, several limitations remain. First, the framework is validated using a publicly available dataset collected under controlled scanning conditions. Specifically, the proposed framework was developed for scanner-acquired images under controlled acquisition conditions, where variations in illumination and imaging angle are minimized. Because precision agriculture inherently operates under natural and continuously changing field conditions, a scanner-based model has limited direct utility in field, real-time monitoring. Furthermore, Otsu thresholding may be sensitive to variable lighting conditions in field images; variations in lighting conditions, camera angles, and field environments may influence detection and segmentation accuracy in real-world applications. In addition, grain parameters may vary with grain filling status and environmental conditions. However, the present study focused on mature harvested samples, in which the observed morphology reflects the outcome of growth under those conditions. The framework was also applied across multiple cultivars to reduce cultivar-specific bias. Second, the study focuses on panicle-level weight prediction rather than field-scale yield estimation. Additional research is required to integrate the proposed framework with canopy-level monitoring systems and large-scale crop yield forecasting models.

Future research may extend the framework by incorporating adaptive binarization or deep learning-based semantic segmentation methods. Furthermore, broader validation under diverse environmental conditions remains a subject for future study. Future research may also involve incorporating multi-view images, temporal growth observations, or hybrid modeling approaches that combine image-derived phenotypic traits with environmental variables. Such extensions could further improve prediction robustness and support the development of practical precision agriculture systems.

Author Contributions

Conceptualization, D.K., H.L. and S.K.; methodology, D.K., H.L. and S.K.; software, D.K., H.L. and S.K.; validation, D.K., H.L. and S.K.; formal analysis, D.K., H.L. and S.K.; investigation, D.K., H.L. and S.K.; resources, S.K.; writing—original draft, D.K., H.L. and S.K.; writing—review and editing, D.K., H.L. and S.K.; visualization, D.K. and H.L.; funding acquisition, S.K.; supervision, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (No. RS-2023-00239448).

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

The authors gratefully acknowledge the support of the National Research Foundation of Korea (NRF) of Korea and the Ministry of Education. The views expressed in this paper are solely those of the authors and do not represent the opinions of the funding agency.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Food and Agriculture Organization of the United Nations (FAO). FAO Rice Information; FAO: Rome, Italy, 2002; Volume 3, Available online: https://www.fao.org/4/y4347e/y4347e01.htm (accessed on 22 January 2026).
Shi, Y.; Guo, Y.; Wang, Y.; Li, M.; Li, K.; Liu, X.; Fang, C.; Luo, J. Metabolomic analysis reveals nutritional diversity among three staple crops and three fruits. Foods 2022, 11, 550. [Google Scholar] [CrossRef]
Korea Rural Economic Institute (KREI). World Grain Market Outlook and Current Status. Available online: https://repository.krei.re.kr (accessed on 22 January 2026).
Food and Agriculture Organization of the United Nations (FAO). Cereal Supply and Demand Brief: World Food Situation; FAO: Rome, Italy, 2025; Available online: https://www.fao.org/worldfoodsituation/csdb/en (accessed on 23 January 2026).
Jabed, M.A.; Murad, M.A.A. Crop yield prediction in agriculture: A comprehensive review of machine learning and deep learning approaches, with insights for future research and sustainability. Heliyon 2024, 10, e40836. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Zhong, D.; Chen, X.; Niu, Z.; Cao, Q. Impact of climate change on rice growth and yield in China: Analysis based on climate year type. Geogr. Sustain. 2024, 5, 548–560. [Google Scholar] [CrossRef]
Yu, J.; Du, T.; Zhang, P.; Ma, Z.; Chen, X.; Cao, J.; Li, H.; Li, T.; Zhu, Y.; Xu, F. Impacts of high temperatures on the growth and development of rice and measures for heat tolerance regulation: A review. Agronomy 2024, 14, 2811. [Google Scholar] [CrossRef]
Hu, T.; Zhang, X.; Khanal, S.; Wilson, R.; Leng, G.; Toman, E.M.; Wang, X.; Li, Y.; Zhao, K. Climate change impacts on crop yields: A review of empirical findings, statistical crop models, and machine learning methods. Environ. Model. Softw. 2024, 179, 106119. [Google Scholar] [CrossRef]
Bai, S.; Hong, J.; Li, L.; Su, S.; Li, Z.; Wang, W.; Zhang, F.; Liang, W.; Zhang, D. Dissection of the genetic basis of rice panicle architecture using a genome-wide association study. Rice 2021, 14, 77. [Google Scholar] [CrossRef]
Whan, A.P.; Smith, A.B.; Cavanagh, C.R.; Ral, J.-P.F.; Shaw, L.M.; Howitt, C.A.; Bischof, L. GrainScan: A low cost, fast method for grain size and colour measurements. Plant Methods 2014, 10, 23. [Google Scholar] [CrossRef] [PubMed]
Yin, C.; Zhu, Y.; Li, X.; Lin, Y. Molecular and genetic aspects of grain number determination in rice (Oryza sativa L.). Int. J. Mol. Sci. 2021, 22, 728. [Google Scholar] [CrossRef]
Sim, J.; Cho, J.; Lee, K.; Lee, Y. AI-Based Paddy Rice Yield Prediction Using Satellite Images, Meteorological Data, and Digital Elevation Model: Case Study of South Korea, 2000–2023. Korean J. Remote Sens. 2024, 40, 1195–1208. [Google Scholar] [CrossRef]
Zhao, S.; Gu, J.; Zhao, Y.; Hassan, M.; Li, Y.; Ding, W. A method for estimating spikelet number per panicle: Integrating image analysis and a 5-point calibration model. Sci. Descr. 2015, 5, 16241. [Google Scholar] [CrossRef]
Akhtar, M.S.; Zafar, Z.; Nawaz, R.; Fraz, M.M. Unlocking plant secrets: A systematic review of 3D imaging in plant phenotyping techniques. Comput. Electron. Agric. 2024, 222, 109033. [Google Scholar] [CrossRef]
Gong, L.; Lin, K.; Wang, T.; Liu, C.; Yuan, Z.; Zhang, D.; Hong, J. Image-based on-panicle rice [Oryza sativa L.] grain counting with a prior edge wavelet correction model. Agronomy 2018, 8, 91. [Google Scholar] [CrossRef]
Tang, R. Mathematical Methods for Camera Self-Calibration in Photogrammetry and Computer Vision. Ph.D. Thesis, University of Stuttgart, Stuttgart, Germany, 2013. [Google Scholar]
Forero, M.G.; Murcia, H.F.; Méndez, D.; Betancourt-Lozano, J. LiDAR platform for acquisition of 3D plant phenotyping database. Plants 2022, 11, 2199. [Google Scholar] [CrossRef]
Yang, X.; Lu, X.; Xie, P.; Guo, Z.; Fang, H.; Fu, H.; Hu, X.; Sun, Z.; Cen, H. PanicleNeRF: Low-cost, high-precision in-field phenotyping of rice panicles with smartphone. Plant Phenomics 2024, 6, 0279. [Google Scholar] [CrossRef]
An, N.; Welch, S.M.; Markelz, R.C.; Baker, R.L.; Palmer, C.M.; Ta, J.; Maloof, J.N.; Weinig, C. Quantifying time-series of leaf morphology using 2D and 3D photogrammetry methods for high-throughput plant phenotyping. Comput. Electron. Agric. 2017, 135, 222–232. [Google Scholar] [CrossRef]
Fahlgren, N.; Gehan, M.A.; Baxter, I. Lights, camera, action: High-throughput plant phenotyping is ready for a close-up. Curr. Opin. Plant Biol. 2015, 24, 93–99. [Google Scholar] [CrossRef]
Grundland, M.; Dodgson, N.A. Decolorize: Fast, contrast enhancing, color to grayscale conversion. Pattern Recognit. 2007, 40, 2891–2896. [Google Scholar] [CrossRef]
Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Rosenfeld, A.; Pfaltz, J.L. Sequential operations in digital picture processing. J. ACM (JACM) 1966, 13, 471–494. [Google Scholar] [CrossRef]
Li, L.; Zhang, Q.; Huang, D. A review of imaging techniques for plant phenotyping. Sensors 2014, 14, 20078–20111. [Google Scholar] [CrossRef]
Walsh, J.J.; Mangina, E.; Negrão, S. Advancements in imaging sensors and AI for plant stress detection: A systematic literature review. Plant Phenomics 2024, 6, 0153. [Google Scholar] [CrossRef]
Tanabata, T.; Shibaya, T.; Hori, K.; Ebana, K.; Yano, M. SmartGrain: High-throughput phenotyping software for measuring seed shape through image analysis. Plant Physiol. 2012, 160, 1871–1880. [Google Scholar] [CrossRef] [PubMed]
Zu, Q.; Liu, T.; Zhu, W.; Pan, Y.; Wang, J.; Song, X.; Yu, J.; Dang, S.; Yu, X.; Zhang, Z. Automated seed counting using image processing and deep learning. Front. Plant Sci. 2025, 16, 1659781. [Google Scholar] [CrossRef] [PubMed]
Singh, A.; Ganapathysubramanian, B.; Singh, A.K.; Sarkar, S. Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci. 2016, 21, 110–124. [Google Scholar] [CrossRef]
Itakura, K.; Hosoi, F. Automatic leaf segmentation for estimating leaf area and leaf inclination angle in 3D plant images. Sensors 2018, 18, 3576. [Google Scholar] [CrossRef] [PubMed]
Rose, J.C.; Paulus, S.; Kuhlmann, H. Accuracy analysis of a multi-view stereo approach for phenotyping of tomato plants at the organ level. Sensors 2015, 15, 9651–9665. [Google Scholar] [CrossRef]
Mach, J.; Svatý, Z.; Šoupa, O.; Nouzovský, L.; Halecký, M. Implementation of an SfM-MVS-based photogrammetry approach for detailed 3D reconstruction of plants. Plant Methods 2025, 21, 127. [Google Scholar] [CrossRef]
Lin, Y. LiDAR: An important tool for next-generation phenotyping technology of high potential for plant phenomics? Comput. Electron. Agric. 2015, 119, 61–73. [Google Scholar] [CrossRef]
Gao, K.; Gao, Y.; He, H.; Lu, D.; Xu, L.; Li, J. Nerf: Neural radiance field in 3d vision, a comprehensive review. arXiv 2022, arXiv:2210.00379. [Google Scholar]
Araus, J.L.; Cairns, J.E. Field high-throughput phenotyping: The new crop breeding frontier. Trends Plant Sci. 2014, 19, 52–61. [Google Scholar] [CrossRef]
Munjal, R.; Benıwal, J.; Dhundwal, A.; Goyal, A.; Kumarı, A.; Behl, R.K. Accelerating crop breeding in the 21st century: A comprehensive review of next generation phenotyping techniques and strategies. Ekin J. Crop Breed. Genet. 2023, 9, 160–171. [Google Scholar]
Jegham, N.; Koh, C.Y.; Abdelatti, M.; Hendawi, A. Yolo evolution: A comprehensive benchmark and architectural review of yolov12, yolo11, and their previous versions. arXiv 2024, arXiv:2411.00201. [Google Scholar]
Zhao, S.; Zheng, H.; Chi, M.; Chai, X.; Liu, Y. Rapid yield prediction in paddy fields based on 2D image modelling of rice panicles. Comput. Electron. Agric. 2019, 162, 759–766. [Google Scholar] [CrossRef]
Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-centric real-time object detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
Yang, D.; Yang, H.; Liu, D.; Wang, X. Research on automatic 3D reconstruction of plant phenotype based on Multi-View images. Comput. Electron. Agric. 2024, 220, 108866. [Google Scholar] [CrossRef]
Ultralytics. YOLO12: Attention-Centric Object Detection. Available online: https://docs.ultralytics.com/ko/models/yolo12/ (accessed on 20 February 2026).
Su, Y.; Xiao, L.-T. 3D visualization and volume-based quantification of rice chalkiness in vivo by using high resolution micro-CT. Rice 2020, 13, 69. [Google Scholar] [CrossRef] [PubMed]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
Zhang, H.; Chang, H.; Ma, B.; Shan, S.; Chen, X. Cascade RetinaNet: Maintaining consistency for single-stage object detection. arXiv 2019, arXiv:1907.06881. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Li, G.; Zhang, J.; Yang, C.; Song, Y.; Zheng, C.; Wang, S.; Liu, Z.; Ding, Y. Optimal yield-related attributes of irrigated rice for high yield potential based on path analysis and stability analysis. Crop J. 2014, 2, 235–243. [Google Scholar] [CrossRef]
Yuqi, P.; Penghui, J.; Manchun, L.; Dengshuai, C. Temporal Variation Analysis of Rice Yield in the Jiangsu Province, China: Application of Decision Support System for Agrotechnology Transfer Model. Glob. J. Agric. Innov. Res. Dev. 2022, 9, 81–99. [Google Scholar] [CrossRef]
van Dijk, A.D.J.; Kootstra, G.; Kruijer, W.; de Ridder, D. Machine learning in plant science and plant breeding. iScience 2021, 24, 101890. [Google Scholar] [CrossRef]

Figure 1. Overview of the 2D image-based rice yield prediction framework.

Figure 2. Example of 2D image labeling of rice panicles [37].

Figure 3. Example of rice panicle area segmentation based on binarization.

Figure 4. YOLOv12-based automatic grain detection sample.

Table 1. Comparison of crop modeling techniques based on technical characteristics.

Analysis Category	2D Image-Based Modeling	3D Reconstruction-Based Modeling
Core Algorithms	Otsu, CCA, YOLO, CNN	SfM, MVS, Point Cloud, NeRF
Data Processing	Planar image analysis	Multi-view 3D reconstruction
Extracted Features	Area, Length, Grain count	Volume, Area, Spatial distribution
Computational Cost	Low	High
Limitations	Occlusion, stem loss	High cost, Complex pipeline

Table 2. Comparison of detection performance of YOLO algorithms.

Model	Precision	Recall	mAP@50	F1-Score	Training Time (min)
YOLOv5	0.615	0.636	0.649	0.625	250 min
YOLOv8	0.904	0.866	0.926	0.901	314 min
YOLOv10	0.886	0.854	0.933	0.87	432 min
YOLOv12	0.886	0.921	0.927	0.903	476 min

Table 3. Comparison results of CNN-based object detection models.

Model	Precision	Recall	mAP@50	F1-Score	Training Time (min)
Faster-R-CNN	0.749	0.560	0.454	0.641	1151 min
RetinaNet	0.952	0.700	0.696	0.807	1013 min
SSD	0.163	0.120	0.029	0.138	281 min
YOLOv12	0.886	0.921	0.927	0.903	476 min

Table 4. Prediction performance of YOLOv12 for the number of panicles per image.

Model	$R^{2}$	MAE (Counts)	RMSE (Counts)	MAPE (%)
YOLOv12	0.95	4.68	6.24	3.56

Table 5. Comparison of estimated area values with measured values.

Cultivar	$R^{2}$	MAE (cm²)	RMSE (cm²)	MAPE (%)
Huaidao	0.99	0.24	0.20	1.00
Sidao	0.99	0.37	0.27	1.41
Suxiu	0.98	0.52	0.33	5.10
Jingjing	0.99	0.54	0.43	2.20

Table 6. Statistical significance of independent variables in the full regression model across cultivars.

Cultivar	p-Value of Independent Variables
Cultivar	Length	Area	Count
Huaidao	$9.9 \times 10^{- 3}$	$2.4 \times 10^{- 18}$	0.041
Sidao	0.30	$2.4 \times 10^{- 11}$	$1.0 \times 10^{- 4}$
Suxiu	0.10	$6.0 \times 10^{- 5}$	$6.0 \times 10^{- 8}$
Jingjing	0.23	$1.4 \times 10^{- 24}$	0.15

Table 7. Comparison of regression performance across cultivars in the full model.

Cultivar	$R^{2}$	RMSE (g)	MAE (g)	MAPE (%)
Huaidao	0.89	0.26	0.20	6.45
Sidao	0.97	0.16	0.12	3.79
Suxiu	0.96	0.16	0.11	1.15
Jingjing	0.92	0.24	0.19	7.34

Table 8. Statistical significance of independent variables in the reduced regression model using panicle length and grain count.

Cultivar	p-Value of Independent Variables
Cultivar	Length	Count
Huaidao	$4.2 \times 10^{- 9}$	$1.1 \times 10^{- 29}$

Table 9. Comparison of regression performance in the reduced model using panicle length and grain count.

Cultivar	$R^{2}$	RMSE (g)	MAE (g)	MAPE (%)
Huaidao	0.84	0.32	0.24	7.80

Table 10. Statistical significance of independent variables in the reduced regression model using projected grain area and grain count.

Cultivar	p-Value of Independent Variables
Cultivar	Area	Count
Sidao	$7.8 \times 10^{- 12}$	$6.2 \times 10^{- 5}$
Suxiu	$4.2 \times 10^{- 6}$	$7.8 \times 10^{- 8}$
Jingjing	$2.7 \times 10^{- 31}$	0.098

Table 11. Comparison of regression performance in the reduced model using projected grain area and grain count.

Cultivar	$R^{2}$	RMSE (g)	MAE (g)	MAPE (%)
Sidao	0.96	0.17	0.13	3.82
Suxiu	0.96	0.16	0.11	4.70
Jingjing	0.92	0.24	0.19	7.20

Table 12. Statistical significance of independent variables in the single-variable regression model using projected grain area.

Cultivar	p-Value of Independent Variables
Cultivar	Area
Jingjing	$5.4 \times 10^{- 108}$

Table 13. Regression performance of the single-variable model using projected grain area for Jingjing.

Cultivar	$R^{2}$	RMSE (g)	MAE (g)	MAPE (%)
Jingjing	0.92	0.24	0.19	7.39

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, D.; Lim, H.; Kim, S. Development of a 2D Image-Based Rice Panicle-Level Yield Prediction Framework Using Image-Based Reconstruction Technique. Agronomy 2026, 16, 896. https://doi.org/10.3390/agronomy16090896

AMA Style

Kim D, Lim H, Kim S. Development of a 2D Image-Based Rice Panicle-Level Yield Prediction Framework Using Image-Based Reconstruction Technique. Agronomy. 2026; 16(9):896. https://doi.org/10.3390/agronomy16090896

Chicago/Turabian Style

Kim, Daehong, Hyeongjun Lim, and Sojung Kim. 2026. "Development of a 2D Image-Based Rice Panicle-Level Yield Prediction Framework Using Image-Based Reconstruction Technique" Agronomy 16, no. 9: 896. https://doi.org/10.3390/agronomy16090896

APA Style

Kim, D., Lim, H., & Kim, S. (2026). Development of a 2D Image-Based Rice Panicle-Level Yield Prediction Framework Using Image-Based Reconstruction Technique. Agronomy, 16(9), 896. https://doi.org/10.3390/agronomy16090896

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of a 2D Image-Based Rice Panicle-Level Yield Prediction Framework Using Image-Based Reconstruction Technique

Abstract

1. Introduction

2. Materials and Methods

2.1. 2D & 3D Modeling Method of Crops

2.2. 2D Image-Based Rice Yield Prediction Framework

2.2.1. Data Module

2.2.2. Rice Panicle and 2D Surface Area Estimation Module

2.2.3. Unit Correction and Verification Module

2.2.4. Rice Panicle Weight Prediction Module

3. Results

3.1. Grain Detection Performance

3.2. Performance of Pixel-to-Physical Scale Calibration

3.3. Shape Metrics Based Grain Panicle-Level Grain Weight Reconstruction

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI