Semi-Supervised Density Estimation with Background-Augmented Data for In Situ Seed Counting

Sung, Baek-Gyeom; Lee, Chun-Gu; Kang, Yeong-Ho; Yu, Seung-Hwa; Lee, Dae-Hyun

doi:10.3390/agriculture15151682

Open AccessArticle

Semi-Supervised Density Estimation with Background-Augmented Data for In Situ Seed Counting

by

Baek-Gyeom Sung

¹

,

Chun-Gu Lee

²

,

Yeong-Ho Kang

³,

Seung-Hwa Yu

^2,* and

Dae-Hyun Lee

^1,*

¹

Department of Smart Agriculture Systems Machinery Engineering, Chungnam National University, Daejeon 34134, Republic of Korea

²

Department of Agriculture Engineering, National Institute of Agricultural Sciences, Jeonju 54875, Republic of Korea

³

Department of Crops and Food, Jeonbuk State Agricultural Research and Extension Services, Iksan 54591, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Agriculture 2025, 15(15), 1682; https://doi.org/10.3390/agriculture15151682

Submission received: 21 May 2025 / Revised: 26 July 2025 / Accepted: 31 July 2025 / Published: 4 August 2025

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Direct seeding has gained prominence as a labor-efficient and environmentally sustainable alternative to conventional transplanting in rice cultivation. In direct seeding systems, early-stage management is crucial for stable seedling establishment, with sowing uniformity measured by seed counts being a critical indicator of success. However, conventional manual seed counting methods are time-consuming, prone to human error, and impractical for large-scale or repetitive tasks, necessitating advanced automated solutions. Recent advances in computer vision technologies and precision agriculture tools, offer the potential to automate seed counting tasks. Nevertheless, challenges such as domain discrepancies and limited labeled data restrict robust real-world deployment. To address these issues, we propose a density estimation-based seed counting framework integrating semi-supervised learning and background augmentation. This framework includes a cost-effective data acquisition system enabling diverse domain data collection through indoor background augmentation, combined with semi-supervised learning to utilize augmented data effectively while minimizing labeling costs. The experimental results on field data from unknown domains show that our approach reduces seed counting errors by up to 58.5% compared to conventional methods, highlighting its potential as a scalable and effective solution for agricultural applications in real-world environments.

Keywords:

direct seeding; sowing uniformity; seed counting; density estimation; semi-supervised learning; background augmentation

1. Introduction

Rice, consumed as a staple food in many countries [1], is a globally important food crop cultivated through transplanting and direct seeding. The transplanting method offers higher yields because of easier growth management, for example, that is associated with weed control [2]; therefore, most Asian countries utilize rice transplanting, with the Republic of Korea being one of the major ones using this method as direct seeding accounts for only 1.5% of rice production. However, this method requires considerable labor and water resources and is increasingly being viewed as inefficient in the context of an aging agricultural workforce and the ongoing water crisis [3]. Although several advanced transplanting machines and techniques have contributed to reducing labor for transplanting, it is still required that over 60% of labor be used for seeding preparation, such as for seedlings and puddled fields, compared to direct seeding methods [4]. This problem has prompted Asian countries to shift their rice planting methods from transplanting to direct seeding, a transition that has been further accelerated by advancements in precision farming systems such as unmanned aerial vehicles [5].

Direct seeding systems sow seeds directly onto the soil with minimal input requirements, and ensuring an appropriate seeding rate per unit area is critical for crop establishment. In other words, it is essential to ensure that the seeding density is suitable for the required production yield. Uneven seeding can lead to excessive competition among crops or facilitate the invasion of weeds and also can result in variations in growth rates, ultimately reducing overall productivity [6]. Therefore, ensuring sowing uniformity, which represents how evenly seeds are spatially distributed across the field, after sowing is a critical in direct rice seeding [7,8]. Sowing uniformity is generally evaluated based on the number of seeds after sowing, often using indicators such as the coefficient of variation [9]. Conventionally, seed counting has been performed manually using human labor. This manual monitoring not only results in high error rates but also demands considerable time and labor, making it difficult to apply to large-scale operations [10]. Therefore, alternative methods that can automatically count seeds are needed and they would need to be capable of covering large-scale fields with consistent performance. Automated seed counting can practically contribute to the improvement of yield and survival rates in the direct seeding of rice [11].

Several researchers have been conducting research to automatically evaluate rice seed counts using image processing methods. Lin et al. [12] proposed a node matching algorithm based on the contour curvature analysis of seeds. Jing et al. [13] performed rice seed counting using a binary segmentation technique based on a dynamic thresholding method. Wu et al. [14] proposed “Gain TKW”, a system for the measurement of thousand-kernel weight using the k-means algorithm and marker-controlled watershed algorithm. Methods that perform counting using geometric features via image processing require complex feature extraction processes and deliver lower generalization performance when seeds overlap or have irregular shapes. Therefore, researchers have conducted seed counting using deep learning to overcome the limitations of previous studies [15]. Deep learning models, particularly convolutional neural networks (CNNs), automatically extract multilevel features from raw images, learning simple patterns such as edges and shapes in early layers and more complex seed structures in deeper layers. This hierarchical learning process enables robust detection, even under challenging conditions such as those associated with occlusions and irregular seed shapes. Sun et al. [16] proposed a rice seed counting algorithm based on Faster R-CNN, reporting high counting accuracy and efficient computational performance. Feng et al. [17] introduced a polished rice counting algorithm using a multicolumn CNN-based density map estimation method, which was originally applied in the crowd counting domain. This approach achieved high counting precision when applied to large quantities of rice, demonstrating its effectiveness in handling dense, overlapping grain distributions. While these studies reported high counting performance, they were primarily conducted in indoor environments, which are more stable and less disruptive than field conditions. In addition, previous methods counted seeds by detecting individual seeds, which often suffer from performance degradation when applied to small or congested objects in field conditions, such as seeds on the soil.

Density map estimation, which enables the detection of both object distribution and object count, has demonstrated robustness and accuracy in highly congested scenes, as recently reported in the crowd counting domain [18]. In general, density map estimation is implemented by supervised learning methods and requires a large amount of labeled field data to ensure robustness under field conditions. For example, a representative crowd counting dataset, NWPU-Crowd, consists of 5109 images with a total of 2,133,375 annotated instances—a process that reportedly required 3000 h for annotation [19]. In contrast, the diversity of seed counting backgrounds is hard to sufficiently represent due to the high complexity and variability of soil conditions in agricultural environments [20], making it much more difficult to collect diverse labeled data than in the crowd counting domain. As a result, seed counting tasks often have to rely on much smaller datasets, which limits the effectiveness of supervised density map estimation under real-world applications.

Semi-supervised learning aims to improve training stability and model generalization when only a small amount of labeled data is available. It is usually combined with data augmentation methods such as Mixup [21], Cutout [22], and Cutmix [23], which have been shown to be effective in preventing model overfitting and enhancing generalization ability. However, these methods mainly manipulate pixel-level features without considering background variation, which restricts their ability to generalize—particularly in real-world agricultural environments where lighting, soil textures, and background diversity can vary considerably. In contrast, background augmentation offers a more robust solution by introducing realistic environmental variability. It modifies the background context around target objects while preserving their spatial integrity, which helps models adapt better to domain shifts [24]. This technique has shown notable effectiveness in domains requiring high model robustness and is therefore expected to enhance the generalization ability of models in agricultural applications with diverse and complex backgrounds.

Therefore, this study aims to establish a robust model for the automatic evaluation of direct rice seeding performance with limited labeled data. Rice seeds were counted based on density map estimation using a semi-supervised domain adaptation framework with a background augmented dataset. The rice seed dataset was constructed as pairs of training examples with identical seed distributions but different backgrounds. A semi-supervised learning framework was implemented to extract rice seed features and estimate seed density in a background-invariant manner. Our approach allows us to focus on an easier representation of seed features rather than complex background features, even with a small-scale dataset. This method is well suited for agriculture, a low-resource domain, and is expected to contribute to the automatic evaluation of seeding performance. The main contributions of this paper are as follows:

We employ a framework that combines semi-supervised learning and density map estimation to improve domain-invariant representation for seed counting under diverse agricultural environments.
We showed that in situ seed density can be estimated using small-scale labeled data by leveraging a paired dataset constructed with background augmentation.
The proposed method outperforms previous methods on field data, thereby demonstrating superior generalization ability.

2. Materials and Methods

2.1. Task Overview

In this study, a seed counting framework is proposed to achieve practical counting performance in real-world applications, even with a small amount of labeled data. Ensuring consistent counting results across diverse agricultural backgrounds is a challenge for conventional supervised learning models trained on limited data. This study employs a method that combines consistency regularization in semi-supervised learning with background augmentation to encourage stable counting across different backgrounds. Therefore, the traditional mean teacher framework was adopted. The mean teacher framework is designed to enhance the generalization performance of models on datasets with limited labeled data; strong performance has also been reported in the semi-supervised crowd counting domain [25]. The framework comprises a student–teacher model with the same structure, wherein the student model is a general model that is gradually updated through learning and the teacher model is updated based on the exponential moving average (EMA) of the student model’s weights, allowing for the framework to produce more stable predictions. For labeled data, the framework is trained via general supervised learning; however, for unlabeled data, the same data are input into both the student model and the teacher model, following which learning is performed in a manner that ensures prediction consistency between the two outputs [26]. Figure 1 shows this study’s approach, based on the mean teacher framework. In the supervised flow, labeled source data are used for density map estimation to facilitate representation learning for seed counting. In the unsupervised flow, background augmentation is applied to unlabeled source data and common outputs between unlabeled target data and unlabeled source data are encouraged, enabling the model to extract consistent features across various domain backgrounds. This approach enables the model to count seeds through generalized features and density map estimation across various domain backgrounds. The following sections describe the dataset, model architecture, and training methods used in the seed counting framework, followed by a discussion of the key results and conclusions.

2.2. Dataset Construction

This study develops a device capable of acquiring images indoors via background augmentation. The method is designed to synchronize the domains of field backgrounds and indoor environments. Figure 2 shows the configuration of the device, which consists of a reference surface, where rice seeds are sown, a camera (RealSense D435i, Intel Corporation, Santa Clara, CA, USA) for capturing data, a reference background, which is a white background designed to facilitate the extraction rice seed features without requiring additional domain knowledge for data labeling, and an augmented background, which simulates noise and background conditions similar to those of field data. The reference surface of the configured system is a transparent acrylic plate with dimensions of 0.42 m × 0.3 m (width × length), allowing for images to be captured with only the background switched while maintaining the same seed distribution. In addition, the camera is positioned at an altitude of 0.5 m from the ground, aligned with the center of the system, enabling the top-view imaging of the entire system area, which corresponds to a capture area of 0.5 m × 0.32 m (width × length). An augmented background composed of soil and water was constructed to simulate an environment similar to that of a paddy field where rice seeding is performed. Data collection was conducted by manually seeding a random number of rice seeds on the reference surface and acquiring images while changing the background. During acquisition, the lighting condition was fixed in a top-down configuration. The seed arrangement was randomly adjusted by vibrating the surface. For each seed distribution, multiple images were collected by replacing only the background, thereby ensuring domain diversity. The seed count range during sowing was set to 0–250 seeds/m², more than twice the recommended seeding rate of 80–120 seeds/m² for rice direct seeding systems in the Republic of Korea [27]. A total of 250 experiments were conducted, resulting in 250 pairs of image samples, wherein each pair comprised a source image collected from the reference background and a target image collected from the augmented background.

Pairs of image samples collected through the device were processed through region of interest (ROI) extraction and pixel-level annotation and were converted into training data pairs consisting of source data representing labeled data, target data representing augmented background, and ground truth (GT), which contained the number of seeds in the image. Figure 3 shows the process of generating pairs of training examples from the source and target images.

The source data were extracted from a source image with a resolution of 1920 × 1080 pixels, from which a region of 1088 × 714 pixels was selected as the ROI to eliminate noise. The target data were also constructed by applying the same process to the target image. The GT was generated via pixel-level labeling based on the coordinates of seeds in the source data. Thereafter, a density map was created using the adaptive Gaussian kernel density function, as shown in Equation (1). The density map generated provided comprehensive information, not only representing the number of objects in the image but also capturing their spatial distribution, making it more effective for object counting than traditional object detection-based approaches. Furthermore, the sum of all pixel values across the density map corresponded to the total number of objects in the image [18,28].

F (x, y) = \sum_{i = 1}^{P} δ (x - x_{i}, y - y_{i}) * G_{σ_{i}} (x - x_{i}, y - y_{i})

(1)

where

{(x}_{i}, y_{i})

denotes a two-dimensional (2D) seed position modeled as a delta function

δ (x - x_{i}, y - y_{i})

,

P

a set of seed pixel points,

G_{σ_{i}} (x - x_{i}, y - y_{i})

an adaptive 2D Gaussian kernel, and

σ_{i}

the average distance of k nearest neighbors. To generate the density map, we convolve the delta function with a 2D Gaussian kernel; in addition, the total number of seeds in the image can be calculated by summing all the pixel values over the density map generated.

In this study, a pair of training examples was generated, comprising two images with different backgrounds and one GT, all derived from the same seed distribution. A total of 250 training datasets, each constructed from such example pairs, were created and randomly divided into train, validation, and test sets at a ratio of 4:1:1 for model training.

In addition, a field dataset was prepared to evaluate the generalization ability of the model. The field dataset consisted of 73 images collected under real-world soil backgrounds different from the domain used herein. The images were captured from a top view at varying ROIs, 0.5 m above the ground, using the same camera that was used during dataset construction. This dataset was obtained from two distinct field environments: an upland field at Chungnam National University (36°22′03.8″ N, 127°21′10.6″ E) and a paddy field in Songsan-myeon, Dangjin-si, Chungcheongnam-do (36°55′49.0″ N, 126°38′02.1″ E). These field data were not used during model training, and all images were labeled according to the same procedure as the training dataset. The number of images in each dataset, along with their domain type, is summarized in Table 1.

2.3. Density Estimation Model

The density map data used in the framework contained objects of various scales, which were observed to be distributed and overlapped. Therefore, the learning model required the capability to extract multiscale features and preserve the spatial information of features. Therefore, the feature pyramid network (FPN) was adopted as the student–teacher model, enabling the extraction and fusion of feature vectors of various sizes to obtain a robust feature map while restoring resolution to preserve positional information [29]. Figure 4 shows the FPN model used in the seed counting framework. Features (z) extracted by the “backbone” pass through a top-down pathway, where high-level features are up-sampled and then concatenated with low-level features. Following this, the “head” receives the combined feature map and performs density map estimation. The backbone of the FPN adopts ResNet-50 [30], with the pooling layer removed, to extract the multiscale features of seeds. This approach has been reported to not only enhance feature extraction performance but also effectively extract features of various dimensions [31]. The head, used instead of the pooling layer removed, employs joint regression and classification modeling and is composed of a regression head, which outputs a density map, and a classification head, which outputs a density level class map. This method provides additional information regarding the density levels of pixels within the image, helping prevent learning from incorrect pseudo labels in semi-supervised learning. In addition, because the two distinct objectives are learned in a mutually complementary manner, the generalization of the model is enhanced and overfitting is prevented [32]. The regression head follows the common configuration of models used in the crowd counting task, comprising three convolutional layers, where the kernel size of each of the first two layers is set to 3 and that of the final layer to 1. The classification head is configured with two convolutional layers, each with a kernel size of 1, allowing for it to restore key features while introducing only a small number of additional parameters [25].

The input size of the model was 512 × 512 × 3 pixels (height × width × channel). In the top-down pathway, a downsampling ratio of 8, commonly used in the crowd counting domain utilizing density map estimation, was maintained. Based on a feature vector (z) with a size of 16 × 16 × 2048 pixels, feature maps of sizes 32 × 32 × 1024 pixels and 64 × 64 × 512 pixels were sequentially fused to construct a final feature map with a size of 64 × 64 × 512 pixels. The final feature map was fed into the head, which produced two outputs: a density map with a size of 64 × 64 × 1 pixels, representing the seed density within the image, and a class map with a size of 64 × 64 × 25 pixels, representing pixel-level density classes. These specifications are summarized in Table 2.

2.4. Semi-Supervised Training and Implementation

In this paper, a perturbation strategy was used in the mean teacher framework to enhance the robustness of semi-supervised learning. This method comprises weak perturbation and strong perturbation and has been reported to help the semi-supervised learning model maintain stable prediction performance under various noise conditions [33,34]. Figure 5 shows the semi-supervised learning process of the seed counting framework with perturbation applied.

We now discuss the supervised flow of the student model, in which only weak perturbation was applied. Weak perturbation involved cropping a random section of 512 × 512 × 3 pixels from the original image of a size of 1088 × 714 × 3 pixels, followed by applying random grayscale transformation and horizontal flipping to generate the input image. This method enabled the model to learn from diverse distributions without relying on color statistics. The training process in the supervised flow was conducted as standard supervised learning. The student model generated outputs using labeled source data with weak perturbation applied. The supervised loss was calculated by comparing the model’s output with the GT, and the student model’s parameters were updated through backpropagation. The supervised loss used in the training process was composed of the sum of loss functions, particularly designed for the tasks of regression and classification in each head. In the regression head, the regression loss was based on the structural similarity index measure (SSIM) between the density map output by the model and the GT density map. In the classification head, the classification loss comprised cross-entropy loss and dice loss between the probability map output by the model and the GT probability map. The GT probability map was generated by defining 25 density levels in the GT density map, wherein the density value of each pixel in the density map was converted into a corresponding density level class. Consequently, supervised learning was performed on the labeled source data, wherein the distinction between the background and seeds was clear, enabling the student model to extract features of the seeds and acquire counting capabilities based on density map estimation. After supervised flow, the parameters of the teacher model were updated through the EMA of the weights from the student model. This allowed for the teacher model to avoid the overfitting that may have originated from the student model’s supervised learning on limited labeled data and to deliver generalizable feature extraction performance. Furthermore, the quality of pseudo labels provided by the teacher model to the student model for unlabeled data was observed to be enhanced [25,35].

We now discuss the unsupervised flow of semi-supervised learning, wherein both weak perturbation and strong perturbation were applied. The training process of the unsupervised flow proceeded by using the teacher model’s output obtained on unlabeled source data, with weak perturbation applied, as pseudo labels. The unsupervised loss was then calculated between the student model’s output on unlabeled target data, with strong perturbation applied, and the pseudo labels. Finally, the student model was updated through backpropagation. The strong perturbation used in the training followed the same process as that for weak perturbation, with the addition of a cutout. Notably, cutout is a technique that removes a specific region of an image and fills it with zero pixels. This ensured that the student model of semi-supervised learning did not overfit to the pixel-level features of the augmented background, allowing for it to effectively extract features by capturing the contextual distribution of seeds across the entire image [33,34]. Herein, a 32 × 32 mask was applied to occlude 30% of the image area, a configuration reported to have delivered high performance. The training objective of the unsupervised flow was to maintain prediction consistency between the teacher model and the student model, and therefore, the unsupervised loss employed the same L1 loss for both heads. Consequently, the student model was updated on unlabeled data, enabling it to acquire robust feature extraction capabilities for seeds and perform density map reconstruction, irrespective of the background. Moreover, the teacher model, updated through EMAs of the student model’s parameters, was able to provide pseudo labels with background-invariant features to the student model [19,25].

In the training implementation, the model used a pretrained model on ImageNet. The AdamW optimizer was used for updating the model parameters, as it efficiently applies weight decay to prevent overfitting [36]. The learning rate was set to 2 × 10⁻⁵ and the weight decay to 1 × 10⁻⁴. The model was trained for 100 epochs, and the model parameters were updated only when the validation loss was minimized. During training, the batch size of the training data was set to eight, with a 1:1 ratio of labeled data to unlabeled data, meaning that each batch contained four labeled images and four unlabeled images. The training was conducted with a predefined dataset split, ensuring that samples assigned as unlabeled data were consistently treated as such throughout the learning process. The model training was implemented using Python 3.10.11 on PyTorch 2.4.1. The system hardware specifications included an Intel Core i7-12700F CPU (Intel Corporation, Santa Clara, CA, USA) and an NVIDIA GeForce RTX 4090 GPU (NVIDIA Corporation, Santa Clara, CA, USA) with 16,384 CUDA cores and a 2520 MHz boost clock.

2.5. Evaluation

In this study, the seed counting framework was evaluated by comparing the seed counting performance based on density map estimation with general supervised learning (baseline). Density map estimation was performed on the target data and evaluated by comparing it with the GT from the source data before background augmentation. The counting performance was evaluated as per the proportion of labeled data relative to the training datasets: 5%, 10%, and 40% [37]. In the baseline, supervised learning was conducted using four types of training datasets: only source data and mixed datasets wherein the target data constituted 5%, 10%, and 40% of the total training data (source-to-target ratios of 95:5, 90:10, and 60:40). In the proposed framework, supervised flow was performed on 5%, 10%, and 40% of the source data in the training dataset, with the remaining source data and the corresponding target data used for unsupervised flow. The following evaluation metrics were used to compare the methods in terms of seed counting performance: mean absolute error (MAE), which quantifies the absolute difference between measured and predicted seed counts (as shown in Equation (2)), mean relative error (MRE), which measures the relative difference based on the seed count range (as shown in Equation (3)), and coefficient of determination (R²), which indicates the reliability of model predictions. Moreover, a paired t-test was performed to evaluate the statistical significance of the detection results across labeled-data proportions.

M A E = \frac{1}{N} \sum_{i = 1}^{N} |C_{E s t}^{i} - C_{G T}^{i}|

(2)

M R E = \frac{1}{N} \sum_{i = 1}^{N} \frac{|C_{E s t}^{i} - C_{G T}^{i}|}{C_{G T}^{i}} \times 100 %

(3)

where

N

denotes the number of test samples,

C_{E s t}^{i}

the estimated seed count in the

i

th test sample, and

C_{G T}^{i}

the GT seed count in the

i

th test sample.

The seed counting framework was also compared with supervised learning methods that incorporated other data augmentation techniques, namely Mixup [21], Cutout [22], and Cutmix [23], as shown in Figure 6. It was further compared with state-of-the-art detection-based counting methods, including Faster R-CNN [16] and YOLOv8-nano [38]. Statistical comparisons among strategies were conducted using a one-way analysis of variance (ANOVA) and Tukey’s honest significant difference (HSD) test. Furthermore, the quality of the density maps was evaluated using two standard metrics: the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) [39]. To further assess the robustness of each method, Gaussian noise was added to the input images at three levels: weak (σ = 10), moderate (σ = 20), and strong (σ = 30), and seed counting performance was compared across methods [40].

3. Result and Discussion

3.1. Model Training

Figure 7 shows the training loss and mean squared error (MSE) of seed counting for different learning methods under the 40% labeled data. The training loss and MSE were calculated on the target data from the validation dataset across training epochs. The training was terminated at 100 epochs, as we did not observe any further substantial decrease in the supervised validation loss. The model weights were selected at the point where the validation loss reached its minimum, indicating that overfitting could be effectively prevented. In the supervised loss curves, different convergence patterns were observed depending on the learning methods. The baseline showed rapid fluctuations prior to convergence, whereas our method exhibited a sharp decline followed by a steady decrease. This difference indicated that our framework provided a more stable learning process than supervised learning. In the MSE plots, our method initially exhibited a higher MSE than the baseline in the early stages of training but quickly stabilized. The baseline showed a stable initial trend but exhibited large fluctuations after 10 epochs, particularly on the mixed dataset, wherein extreme variations were observed after 20 epochs. These fluctuations suggested that when learning from datasets with mixed domains, our method was more effective at suppressing erroneous learning flow in comparison with supervised learning. For all methods, both the losses and MSE converged, confirming that the models had been successfully trained to perform seed counting through density map estimation on the target data [41,42].

3.2. Density Map Estimation

Figure 8 shows the density map estimations and seed counting results obtained on six representative test samples. The results are organized into rows, displaying the source data, target data, density maps used as the GT for seed count, and results predicted by two baseline methods and the proposed framework. The number at the bottom-right corner of each density map denotes the estimated seed count. For the baseline trained exclusively on the source data, poor-quality density maps and notable seed counting errors were observed because of the lack of feature learning for the target data. By contrast, the baseline trained on datasets incorporating the target data produced density maps less influenced by external noise and delivered a relatively acceptable counting performance. However, these density maps often incorrectly represented only high-density regions, which were inconsistent with the GT. This result suggested that the baseline overfitted to specific data or high-density regions during the training process that included target data. However, the proposed method delivered comparable counting performance to supervised learning methods while generating density maps that closely resembled the GT in terms of visual quality. These results highlighted that labeled data and feature representation learning for the detection target were necessary for supervised learning. However, our framework offered the advantage of avoiding overfitting by not requiring direct feature learning for the augmented background. In addition, it was able to learn domain-invariant features that were not biased toward specific environments, providing robustness and generalization for the evaluation of seed counting performance across different backgrounds.

3.3. Seed Counting

Seed counting performances obtained on test data using learning methods and labeled-data proportions is shown in Figure 9, evaluated using the MAE and MRE metrics. For the baseline, the evaluation metrics exhibited more substantial fluctuations compared to our method as the labeled-data proportion was increased. This result highlighted the dependence of traditional supervised learning on the scale of labeled data, whereas the method used herein exhibited less label dependence. In particular, except for the case of 40% labeled data, our method reduced MAE by 67.3–71.4% and MRE by 71.9–77.5% relative to the baseline. At 40% labeled data, the baseline reduced MAE by 23.3% and MRE by 35.8%; however, the performance variation was relatively smaller compared to other comparisons. Comparing our method at 5% labeled data, which resulted in the lowest error, with the baseline at 40% labeled data, the baseline exhibited a 2.9% reduction in MAE but a 4.9% increase in MRE. This comparison demonstrated that the proposed method was able to deliver performance comparable to that of supervised learning, even with a small amount of labeled data, while also ensuring more generalizable counting performance across diverse seed count ranges.

The statistical analysis results of the detection outcomes are presented in Table 3. The baseline exhibited significant differences with the GT both at 5% and 10% labeled data. However, our method exhibited no statistically significant differences in detection results across different labeled-data proportions. These findings supported that the proposed method delivered more stable seed counting performance under conditions of data scarcity in comparison with supervised learning.

Figure 10 shows the seed counting performance evaluated on field data. The results were obtained using the baseline and our method, both trained on 5% of the labeled data. In addition, to compare performance variation across domains, results on the test data are also provided. The seed counting was conducted without further training, and the evaluation was performed using the MAE (left) and MRE (right). Both methods exhibited substantial performance degradation on the field data. However, our method achieved 56.2% and 3.1% lower MAE and MRE, respectively, compared to the baseline. This comparison demonstrates the superior robustness and generalization performance of the proposed method across diverse environments.

While our method exhibited lower MAE and MRE than the baseline, the MRE remained high in absolute terms. This may be attributed to the complexity of real-world conditions and the inherent characteristics of MRE, which tends to amplify relative errors in samples with low seed counts. To further investigate this issue, we additionally performed a linear regression analysis between the estimated and measured seed counts to assess the prediction trend. Figure 11 shows the results of the linear regression analysis performed on the test data (left column) and the field data (right column). Both methods exhibited a strong linear relationship on the 51 test samples; notably, our method (lower row) achieved an R² of 0.99, which is higher than that of the baseline (upper row). On the 73 field samples, the baseline exhibited almost no linear correlation, with an R² of 0.01, whereas the proposed method showed a reduced R² of 0.82 but still maintained a strong linear correlation. These results validated that our method delivered better seed counting performance than the conventional process while exhibiting higher robustness to variations in data distribution, even with a limited amount of labeled data. This confirmed that the background augmentation method utilizing the proposed device and the learning strategy of the presented framework could be successfully adapted as a seed counting tool to ensure generalization performance across various domains. Furthermore, the fact that sufficient performance was delivered using only a small amount of labeled data collected indoors at low cost suggested that our method possessed the potential for versatile application across diverse environments.

3.4. Comparison and Discussion

The framework presented in this study enabled seed counting, even in the case of limited labeled data, by utilizing background augmentation and semi-supervised learning. This approach offered potential advantages over conventional process in the agricultural domain, where field data are often scarce. However, our framework has several limitations despite these advantages. First, throughout the “seed counting performance” evaluation process, the detection performance on the target data declined as the proportion of labeled data was increased. This decline originated from the reduction in unlabeled data in the unsupervised flow. This phenomenon indicated that while our method was less dependent on the scale of labeled data in comparison with conventional supervised learning-based approaches for seed counting, it still exhibited some dependency on the data scale. Nevertheless, by utilizing the proposed indoor test equipment, data collection could be conducted at a lower cost in comparison to traditional field experiments. Moreover, labeling is performed in an easily distinguishable background that does not require domain expertise, and the training process requires only a small amount of labeled data, which remains a notable advantage. Second, real-world applications had diverse soil conditions. In particular, soil color, which varies depending on its composition, is a visually dominant characteristic and may influence counting performance [20]. However, the generalization of the seed counting framework could not be fully ensured because the data used in our method were augmented based on only a single background environment. Figure 12 shows the density maps generated by conventional data augmentation techniques used in supervised learning and the proposed method. The comparison was conducted by applying Mixup [21], Cutout [22], and Cutmix [23] to the baseline training method at 5% labeled data. The density estimation was conducted using ten representative samples extracted from the field data. The results indicated that the baseline and Mixup failed to generate valid density maps on the field samples, making seed counting infeasible. In contrast, Cutout, Cutmix, and our method were able to generate density maps that were visually similar to the GT seed distributions. However, Cutout and Cutmix exhibited a tendency to overestimate both the density and the number of seeds compared to our method. This overestimation appeared to result from misidentifying background noise or surrounding patterns as seeds in domain environments that were not included during training. These results demonstrate that our approach, despite the use of a single background during augmentation, can still generalize effectively to unknown domains. The proposed approach, which focuses on robust seed feature extraction, produced stable and visually consistent density maps that were less affected by background variations. This suggests its potential to mitigate domain-specific biases, even under limited diversity in training conditions.

Figure 13 shows two representative cases of the overestimation of counts: Case A shows overestimation caused by overlapped seeds, and Case B shows false positives produced by background noise such as straws. Accurately determining the number and distribution of seeds in real-world scenarios is challenging when using two-dimensional images, particularly in cases of severe three-dimensional stacks [10,12,15]. Such difficulties can limit the practical accuracy of seed counting, especially under improper sowing conditions or when background objects resemble seeds [7,8,9]. These results suggest that, although the proposed method provides more generalized density map estimation under background variation compared to other methods, performance degradation due to overlapping seeds and the occurrence of overestimation remain limitations.

Table 4 presents a comparison of seed counting errors and density map quality between our method and other methods on both the test data and the field data. When trained with 5% labeled data, all methods reduced the seed counting error of the baseline on the test samples. Among the density estimation-based methods, Cutmix showed the lowest error, while Faster R-CNN [16] achieved the lowest error among the detection-based methods. However, our method achieved a 34.4% lower MAE compared to both Cutmix and Faster R-CNN. For the field samples, Cutmix showed an even higher error than the baseline, which appears to be due to the negative impact of pixel-level feature manipulation on inference in real-world backgrounds in contrast to its effect on test samples. Similarly, Mixup, which also manipulates pixel-level features, showed only a marginal reduction in MAE. In contrast, Cutout, which removes parts of the image, demonstrated the highest generalization ability among the density estimation-based methods. Furthermore, our proposed method achieved an 11.2% lower MAE than Cutout, confirming that background augmentation can provide more robust feature representations for seed counting. Detection-based methods exhibited lower errors than density estimation-based methods in the field samples, likely due to the limitations of density map estimation observed in the failure cases (as shown in Figure 13). However, our method achieved a 6.4% lower MAE than the best-performing detection-based method, Faster R-CNN, indicating that effective seed counting can be achieved through density estimation with background augmentation, even in challenging field environments. Additionally, the proposed method was consistently included in the best-performing statistical group for error across test and field datasets. This supports the finding that our method achieved comparable or superior results to other methods.

With respect to density map quality, the proposed method produced the highest-quality density maps on the test samples, while other augmentation methods resulted in lower-quality maps. The pronounced differences in the PSNR suggest that conventional augmentation methods may have overfitted to artificial patterns introduced during augmentation, rather than capturing real-world field distributions. A similar pattern was observed in the field samples. The proposed method achieved relatively high-quality density maps among all methods that produced visually valid estimations. However, methods that failed to generate reasonable density maps, such as the baseline and Mixup (as shown in Figure 12), showed the highest SSIM and PSNR values. This is because these metrics focus on structural similarity and pixel-level differences, which do not always reflect human visual perception. Moreover, since seeds occupy only a small area of the image, missing detections have a limited impact on these scores, whereas severe overestimation introduces structural distortions that reduce the SSIM and PSNR. These findings indicate that conventional augmentation was insufficient for learning background-invariant representations with limited labeled data. In contrast, background augmentation provided natural variations that improved model robustness and adaptability in diverse real-world applications. Thus, our framework demonstrated clear advantages over other augmentation methods in addressing generalization challenges in seed counting for diverse agricultural environments.

Figure 14 shows the results of seed counting after adding Gaussian noise to the field samples [40]. As the noise level increased, most methods exhibited an increase in MAE, and under strong noise conditions, all methods—both density estimation-based and state-of-the-art detection-based—showed substantially high errors. However, some methods, such as the baseline and Mixup, exhibited relatively small or inconsistent changes in MAE despite increasing noise levels. This may be due to the already low reliability of seed counting in real-world field images, which are inherently noisy and complex. In contrast, the proposed method consistently maintained the lowest errors across varying noise levels and demonstrated a relatively stable performance. These results suggest that the proposed approach maintains robustness and stability in seed counting performance under noisy conditions compared to the other methods.

4. Conclusions

This study proposed a density map-based rice seed counting method using a semi-supervised framework with background-augmented data to automate the evaluation of direct seeding performance. The results showed that our method generated density maps that were visually most similar to the GT and delivered reasonable seed counting performance, even in field domains that had not been included during training. However, some counting errors remained due to the model’s insufficient density estimation ability. In particular, some samples exhibited a tendency to overestimate density relative to the GT, especially in regions with stacked seeds, which is a limitation of two-dimensional image analysis.

Nevertheless, the proposed method offers a more practical approach to seed counting compared to existing methods. This method has the potential to enable low-cost data collection and reduce reliance on large labeled datasets, making it particularly promising in agricultural environments with diverse backgrounds. However, for our method to be efficiently utilized as an automatic seed counting system for real-world evaluation, additional improvements are needed, such as achieving real-time performance on edge devices. In particular, although real-time counting is essential for practical deployment, this requirement is not met by the proposed framework. For practical deployment in agricultural applications, future studies should focus on reducing model complexity through techniques such as model compression. If future research successfully addresses these requirements, the proposed approach has the potential to support the scalable and cost-effective monitoring of sowing uniformity in large-scale and unknown environments. Furthermore, this ability is expected to provide promising prospects for advancing precision agriculture. In particular, by offering timely visual information to support data-driven decision-making, our approach may enhance operational efficiency and promote long-term sustainability in agricultural systems.

Author Contributions

Conceptualization, B.-G.S. and D.-H.L.; methodology, B.-G.S. and D.-H.L.; software, B.-G.S.; validation, C.-G.L., S.-H.Y. and D.-H.L.; formal analysis, C.-G.L. and Y.-H.K.; investigation, C.-G.L.; resources, Y.-H.K.; data curation, B.-G.S.; writing—original draft preparation, B.-G.S.; writing—review and editing, B.-G.S., S.-H.Y. and D.-H.L.; visualization, B.-G.S. and D.-H.L.; supervision, S.-H.Y.; project administration, S.-H.Y.; funding acquisition, S.-H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Cooperative Research Program for Agriculture Science and Technology Development (Combined Work Type High Performance Field Crop Precision Planting & Transplanting Technology Development, RS-2021-RD009653), Rural Development Administration (RDA), Republic of Korea, and Chungnam National University (CNU), Republic of Korea (2022-0694-01).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fukagawa, N.K.; Ziska, L.H. Rice: Importance for global nutrition. J. Nutr. Sci. Vitaminol. 2019, 65, S2–S3. [Google Scholar] [CrossRef]
Wang, X.; Li, Z.; Tan, S.; Li, H.; Qi, L.; Wang, Y.; Chen, J.; Yang, C.; Chen, J.; Qin, Y.; et al. Research on density grading of hybrid rice machine-transplanted blanket-seedlings based on multi-source unmanned aerial vehicle data and mechanized transplanting test. Comput. Electron. Agric. 2024, 222, 109070. [Google Scholar] [CrossRef]
Ali, A.M.; Thind, H.S.; Singh, V.; Singh, B. A framework for refining nitrogen management in dry direct-seeded rice using GreenSeeker™ optical sensor. Comput. Electron. Agric. 2015, 110, 114–120. [Google Scholar] [CrossRef]
Hwang, W.H.; Jeong, J.H.; Lee, H.S.; Yang, S.Y.; Lee, C.K.; Lim, Y.H.; Cho, S.H.; Min, H.K.; Kim, S.K.; Nam, J.W.; et al. Proper growing regions and management practices for improving production stability in direct-seeded rice cultivation. Korean J. Crop Sci. 2019, 64, 336–343. [Google Scholar] [CrossRef]
Lee, C.M.; Kim, C.S.; Shin, W.C.; Baek, M.K.; Park, H.S.; Ko, J.C.; Kim, J.J.; Suh, J.P.; Jeong, O.Y.; Lee, K.M.; et al. ‘Saebonghwang’: A high grain quality mid-late-maturing rice cultivar adaptable to direct seeding and transplanting cultivation. Korean J. Breed. Sci. 2024, 56, 147–159. [Google Scholar] [CrossRef]
Xie, C.; Zhang, D.; Yang, L.; Cui, T.; Yu, T.; Wang, D.; Xiao, T. Experimental analysis on the variation law of sensor monitoring accuracy under different seeding speed and seeding spacing. Comput. Electron. Agric. 2021, 189, 106369. [Google Scholar] [CrossRef]
Liu, W.; Zhou, Z.; Xu, X.; Gu, Q.; Zou, S.; He, W.; Luo, X.; Huang, J.; Lin, J.; Jiang, R. Evaluation method of rowing performance and its optimization for UAV-based shot seeding device on rice sowing. Comput. Electron. Agric. 2023, 207, 107718. [Google Scholar] [CrossRef]
Moreno, F.G.; Zimmermann, G.G.; Jasper, S.P.; da Silva Ferraz, R.; Savi, D. Sensors installation position and its interference on the precision of monitoring maize sowing. Smart Agric. Technol. 2023, 4, 100150. [Google Scholar] [CrossRef]
Xie, C.; Zhang, D.; Yang, L.; Cui, T.; He, X.; Du, Z. Precision seeding parameter monitoring system based on laser sensor and wireless serial port communication. Comput. Electron. Agric. 2021, 190, 106429. [Google Scholar] [CrossRef]
Peng, J.; Yang, Z.; Lv, D.; Yuan, Z. A dynamic rice seed counting algorithm based on stack elimination. Measurement 2024, 227, 114275. [Google Scholar] [CrossRef]
Zhang, W.; Zhao, B.; Gao, S.; Zhu, Y.; Zhou, L.; Niu, K.; Qiu, Z.; Jin, X. Design and experiment of an intelligent testing bench for air-suction seed metering devices for small vegetable seeds. Biosyst. Eng. 2024, 245, 84–95. [Google Scholar] [CrossRef]
Lin, P.; Chen, Y.M.; He, Y.; Hu, G.W. A novel matching algorithm for splitting touching rice kernels based on contour curvature analysis. Comput. Electron. Agric. 2014, 109, 124–133. [Google Scholar] [CrossRef]
Jing, H.; Peiyuan, L.; Hanwei, C. Research on the rice counting method based on connected component labeling. In Proceedings of the Sixth International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Zhangjiajie, China, 10–11 January 2014; pp. 552–555. [Google Scholar] [CrossRef]
Wu, W.; Zhou, L.; Chen, J.; Qiu, Z.; He, Y. GainTKW: A measurement system of thousand kernel weight based on the android platform. Agronomy 2018, 8, 178. [Google Scholar] [CrossRef]
Tan, S.; Ma, X.; Mai, Z.; Qi, L.; Wang, Y. Segmentation and counting algorithm for touching hybrid rice grains. Comput. Electron. Agric. 2019, 162, 493–504. [Google Scholar] [CrossRef]
Sun, J.; Zhang, Y.; Zhu, X.; Zhang, Y.D. Deep learning optimization method for counting overlapping rice seeds. J. Food Process Eng. 2021, 44, e13787. [Google Scholar] [CrossRef]
Feng, A.; Li, H.; Liu, Z.; Luo, Y.; Pu, H.; Lin, B.; Liu, T. Research on a rice counting algorithm based on an improved MCNN and a density map. Entropy 2021, 23, 721. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Zhou, D.; Chen, S.; Gao, S.; Ma, Y. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 589–597. [Google Scholar] [CrossRef]
Qian, Y.; Zhang, L.; Guo, Z.; Hong, X.; Arandjelović, O.; Donovan, C.R. Perspective-assisted prototype-based learning for semi-supervised crowd counting. Pattern Recognit. 2025, 158, 111073. [Google Scholar] [CrossRef]
Ibáñez-Asensio, S.; Marqués-Mateu, A.; Moreno-Ramón, H.; Balasch, S. Statistical relationships between soil colour and soil attributes in semiarid areas. Biosyst. Eng. 2013, 116, 120–129. [Google Scholar] [CrossRef]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar] [CrossRef]
DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar] [CrossRef]
Yun, S.; Han, D.; Chon, S.; Oh, S.J.; Yoo, Y.; Choe, J. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6022–6031. [Google Scholar] [CrossRef]
Divyanth, L.G.; Guru, D.S.; Soni, P.; Machavaram, R.; Nadimi, M.; Paliwal, J. Image-to-image translation-based data augmentation for improving crop weed classification models for precision agriculture applications. Algorithms 2022, 15, 401. [Google Scholar] [CrossRef]
Qian, Y.; Hong, X.; Guo, Z.; Arandjelović, O.; Donovan, C.R. Semi-supervised crowd counting with contextual modeling: Facilitating holistic understanding of crowd scenes. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 8230–8241. [Google Scholar] [CrossRef]
Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; p. 30. [Google Scholar]
Rural Development Administration. Agricultural Technology Guide 169: Cultivation Method by Direct Sowing, 2nd ed.; RDA: Jeonju, Republic of Korea, 2018. [Google Scholar]
Kim, T.; Lee, D.H.; Kim, W.S.; Zhang, B.T. Domain adapted broiler density map estimation using negative-patch data augmentation. Biosyst. Eng. 2023, 231, 165–177. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar] [CrossRef]
Liu, M.; Zhang, J.; Adeli, E.; Shen, D. Joint classification and regression via deep multi-task multi-channel learning for Alzheimer’s disease diagnosis. IEEE Trans. Biomed. Eng. 2019, 66, 1195–1206. [Google Scholar] [CrossRef]
Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C.A.; Cubuk, E.D.; Kurakin, A.; Li, C.L. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In Proceedings of the 33rd Annual Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020; pp. 596–608. [Google Scholar]
Yang, Y.; Sun, G.; Zhang, T.; Wang, R.; Su, J. Semi-supervised medical image segmentation via weak-to-strong perturbation consistency and edge-aware contrastive representation. Med. Image Anal. 2025, 101, 103450. [Google Scholar] [CrossRef]
Qian, Y.; Zhang, L.; Hong, X.; Donovan, C.; Arandjelović, O. Segmentation assisted u-shaped multi-scale transformer for crowd counting. In Proceedings of the British Machine Vision Conference (BMVC 2022), London, UK, 21–24 November 2022; p. 397. [Google Scholar]
Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar] [CrossRef]
Lin, H.; Ma, Z.; Hong, X.; Wang, Y.; Su, Z. Semi-supervised crowd counting via density agency. In Proceedings of the 30th ACM International Conference on Multimedia (MM 2022), New York, NY, USA, 10–14 October 2022; pp. 1416–1426. [Google Scholar] [CrossRef]
Ma, N.; Su, Y.; Yang, L.; Li, Z.; Yan, H. Wheat Seed Detection and Counting Method Based on Improved YOLOv8 Model. Sensors 2024, 24, 1654. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Hendrycks, D.; Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. arXiv 2019. [Google Scholar] [CrossRef]
Krijthe, J.H.; Loog, M. The peaking phenomenon in semi-supervised learning. In Proceedings of the Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR 2016), Mérida, Mexico, 30 November–2 December 2016; pp. 299–309. [Google Scholar] [CrossRef]
Yang, X.; Song, Z.; King, I.; Xu, Z. A survey on deep semi-supervised learning. IEEE Trans. Knowl. Data Eng. 2023, 35, 8934–8954. [Google Scholar] [CrossRef]

Figure 1. Overall seed counting framework based on semi-supervised learning with background augmentation.

Figure 2. A seeding image acquisition device capable of background augmentation, comprising a reference surface, a camera, a reference background, and an augmented background.

Figure 3. Dataset construction process in which each pair of training examples consisted of a reference background image and an augmented background image for semi-supervised learning.

Figure 4. The feature pyramid network framework for seed counting with ResNet-50 as the backbone, including a top-down pathway, a regression head for density map estimation, and a classification head for density level classification.

Figure 5. Semi-supervised learning framework for seed counting. The student model is trained to estimate the density map and class map through supervised learning, while knowledge is transferred to the teacher model via EMA. The teacher model, which maintains an EMA of the student model’s parameters, generates more stable predictions. This interaction between the teacher and student models enables unsupervised learning, improving generalization and robustness by leveraging both labeled and unlabeled data.

Figure 6. Data augmentation methods compared with our approach: (a) source data, (b) target data, (c) Mixup, (d) Cutout, and (e) Cutmix.

Figure 7. Supervised loss curves (left) and MSE plots for seed counting (right) for different learning methods with 40% labeled data.

Figure 8. Estimated density maps and predicted seed counts for the test sample. The results of six samples are shown in six columns, with each sample comprising six rows: the source image, target image, ground truth, and the outputs from three methods—baseline (only source data), baseline (a source-to-target ratio of 60:40), and our framework (40% labeled data). Color represents predicted density, with red indicating higher density and blue indicating lower density. The number in the lower-right corner of each image indicates the predicted seed count.

Figure 9. Seed counting performance measured by MAE (left) and MRE (right) across learning methods and labeled-data proportions.

Figure 10. Comparison of seed counting performance between test and field data, evaluated by MAE (left) and MRE (right).

Figure 11. Scatter plots showing the relationships between estimated and measured seed counts for the baseline (a) and our method (b). Linear regressions are shown for the test samples (left) and for the field samples (right).

Figure 12. Density map estimation results using field samples. Ten samples are shown in eight columns, and each sample consists of seven rows: the input image, ground truth, and outputs from baseline and four data augmentation methods—Mixup [21], Cutout [22], Cutmix [23], and our method. Color represents predicted density, with red indicating higher density and blue indicating lower density. The number in the lower-right corner of each image indicates the predicted seed count.

Figure 13. Example of failure cases in seed counting observed in the proposed method. Color represents predicted density, with red indicating higher density and blue indicating lower density. The number in the lower-right corner of each image indicates the predicted seed count.

Figure 14. Comparison of seed counting performance under different levels of Gaussian noise (σ = 0, 10, 20, 30), measured by MAE on field samples.

Table 1. Dataset composition with number of samples by data acquisition environment.

Data Source	Data Domain	Number of Samples
Data Source	Data Domain	Training	Validation	Test	Total
Indoor	Known	159	40	51	250
Field	Unknown	-	-	73	73

Table 2. Summary of model configuration.

Backbone	Input Size (Height × Width × Channel)	Params (M)	FLOPs (M)
Resnet-50 [30]	512 × 512 × 3	72.0	222.7

Table 3. Comparison of differences between measured and predicted values of seed counting.

Method	Ground Truth	p Value
Method	Ground Truth	5% Labeled Data	10% Labeled Data	40% Labeled Data
Baseline	88.5 ± 58.7	0.007 **	0.002 **	0.104
Our method	88.5 ± 58.7	0.055	0.736	0.065

Average

\pm

standard deviation. ** p < 0.01

Table 4. Evaluation results of counting performance on test and field datasets.

Method	MAE		SSIM		PSNR (dB)
Method	Test	Field	Test	Field	Test	Field
Baseline	11.93 ± 13.10 ^a	50.91 ± 55.29 ^a	0.87	0.94	18.9	21.1
Mixup [21]	7.25 ± 5.16 ^b	47.98 ± 50.54 ^a	0.86	0.94	16.7	21.9
Cutout [22]	10.19 ± 7.25 ^a,b	25.14 ± 21.75 ^b	0.85	0.88	15.1	16.8
Cutmix [23]	5.14 ± 4.98 ^b,c	53.74 ± 45.71 ^a	0.86	0.84	17.1	16.0
Faster-RCNN [16]	5.14 ± 7.41 ^b,c	23.86 ± 30.72 ^b	-	-	-	-
YOLOv8n [38]	7.49 ± 9.10 ^a,b	35.85 ± 43.14 ^a,b	-	-	-	-
Our method	3.37 ± 3.57 ^b,c	22.33 ± 20.03 ^b	0.87	0.88	20.5	19.0

Mean ± standard deviation. Different superscripts (a, b, c) within a column indicate significant differences at p < 0.05 (Tukey’s test). Abbreviations: MAE = mean absolute error; PSNR = peak signal-to-noise ratio; SSIM = structural similarity index.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sung, B.-G.; Lee, C.-G.; Kang, Y.-H.; Yu, S.-H.; Lee, D.-H. Semi-Supervised Density Estimation with Background-Augmented Data for In Situ Seed Counting. Agriculture 2025, 15, 1682. https://doi.org/10.3390/agriculture15151682

AMA Style

Sung B-G, Lee C-G, Kang Y-H, Yu S-H, Lee D-H. Semi-Supervised Density Estimation with Background-Augmented Data for In Situ Seed Counting. Agriculture. 2025; 15(15):1682. https://doi.org/10.3390/agriculture15151682

Chicago/Turabian Style

Sung, Baek-Gyeom, Chun-Gu Lee, Yeong-Ho Kang, Seung-Hwa Yu, and Dae-Hyun Lee. 2025. "Semi-Supervised Density Estimation with Background-Augmented Data for In Situ Seed Counting" Agriculture 15, no. 15: 1682. https://doi.org/10.3390/agriculture15151682

APA Style

Sung, B.-G., Lee, C.-G., Kang, Y.-H., Yu, S.-H., & Lee, D.-H. (2025). Semi-Supervised Density Estimation with Background-Augmented Data for In Situ Seed Counting. Agriculture, 15(15), 1682. https://doi.org/10.3390/agriculture15151682

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Supervised Density Estimation with Background-Augmented Data for In Situ Seed Counting

Abstract

1. Introduction

2. Materials and Methods

2.1. Task Overview

2.2. Dataset Construction

2.3. Density Estimation Model

2.4. Semi-Supervised Training and Implementation

2.5. Evaluation

3. Result and Discussion

3.1. Model Training

3.2. Density Map Estimation

3.3. Seed Counting

3.4. Comparison and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI