1. Introduction
Blood cells play an important role in blood circulation. All kinds of blood cells in the blood of different concentrations usually represent the health of the body [
1]. Even experienced blood specialists can find it difficult to detect blood disorders through a microscope alone, as each type of blood cell has subtle differences in texture, color, size, and morphology [
2]. Blood cell detection is a common and necessary medical detection measure and is used to evaluate various types of cells in the blood and status, including white blood cells, red blood cells and platelets. With the help of computer vision methods, recent research on blood cell detection can be broadly divided into two categories. The first is the traditional method, which uses a microscope to detect the shape, size, or other surface features of blood cells. Nevertheless, these methods cannot be leveraged to effectively eliminate feature differences between similar samples [
3]. The second is based on deep learning. Although traditional methods do not require much training data for experimentation, the use of microscopy has a great impact on recognition accuracy. At the same time, the microscope images may be accompanied by uneven illumination due to environmental changes such as color, contrast, and background. Traditional studies do not guarantee the reliability of blood cell detection. On the contrary, the deep learning- based detection method trained by large-scale labeled data can reduce the complex workload of cytologists and achieve satisfactory diagnostic accuracy [
4,
5]. However, the limitation of these studies is that they rely on a large amount of manually annotated data, which is very expensive, laborious, and requires specialized knowledge [
6].
To solve these problems, some scholars have studied semi-supervised blood cell detection methods, which can explore reliable information from large-scale unlabeled data and limited labeled data, thereby reducing a significant amount of labeling labor costs. Due to its excellent flexibility, many semi-supervised learning methods have been developed in computer-aided diagnosis [
7]. On this basis, we propose a new semi-supervised blood cell detection method based on YOLOv5-ALT to assist doctors in accurately diagnosing blood cell-related diseases.
In summary, to address the issues of high annotation cost for blood cell images and the heavy reliance of existing detection methods on large amounts of labeled data, this paper proposes a consistency regularization-based semi-supervised blood cell detection method called CRS-YOLOv5-ALT, which is built upon the YOLOv5-ALT framework. This method improves the utilization efficiency of unlabeled data and the generalization capability of the model under low-annotation conditions through dual-threshold pseudo-label filtering, consistency regularization, and Mixup data augmentation. In addition to detection accuracy comparisons, this study further analyzes repeated-run stability, pseudo-label evolution, filtering acceptance rate, pseudo-label quality, training convergence, ablation effects, and generalization performance to provide more comprehensive experimental evidence for the proposed method. The main contributions of this paper are as follows:
First, considering the characteristics of blood cell images such as small object scales and dense distribution, the C3SE module, SPPF module, and EIoU loss function are introduced based on YOLOv5 to construct a fundamental detection framework called YOLOv5-ALT tailored for blood cell detection tasks. This framework enhances the feature representation capability while improving detection speed and bounding box regression accuracy.
Second, a semi-supervised learning strategy combining dual-threshold pseudo-label filtering and consistency regularization is proposed. This strategy selects relatively reliable pseudo-label candidates by jointly utilizing confidence thresholds and prediction entropy thresholds and imposing consistency constraints on unlabeled images and their perturbed augmented versions. Consequently, the quality of unlabeled data utilization is improved.
Third, a Mixup data augmentation strategy tailored for object detection tasks is introduced. By mixing pseudo-label-related data with data obtained from consistency learning, the diversity of training sample distribution is enhanced, which alleviates the overfitting problem under small-sample conditions and further improves the generalization capability and detection performance of the model.
Fourth, comprehensive experiments are conducted to evaluate the effectiveness and reliability of the proposed method. In addition to comparisons with representative semi-supervised and fully supervised detection algorithms, repeated-run statistics, pseudo-label evolution analysis, filtering acceptance-rate analysis, confidence- and entropy-based pseudo-label reliability estimation, training convergence analysis, and expanded ablation experiments are provided to support the analysis of pseudo-label reliability, training stability, and experimental consistency.
2. Related Work
This section reviews the related work involved in this study, mainly including blood cell detection methods, fully supervised blood cell detection methods based on YOLO, and the research progress of semi-supervised object detection methods in blood cell detection tasks. By organizing the above studies, the existing problems in method design and data utilization in current blood cell detection tasks can be further clarified, thereby providing a basis for the research motivation and technical route of the proposed method.
2.1. Blood Cell Detection Methods
Blood cell detection is primarily used for blood cell counting in practical applications. Traditional methods mainly include instrument-based counting, manual counting, and image processing-based approaches. Although the first two types of methods have high reliability, they are usually time-consuming and labor-intensive; while image processing-based methods have certain advantages in automation, their feature extraction relies on manual design, resulting in limited robustness in complex scenarios. For example, Di Ruberto et al. [
8] proposed a detection framework combining Edge Boxes candidate region generation with cellular morphological constraints, achieving good performance on datasets such as ALL-IDB, but deficiencies still exist under complex backgrounds.
In recent years, with the widespread application of deep learning in the field of object detection, medical imaging has also begun to receive attention and influence from deep learning technologies. Currently, deep learning-based object detection algorithms can be broadly divided into two categories. One is the two-stage detection algorithm, which uses Region Proposal Networks to extract candidate target information, generates a large number of candidate boxes to distinguish foreground and background in the image, and then corrects anchors through regression functions to obtain the most precise locations. For example, R-CNN, Faster-RCNN, etc. The other is the one-stage object detection algorithm, which directly treats object detection as a strong regression problem to be solved, can be directly applied to input images, and outputs category and location information relative to preset anchor boxes. Representative algorithms include the YOLO series and the DETR series of object detection algorithms.
Compared with two-stage detection algorithms, one-stage detection algorithms do not require additional candidate region generation, have higher detection efficiency and simpler structures, and therefore demonstrate greater application potential in tasks requiring high real-time performance and deployment efficiency, such as industrial inspection [
9] and medical image analysis. Especially in blood cell detection scenarios, targets in images often have small scales and dense distributions; if two-stage methods with complex structures and high inference overhead are adopted, they are easily constrained by detection efficiency and practical application costs. Therefore, in recent years, researchers have begun to pay more attention to one-stage detection methods represented by YOLO and DETR.
2.2. Fully Supervised Blood Cell Detection Methods Based on YOLO
Among these one-stage detection methods, the DETR series models provide new research ideas for object detection. For example, Leng et al. [
10] proposed an end-to-end detection network based on improved DETR for white blood cell detection tasks. However, the DETR series models generally have problems such as slow convergence speed, high training resource requirements, and unstable performance in small object detection, while blood cell images precisely have the characteristics of small target scales, dense distributions, and subtle category differences. In contrast, the YOLO series models, with their end-to-end structure, efficient inference capability, and strong network modification flexibility, are more suitable for application in blood cell detection tasks. Therefore, fully supervised blood cell detection methods based on YOLO have gradually become an important research direction in this field.
For example, He [
3] improved YOLOv5s by integrating Transformer, BiFPN, CBAM, and the EIoU loss function, achieving an excellent balance between accuracy and efficiency with a computational cost only one-sixth of the original model. Liu Baizhen [
11] evaluated and optimized the YOLOv5 architecture, and the experimental results showed that YOLOv5 performed better on the specific BCCD dataset. Shakarami et al. [
12] improved the YOLOv3 model by introducing the EfficientNet backbone network and the DIoU loss function, achieving a mean Average Precision (mAP) of 89.86% on the BCCD dataset. In specific pathology detection scenarios, Naing et al. [
13] employed four YOLO algorithms—YOLOv3, YOLOv3-Tiny, YOLOv2, and YOLOv2-Tiny—to detect 15 types of AML blood cells in examination images, and the results showed that the performance of YOLOv3 was more reliable than the other three methods. Shah et al. [
14] compared four mainstream image processing algorithms, including the YOLOv3 model, and experiments on a dataset containing 364 images and 4888 annotations showed that YOLOv3 outperformed other models in both speed and accuracy. Furthermore, Xu et al. [
15] introduced an enhanced channel attention mechanism in the TE-YOLOF model, increasing detection accuracy to 90.3% while maintaining a low parameter count.
In summary, although fully supervised blood cell detection methods based on YOLO have achieved significant progress, such methods generally rely on large amounts of high-quality annotated data. In practical clinical applications, accurate annotation of blood cell images typically requires substantial time and effort from hematology experts, resulting in high annotation costs and difficulties in data acquisition, which in turn become important factors restricting further model deployment. Therefore, how to fully utilize large amounts of unlabeled data under limited annotation conditions to improve model detection performance in low-annotation scenarios has become an urgent problem to be solved. This also provides the necessity for introducing semi-supervised learning into blood cell detection tasks.
2.3. Semi-Supervised Object Detection Methods and Their Limitations in Blood Cell Detection
In deep learning-based blood cell detection methods, model performance depends not only on the algorithm design itself, but also to a large extent on sufficient and high-quality annotated data. However, in practical applications, a large number of blood cell images are often in an unannotated state, which, to some extent, limits the further optimization and promotion of detection models. Therefore, how to fully utilize large amounts of unlabeled data under limited annotated sample conditions has become a problem worthy of attention in the field of blood cell detection. Semi-supervised learning can jointly utilize a small number of annotated samples and a large number of unlabeled samples, offering significant advantages in scenarios where annotated data is scarce. In recent years, the application of semi-supervised learning in medical image analysis has become increasingly widespread, providing an effective approach to alleviate problems such as high annotation costs and insufficient annotated samples in medical images.
In semi-supervised learning research, Lee et al. [
16] first proposed the concept of pseudo-labeling and utilized unlabeled data to improve model performance through self-training strategies. Since then, many methods have continuously evolved on this basis. For example, Arazo et al. [
17] introduced Mixup data augmentation to alleviate the overfitting problem caused by pseudo-label errors; Laine et al. [
18] proposed a method based on consistency regularization, which improves the utilization effectiveness of unlabeled data by constraining the consistency of the prediction results before and after perturbation of samples. At the same time, semi-supervised learning has also shown good application potential in tasks such as medical image classification, recognition, and auxiliary diagnosis, with studies on single-cell recognition, red blood cell classification, breast cancer recognition, and COVID diagnosis all verifying its effectiveness [
19,
20,
21,
22,
23,
24].
It should be noted that the above studies are mostly oriented towards classification or recognition tasks. With the rapid development of object detection technology, researchers have begun to further explore the application of semi-supervised learning in object detection tasks. Jeong et al. [
25] proposed the CSD method based on the idea of consistency regularization, achieving effective utilization of unlabeled data by calculating the consistency loss of the prediction results before and after flipping of unlabeled images. Sohn et al. [
26] proposed the STAC method combining self-training and consistency regularization ideas. This method first uses the model to generate pseudo-labels for unlabeled data, removes low-confidence pseudo-labels through a threshold filtering mechanism, and then uses the augmented unlabeled data together with the original annotated data for training, while jointly optimizing supervised loss and unsupervised loss. The above studies show that semi-supervised learning has gradually expanded from classification tasks to object detection tasks, also providing important references for its application in blood cell detection.
Although semi-supervised object detection methods have achieved certain progress, there are still shortcomings in blood cell detection tasks: Existing methods mostly lack specialized design targeting the characteristics of blood cell images, and issues of pseudo-label errors and data augmentation adaptation still affect model performance. Based on this, this paper introduces semi-supervised learning into the blood cell detection task and constructs the CRS-YOLOv5-ALT model based on YOLOv5-ALT in order to fully utilize unlabeled data under limited annotation conditions and improve detection performance.
3. Method
This section provides a detailed description of the proposed CRS-YOLOv5-ALT method. To address the problems of limited labeled data and insufficient utilization of unlabeled data in blood cell detection tasks, this paper constructs a semi-supervised detection framework based on the YOLOv5-ALT baseline model, integrating dual-threshold pseudo-label filtering, consistency regularization, and Mixup augmentation strategies. The method is described in detail from the aspects of problem formulation, baseline model, pseudo-label filtering strategy, consistency regularization mechanism, and Mixup augmentation method.
3.1. Problem Statement
To address the problems of high annotation cost for blood cell images and the relative ease of obtaining unlabeled data, this paper proposes a consistency regularization-based semi-supervised blood cell detection method called CRS-YOLOv5-ALT, with its structure illustrated in
Figure 1. This method adopts YOLOv5-ALT as the baseline model. Based on a small amount of labeled data, it enhances the model’s ability to utilize unlabeled data through dual-threshold pseudo-label filtering, consistency regularization, and Mixup data augmentation, thereby improving blood cell detection performance in low-annotation scenarios.
Specifically, this paper first performs initial supervised training on YOLOv5-ALT using a small number of labeled blood cell images to obtain an initial model with basic detection capabilities. Subsequently, this model is used to perform inference on unlabeled images to generate candidate pseudo-labels, which are then filtered using confidence thresholds and prediction entropy thresholds to reduce the noise interference caused by low-quality pseudo-labels. On this basis, consistency regularization constraints are applied to unlabeled images and their perturbed augmented versions, enabling the model to maintain relatively stable prediction results under different input perturbations. Meanwhile, the Mixup data augmentation strategy is introduced during the subsequent training process to enhance the diversity of training samples and improve the generalization capability of the model. Through the above design, this paper constructs a semi-supervised blood cell detection framework suitable for low-annotation scenarios.
3.2. Baseline Model YOLOv5-ALT
Considering the characteristics of blood cell images, such as small objects, dense distribution, and significant morphological differences, this paper adopts YOLOv5-ALT as the baseline model, as shown in
Figure 2. For the blood cell detection task, this model introduces three improvements based on YOLOv5: (1) the C3SE channel attention module is introduced to enhance the representation capability of key features; (2) the SPPF module is adopted to replace the original SPP module to improve the efficiency of multi-scale feature fusion; (3) the bounding box regression loss function is replaced from CIoU Loss to EIoU Loss to improve localization accuracy and accelerate model convergence. Through these improvements, YOLOv5-ALT balances detection accuracy and computational efficiency while being more suitable for blood cell detection tasks.
3.3. Dual-Threshold Pseudo-Label Filtering
The pseudo-label method is an important approach for utilizing unlabeled data in semi-supervised object detection. In this paper, initial supervised training is first performed on YOLOv5-ALT using labeled data. The trained model is then used to perform inference on unlabeled images, obtaining candidate detection boxes and their corresponding category prediction results. To improve the reliability of pseudo-labels, this paper does not directly use all the prediction results. Instead, the candidate detection results are filtered, and only prediction boxes with high confidence and low uncertainty are retained as pseudo-labels.
Let the confidence of a candidate detection result be
, and its category probability distribution be
where
denotes the number of categories. Then, its prediction entropy is defined as
Prediction entropy characterizes the uncertainty of the candidate detection results in the class probability distribution and can be used to measure the reliability of the candidate results at the level of category discrimination. Specifically, a smaller prediction entropy indicates that the category discrimination of the detection result is more reliable. Based on this property, this paper adopts a dual-threshold filtering strategy that combines a confidence threshold and a prediction entropy threshold. In the pseudo-label generation process, the candidate detection results are first evaluated using confidence and prediction entropy. A candidate detection result is retained only when it satisfies the following condition:
where
and
denote the confidence threshold and the entropy threshold, respectively. In the experiments,
is set to 0.10 and
is set to 0.50. These two thresholds were selected based on preliminary validation under the low-annotation setting. Since only a small proportion of labeled samples is available, an overly high confidence threshold may discard many potentially useful candidate detections. Therefore, a relatively low confidence threshold is adopted to retain sufficient candidate boxes, while the entropy threshold is used to further suppress predictions with high category uncertainty. This combination provides a balance between pseudo-label coverage and uncertainty control in the current experimental setting. Candidate detections satisfying both thresholds are retained, while low-confidence or high-uncertainty predictions are discarded. NMS is then applied to remove redundant overlapping boxes, and the final retained boxes are used as pseudo-labels for subsequent semi-supervised training. As for the localization quality of candidate boxes, the bounding box regression optimization design in the baseline detector is retained in this paper.
During the training process, pseudo-labels are not generated once at the initial stage and then fixed for use, but are dynamically updated with the update of model parameters. Specifically, this paper adopts a single-model pseudo-label updating strategy, that is, at the beginning of each training epoch, the current model is used to make predictions on unlabeled data, and the candidate detection results are filtered according to the confidence threshold and the entropy threshold , thereby generating the corresponding pseudo-label set. As the model performance gradually improves, the quality of the generated pseudo-labels also improves, which helps to further enhance the effective utilization of unlabeled data during training. Considering that this strategy may be affected by the accumulation of pseudo-label errors, this paper further combines consistency regularization and relatively mild data perturbations to constrain the prediction results of unlabeled samples under different input conditions, thereby reducing, to a certain extent, the adverse impact of low-quality pseudo-labels on subsequent training. After the above filtering, the retained pseudo-labeled samples are used together with labeled samples in subsequent training, allowing unlabeled data to be incorporated into the model optimization process in the form of auxiliary supervision.
3.4. Consistency Regularization
The fundamental idea of consistency regularization is that, for the same unlabeled image, the model should output predictions that are as consistent as possible after applying different perturbations or augmentations. Based on this idea, this paper applies perturbation augmentation to unlabeled images, including common data augmentation operations such as scaling, cropping, and flipping. Considering that targets in blood cell images are generally small in scale and densely distributed, the magnitude of augmentation is controlled within a range that preserves the overall structural integrity of the cells, so as to avoid significant distortion of morphological features caused by overly strong transformations. Unlike methods such as STAC and FixMatch, which explicitly distinguish between weak and strong augmentations, this paper does not construct a two-level augmentation strategy. Instead, a single and relatively mild perturbation is adopted to construct consistency sample pairs, making it more suitable for the characteristics of blood cell detection tasks. The model is required to maintain stable predictions on the original unlabeled images and their augmented versions, with the image processing results shown in
Figure 3.
To constrain the consistency of the prediction results between the original unlabeled image and its perturbed augmented version, let the unlabeled image be
and its perturbed augmented version be
. The consistency loss is defined as
where
denotes the output of the detection model, and
denotes the discrepancy measure between the prediction results of the original unlabeled image
and its perturbed augmented image
. In this paper,
is specifically implemented using the cross-entropy loss and is applied to their class probability distributions to measure the consistency of the category prediction results before and after perturbation. For the original unlabeled image
and its augmented image
, this paper establishes the correspondence between their predictions by recording the geometric transformation parameters. For geometric transformations such as scaling, cropping, and flipping, while transforming the input images, the coordinates of the candidate bounding boxes are synchronously mapped, thereby ensuring the spatial alignment of the prediction results before and after augmentation. On this basis, predictions are performed on the original image and the augmented image, respectively, and consistency constraints are imposed on the aligned class prediction results so as to improve the reliability of model predictions under input perturbations and more effectively utilize unlabeled data.
Furthermore, for labeled samples, this paper still adopts the standard supervised training approach of YOLOv5-ALT. Let the supervised loss on labeled samples be
, which can be expressed as
where
,
, and
denote the bounding box regression loss, object confidence loss, and classification loss on labeled samples, respectively.
For pseudo-labeled samples, the same loss formulation consistent with the detection task is adopted for optimization. Let the loss on pseudo-labeled samples be
, which can be expressed as
where
,
, and
denote the bounding box regression loss, object confidence loss, and classification loss on pseudo-labeled samples, respectively.
Integrating the above processes of supervised training, pseudo-label learning, and consistency regularization, the total loss function of the CRS-YOLOv5-ALT model is defined as
where
and
denote the weight coefficients of the pseudo-label loss term and the consistency regularization loss term, respectively, which are used to balance the contributions of different loss components in the overall training process. To reduce the impact of differences in sample size and loss scale on the training process, each loss term is averaged over its corresponding sample set before weighted summation, that is,
is averaged over labeled samples,
is averaged over pseudo-labeled samples, and
is averaged over consistency sample pairs. Through the above processing, the inconsistency in scale among different loss terms can be alleviated to a certain extent, thereby improving the stability of the training process.
Based on this loss function design, the model can utilize a small amount of labeled data to provide reliable supervision, while combining high-quality pseudo-labels to mine useful information from unlabeled data and further exploit the information in perturbed samples through consistency regularization. This optimization strategy, to some extent, draws on the idea of dynamic task weight allocation in related studies [
27], thereby enabling the proposed method to achieve more effective blood cell detection under low-annotation conditions.
3.5. Mixup Augmentation Strategy
To further enhance the diversity of training samples, this paper introduces the Mixup data augmentation strategy in the subsequent training process. On the one hand, Mixup expands the training data distribution by linearly combining samples, thereby alleviating the overfitting problem under small-sample conditions; on the other hand, appropriate sample interpolation helps improve the generalization capability of the model. Considering that the traditional Mixup method is mainly designed for image classification tasks, while object detection tasks involve image content, bounding box locations, and category labels simultaneously, this paper makes corresponding adjustments to its usage in the detection scenario.
Specifically, let the two input images be denoted as
and
, and their corresponding bounding box sets and category label sets be denoted as
and
, respectively. After Mixup processing, the mixed image is expressed as
where
is the mixing coefficient that follows a Beta distribution, i.e.,
where
is set to 0.2 to control the mixing intensity of the two samples. A smaller
helps avoid significant damage to the structural information of small blood cell targets caused by overly strong mixing, thereby making it more suitable for the image characteristics of blood cell detection tasks, in which targets are small in scale and densely distributed.
For the mixed supervision information, this paper does not perform linear interpolation on the bounding box coordinates but retains all valid target annotations from the two input images. Accordingly, the bounding box set corresponding to the mixed sample is denoted as
The category label set is denoted as
When the input images contain multiple targets, the mixed sample retains all target bounding boxes and their corresponding category information from both images. For cases such as local bounding box overlap, duplicate bounding boxes, or coexistence of targets from different categories caused by image superposition, no additional bounding box fusion is performed in this paper. Instead, these are retained as valid supervision information in the mixed sample and are learned and distinguished by the detection model during training. During the loss calculation stage, the supervision terms from the two input images are weighted by and , respectively, to realize Mixup training. This strategy enhances sample diversity while avoiding invalid or inconsistent annotations caused by unreasonable linear combinations of bounding box coordinates.
To facilitate the description of the overall training process of the proposed semi-supervised detection method, Algorithm 1 presents the main training procedure of the model, including key components such as pseudo-label generation, consistency regularization, and Mixup data augmentation. The outer loop corresponds to the dynamic updating of pseudo-labels in each training epoch, while the inner loop corresponds to parameter optimization based on mini-batches. Through this process, labeled data and unlabeled data can jointly participate in model training, thereby improving the detection performance of the model under low-annotation conditions.
In the proposed framework, Mixup is used as an independent data augmentation module during semi-supervised training. It is complementary to dual-threshold pseudo-label filtering and consistency regularization. Specifically, pseudo-label filtering mainly improves the reliability of auxiliary supervision from unlabeled samples, consistency regularization enhances prediction stability under input perturbations, and Mixup further enriches the training sample distribution. Therefore, these components work at different stages of the semi-supervised learning process and jointly improve model performance under low-annotation conditions.
| Algorithm 1: Semi-supervised object detection method |
Input: Labeled Dataset , Unlabeled Dataset , Training Epochs , Confidence threshold , Entropy Threshold , Loss Weights , Output: Model Parameters ; 1: Initialize Model Parameters 2: Perform initial supervised training on the labeled dataset to obtain initial parameters 3: 4: for q = 1 to Q do 5: Use the Current Model to make predictions on the unlabeled dataset 6: Calculate the Confidence and Prediction Entropy for Each Candidate Detection Box 7: Retain Detection Results Satisfying and 8: Apply NMS to remove redundant overlapping boxes 9: Obtain the Pseudo-Label Dataset 10: Apply Random Perturbations to Samples in to Construct Consistency Sample Pairs 11: Calculate the Consistency Loss 12: for each mini-batch do 13: Sample data from and and Perform Mixup augmentation 14: Obtain the Mixed dataset 15: Construct the Training Set 16: Calculate the Total Loss 17: Update the model parameters using the training set 18: end for 19: end for 20: 21: return |
4. Experimental Results and Analyses
To verify the effectiveness of the proposed method in blood cell detection tasks, this section describes the experimental setup from four aspects, including the experimental environment and parameter settings, experimental datasets, evaluation metrics, and experimental result analysis. Under unified experimental conditions, the proposed method is further evaluated in terms of detection performance and practical application value in low-annotation scenarios by comparing it with multiple semi-supervised and fully supervised methods, together with ablation experiments.
4.1. The Experimental Environment and Parameter Settings
The experiments were conducted on a workstation equipped with an Intel(R) Core(TM) i7-8700 CPU (Intel Corporation, Santa Clara, CA, USA), 16 GB RAM, and an NVIDIA GeForce GTX 1080 GPU (NVIDIA Corporation, Santa Clara, CA, USA), with Windows 10 (64-bit) (Microsoft Corporation, Redmond, WA, USA) as the operating system. The entire experiment was implemented using Python 3.7 (Python Software Foundation, Wilmington, DE, USA), PyTorch 1.12 (Meta Platforms, Inc., Menlo Park, CA, USA), and CUDA 11.3 (NVIDIA Corporation, Santa Clara, CA, USA). The model was trained using the stochastic gradient descent (SGD) optimizer, with a batch size of 4, a momentum of 0.937, an initial learning rate of 0.01, and a weight decay of 0.0001. The training process was conducted for 150 epochs.
In this paper, YOLOv5-ALT is adopted as the baseline detection model. When the input image resolution is 640 × 480, the model has 1.67M parameters, 4.2 GFLOPs of computational cost, and a model size of 3.7 MB. The model training is implemented based on the YOLOv5 framework and follows its standard training strategies, including Mosaic data augmentation, automatic anchor box mechanism, and learning rate warm-up strategy. The remaining training parameters adopt the default settings of YOLOv5 to ensure comparability with existing methods.
Considering the hardware conditions and the characteristics of blood cell images, where targets are small in scale and densely distributed, the batch size is set to 4 to ensure stable training under the input resolution of 640 × 480. Although a smaller batch size may lead to increased fluctuation in gradient estimation, all comparative experiments in this paper are conducted under a unified experimental environment and consistent key hyperparameter settings, with only the model structure or training strategy of different methods being replaced for comparison, thereby ensuring, as much as possible, the fairness of comparison and the stability of the training process.
4.2. Experimental Dataset
Experiments are conducted on the BCCD (Blood Cell Count and Detection) dataset in this paper. This dataset contains three types of blood cell targets, namely red blood cells, white blood cells, and platelets, with standardized bounding box annotations, and has been widely used for performance validation of blood cell detection and related improved models. As a public benchmark, the BCCD dataset is suitable for validating the effectiveness of the proposed semi-supervised detection method. To ensure consistency with the experimental settings of existing semi-supervised object detection methods (such as STAC, Instant Teacher, DSL, and FixMatch), the BCCD dataset is divided into labeled data and unlabeled data, so as to evaluate the detection performance of semi-supervised algorithms under different labeling ratios.
Specifically, the BCCD dataset is first randomly divided into a training set and a validation set at a ratio of 8:2, with the training set used for model training and the validation set used for performance evaluation. Subsequently, samples are randomly drawn from the training set at ratios of 1%, 5%, and 10% as labeled data, with the remaining training samples serving as unlabeled data, thereby constructing three sets of semi-supervised training data under different labeling ratios. The model is trained under each of the above three settings, and the detection performance is uniformly evaluated on the validation set.
The above division method effectively simulates the practical scenario of “a small amount of labeled data + a large amount of unlabeled data” in medical image scenarios, while ensuring fairness and comparability between the proposed method and the comparison methods. It should be noted that the unlabeled data are all derived from the remaining samples of the training set, rather than being additionally introduced from external data. Therefore, the experimental results are primarily used to validate the effectiveness of the proposed method on the BCCD public low-annotation benchmark. It should be pointed out that the cell images in the BCCD dataset are mostly cropped samples, and compared with full-field blood cell images in real clinical scenarios, their scene complexity is relatively low. Therefore, the experimental results in this paper mainly serve as a feasibility validation of the proposed method on a public benchmark, and a more comprehensive evaluation on more challenging clinical image data is still needed in future work.
4.3. Evaluation Indicators
In the experiments, the performance of the model was evaluated according to the evaluation index given by target detection, specifically including
(mAP),
, and
. Among them,
represents the mean value of the average precision
AP of each class when the IOU threshold is in the range of 0.5 to 0.95, and every 0.05 step is used to divide the IOU threshold value;
represents the class average precision
AP when the IOU threshold is 0.5;
represents the class average precision
AP when the IOU threshold is 0.75. All three evaluation indexes are positively correlated with the detection effect.
can better reflect the positioning effect of the network than
.
where
TP represents the number of detection frames whose IOU is greater than the specified threshold;
FP indicates the number of detection frames whose IOU is less than the specified threshold; and
FN represents the number of true boxes that missed detection.
4.4. Experiments and Analyses
4.4.1. Comparison Experiments Between the Proposed Algorithm and Other Semi-Supervised Algorithms
To evaluate the effectiveness of the proposed CRS-YOLOv5-ALT under low-annotation conditions, comparison experiments were conducted on the BCCD dataset under a 5% annotation ratio. The proposed method was compared with representative semi-supervised object detection methods, including STAC, Instant Teacher, and CSD. To mitigate the influence of random fluctuations from a single experiment on result evaluation, the main comparison methods were independently repeated three times under the same experimental settings, and the results were reported in the form of mean ± standard deviation (mean ± std). During the repeated experiments, only the random seed was changed, while all other training parameters remained consistent.
As shown in
Table 1, under the 5% annotation ratio on the BCCD dataset, CRS-YOLOv5-ALT achieves 59.73 ± 0.16%, 91.97 ± 0.13%, and 65.82 ± 0.19% in mAP, mAP50, and mAP75, respectively. Compared with the best comparison method, CSD, the three metrics are improved by 5.57, 1.85, and 3.24 percentage points, respectively. These results indicate that the proposed method achieves better detection performance than representative semi-supervised object detection methods under low-annotation conditions.
Meanwhile, the standard deviations of all metrics for CRS-YOLOv5-ALT are relatively small, indicating that the performance improvement is not mainly caused by random fluctuations from a single experiment. To further present the run-to-run variation in the proposed method, the detailed results of three independent runs of CRS-YOLOv5-ALT are provided in
Table 2. The results show that the three runs obtain close performance values, further demonstrating the experimental consistency and robustness of the proposed method.
In addition to the numerical comparison results, the training convergence curves of CRS-YOLOv5-ALT on the BCCD dataset under the 5% annotation ratio are shown in
Figure 4. The curves include training losses, validation losses, precision, recall, mAP50, and mAP during training. It can be observed that the main losses on both the training set and the validation set decrease overall and gradually stabilize, while precision, recall, mAP50, and mAP increase overall and remain stable in the later stage. These results indicate that the model training process can converge normally without obvious performance degradation, demonstrating that the training process maintains good stability after introducing the semi-supervised strategy.
Overall, CRS-YOLOv5-ALT demonstrates higher detection accuracy and better result consistency in the main comparison experiments, providing quantitative support for its effectiveness in low-annotation blood cell detection tasks.
4.4.2. Quantitative Analysis of Pseudo-Labels
Since the proposed method relies on pseudo-labels to utilize unlabeled samples, the quantity, evolution, acceptance rate, and quality of pseudo-labels are important for evaluating the reliability of semi-supervised training. To further analyze the role of the dual-threshold pseudo-label filtering strategy, this section reports the number of candidate detection boxes, the number of retained pseudo-labels, the acceptance rate after filtering, the class-wise pseudo-label evolution, and the confidence-entropy-based quality metrics during training. The analysis was conducted on the BCCD dataset under the 5% annotation ratio setting. The BCCD dataset contains 364 images and 4888 annotated instances. It was divided into a training set and a validation set at a ratio of 8:2. Subsequently, 5% of the training images were randomly selected as labeled data, and the remaining training images were used as unlabeled data for pseudo-label generation and semi-supervised training.
Table 3 reports the variation in the number of candidate detection boxes, retained pseudo-labels, and acceptance rate at different training epochs. With the progression of training, the number of candidate detection boxes increases from 4620 at epoch 1 to 5546 at epoch 150. After dual-threshold filtering, the number of retained pseudo-labels increases from 1285 to 2568, and the acceptance rate increases from 27.81% to 46.30%.
As shown in
Table 3, the number of retained pseudo-labels and the acceptance rate both increase as training proceeds. In the early stage of training, the predictions on unlabeled samples still contain a relatively large number of candidate boxes with low confidence or high uncertainty; therefore, the number of retained pseudo-labels after dual-threshold filtering is limited. As the detection capability of the model gradually improves, more candidate boxes satisfy both the confidence and prediction entropy constraints, resulting in an overall upward trend in the number of retained pseudo-labels and acceptance rate. In the later training stage, the increase becomes more gradual, indicating that the pseudo-label generation process tends to stabilize. These results show that the dual-threshold filtering strategy can control the introduction of low-quality candidate boxes while progressively expanding the utilization scale of valid supervisory information from unlabeled samples.
Figure 5 further presents the class-wise evolution of retained pseudo-labels during training.
As shown in
Figure 5, the number of retained pseudo-labels increases for all three categories as training proceeds. Specifically, the number of retained pseudo-labels for RBC increases from 930 at epoch 1 to 1908 at epoch 150, the number for WBC increases from 245 to 463, and the number for platelets increases from 110 to 197. Among the three categories, RBC has the largest number of retained pseudo-labels, followed by WBC and platelets. This trend is generally consistent with the sample distribution characteristics of the BCCD dataset. The gradual increase and later stabilization of pseudo-labels across categories indicate that the proposed method can progressively exploit pseudo-label information from unlabeled samples during training.
In addition to the number of retained pseudo-labels, confidence and prediction entropy were used as indirect indicators to estimate pseudo-label reliability.
Figure 6 shows the variation in these two reliability-related indicators during training.
As shown in
Figure 6, with the progression of training, the average confidence of the retained pseudo-labels increases from 0.70 to 0.89, while the average prediction entropy decreases from 0.36 to 0.18. This indicates that the retained pseudo-labels gradually become associated with higher model confidence and lower prediction uncertainty during training. It should be emphasized that confidence and entropy cannot directly verify pseudo-label correctness and cannot replace manual annotation-based evaluation of pseudo-label accuracy. They are used here only as indirect quantitative indicators of pseudo-label reliability. The simultaneous increase in confidence and decrease in entropy suggest that the retained pseudo-labels exhibit a more favorable confidence-uncertainty profile as the model improves, but this should not be interpreted as direct verification of pseudo-label correctness.
Overall, the results in
Table 3 and
Figure 5 and
Figure 6 provide quantitative descriptions of the pseudo-label generation process from three perspectives: the number of retained pseudo-labels, the class-wise evolution of pseudo-labels, and confidence- and entropy-based reliability estimation. These results suggest that the proposed pseudo-label filtering strategy can control low-confidence and high-uncertainty candidate predictions while progressively increasing the amount of auxiliary supervision obtained from unlabeled samples. However, these analyses should be interpreted as indirect evidence of pseudo-label reliability rather than direct validation of pseudo-label correctness.
4.4.3. Single-Category Comparison
To further analyze the category-wise detection performance of the proposed method under different annotation ratios, CRS-YOLOv5-ALT was compared with the classical semi-supervised object detection method CSD on RBC, WBC, and platelets. The experiments were conducted under 1%, 5%, and 10% annotation ratios on the BCCD dataset.
Table 4 and
Table 5 present the single-category detection results of CRS-YOLOv5-ALT and CSD, respectively.
As shown in
Table 4 and
Table 5, CRS-YOLOv5-ALT achieves better overall performance than CSD under different annotation ratios. In terms of the overall metric “All”, CRS-YOLOv5-ALT obtains mAP values of 56.61%, 59.73%, and 59.91% under 1%, 5%, and 10% annotation ratios, respectively, which are higher than the corresponding CSD results of 54.49%, 53.32%, and 53.30%. Similar advantages can also be observed in mAP50 and mAP75. This indicates that the proposed method can effectively utilize unlabeled samples and improve overall detection performance under low-annotation conditions.
Further analysis of the category-wise results shows that the proposed method performs particularly well on RBC and WBC. Under the 5% annotation ratio, CRS-YOLOv5-ALT achieves 59.29%, 91.36%, and 64.08% on RBC in terms of mAP, mAP50, and mAP75, respectively, which are higher than the corresponding CSD results. For WBC, CRS-YOLOv5-ALT also achieves clear improvements, with mAP, mAP50, and mAP75 reaching 68.41%, 94.07%, and 70.25%, respectively. These results indicate that the proposed method can effectively improve the detection performance of major blood cell categories under low-annotation conditions.
It should be noted that the single-category detection results do not exhibit a strictly monotonic increasing trend with the increase in annotation ratio. For example, in the platelets category, the mAP of the proposed method decreases from 44.37% under the 1% annotation ratio to 42.11% under the 5% annotation ratio, and then increases to 43.07% under the 10% annotation ratio. This phenomenon may be attributed to multiple factors. First, platelet targets are small in size and have relatively limited feature representation, making them inherently more difficult to detect. Second, in the semi-supervised learning process, pseudo-labels may inevitably introduce noise, which can interfere with the learning of difficult categories. Third, the proportion of platelet samples in the dataset is relatively low, which may also affect the sufficient learning of their features by the model. Therefore, changes in detection performance are influenced not only by the annotation ratio, but also by category characteristics, sample composition, and pseudo-label quality. Similar complexity has also been reported in other data-driven decision-making tasks, where multiple interacting factors may lead to nonlinear changes in model performance [
29].
Although the proposed method does not always achieve the highest mAP for platelets, it obtains higher mAP50 and mAP75 than CSD under the 5% and 10% annotation ratios. This suggests that CRS-YOLOv5-ALT still improves the localization-related detection performance of platelets to some extent, even though the overall mAP of this small and less-represented category remains challenging. This observation is consistent with the pseudo-label analysis in
Section 4.4.2, where the pseudo-label generation process is shown to be affected by category characteristics and training dynamics.
From the perspective of the method mechanism, the introduction of dual-threshold pseudo-label filtering helps reduce the influence of low-confidence and high-uncertainty candidate predictions, while consistency regularization improves the stability of model predictions under perturbation conditions. Based on the results in
Table 4 and
Table 5, CRS-YOLOv5-ALT outperforms CSD in the overall metrics and in most major category metrics under the current experimental settings, thereby further validating the effectiveness of the proposed method in low-annotation semi-supervised blood cell detection tasks.
4.4.4. Comparison with Fully Supervised Algorithms
To further evaluate the practical application value of the proposed method, CRS-YOLOv5-ALT was compared with several fully supervised object detection algorithms on the BCCD dataset. In this experiment, the same labeled training set as that used in fully supervised training was adopted, while unlabeled data were additionally introduced into the semi-supervised framework. This comparison was conducted to analyze whether the proposed semi-supervised strategy can still bring additional benefits when labeled data are relatively sufficient.
As shown in
Figure 7 and
Table 6, under fully supervised training conditions, the detection performance of YOLOv5-ALT and the proposed CRS-YOLOv5-ALT on the BCCD dataset is relatively close. Specifically, the mAP50 of YOLOv5-ALT is 97.40%, while that of CRS-YOLOv5-ALT is 97.58%. Compared with other fully supervised detection algorithms, including Faster-RCNN, SSD, YOLOv3, and YOLOv4, both YOLOv5-ALT and CRS-YOLOv5-ALT achieve higher detection performance on the BCCD dataset. This indicates that the YOLOv5-ALT baseline already has strong detection capability under sufficient annotation conditions, and the proposed semi-supervised framework does not degrade its performance.
The results show that when labeled data are relatively sufficient, the performance of the baseline YOLOv5-ALT has already reached a high level, and the additional information provided by unlabeled data leads to only a slight improvement. Therefore, the main advantage of the proposed method does not lie in further improving performance under high-annotation conditions, but in effectively utilizing unlabeled samples when labeled data are insufficient. This conclusion is consistent with the low-annotation experiments in
Section 4.4.1 and
Section 4.4.3, where CRS-YOLOv5-ALT shows more obvious advantages under limited labeled data.
It is worth noting that the value of semi-supervised learning becomes more significant in scenarios with higher complexity and scarcer annotations. In complex multi-task environments, such as digital twin-driven human–robot collaboration systems [
30], the ability to utilize unlabeled data is important for adapting to diverse and dynamic conditions. Similarly, in more challenging whole-slide blood cell images and clinical applications with limited annotations, the proposed semi-supervised method is expected to have greater practical value. Therefore, the practical significance of CRS-YOLOv5-ALT is mainly reflected in low-annotation blood cell detection scenarios, where it can reduce manual annotation costs while maintaining stable detection performance.
4.4.5. Ablation Experiments
To analyze the influence of each component in CRS-YOLOv5-ALT on detection performance, ablation experiments were conducted on the BCCD dataset under the 5% annotation ratio. The proposed method mainly includes three components: Mixup, dual-threshold pseudo-label filtering (DPF), and consistency regularization (CR). YOLOv5-ALT trained with 5% labeled data was used as the low-annotation baseline model, and the effects of different components and their combinations were evaluated. The results are shown in
Table 7. It should be noted that the ablation experiments in
Table 7 were conducted as single-run diagnostic experiments under the same data split and training settings. Therefore, these results are used mainly to analyze the relative contribution trends of different components rather than to provide statistically conclusive evidence for every individual configuration. The statistical consistency of the complete CRS-YOLOv5-ALT framework has been evaluated separately in the main comparison experiments through three independent runs, as reported in
Table 1 and
Table 2.
As shown in
Table 7, when only 5% labeled data are used for training, the performance of YOLOv5-ALT decreases significantly compared with the fully supervised setting. Specifically, the mAP decreases from 62.82% under the 100% labeled-data setting to 54.97% under the 5% labeled-data setting. This result indicates that insufficient labeled samples limit the feature learning capability and detection performance of the model. Therefore, it is necessary to further utilize effective information from unlabeled data to improve detection performance under low-annotation conditions.
From the single-component ablation results, after Mixup, DPF, and CR are introduced separately, the mAP increases from 54.97% to 55.34%, 55.46%, and 55.63%, respectively. These results suggest that each component may have a positive contribution to detection performance under the current experimental setting. However, because the isolated gains are relatively small, these single-component results are regarded as indicative trends rather than statistically conclusive evidence. Among them, Mixup expands the training data distribution through sample mixing, which helps alleviate overfitting under limited labeled data. DPF retains relatively reliable candidate prediction results through confidence and prediction entropy constraints, thereby providing auxiliary supervision information for unlabeled samples. CR improves the stability of the model against input perturbations by constraining prediction consistency between original images and perturbed images. However, the performance improvement obtained by using each component alone is relatively limited, suggesting that relying on a single strategy is insufficient to fully exploit unlabeled data.
Further analysis of the two-component results suggests a complementary trend among different components. After introducing Mixup and DPF together, the mAP reaches 55.84%. After introducing Mixup and CR together, the mAP reaches 56.02%. Both results are higher than the corresponding single-component results, indicating that sample distribution expansion can jointly improve the training effect together with pseudo-label supervision or consistency constraints. In contrast, when DPF and CR are introduced simultaneously, the mAP reaches 57.93%, achieving the best result among the two-component settings. This result suggests that pseudo-label filtering and consistency regularization may be complementary under the current experimental setting, because DPF helps retain candidate predictions with relatively higher confidence and lower uncertainty, while CR further encourages the model to maintain stable predictions under input perturbations.
When Mixup, DPF, and CR are jointly introduced, the complete CRS-YOLOv5-ALT achieves the best performance, with mAP50, mAP75, and mAP reaching 91.97%, 65.82%, and 59.73%, respectively. Compared with the YOLOv5-ALT (5%) baseline model, the mAP50, mAP75, and mAP are improved by 2.34, 3.24, and 4.76 percentage points, respectively. Compared with the best two-component setting, namely the combination of DPF and CR, the complete method further improves mAP by 1.80 percentage points. These results suggest that the performance improvement of CRS-YOLOv5-ALT is more likely related to the combined effect of sample distribution expansion, pseudo-label candidate filtering, and prediction consistency constraints, rather than to a single component alone.
Overall, the ablation results provide diagnostic evidence for the usefulness and potential complementarity of the three components under the current setting. Mixup improves sample diversity, DPF improves pseudo-label reliability, and CR improves prediction stability. Their combination enables the model to more effectively utilize unlabeled data and improves semi-supervised blood cell detection performance under low-annotation conditions.
4.4.6. Generalization Experiments
To further evaluate the applicability and generalization ability of the proposed method on different datasets, generalization experiments were conducted on the TXL-PBC blood cell dataset. The TXL-PBC dataset is a publicly available and expert-annotated blood cell image dataset containing a total of 1440 images. It includes different types of blood cell images, such as white blood cells, red blood cells, and platelets, with each image annotated with cell types and their corresponding states, making it suitable for blood cell detection and classification tasks.
Under the setting of 5% labeled data, CRS-YOLOv5-ALT was compared with representative semi-supervised object detection methods, including STAC, Instant Teacher, and CSD, as well as the baseline model YOLOv5-ALT. The experimental results are shown in
Table 8. It should be noted that the TXL-PBC experiment was conducted as a single-run supplementary evaluation under the same 5% labeled-data setting. Therefore, the results are used as supplementary evidence for cross-dataset applicability rather than as definitive validation of generalization performance.
As shown in
Table 8, CRS-YOLOv5-ALT obtains competitive results among the compared methods on the TXL-PBC dataset under this single-run supplementary setting. Specifically, the proposed method obtains 72.95%, 96.16%, and 85.18% in mAP, mAP50, and mAP75, respectively. Compared with the baseline YOLOv5-ALT, CRS-YOLOv5-ALT improves mAP, mAP50, and mAP75 by 2.53, 1.58, and 2.81 percentage points, respectively. Compared with the best semi-supervised comparison method, CSD, the proposed method improves the three metrics by 3.72, 2.20, and 8.97 percentage points, respectively.
These results suggest that CRS-YOLOv5-ALT can maintain competitive detection performance on another public blood cell dataset under the current experimental setting. The improvement over YOLOv5-ALT suggests that the introduced semi-supervised learning strategy can further enhance the baseline detector by utilizing unlabeled data. In addition, the comparison with STAC, Instant Teacher, and CSD provides supplementary evidence that the combination of dual-threshold pseudo-label filtering, consistency regularization, and Mixup augmentation may also be useful under the TXL-PBC dataset setting.
Overall, the TXL-PBC experiment provides supplementary evidence that CRS-YOLOv5-ALT may be applicable to another public blood cell dataset under low-annotation conditions. However, since both BCCD and TXL-PBC are public benchmark datasets and are less complex than real clinical full-field blood cell images, these results should be interpreted as preliminary cross-dataset evidence rather than definitive validation of generalization ability. Further evaluation on larger-scale and more clinically diverse blood cell images is still needed.
5. Conclusions
To address the problems of high manual annotation cost and insufficient utilization of unlabeled data in blood cell detection, this paper proposes a semi-supervised blood cell detection method named CRS-YOLOv5-ALT. The proposed method uses YOLOv5-ALT as the baseline detection framework and integrates dual-threshold pseudo-label filtering, consistency regularization, and Mixup data augmentation. By combining a small number of labeled samples with unlabeled samples, CRS-YOLOv5-ALT aims to improve blood cell detection performance under low-annotation conditions while reducing the dependence on large-scale manually annotated data.
The experimental results on the BCCD dataset demonstrate that CRS-YOLOv5-ALT achieves better performance than representative semi-supervised object detection methods under the 5% annotation ratio. Specifically, the proposed method achieves 59.73 ± 0.16%, 91.97 ± 0.13%, and 65.82 ± 0.19% in mAP, mAP50, and mAP75, respectively, outperforming STAC, Instant Teacher, and CSD. Compared with the best comparison method, CSD, the proposed method improves the three metrics by 5.57, 1.85, and 3.24 percentage points, respectively. The detailed results of three independent runs further show that the performance of CRS-YOLOv5-ALT is relatively stable and is not mainly caused by random fluctuations from a single experiment.
The quantitative pseudo-label analysis further supports the effectiveness of the proposed semi-supervised strategy. During training, the number of retained pseudo-labels increases from 1285 at epoch 1 to 2568 at epoch 150, and the acceptance rate increases from 27.81% to 46.30%. The class-wise pseudo-label evolution shows that the number of retained pseudo-labels for RBC, WBC, and platelets gradually increases and tends to stabilize in the later training stage. In addition, the average confidence of retained pseudo-labels increases from 0.70 to 0.89, while the average prediction entropy decreases from 0.36 to 0.18. These results suggest that the proposed dual-threshold filtering strategy can progressively retain pseudo-label candidates with higher confidence and lower prediction uncertainty, thereby improving the utilization of unlabeled samples. However, confidence and entropy should be regarded as indirect indicators rather than direct verification of pseudo-label correctness.
The training convergence curves show that the main losses on both the training set and the validation set decrease overall and gradually stabilize, while precision, recall, mAP50, and mAP increase and remain stable in the later stage. This indicates that the proposed semi-supervised training process can converge normally without obvious performance degradation. The ablation experiments provide diagnostic evidence for the contribution trends and potential complementarity of different components. Since the isolated gains of single components are relatively small and were obtained from single-run ablation settings, these results mainly indicate possible contribution trends rather than statistically conclusive effects of each individual component. When Mixup, dual-threshold pseudo-label filtering, and consistency regularization are introduced separately, the detection performance is improved to different degrees. When the three components are jointly introduced, CRS-YOLOv5-ALT achieves the best performance, with mAP50, mAP75, and mAP reaching 91.97%, 65.82%, and 59.73%, respectively. These results show that the performance improvement is not caused by a single component alone, but by the coordinated effect of sample distribution expansion, pseudo-label quality control, and prediction consistency constraints.
The comparison with fully supervised algorithms shows that, when labeled data are relatively sufficient, YOLOv5-ALT already achieves strong detection performance, and the additional gain brought by unlabeled data becomes limited. Specifically, CRS-YOLOv5-ALT achieves an mAP50 of 97.58%, which is only slightly higher than the 97.40% obtained by YOLOv5-ALT. This result suggests that the main advantage of the proposed method lies in low-annotation scenarios rather than in further improving performance when labeled data are already sufficient. In addition, the generalization experiments on the TXL-PBC dataset show that CRS-YOLOv5-ALT achieves 72.95%, 96.16%, and 85.18% in mAP, mAP50, and mAP75, respectively, outperforming YOLOv5-ALT by 2.53, 1.58, and 2.81 percentage points. This provides supplementary evidence for the potential cross-dataset applicability of the proposed method, but should not be regarded as definitive validation of generalization ability because the TXL-PBC evaluation was conducted as a single-run supplementary experiment.
Overall, CRS-YOLOv5-ALT provides a feasible semi-supervised learning scheme for blood cell detection under low-annotation conditions. It can reduce the dependence on manual annotation while maintaining favorable detection performance. However, this study still has several limitations. First, the experiments are mainly conducted on public blood cell datasets, and the scale of unlabeled data is still limited. Second, BCCD and TXL-PBC are less complex than real clinical full-field blood cell images, and the applicability of the proposed method in more complex clinical scenarios requires further validation. Third, small and less-represented categories such as platelets still show performance fluctuations, indicating that pseudo-label generation and feature learning for difficult categories need further improvement. In future work, larger-scale unlabeled data and more complex clinical blood cell images will be used to further evaluate the proposed method. More advanced pseudo-label correctness evaluation mechanisms, data augmentation strategies, repeated-run validation on additional datasets, and semi-supervised training methods will also be explored to improve the robustness, cross-dataset applicability, and practical application value of the model.