Automated Blood Cell Detection and Counting Based on Improved Object Detection Algorithm

Liu, Zeyu; Yuan, Dan; Zhu, Guohun

doi:10.3390/math13183023

Open AccessFeature PaperArticle

Automated Blood Cell Detection and Counting Based on Improved Object Detection Algorithm

by

Zeyu Liu

¹

,

Dan Yuan

^2,*

and

Guohun Zhu

^1,*

¹

School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane, QLD 4072, Australia

²

School of Mechanical and Mining Engineering, The University of Queensland, Brisbane, QLD 4072, Australia

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(18), 3023; https://doi.org/10.3390/math13183023

Submission received: 13 August 2025 / Revised: 10 September 2025 / Accepted: 11 September 2025 / Published: 18 September 2025

Download

Browse Figures

Versions Notes

Abstract

Blood cell detection and enumeration play a crucial role in medical diagnostics. However, traditional methods often face limitations in accurately detecting smaller or overlapping cells, which can result in misclassifications and reduced reliability. To overcome these challenges related to detection failures and classification inaccuracies, this study presents an enhanced YOLO-based algorithm, specifically designed for blood cell detection, referred to as YOLO-BC. This novel approach aims to improve both detection precision and classification accuracy, particularly in complex scenarios where cells are difficult to distinguish due to size variability and overlapping. The Effective Multi-Scale Attention (EMSA) is integrated into the C2f module, enhancing feature maps by applying attention across multiple scales to refine the representation of blood cell features. Omni-Dimensional Dynamic Convolution (ODConv) is employed to replace the standard convolution module, adaptively combining kernels from multiple dimensions to improve feature representation for diverse blood cell types. For the experiments, the BCCD (Blood Cell Count and Detection) dataset is utilized, alongside data augmentation techniques. In terms of experimental evaluation, YOLO-BC outperforms YOLOv8 with a 3.1% improvement in mAP@50, a 3.7% increase in mAP@50:95, and a 2% increase in F1-score, all based on the same dataset and IoU parameters. Notably, small objects such as platelets are also detected with high accuracy. These findings highlight the effectiveness and potential clinical applicability of YOLO-BC for automated blood cell detection.

Keywords:

blood cell detection; object detection; YOLOv8; efficient multi-scale attention; omni-dimensional dynamic convolution; automated

MSC:

68U10

1. Introduction

In the field of medicine, the diagnosis of patients’ medical conditions can be made by detecting the quantity of three types of blood cells (red blood cells, white blood cells and platelets) [1] in the human bloodstream. For example, an excessive number of erythrocytes may indicate vascular obstruction, while a deficiency may suggest anemia. The quantity of leukocytes can provide indications of bacterial invasion [2]. Diagnosing acute myeloid leukemia and related neoplasms in adults is challenging [3], due to the need for integrating clinical, morphological, immunophenotypic, cytogenetic, and molecular genetic data.

Traditional blood cell detection methods often rely on manual inspection by professionals using microscopes and other devices [4], which is both labor-intensive and resource-demanding. Traditional image processing methods, including thresholding, edge detection, and morphological operations, are commonly employed in blood cell identification. Nevertheless, these approaches often struggle with variations in erythrocyte morphology and the presence of overlapping or clustered cells, resulting in suboptimal performance.

In contrast, machine learning-based approaches, particularly deep learning methods, have shown superior accuracy in erythrocyte recognition, even in the presence of challenging cell image features.

One notable study investigated the YOLO-Dense [5], which integrates dense connections into the YOLO architecture. This adjustment enhanced detection precision, reaching an mAP@50 of 86%, where accuracy is determined by over 50% intersection between predicted and ground truth bounding boxes. The YOLO-Dense model demonstrated superior performance compared to the Faster R-CNN, underscoring the advantages of deep learning techniques in tackling the complexities associated with blood cell identification tasks.

Meanwhile, data augmentation methods have been widely applied in the field of medical imaging. Chlap et al. [6] explores the application of deep learning in identifying medical conditions from images and discusses the potential for augmenting medical datasets using biologically informed methods. And Generative Adversarial Networks (GANs) [7] was used for generating synthetic medical images that respect biological constraints, improving CNN performance in liver lesion classification. Akhil et al. [8] discusses the use of deep convolutional adversarial networks (DCGANs) for synthesizing medical images with realistic biological features, contributing to the advancement of data augmentation strategies in medical imaging.

Based on the background outlined above, we propose the YOLO-BC (YOLOv8 for Blood Cell) algorithm. It is an enhanced version of the existing YOLOv8 architecture, specifically designed for blood cell detection. This improvement is achieved by incorporating an Efficient Multi-Scale Attention (EMSA) and Omni-Dimensional Dynamic Convolution (ODConv) into the backbone section. YOLO-BC demonstrates superior performance over YOLOv8, achieving a higher F1-score and outperforming YOLOv8 in both mAP@50 and mAP@50:95. The proposed YOLO-BC algorithm addresses important issues such as overlapping blood cells and enhances accuracy in detecting small cells, highlighting its potential applicability and generalizability across various blood cell detection scenarios.

There are three main contributions of this study:

(1): Dataset Augmentation for Blood Cell Variations: To mitigate the issue of limited cell samples and variability in cell shapes, a small-scale blood cell dataset is augmented using advanced digital image processing techniques, including image flipping, scaling, and brightness adjustment. This augmentation effectively expands the dataset, enriching the detection scenario and bringing it closer to real-world conditions.
(2): Integration of EMSA: An EMSA mechanism is introduced within the specific C2f module of YOLOv8, where it enhances feature maps by incorporating attention mechanisms across multiple scales. This allows the network to focus on critical regions of interest, improving detection accuracy for diverse blood cell types, especially in challenging scenarios with occluded or overlapping cells.
(3): ODConv for small cells: The conventional convolution module of YOLOv8 is replaced with an ODConv submodule, which adaptively combines convolution kernels across multiple dimensions. This modification improves the model’s ability to represent features from various blood cell types, enhancing its performance in identifying and differentiating small and irregularly shaped blood cells.

2. Related Works

In recent years, significant advancements have been made in the field of blood cell recognition. Since the introduction of LeNet [9], a series of influential architectures and technologies have emerged, driving progress in this domain. A major breakthrough occurred in the 2012 ImageNet image classification challenge [10], with the introduction of deep convolutional neural networks (CNNs), which significantly improved image recognition performance. In the 2014 ImageNet competition, VGGNet [10] advanced the field further by employing smaller convolutional kernels and deeper network layers, thereby enhancing both feature extraction and overall performance. The 2015 ImageNet competition introduced ResNet [11], which incorporated residual connections to address critical issues such as vanishing gradients and model degradation, enabling the training of much deeper networks without sacrificing performance.

Specifically, for the task of medical cell counting, researchers have proposed a series of deep learning network architectures. Faster R-CNN [12] introduces a region proposal network (RPN) and ROI pooling layer, combining object localization and classification. Recent advancements in medical imaging have seen transformer-based models gaining traction due to their superior performance over traditional CNNs in various detection tasks. For instance, Swin Transformer has been applied in the retinal vessel segmentation [13] and demonstrated state-of-the-art results, outperforming CNN-based methods in terms of accuracy and robustness. Another significant study by Zhu et al. [14] utilized a Vision Transformer (ViT) for lung cancer detection, where the model outperformed CNNs in terms of sensitivity and specificity, particularly in challenging low-resolution images. Moreover, Dosovitskiy et al. [15] proposed a hybrid model combining transformers and CNNs. And it showed significant improvement in detecting anomalies in chest X-rays, with a notable reduction in false positives compared to traditional methods.

YOLO is renowned for its real-time inference speed [16], transforming the object detection task into a regression problem to predict object positions and categories simultaneously. Firstly, YOLOv3 adopted the Darknet-53 backbone, enhancing feature extraction and refined object classification and localization accuracy. Then, YOLOv4 emphasized efficiency, introducing the CSPDarknet53 backbone, SPP (Spatial Pyramid Pooling), and PANet (Path Aggregation Network). Critically, YOLOv5 offered improvements in model training, including auto-learning anchor boxes, model pruning, and advanced data augmentation techniques. The YOLOv6 and YOLOv7 further optimized performance. YOLOv6 focused on computational efficiency, while YOLOv7 introduced advanced networks like dynamic convolution and hybrid task attention, improving both speed and accuracy in complex detection scenarios.

However, detecting small and overlapping cells in medical images remains a challenge [17], as highlighted by Shakarami et al. They found that YOLOv3-based methods failed to consistently detect small-sized cells in histopathology slides due to spatial resolution limitations in deeper layers of the network. The performance of YOLO-based cell detectors is heavily influenced by the dataset used for training. Medical image datasets are often small, highly specialized, and difficult to annotate, which may lead to overfitting and poor generalization. Nair et al. [18] emphasize the need for larger and diverse datasets to improve the generalization of models. Their study on YOLOv4 for cell detection in histopathology images showed that model performance improves significantly with the inclusion of more annotated samples, especially when incorporating a variety of cellular types and pathological conditions. Furthermore, the complexity of medical image annotation, where cells must be identified with high precision, poses challenges for both training and validation of model.

And data augmentation methods and semi-supervised learning approaches have been suggested to overcome these limitations, as explored by Yang et al. [19] in their hybrid YOLO-based method that utilizes synthetic image generation to expand dataset sizes. To further improve the detection performance of small cells, Shao et al. [20] proposed an attention-based YOLO network. This attention mechanism enables the model to focus on relevant regions of the image, enhancing detection accuracy for cells that may otherwise be overlooked in complex medical images. YOLO-Dense introduces Dense connections in YOLO [21], achieving better detection performance compared to Faster R-CNN with an mAP@50 score of 86%.

Due to the limited size of medical cell image datasets, data augmentation techniques are frequently employed to expand the variability of training samples. Transformations such as mirror flipping, rotation, scaling, and translation can be applied using affine transformation methods, which significantly diversify the set of available cell images. These augmentations not only enhance the range of training data but also contribute to improving the model’s robustness and its ability to generalize across various real-world scenarios. By artificially expanding the dataset, these techniques help mitigate overfitting and improve the model’s performance on unseen data.

In this study, we conduct an in-depth investigation of various object detection algorithms for blood cell recognition and counting, with a particular focus on the explainability and clinical applicability of our proposed model. We propose an improved YOLO-BC algorithm based on YOLOv8, incorporating Efficient Multi-scale Attention (EMSA) and Omni-dimensional Dynamic Convolution (ODConv) models. These innovations not only enhance the model’s performance but also contribute to its interpretability, which is crucial for clinical adoption. The BCCD (Blood Cell Count and Detection) dataset [21] is selected for experiments, with data augmentation tricks applied to enhance the dataset. In addition to performance metrics such as precision, recall, mAP, and F1 score, we emphasize the importance of transparent model predictions for clinical settings. To this end, we detail the impact of EMSA and ODConv on model explainability, providing insights into how these components aid in understanding the detection process.

3. Methods

3.1. The Design of YOLO-BC Detection Pipeline

In terms of blood cell detection and counting, the YOLO-BC should be designed to automatically identify and count different types of blood cells in blood images, such as red blood cells, white blood cells, and platelets.

YOLO-BC employs a convolutional neural network (CNN) for feature extraction and object detection, partitioning the image into grids and predicting candidate object bounding boxes along with their corresponding classes in each grid (refer to Figure 1a). This design allows YOLO-BC to deliver rapid detection and high accuracy, effectively handling large datasets of blood cell images.

As illustrated in Figure 1b, taking white blood cell detection as an example, the YOLO-BC algorithm generates bounding boxes along with the corresponding classification probabilities for the detected objects, thereby enabling the identification of various blood cell types. This design is crucial for enhancing the practical application of YOLO-BC in medical diagnostics.

Conventional medical diagnostics typically depend on manual visual assessment and analysis. In contrast, incorporating YOLO-BC for automated blood cell detection presents a promising method for offering clinicians a rapid and dependable diagnostic tool for conditions related to blood cells.

This study designed the training framework and model architecture for the YOLOv8-BC model, as illustrated in Figure 2. The initial dataset contained only 364 image samples, which are captured using a regular light microscope, with a resolution of 640

\times

480. Firstly, the test set is separated, and data augmentation is applied to the training and validation sets.

Through data augmentation, a more comprehensive blood cell dataset is generated, ensuring a broader representation of the blood cell detection scenario. The final dataset includes a training set (765 images), a validation set (73 images), and a test set (36 images). Although the number of test images is limited, each image contains a large number of blood cells, providing a relatively rich sample size.

In total, the dataset encompassed 11,789 individual blood cell samples, comprising 10,031 red blood cells, 898 white blood cells, and 851 platelets. After training on this augmented dataset, the YOLOv8-BC model is developed.

In the context of analyzing microscope images, medical professionals typically need to examine and assess a substantial number of blood samples to detect potential abnormalities. Traditionally, this process needs significant manual effort and time. However, with the implementation of the YOLO-BC pipeline, which incorporates advanced visual detection software, clinicians can simply input microscope images into the system. The embedded software autonomously detects and localizes blood cells within the images, providing rapid and precise diagnostic insights. This automated approach significantly enhances the efficiency and accuracy of blood cell analysis, supporting timely and reliable clinical decision-making.

3.2. YOLO-BC Algorithm

Based on the YOLOv8 model [22], incorporating the aforementioned EMSA and ODConv, this study designed and implemented an improved network architecture called YOLO-BC (YOLOv8 for Blood Cell), as illustrated in the Figure 3 below.

In this study, we introduce several novel modifications to the YOLOv8 architecture with the goal of enhancing feature aggregation and detection accuracy. The Efficient Multi-Scale Attention (EMSA) module is strategically integrated at the end of the C2f (Faster CSP Bottleneck with 2 Convolutions) block, which connects to the second concatenation layer in the neck section. This positioning allows the EMSA module to optimize the aggregation of multi-scale feature information, significantly improving the network’s capability to capture and process features across varying resolutions. Additionally, the third convolutional layer in the backbone is substituted with the Omni-Dimensional Dynamic Convolution (ODConv) module, which enhances the network’s ability to extract target-specific features. This modification further refines the detection performance, enabling more accurate identification of objects in complex scenarios. These architectural improvements collectively strengthen the overall efficiency and effectiveness of the YOLOv8 model in feature extraction and detection tasks.

In blood cell imagery, occlusion and overlapping among cells present significant challenges for object detection. To address these issues, an attention mechanism is introduced in Figure 3, aimed at mitigating the limitations of YOLOv8 in blood cell detection, including problems such as redundant counting in overlapping cells and reduced accuracy in detecting smaller blood cells.

3.3. Efficient Multi-Scale Attention

The Efficient Multi-Scale Attention (EMSA) mechanism [23] is an innovative attention module that has recently garnered significant attention in the literature. Its core principle is to reduce computational complexity while maintaining crucial information across each channel.

The EMSA mechanism achieves this by reshaping a subset of the channels into the batch dimension, which allows for more efficient processing of channel information, as illustrated in Figure 4.

By partitioning the channel dimension into multiple sub-features, the module ensures a balanced distribution of spatial semantic features across each feature group. Assume that the input feature map is X, there are m samples in total, each sample has

n

features, and the feature map is divided into G groups. For each group g, calculate the mean

μ_{g}

and variance

σ_{g}

of the features within the group, defined as [23]:

μ_{g} = \sum X_{i} / (n / G H W) {σ_{g}}^{2} = \sum {(X_{i} - μ_{g})}^{2} / (n / G H W)

(1)

Intuitively,

H

and

W

represent the height and width of the feature map respectively; G is number of feature map groups.

Then, it normalizes the features

X_{i}

within the group as:

Y_{i} = (X_{i} - μ_{g}) / \sqrt{{σ_{g}}^{2} + ε}

(2)

where ε is a small positive number to prevent the denominator from being zero. Finally, the normalized feature Y is restored to its original shape. This grouping improves the expression capability of different blood cells’ pixel features and reduces computational costs, enhancing model efficiency.

The EMSA mechanism is designed to encode global context information while dynamically adjusting the channel weights across each parallel branch, thereby enhancing the calibration of channel significance. Leveraging the advantages of the innovative EMSA architecture, the C2f module of YOLOv8 is optimized by incorporating a multi-scale attention mechanism. This addition effectively addresses the variances in pixel-level features across distinct blood cell types, with particular emphasis on small and challenging targets such as platelets. Furthermore, the backbone P4/16 layer is adapted post-C2f to improve the model’s ability to process these fine-grained features, ultimately boosting the network’s performance in detecting small blood cell categories. This modification enhances the overall feature extraction and detection capability, especially in scenarios involving subtle or difficult-to-detect objects.

Additionally, the EMSA mechanism leverages cross-dimensional interactions to aggregate output features from two parallel branches, enhancing the model’s ability to capture fine-grained pairwise relationships at the pixel level. This interaction is particularly beneficial for blood cell detection, as it allows for a deeper understanding of the spatial correlations between different regions within the image. By improving feature consistency and precision, the EMSA mechanism strengthens the model’s capability to represent complex patterns with greater efficacy. Within the network architecture, the EMSA module is strategically positioned before the bottleneck module (Figure 5), ensuring that the enriched feature representation is processed before any dimensionality reduction takes place. This design choice further optimizes the model’s performance by preserving detailed spatial information.

3.4. Omni-Dimensional Dynamic Convolution

ODConv (Omni-Dimensional Dynamic Convolution) [24] is an advanced dynamic convolution method that extends and builds upon the CondConv framework. It incorporates the dynamics of multiple dimensions, such as the spatial domain, input channels, and output channels, to enhance model performance through parallel strategies and multi-dimensional attention mechanisms. This enables ODConv to learn complementary attention features, thereby improving feature representation.

The core concept behind ODConv is to decompose the standard convolution operation into multiple sub-operations and introduce dynamic weights to optimize feature representation. By leveraging matrix decomposition [25], ODConv utilizes a decomposition strategy for the convolutional kernel by representing it as the product of two low-rank matrices. This decomposition significantly reduces the number of parameters and the computational complexity associated with traditional convolutional operations. As a result, the YOLOv8 will become more computationally efficient while maintaining high performance. This reduction in parameter space is particularly advantageous for blood cell detection tasks, where it facilitates faster inference and more efficient use of resources, ultimately enhancing the overall model performance without sacrificing accuracy.

ODConv is designed as a modular, plug-and-play component, allowing for seamless integration into the existing YOLOv8 architecture. It effectively addresses the issue of suboptimal detection performance, particularly in the detection of red blood cells (RBCs), by selectively replacing specific convolution modules within YOLOv8. Furthermore, this work introduces an enhanced iteration of ODConv, referred to as ODConv v2, which incorporates several key improvements to elevate its functionality. Notably, a batch normalization layer [26] is added after the convolution operation to improve feature stability and training efficiency, while the SiLU activation function is employed to provide enhanced non-linearity and better gradient flow.

In addition, several enhancements are implemented within the convolutional module at the P3/8 pixel level of the YOLOv8 architecture. These adjustments encompass changes to the spatial resolution as well as modifications to the input and output channel dimensions, all of which contribute to the optimization of the model’s overall performance. By fine-tuning these parameters, the model is able to better capture hierarchical features at varying scales, thereby improving its efficiency and accuracy. The experimental results presented in this study are based on the integration of ODConv v2 within the YOLO-BC framework, ensuring that the outcomes reflect the improved model architecture and its enhanced capabilities.

3.5. Dataset

Data augmentation is performed on the original BCCD dataset [21], thereby improving the model’s ability to generalize over unseen data. Each image has a 50% probability of being horizontally flipped. This transformation aids the model in learning to recognize objects from different perspectives, enhancing its ability to generalize and potentially improving detection performance. Similarly, each image has a 50% chance of undergoing a vertical flip.

This data augmentation approach is particularly beneficial in scenarios where objects may appear in inverted or varied vertical orientations, which is a common occurrence in real-world imaging environments. To further enhance the model’s robustness, a random cropping technique is applied to a subset of images, with the cropping percentage ranging from 0% to 15%. This makes the model to focus on diverse regions within the images, thereby promoting better spatial understanding and improving generalization. Additionally, the exposure levels of the images are adjusted within a range of −20% to +20%, simulating variations in lighting conditions and further enhancing the model’s ability to generalize across different environmental factors of blood samples. The dataset consists of a total of 874 clinical blood cell images, with 765 images allocated to the training set, 73 to the validation set, and 36 to the test set. This careful partitioning ensures a well-balanced distribution of images for model training and evaluation, optimizing performance across all stages of the learning process.

As is shown in Figure 6, within the BCCD dataset used for training, there are a total of 739 platelet instances, 8814 red blood cell instances, and 789 white blood cell instances (Figure 6a). Furthermore, normalized statistics were computed for the positions of the detection bounding boxes, scaling their coordinates to a range between 0 and 1. Analysis reveals that the dimensions (length and width) of the detection boxes are predominantly concentrated around the 0.2 mark. In this context, the x- and y-axes in Figure 6c represent the relative positions of the detection boxes within the blood cell dataset, with values being unitless, while the width and height axes in Figure 6d illustrate the relative sizes of the bounding boxes. These normalized statistics provide a more consistent and scalable representation of object locations and dimensions, improving model performance.

Label correlograms are an effective tool for identifying spatial patterns or correlations within blood cell object annotations, particularly in datasets containing multiple classes and scales. This technique enables the detection of instances where certain classes exhibit significant co-occurrence within an image, or where specific classes are more frequently observed at particular scales. The following figure shows a label correlation plot for a blood cell dataset, visualized in the xywh space (Figure 7). This plot illustrates the relationships between the x, y coordinates, as well as the width and height variables of the detection bounding boxes for each label. Additionally, the size distribution of the blood cell dataset is relatively well-balanced, making it more conducive to training robust models. This balanced distribution of object sizes ensures that the model learns to detect cells of varying dimensions, enhancing its performance across different scenarios.

3.6. Statistics

In this study, detection performance was evaluated using several metrics, including accuracy (Precision), recall rate (Recall), F1 score (F1), mean average precision (mAP) [27,28], number of parameters (Params), computational complexity (FLOPs), and frames per second (FPS) [29]. Below is a detailed introduction of these metrics along with their corresponding formulas:

(1): Precision

Precision measures the ratio of true positives (TP) among the instances predicted as positive by the model, i.e., the proportion of correctly identified blood cells out of all the detected instances. In a medical context, high precision is crucial to minimize the number of false positives (FP), ensuring that the model does not incorrectly identify healthy cells as abnormal, which could lead to unnecessary treatments or interventions. The formula for precision is [27]:

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

In Formula (3), TP represents the number of true positives and FP represents the number of false positives.

(2): Recall

Recall, on the other hand, measures the ability of the model to identify all relevant instances (i.e., all the positive cases), reflecting the proportion of true positives detected out of all actual positive samples. In the context of medical imaging, recall is particularly important as it ensures that the model does not miss any actual abnormalities, which is critical for early detection and prevention of diseases. However, an overly high recall may increase false positives, which is a trade-off that must be balanced. Its formula is:

R e c a l l = \frac{T P}{T P + F N}

(4)

In Formula (4), TP represents the number of true positives, and FN represents the number of false negatives.

(3): F1 score

The F1-score combines precision and recall into a single metric, providing a balance between the two. It is especially useful when there is an imbalance between the precision and recall, which is often the case in medical applications where both false positives and false negatives can have significant consequences. The F1-score highlights cases where the model is struggling to maintain both high precision and recall simultaneously. Its formula is:

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R r c a l l}

(5)

(4): mAP@50 and mAP@50-95

The Mean Average Precision (mAP) is commonly used to evaluate object detection models, including those in medical imaging tasks like blood cell detection. mAP measures the overall accuracy of the model by averaging the precision at different recall levels, providing insight into both the model’s ability to detect true positives and the trade-offs in its decision threshold.

Specifically, this study uses mAP at two different IoU (Intersection over Union) thresholds: mAP@50 and mAP@50-95, which provide different levels of sensitivity for detection. Higher mAP values indicate better model performance in detecting blood cells with high precision. The formula for mAP is:

m A P = \frac{1}{n} \sum_{i = 1}^{n} {A P}_{i}

(6)

In Formula (6), n represents the number of object categories, and

{A P}_{i}

represents the Average Precision of the i-th category, where mAP@50 and mAP@50-90 are the mAP values, with the threshold of IoU is set to 0.5 and 0.5–0.95 respectively.

4. Experiments and Results

4.1. Experimental Environment

The experiment was conducted using the pytorch 2.0.0 training framework [30] on a linux operating system environment with an NVDIA GeForce RTX 3090 graphics card with 24 GB of memory [31]. This study uses the default YOLOv8 hyperparameters, which are well-tuned for a wide range of tasks.

The model training parameters were set to 100 epochs, and the optimizer used Adam with an initial learning rate of 0.01 and a momentum of 0.937. The object confidence threshold for detection is set to 0.001 for validation and testing, which is also the default confidence adopted by YOLOv8. And the confidence threshold in the visualization is set to 0.25.

4.2. Model Training

The training process of the improved model is shown in the Figure 8 below, where the x-axis represents the number of epoches and the y-axis represents the corresponding indicator value. The YOLO-BC model params, Flops and FPS are 3,072,484, 8.3 G and 22.4 frame/s respectively.

For comparison, the YOLO-BC model uses the default YOLOv8 hyperparameters, with a batch size of 16 and 100 epochs. At the end of the training, all performance indicators [32] of the model converge, with an mAP@50 of 0.927 and an mAP@50:95 of 0.653 on the training dataset. Considering the different blood cell types, the recall rates for WBC, RBC, and Platelets are 99.9%, 79.2%, and 87.5%, respectively, with mAP50 values of 0.994, 0.834, and 0.873 for the three categories. This indicates that none of the categories exhibit significantly low accuracy.

The results reveal that when the IoU threshold is set to 0.5, the YOLO-BC model exhibits a modest improvement over YOLOv8 on the blood cell training dataset (see Figure 9, panels (b), (c), and (d) compared to (a)). However, as the IoU threshold increases, the model demonstrates a marked improvement in mean Average Precision (mAP), although the absolute value remains comparatively lower. Furthermore, both recall and precision metrics show varying levels of enhancement.

However, the true performance of the blood cell detection model must be evaluated on a test set to accurately assess the robustness of the YOLO-BC model. The experimental results from the training set serve as a reference for tracking improvements in the model.

4.3. Ablation Environment

An ablation experiment on test dataset was designed to validate the effectiveness of the proposed YOLO-BC model. The experimental results are shown in Table 1. The detection accuracy of YOLO-BC, as measured by mAP@50 and mAP@50:95, reached 0.901 and 0.62 respectively.

While the YOLOv8 model achieves 20.6 FPS, the integration of ODConv increases the FPS to 25.9, but the final YOLO-BC model reports a lower FPS of 22.4. This FPS trade-off can be attributed primarily to the addition of the EMSA. While ODConv contributes to an increase in FPS by optimizing convolution operations through dynamic weight adjustments, EMSA introduces additional computational complexity due to its multi-dimensional attention. This mechanism, while improving feature representation, requires more computations, particularly in terms of multi-scale feature interactions, which ultimately results in a slight reduction in FPS in the YOLO-BC model (22.4 FPS). However, this trade-off is justified by the significant performance gains in other key metrics, particularly in mAP, which is the most critical metric for blood cell detection. The improvements in accuracy and robustness achieved through EMSA, alongside the dynamic convolution capabilities of ODConv, result in a more accurate and efficient detection system despite the minor FPS reduction.

These values were the highest among several models, highlighting the significant advantages of the improved model (Table 1). From the Precision-Recall comparison (Figure 10), the YOLOv8 + ODConv model performs slightly better in Precision, while YOLOv8 + EMSA exhibits a marginally lower Recall. When assessing the mean Average Precision (mAP), which serves as the primary evaluation metric for blood cell object detection, the YOLO-BC model surpassed all competing approaches, achieving the highest scores for both mAP@50 and mAP@50:95. The results of ablation studies further validate the contribution of the two key enhancements—EMSA and ODConv. These experiments, as depicted in Figure 11, underscore the substantial impact of these modifications on the model’s performance, demonstrating their effectiveness in improving detection accuracy.

When compared to the baseline YOLOv8 model, the integration of the EMSA and ODConv modules has resulted in a substantial enhancement in the core performance metric, mAP@50. Specifically, the improvements are evident across various blood cell types. For platelets, the mAP@50 has increased from 0.855 to 0.873, while the mAP for Red Blood Cells (RBCs) has risen from 0.782 to 0.834. Additionally, the mAP for White Blood Cells (WBCs) has seen a remarkable increase from 0.974 to 0.994 (as shown in Figure 12). Consequently, the final YOLO-BC model achieves a mAP@50 of 0.901, an increase from the original 0.87 achieved by YOLOv8, demonstrating a significant improvement in detection accuracy. These enhancements highlight the effectiveness of the EMSA and ODConv modules in refining the model’s ability to capture and classify different blood cell types with higher precision.

Subsequently, this study employes the RMSE metric to perform a statistical analysis of the counting errors for each blood cell type. The YOLO-BC demonstrates a significantly lower RMSE for RBC compared to the baseline, indicating improved performance in RBC count prediction. The RMSE values are 10.91 and 12.04 for the final and baseline models, respectively, with a difference of approximately 1.13. This reduction suggests that the final model enhances RBC prediction accuracy, contributing to a significant improvement in overall performance, especially in tasks where RBC counting is crucial.

And RMSE for WBC shows no difference between the final and baseline models, both having a value of 0.17. Given the small RMSE, both models exhibit high accuracy in predicting WBC counts, and the prediction error has negligible impact on overall model performance. Finally, The RMSE for Platelets prediction is reduced from 0.83 in the baseline model to 0.73 in the YOLO-BC, indicating a slight improvement in the final model’s platelet prediction accuracy. Though the difference is modest, it represents a step forward in prediction performance for this category.

The final model’s total RMSE is about 1.14 lower than that of the baseline, indicating a notable improvement in overall prediction accuracy. This improvement is primarily driven by the reduction in RBC RMSE, which has a significant contribution to the total error. The YOLO-BC’s optimization in RBC prediction notably enhances the overall performance.

4.4. Comparison Experiment

In order to further improve the superiority of the YOLO-BC model, this study selected different versions of mainstream object detection models, including SSD [33,34], Faster-RCNN [35,36], YOLOv5 [37,38,39], RT-DETR [40], GroundingDINO [41], for comparative experiments. The results are shown in Table 2, where the dataset splits, input resolution, and training epochs used for all models are the same.

As is shown in Table 2, YOLO-BC achieves the highest F1 score (85%) and mAP@50 (90.1%), which indicates a better overall performance in blood cell object detection in all.

Furthermore, it has the highest precision (80.9%) and recall (89%) among all the models, which demonstrates its excellent ability to both detect and identify objects accurately. The GFLOPs of YOLO-BC (8.3) are nearly identical to that of YOLOv8 (8.1), yet it outperforms YOLOv8 with a 3.1% improvement in mAP@0.5. This shows that YOLO-BC significantly enhances object detection accuracy by incorporating innovative modules such as EMSA and ODConv, while the computational cost remains almost unchanged.

4.5. Visualization

After completing the ablation experiment and comparison experiment, the feature map visualization [42,43] is conducted on YOLO-BC detection for blood cells. And the feature map visualization refers to the visual display of the intermediate layer feature map of the neural network [44] during the inference process, so as to better understand the working principle of the model and the learned features.

As shown in Figure 13, YOLO-BC effectively extracts the pixel-level features of various types of blood cells and clearly delineates the contours of different blood cell categories. However, YOLO-BC still requires further processing of these features to accurately detect and classify the blood cells. By leveraging a multi-layer neural network architecture, YOLO-BC is able to thoroughly learn and capture the intricate pixel characteristics of blood cells, thereby improving detection and classification performance.

To provide a more intuitive evaluation of the proposed algorithm’s performance, we present the detection results of both YOLOv8 and YOLO-BC on the test dataset (Figure 14).

It is evident that YOLOv8 demonstrates lower confidence when detecting RBCs in the same region, leading to a significant degradation in RBC counting accuracy, particularly at the positions marked by the dotted lines at both ends (Figure 14). In contrast, the improved YOLO-BC model achieves precise detection and accurate counting of the same blood cell targets. This clearly demonstrates that YOLO-BC is more effective at localizing and recognizing blood cells with higher precision.

Taking into account the practical requirements of medical professionals for blood cell diagnostics, the YOLO-BC algorithm provides the flexibility to tailor its functionality to specific needs. For instance, in cases where only the detection and quantification of red blood cells (RBCs) is necessary for diagnosing conditions related to RBC abnormalities [45,46,47], the algorithm allows users to adjust the category label to exclusively detect and count RBCs, as shown in Figure 15. This customization feature enables medical personnel to concentrate solely on the blood cell type of interest, minimizing visual distractions that may arise from the simultaneous detection of other cell types. Consequently, this targeted detection capability enhances the efficiency of the diagnostic process by streamlining the focus on relevant cellular features.

And it is noteworthy that YOLO-BC can also detect small targets such as platelets more accurately, indicating that the model has certain scalability in general cell diagnosis and has the potential to detect small objects in microfluidic images.

4.6. The 5-Fold Cross Validation

To eliminate the impact of data imbalance and bias from small dataset partitioning on result evaluation, we employe 5-fold cross-validation during the training process, as shown in the Table 3. It can be observed that the weight averaging performance obtained through 5-fold cross-validation is no weaker than the YOLO-BC model built in the previous sections, especially in the core indicator mAP.

At the same time, this also eliminates the risk of model overfitting, when YOLO-BC is applied to more blood cell detection scenarios.

4.7. Generalization Validation

To validate the generalization and robustness of the proposed YOLO-BC model, this study selects the BCDv4 (blood cell detection version4) dataset from Roboflow AI platform, for further experimentation.

In the ablation experiment conducted on the new BCDv4 testing dataset, the YOLO-BC model proves notable advantages after improvements, as is shown in Table 4. YOLO-BC achieves the highest performance with a Precision of 92.7%, Recall of 90.6%, and a mAP@50 of 0.939, surpassing all other models in terms of both mAP@50 and overall detection accuracy. It also strikes an optimal balance between computational efficiency and performance, with 3,062,116 parameters and 8.0 GFLOPs, offering a competitive FPS of 6.48.

Then, in the comparison experiments on the BCDv4 dataset, the YOLO-BC model proves clear superiority in terms of both detection accuracy and computational efficiency. As is shown in Table 5, the YOLO-BC achieves a Precision of 92.7%, Recall of 90.6%, and a mAP@0.5 of 0.939, outperforming other models in terms of mAP@0.5 while maintaining a low computational cost with only 8.0 GFLOPs. With fewer GFLOPs, YOLO-BC provides a competitive FPS of 6.48, significantly outperforming more computationally expensive models like Faster-RCNN (134.0 GFLOPs) and RT-DETR (103.4 GFLOPs) in terms of real-time performance.

While models such as YOLOv5 and YOLOv8 provide strong performance with high precision, YOLO-BC stands out with a balanced trade-off of accuracy and efficiency, surpassing YOLOv8 baseline in both mAP@0.5 and FPS. The model’s efficiency with fewer GFLOPs and higher FPS, positions YOLO-BC as a good choice for real-time, high-performance applications in medical imaging fields.

As shown in the Figure 16, with different microscope magnification settings, YOLO-BC can still stably detect blood cells and distinguish their types.

4.8. Deployment on Edge Computing Device

Considering practical clinical applications, this study selects the RK3576 edge device produced by Rockchip for deploying the YOLO-BC model. The RK3576 supports an octa-core CPU (4× Cortex-A72 + 4× Cortex-A53), a 6TOPS NPU (Neural Processing Unit) for AI computations, and a Mali-G52 MC3 GPU, making it capable of handling AI-intensive and graphics-intensive applications. And we use the NanoPi M5 edge computing device with RK3576 chip, for the whole deployment, as is shown in Figure 17.

The experimental results show an average inference time of 35.658 ms, with a total memory usage of only 12.28 MB. These findings indicate that the lightweight YOLO-BC exhibits significant lightweight advantages, showcasing its potential for deployment on various small-scale medical devices for blood cell detection. With the low price of the RKNN-3576 hardware, it further facilitates the broader adoption and practical application of the YOLO-BC model in clinical diagnostics.

5. Conclusions

In this work, we present an enhanced method for blood cell image detection and counting, referred to as YOLO-BC, which is built upon the YOLOv8 architecture. To improve feature representation, we use the EMSA module prior to the bottleneck layer of the backbone, enabling the model to better capture spatial relationships between features across different regions of the image. Furthermore, we replace the P3/8 pixel convolutional module with the ODConv module, strategically enhancing the model’s ability to leverage multi-dimensional contextual information for more robust feature extraction. Experimental results indicate that the revised architecture leads to a substantial enhancement in detection performance, achieving a mean Average Precision of 0.901 at an Intersection over Union (IoU) threshold of 50% (mAP@50), and a mAP of 0.62 across a range of IoU thresholds (mAP@50:95). These outcomes confirm the efficacy of the proposed approach in improving the precision of blood cell detection.

YOLO-BC effectively resolves pixel-level discrepancies between three blood cell categories, surpassing the constraints of conventional counting techniques. Through the integration of the EMSA and ODConv submodules, the model addresses prevalent issues such as missed detections, false positives, and redundant counting, especially in intricate scenarios. These advancements considerably improve the model’s accuracy and reliability, particularly in blood cell detection and quantification tasks on scenarios, enhancing its overall robustness.

By comparing SSD, Faster R-CNN, YOLOv5, YOLOv8, RT-DETR, and Ground DIDO, YOLO-BC achieves the highest F1 score (85%) and mAP@50 (90.1%), demonstrating superior overall performance in object detection. Furthermore, it exhibits the highest precision (80.9%) and recall (89%) among all the models, highlighting its exceptional capability in both detecting and accurately identifying objects. Then, a 5-fold cross-validation is condacuted, thereby eliminating the risk of overfitting and bias from small dataset partitioning. The validation on the subsequent BCDv4 dataset also demonstrates the generalization ability of YOLO-BC. Therefore, YOLO-BC offers significant advantages in terms of accuracy compared to other widely used models, positioning it as a promising choice for object detection applications.

Finally, deployment experiments on the RK3576 edge computing device proves that YOLO-BC achieves a inference speed at the millimeter scale for single blood cell image, while utilizing only approximately 12 MB of memory. Its low computational cost makes it highly favorable for real-time hospital workflow integration. In future work, data augmentation methods [48,49,50], implementation of lightweight models [51,52], and fine-tuning of the model will be pursued. Data augmentation can improve the shortcomings of incomplete coverage of scenes with a small number of samples, and the lighter YOLO-BC model will have faster inference speed, greatly speeding up the diagnosis of blood cells by medical workers, and is also convenient for deployment on some portable medical devices.

Author Contributions

Methodology, Z.L.; Software, Z.L.; Validation, G.Z.; Investigation, G.Z.; Writing—original draft, Z.L.; Writing—review & editing, D.Y. and G.Z.; Supervision, D.Y. and G.Z.; Project administration, D.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article. The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflict of interest.

References

Sackmann, E.K.; Fulton, A.L.; Beebe, D.J. The present and future role of microfluidics in biomedical research. Nature 2014, 507, 181–189. [Google Scholar] [CrossRef]
Yousif, T.Y.E. Impact of abnormal leukocyte count in the pathophysiology of sickle cell anemia. J. Blood Med. 2022, 13, 673–679. [Google Scholar] [CrossRef]
George, T.I.; Bajel, A. Diagnosis of rare subtypes of acute myeloid leukaemia and related neoplasms. Pathology 2021, 53, 312–327. [Google Scholar] [CrossRef]
İnce, F.D.; Ellidağ, H.Y.; Koseoğlu, M.; Şimşek, N.; Yalçın, H.; Zengin, M.O. The comparison of automated urine analyzers with manual microscopic examination for urinalysis automated urine analyzers and manual urinalysis. Pract. Lab. Med. 2016, 5, 14–20. [Google Scholar] [CrossRef]
Wang, X.; Liu, J. Tomato anomalies detection in greenhouse scenarios based on YOLO-Dense. Front. Plant Sci. 2021, 12, 634103. [Google Scholar] [CrossRef]
Chlap, P.; Min, H.; Vandenberg, N.; Dowling, J.; Holloway, L.; Haworth, A. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 2021, 65, 545–563. [Google Scholar] [CrossRef]
Dikici, E.; Bigelow, M.; White, R.D.; Erdal, B.S.; Prevedello, L.M. Constrained generative adversarial network ensembles for sharable synthetic medical images. J. Med. Imaging 2021, 8, 024004. [Google Scholar] [CrossRef]
Akhil, M.S.; Sharma, B.S.; Kodipalli, A.; Rao, T. Medical image synthesis using DCGAN for chest X-ray images. In Proceedings of the 2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS), Coimbatore, India, 14–15 March 2024; IEEE: New York, NY, USA, 2024; Volume 1, pp. 1–8. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Alex, K.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
Shafiq, M.; Gu, Z. Deep residual learning for image recognition: A survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 10012–10022. [Google Scholar]
Kumar, A.; Mehta, R.; Reddy, B.R.; Singh, K.K. Vision transformer based effective model for early detection and classification of lung cancer. SN Comput. Sci. 2024, 5, 839. [Google Scholar] [CrossRef]
Singh, S. Computer-aided diagnosis of thoracic diseases in chest X-rays using hybrid cnn-transformer architecture. arXiv 2024, arXiv:2404.11843. [Google Scholar]
Cong, X.; Li, S.; Chen, F.; Liu, C.; Meng, Y. A Review of YOLO Object Detection Algorithms based on Deep Learning. Front. Comput. Intell. Syst. 2023, 4, 17–20. [Google Scholar] [CrossRef]
Shakarami, A.; Menhaj, M.B.; Mahdavi-Hormat, A.; Tarrah, H. A fast and yet efficient YOLOv3 for blood cell detection. Biomed. Signal Process. Control 2021, 66, 102495. [Google Scholar] [CrossRef]
Nair, L.S.; Prabhu, R.; Sugathan, G.; Gireesh, K.V.; Nair, A.S. Mitotic nuclei detection in breast histopathology images using YOLOv4. In Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 6–8 July 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Yang, X.; Song, Z.; King, I.; Xu, Z. A survey on deep semi-supervised learning. IEEE Trans. Knowl. Data Eng. 2022, 35, 8934–8954. [Google Scholar] [CrossRef]
Shao, Y.; Xu, Z.; Zhu, Q. SH-YOLO: Enhanced Real-Time Detection of Laparoscopic Surgical Instruments in Computer-aided Surgery based on Star Operation and Hybrid Attention Mechanisms. IEEE Access 2025, 13, 135179–135195. [Google Scholar] [CrossRef]
Zhong, T. Research on Blood Cell Recognition and Counting Based on Improved YOLO v7. Adv. Appl. Math. 2023, 12, 1083–1089. [Google Scholar] [CrossRef]
Terven, J.; Cordova-Esparza, D. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv 2023, arXiv:2304.00501. [Google Scholar]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023, Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Li, C.; Zhou, A.; Yao, A. Omni-dimensional dynamic convolution. arXiv 2022, arXiv:2209.07947. [Google Scholar] [CrossRef]
Hsu, D.; Kakade, S.M.; Zhang, T. Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theory 2011, 57, 7221–7234. [Google Scholar] [CrossRef]
Kutlu, H.; Avci, E.; Özyurt, F. White blood cells detection and classification based on regional convolutional neural networks. Med. Hypotheses 2020, 135, 109472. [Google Scholar] [CrossRef] [PubMed]
Flach, P.; Kull, M. Precision-recall-gain curves: PR analysis done right. In Proceedings of the 29th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Roy, A.M.; Bhaduri, J. A deep learning enabled multi-class plant disease detection model based on computer vision. AI 2021, 2, 413–428. [Google Scholar] [CrossRef]
Lee, Y.; Hwang, J.W.; Lee, S.; Bae, Y.; Park, J. An energy and GPU-computation efficient backbone network for real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
Wu, P. PyTorch 2.0: The Journey to Bringing Compiler Technologies to the Core of PyTorch (Keynote). In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, Montréal, QC, Canada, 25 February–1 March 2023; p. 1. [Google Scholar]
Oakden, T.; Kavakli, M. Performance Analysis of RTX Architecture in Virtual Production and Graphics Processing. In Proceedings of the 2022 IEEE 42nd International Conference on Distributed Computing Systems Workshops (ICDCSW), Bologna, Italy, 10 July 2022; IEEE: New York, NY, USA, 2022; pp. 215–220. [Google Scholar]
Bolton, T.; Bass, J.; Gaber, T.; Mansouri, T. Comparing Object Recognition Models and Studying Hyperparameter Selection for the Detection of Bolts. In Natural Language Processing and Information Systems, Proceedings of the International Conference on Applications of Natural Language to Information Systems, Derby, UK, 21–23 June 2023; Springer Nature: Cham, Switzerland, 2023; pp. 186–200. [Google Scholar]
Wang, Q.; Bi, S.; Sun, M.; Wang, Y.; Wang, D.; Yang, S. Deep learning approach to peripheral leukocyte recognition. PLoS ONE 2019, 14, e0218808. [Google Scholar] [CrossRef] [PubMed]
Shah, R.; Shastri, J.; Bohara, M.H.; Panchal, B.Y.; Goel, P. Detection of different types of blood cells: A comparative analysis. In Proceedings of the 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballari, India, 23–24 April 2022; IEEE: New York, NY, USA, 2022; pp. 1–5. [Google Scholar]
Yang, S.; Fang, B.; Tang, W.; Wu, X.; Qian, J.; Yang, W. Faster R-CNN based microscopic cell detection. In Proceedings of the 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Shenzhen, China, 15–17 December 2017; IEEE: New York, NY, USA, 2017; pp. 345–350. [Google Scholar]
Wen, T.; Wu, H.; Du, Y.; Huang, C. Faster R-CNN with improved anchor box for cell recognition. Math. Biosci. Eng. 2020, 17, 7772–7786. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Deng, Z.; Wu, Z.; Lai, G. An Improved EIoU-Yolov5 Algorithm for Blood Cell Detection and Counting. In Proceedings of the 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Chengdu, China, 19–21 August 2022; IEEE: New York, NY, USA, 2022; pp. 989–993. [Google Scholar]
Nayer, M.M.A.; Rahaman, K.M.A.; Hossen, M.N. DCBC DeepL: Detection and Counting of Blood Cells Employing Deep Learning and YOLOv5 Model. In Proceedings of the Artificial Intelligence and Data Science: First International Conference, ICAIDS 2021, Hyderabad, India, 17–18 December 2021; Springer Nature: Cham, Switzerland, 2022; pp. 203–214. [Google Scholar]
Luong, D.T.; Anh, D.D.; Thang, T.X.; Huong, H.T.L.; Hanh, T.T.; Khanh, D.M. Distinguish normal white blood cells from leukemia cells by detection, classification, and counting blood cells using YOLOv5. In Proceedings of the 2022 7th National Scientific Conference on Applying New Technology in Green Buildings (ATiGB), Da Nang, Vietnam, 11–12 November 2022; IEEE: New York, NY, USA, 2022; pp. 156–160. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
Liu, S.; Zeng, Z.; Ren, T.; Li, F.; Zhang, H.; Yang, J.; Jiang, Q.; Li, C.; Yang, J.; Su, H.; et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 20–21 August 2025; Springer: Cham, Switzerland, 2025; pp. 38–55. [Google Scholar]
Chen, Y.; Zhao, Z.; Yu, Y.; Wang, W.; Tang, C. Understanding IFRA for Detecting Synchronous Machine Winding Short Circuit Faults Based on Image Classification and Smooth Grad-CAM++. IEEE Sens. J. 2022, 23, 2422–2432. [Google Scholar] [CrossRef]
Menikdiwela, M.; Nguyen, C.; Li, H.; Shaw, M. CNN-based small object detection and visualization with feature activation mapping. In Proceedings of the 2017 International Conference on Image and Vision Computing New Zealand (IVCNZ), Christchurch, New Zealand, 4–6 December 2017; IEEE: New York, NY, USA, 2017; pp. 1–5. [Google Scholar]
Ma, W.; Wu, Y.; Cen, F.; Wang, G. Mdfn: Multi-scale deep feature learning network for object detection. Pattern Recognit. 2020, 100, 107149. [Google Scholar] [CrossRef]
Georgatzakou, H.T.; Antonelou, M.H.; Papassideri, I.S.; Kriebardis, A.G. Red blood cell abnormalities and the pathogenesis of anemia in end-stage renal disease. Proteom.–Clin. Appl. 2016, 10, 778–790. [Google Scholar] [CrossRef]
Krishnevskaya, E.; Molero, M.; Ancochea, Á.; Hernández, I.; Vives-Corrons, J.L. New-Generation Ektacytometry Study of Red Blood Cells in Different Hemoglobinopathies and Thalassemia. Thalass. Rep. 2023, 13, 70–76. [Google Scholar] [CrossRef]
Peng, S.; Li, W.; Ke, W. Association between red blood cell distribution width and all-cause mortality in unselected critically ill patients: Analysis of the mimic-iii database. Front. Med. 2023, 10, 1152058. [Google Scholar] [CrossRef]
Bhuiyan, M.; Islam, M.S. A new ensemble learning approach to detect malaria from microscopic red blood cell images. Sens. Int. 2023, 4, 100209. [Google Scholar] [CrossRef]
Alomar, K.; Aysel, H.I.; Cai, X. Data augmentation in classification and segmentation: A survey and new strategies. J. Imaging 2023, 9, 46. [Google Scholar] [CrossRef]
Hao, X.; Zhu, Y.; Appalaraju, S.; Zhang, A.; Zhang, W.; Li, B.; Li, M. Mixgen: A new multi-modal data augmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–7 January 2023; pp. 379–389. [Google Scholar]
Wu, Y.; Han, Q.; Jin, Q.; Li, J.; Zhang, Y. LCA-YOLOv8-Seg: An Improved Lightweight YOLOv8-Seg for Real-Time Pixel-Level Crack Detection of Dams and Bridges. Appl. Sci. 2023, 13, 10583. [Google Scholar] [CrossRef]
Du, Y.; Liu, X.; Yi, Y.; Wei, K. Optimizing Road Safety: Advancements in Lightweight YOLOv8 Models and GhostC2f Design for Real-Time Distracted Driving Detection. Sensors 2023, 23, 8844. [Google Scholar] [CrossRef] [PubMed]

Figure 1. (a) Feature extraction of blood cell (b) YOLO-BC’s bounding box and class probability for detected blood cell.

Figure 2. YOLO-BC training mechanism and model’s architecture.

Figure 3. The schematic diagram of the proposed YOLO-BC.

Figure 4. Efficient Multi-Scale Attention Module.

Figure 5. The EMSA module before bottleneck in YOLO-BC.

Figure 6. Distribution of blood cell labels in training set ((a) Blood cell instances counting; (b) Length and width of the detection boxes; (c) The relative positions of the detection boxes; (d) The relative sizes of the bounding boxes).

Figure 7. Labels correlogram of blood cell dataset.

Figure 8. YOLO-BC Model Training (The loss function is formulated to measure the discrepancy between the model’s predictions and the ground truth, incorporating key components such as box loss, classification loss, and distribution focal loss. A reduced loss value signifies enhanced model robustness and greater flexibility in adapting to data variations. The reported results include the raw values of various training metrics, while the “smoothed” values reflect the metrics after applying a smoothing trick, offering a more refined and consistent representation of the overall training trajectory. This smoothing process helps to mitigate fluctuations and provides a clearer view of the model’s performance evolution throughout the training process).

Figure 9. The training results of models based on YOLOv8 ((a) precision curve of models from YOLOv8 to YOLO-BC; (b) recall curve of models; (c) mAP@50 curve of models where iou threshold is 0.5; (d) mAP@50:95 curve of models, iou threshold is 0.5–0.95).

Figure 10. Comparison of precision and recall rate of different models.

Figure 11. Comparison of mAP@50 and mAP@50:95 for different models.

Figure 12. Comparison results between the optimized YOLOv8 model and the original model. (a) Precison-Recall curve of YOLOv8 on three types of blood cells; (b) Precison-Recall curve of YOLOv8 with EMSA improvementt; (c) Precison-Recall curve of YOLOv8 with ODConv improvement; (d) Precison-Recall curve of YOLO-BC, including both EMSA and ODConv. Precison of RBC, WBC and Platelets. And the mAP@0.5 for Platelets, RBC, and WBC are 87.3%, 83.4%, 99.4% respectively.

Figure 13. YOLO-BC’s feature maps for blood cell detection (By designating a specific feature layer as the output target, the backpropagation algorithm is utilized to calculate the gradient of the input image relative to this feature map. These computed gradients are subsequently employed to construct a heatmap, which visually represents the regions of the input that most strongly influence the activations within the selected feature layer).

Figure 14. Comparison of predicted results. The improved YOLO-BC can identify some red blood cells missed by yolov8 detection.

Figure 15. Separate diagnosis of corresponding types of blood cells.

Figure 16. Blood cell detection for images captured with varying microscope settings.

Figure 17. The NanoPi M5 edge computing device with RK3576 chip.

Table 1. Ablation experiment on test dataset of BCCD.

Model	P/%	R/%	mAP@50	mAP@50:95	GFLOPs	FPS
YOLOv8	79.9	87.5	87	58.3	8.1	20.6
YOLOv8 + EMSA	78.9	87.2	88	59.8	8.2	22.6
YOLOv8 + ODConv	81.4	86.5	88.2	61.1	7.9	25.9
YOLO-BC	80.9	88.9	90.1	62	8.3	22.4

Table 2. Comparison experiment of different models.

Model	F1	P/%	R/%	mAP@0.5/%	GFLOPs
SSD	66.3	80.4	64.9	75.7	62.4
Faster-RCNN	56.3	53.9	72.8	67.2	205.1
YOLOv5	81.9	78.8	85.8	87	24.1
YOLOv8	83	79.9	87.5	87	8.1
RT-DETR	82.5	78.8	87.9	86.7	100.9
GroundingDINO	84.6	80.7	88.3	88.5	64.8
YOLO-BC	85	80.9	88.9	90.1	8.3

Table 3. Performance of 5-fold cross-validation on BCCD dataset.

Model	P/%	R/%	mAP@0.5	GFLOPs	FPS
baseline	79.9	87.5	87	8.1	20.6
YOLO-BC	80.9	88.9	90.1	8.3	22.4
5-Fold CV	87.2	88.7	91.6	8.11	24.9

Table 4. Ablation experiment on testing dataset of BCDv4.

Model	P/%	R/%	mAP@50	mAP@50:95	Params	GFLOPs	FPS
YOLOv8	93.3	86.7	0.915	0.665	3,006,233	8.1	5.44
YOLOv8 + EMSA	94.1	88.9	0.928	0.620	3,040,601	8.2	5.79
YOLOv8 + ODConv	91.2	89.9	0.927	0.608	3,027,748	7.9	5.67
YOLO-BC	92.7	90.6	0.939	0.653	3,062,116	8.0	6.48

Table 5. Comparison experiment of different models on BCDv4.

Model	P	R	mAP@0.5	GFLOPs	FPS
SSD	0.663	0.617	0.899	30.5	14.3
Faster-RCNN	0.650	0.634	0.904	134.0	6.2
YOLOv5	0.964	0.939	0.906	24.2	8.77
YOLOv8	0.933	0.867	0.915	8.1	5.44
RT-DETR	0.935	0.833	0.899	103.4	13.88
GroundingDINO	0.653	0.705	0.896	52.4	11.1
YOLO-BC	0.927	0.906	0.939	8.0	6.48

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Yuan, D.; Zhu, G. Automated Blood Cell Detection and Counting Based on Improved Object Detection Algorithm. Mathematics 2025, 13, 3023. https://doi.org/10.3390/math13183023

AMA Style

Liu Z, Yuan D, Zhu G. Automated Blood Cell Detection and Counting Based on Improved Object Detection Algorithm. Mathematics. 2025; 13(18):3023. https://doi.org/10.3390/math13183023

Chicago/Turabian Style

Liu, Zeyu, Dan Yuan, and Guohun Zhu. 2025. "Automated Blood Cell Detection and Counting Based on Improved Object Detection Algorithm" Mathematics 13, no. 18: 3023. https://doi.org/10.3390/math13183023

APA Style

Liu, Z., Yuan, D., & Zhu, G. (2025). Automated Blood Cell Detection and Counting Based on Improved Object Detection Algorithm. Mathematics, 13(18), 3023. https://doi.org/10.3390/math13183023

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Blood Cell Detection and Counting Based on Improved Object Detection Algorithm

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. The Design of YOLO-BC Detection Pipeline

3.2. YOLO-BC Algorithm

3.3. Efficient Multi-Scale Attention

3.4. Omni-Dimensional Dynamic Convolution

3.5. Dataset

3.6. Statistics

4. Experiments and Results

4.1. Experimental Environment

4.2. Model Training

4.3. Ablation Environment

4.4. Comparison Experiment

4.5. Visualization

4.6. The 5-Fold Cross Validation

4.7. Generalization Validation

4.8. Deployment on Edge Computing Device

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI