Improved YOLOv7 Algorithm for Detecting Bone Marrow Cells

Cheng, Zhizhao; Li, Yuanyuan

doi:10.3390/s23177640

Open AccessArticle

Improved YOLOv7 Algorithm for Detecting Bone Marrow Cells

by

Zhizhao Cheng

and

Yuanyuan Li

^*

School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan 430205, China

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(17), 7640; https://doi.org/10.3390/s23177640

Submission received: 31 July 2023 / Revised: 29 August 2023 / Accepted: 31 August 2023 / Published: 3 September 2023

(This article belongs to the Section Biomedical Sensors)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The detection and classification of bone marrow (BM) cells is a critical cornerstone for hematology diagnosis. However, the low accuracy caused by few BM-cell data samples, subtle difference between classes, and small target size, pathologists still need to perform thousands of manual identifications daily. To address the above issues, we propose an improved BM-cell-detection algorithm in this paper, called YOLOv7-CTA. Firstly, to enhance the model’s sensitivity to fine-grained features, we design a new module called CoTLAN in the backbone network to enable the model to perform long-term modeling between target feature information. Then, in order to cooperate with the CoTLAN module to pay more attention to the features in the area to be detected, we integrate the coordinate attention (CoordAtt) module between the CoTLAN modules to improve the model’s attention to small target features. Finally, we cluster the target boxes of the BM cell dataset based on K-means++ to generate more suitable anchor boxes, which accelerates the convergence of the improved model. In addition, in order to solve the imbalance between positive and negative samples in BM-cell pictures, we use the Focal loss function to replace the multi-class cross entropy. Experimental results demonstrate that the best mean average precision (mAP) of the proposed model reaches 88.6%, which is an improvement of 12.9%, 8.3%, and 6.7% compared with that of the Faster R-CNN model, YOLOv5l model, and YOLOv7 model, respectively. This verifies the effectiveness and superiority of the YOLOv7-CTA model in BM-cell-detection tasks.

Keywords:

YOLOv7; bone marrow cell detection; attention mechanism; CoTLAN; focal loss

1. Introduction

The examination and differentiation of BM cells’ morphology is an important means in the diagnosis for many blood and BM diseases. It is also a common laboratory procedure in clinical workflows [1,2]. Although a host of other methods are now available, such as cytogenetics [3], immunophenotyping [4], and, increasingly, molecular genetics [5], these methods are very sophisticated and highly dependent on experimental manipulation [6]. Furthermore, not all institutions have the corresponding facilities and capabilities. Therefore, cytological examination remains an important first step in the diagnostic workup of many intramedullary and extramedullary pathologies [7]. By observing and analyzing the morphological characteristics of BM cells, we are able to quantify cells of different lineages to determine the proportion of each cell, which helps in the classification of many benign and malignant hematological disorders [8]. However, BM aspirate differential cell counts (DCCs) have been difficult to automate, which is why microscopic examination and classification of single-cell morphology is still largely performed by human experts in routine clinical workflows.

Manual analysis of DCCs currently performed in clinical laboratories is suboptimal mainly due to several factors [9]. Firstly, manual assessment of BM smears requires labor-intensive visual inspection of the smears using a microscope. This technique can cause damage to the examiner’s eyesight, shoulders, and neck. Secondly, the different areas of observation selected by different examiners, as well as the understanding of the test criteria, are not completely consistent, and the test results can be subject to subjectivity [10]. Thirdly, the relatively few cells typically counted inherently results in statistical imprecision. Certainly, conventional automated hematology analyzers that do not utilize digital images have been explored for performing DCCs. However, this method has some limitations, including failure to count nucleated red blood cells and to differentiate stages of cell development, as well as interference by BM lipids [11,12]. If one utilizes the digital images feature of BM smears to develop an automatic DCC system, then these problems can be solved [13].

Over the past few years, cell detection and classification has actually become the most widely studied problem in computational pathology. Commercial hematology analyzers have begun using automated image analysis of Wright-stained smears [14]. However, achieving accurate results heavily depends on the precise control of variables, such as reducing staining variations and cell crowding, while simultaneously maximizing the retention of cytology details. Cell- and nucleus-detection algorithms often rely on regular shapes and may fail to correctly detect cells with irregular or multilobed nuclei. Accurate cell detection and localization of cell boundaries are crucial for correct classification, especially in the BM, where subtle differences in cytologic features can complicate the process.

With the rapid development of deep learning, object-detection algorithms are gradually applied to cell detection. However, most previous studies on automated cytomorphological detection and classification have focused on leukocyte types of peripheral blood smears or physiological cell types [15,16,17,18,19,20], limiting their usefulness for diagnostic leukocyte detection of hematological malignancies in the BM. Blood cells in peripheral blood smears are easier to recognize, as they contain only five types of WBCs: segmented neutrophils, eosinophils, basophils, lymphocytes, and monocytes. Compared with peripheral blood smears, BM smears have a higher density of contacting and overlapping cells, more cell types, and more complex cell morphologies, so the object-detection task is more challenging. The current literature on BM aspirate smear image analysis does not have high-speed and efficient cell-detection methods, and has been successful in only a few cytological categories, limiting its potential clinical application, which is a particularly challenging problem [21,22].

Object-detection models can be mainly divided into two-stage models and one-stage models. When the model size is the same, the two-stage models usually exhibit higher classification accuracy, but their time cost is much higher than that of the one-stage models. Among the two-stage models, there is a very classic Faster RCNN proposed by S. Ren et al. [23]. In the one-stage object-detection models, You Only Look Once (YOLO) is a representative model, such as YOLOv3, proposed by J. Redmon et al. [24], and YOLOv5, which is very popular in the industry. However, compared with the above algorithm, the YOLOv7 model published by Wang et al. [25] on CVPR in 2023 has faster speed and higher accuracy on the COCO dataset. In academia, YOLOv7 is a more advanced, more accurate, and faster object-detection algorithm that can meet the needs of different application scenarios. It supports both mobile GPUs and GPU devices from edge devices to the cloud. Therefore, it can be combined with an electron microscope scanner and applied to Wise Information Technology of med (WITMED) to quickly and effectively complete the automation task of BM aspirate DDCs. However, there are very few studies on the application of the YOLOv7 model in BM cell detection. At the same time, the accuracy of the model in detecting BM cells also has room for improvement, because the accuracy of this detection method is susceptible to subtle differences in myeloid cell types, small cell size, and dense cell distribution.

In response to the above problems, this paper proposes an improved BM cell detection algorithm, YOLOv7-CTA, for the detection of bone marrow cells in BM aspire smears. This manuscript provides the following contributions:

In order to improve the sensitivity of the model to the fine-grained features of BM cells, we hope that the model can perform long-term modeling of the target feature information in the picture and improve the model’s ability to extract fine-grained features. Therefore, a new feature-extraction network module, CoTLAN, is designed in this paper to solve this problem.
To alleviate the impact of image complex background on object detection, this paper introduces the self-attention mechanism CoordAtt module and combines it with the CoTLAN module.
Through comparative experiments, this paper selects the most suitable loss function CIoU as the localization loss function of the network.
To address the slow convergence issue caused by the small size of BM cells, this paper uses the K-means++ algorithm to cluster the target boxes in the BM cell dataset to generate more suitable anchor boxes. In addition, this paper uses Focal loss function to replace the original multi-classification cross entropy to solve the imbalance between positive and negative samples.

The experimental results show that the improved YOLOv7-CTA has a better detection effect on 15 types of BM cells contained in BM aspirate smears than others.

2. Materials and Methods

2.1. Dataset Preparation and Preprocessing

2.1.1. Image Acquisition

Since public datasets of BM cells are scarce and involve too few cell types, we sought the help of a medical testing institution. To this end, we spent a lot of human and material resources obtaining real BM aspirate smear samples. With the help of experts, we objectively labeled the collected cell images to sort out a real and valid dataset. The specific workflow for producing the BM cell image dataset is illustrated in Figure 1.

The process of producing a BM cells dataset can be broadly divided into four steps: Firstly, a small amount of BM aspirate from patients with hematological disorders is obtained, and the smear is made after Wright staining. Secondly, the smears are scanned using an electron microscope, resulting in an image with a resolution of 88,070 × 89,011 pixels. Then, an image acquisition application with the highest resolution for collecting BM-cell images, a resolution of 600 × 600, is developed by analyzing the image file, and a preview function is provided to facilitate the collection process. Finally, with the assistance of doctors and experts, the cell images are annotated.

2.1.2. Image Database and Label Database

In this experiment, the ground-truth boxes were labeled on the cell image by using the Labelme tool [26]. Because the number of cells such as macrophages and plasma cells are too few, they were not involved in this experiment. We selected 1204 BM-cell images, a total of 13,059 BM cells, and these cells contained a total of 15 categories. Figure 2 shows these 15 types of BM cells in our database, each of which includes various shapes.

The dataset of labeled BM cells was partitioned into training and validation sets, with a 7:3 split ratio. Here, we counted the number of different types of cells and plotted the size distribution of cell labels; the results are shown in Figure 3.

In Figure 3a, the vertical axis is the number of labels, and the horizontal axis is the label name. It can be seen from the figure that there are relatively large differences in the number of different types of cell targets to be detected, but this distribution is more in line with the actual situation. In Figure 3b, the width of the abscissa represents the ratio of the label width to the image width, while the height of the ordinate signifies the ratio of the label height to the image height. The target size of this dataset is relatively small; its width and height are about 9% of the image size.

2.2. Proposed Methods

The entire detection framework proposed in this paper is shown in Figure 4, which is divided into four parts.

First of all, this paper changes the network structure of YOLOv7. Due to the small size of BM cells, the feature difference between cells is not obvious. In order to capture the fine-grained features of BM cells effectively, we design a novel feature-extraction network module called CoTLAN in this paper. This module facilitates the long-term modeling of target feature information, thereby enhancing the model’s sensitivity to the subtle characteristics of BM cells. At the same time, in order to cooperate with the CoTLAN module’s attention to small target features, this paper introduces the self-attention mechanism CoordAtt module to reduce the impact of irrelevant features on object detection. Secondly, this paper replaces the initial anchor boxes. The K-means++ algorithm is utilized to cluster all ground-truth boxes of BM-cell labels in the dataset and generate anchor boxes of varying sizes. This ensures that the initial anchor box size of the model aligns with the target size, thereby accelerating the model convergence. Thirdly, this paper replaces the loss function. Focal loss is used as a classification loss in the model to solve the imbalance between positive and negative samples [27]. Finally, this paper chooses to use CIoU loss as the localization loss in the model, and adopts the CIoU non-maximum suppression (NMS) method to avoid multiple detections of the target and make the model output more accurate. The improved model improves the convergence speed and detection performance.

2.3. Backbone Network

The inspection model, namely YOLOv7-CTA, for BM cells in this paper is shown in Figure 5. The whole network model structure comprises four parts: input, backbone network, neck network, and head network.

In order to increase the robustness of the model, this paper also adopts the mosaic data augmentation method [28] at the input side. The idea is to stitch four defect images with operations such as random scaling, random cropping, and random layout to generate more new data samples.

The backbone network consists of CBS, MP, CoordAtt, and CoTLAN modules. The CBS module consists of regular convolution, batch normalization, and activation functions. The MP module consists of maxpooling and CBS modules. The CoTLAN module is a new module designed in this paper, which is very similar to ELAN in the YOLOv7 model, which retains the advantages of the ELAN module. Specifically, the last

3 \times 3

convolutional module of a standard ELAN module is replaced by a CoTBlock module, which can aggregate dynamic context information and local convolutional static context information, effectively improving the model’s expressive ability to fine-grained features. The size of the feature map is not changed after the CoTLAN module, but the number of output channels at the end of the module has changed. In addition, the CoordAtt modules are added after the MP modules of the backbone, which helps the model to more accurately locate and identify the target of interest.

The neck network consists of a feature pyramid network (FPN) and path aggregation network (PAN). The backbone network’s feature map output undergoes 32-fold downsampling and is then processed by the SPPCSP module, which reduces the channel count from 1024 to 512. Subsequently, the feature map undergoes feature fusion using both top-down and bottom-up approaches. The neck network structure effectively combines feature maps at various levels. Finally, the RepC and Conv modules in the head are utilized to output the prediction results.

2.3.1. CoT-Block Module

In recent years, Dosovitskiy A et al. [29] borrowed and improved the Transformer structure based on self-attention to apply it to computer vision tasks and achieved good results. However, the traditional self-attention mechanism can only be based on the interaction between the query and the key, and the rich contextual connection between adjacent keys is not explored, thus limiting the feature expression ability of the self-attention mechanism. Therefore, this paper uses a novel Transformer-style self-attention module, named Contextual Transformer block [30], referred to as CoT-Block. This module is used to explore rich contextual connections between adjacent keys, while being able to capture long-range dependencies at different locations globally. Its detailed structural diagram is shown in Figure 6.

As can be seen from Figure 6, for the input feature

X \in ℝ^{H \times W \times C}

(

H

: height,

W

: width,

C

: channel number), the CoT-Block module first defines three variables,

Q = X

,

K = X

, and

V = X W_{v}

(

W_{v}

is an embedding matrix that is implemented as a 1 × 1 convolution in space). Firstly,

3 \times 3

group convolution operation is performed on

K

to obtain the feature

K^{1} \in ℝ^{H \times W \times C}

with local static context modeling information. The above process is shown in Formula (1).

K^{1} = {Conv}_{3 \times 3 gruop} (K)

(1)

Secondly, conditioned on the concatenation of

K^{1}

and

Q

, two consecutive

1 \times 1

convolution operations are performed to generate the attention matrix

A \in ℝ^{H \times W \times (3 \times 3 \times C_{h})}

(

C_{h}

denotes the head number for each head, and the local attention matrix at each spatial location of

A

is learned based on the query feature and the contextualized key feature), as shown in Formula (2), where

δ

is a non-linear activation function.

A = {Conv}_{1 \times 1} (δ ({Conv}_{1 \times 1} ([K^{1}, Q])))

(2)

The resulting attention matrices

A

and

V

are then subjected to local matrix multiplication to obtain the global dynamic context modeling information feature

K^{2} \in ℝ^{H \times W \times C}

, as shown in Formula (3), where

⊛

denotes local matrix multiplication.

K^{2} = A ⊛ V

(3)

Finally, the result of fusing the feature

K^{1}

with local static context modeling information and the feature

K^{2}

with global dynamic context modeling information is taken as the output result of CoT-Block.

Therefore, inspired by the above research, this paper uses CoT-Block to replace the last

3 \times 3

convolution in the ELAN structure in the original YOLOv7 model, and designs a new CoTLAN structure to enhance the feature extraction of the feature-extraction ability network of the original YOLOv7 model. The self-attention mechanism module in CoTLAN makes each pixel in the current feature map pay more attention to the features of other pixels in the current feature map, and explores the rich contextual connections between adjacent pixels. Learning the connection between different pixels through the weights between pixels can help the model extract more effective feature information from the feature map. In the feature-extraction network, CoTLAN replaces ELAN, which can effectively improve the feature-extraction ability of the model for small targets, especially in the detection scenario of BM cell images.

2.3.2. CoordAtt Module

Due to the complex and changeable environment of BM cells, in order to improve the feature expression ability of the model on BM cells, an attention mechanism module was added between every two CoTLAN modules of the backbone network.

Attention mechanisms can be categorized into channel attention mechanisms, spatial attention mechanisms, and a combination of both. While traditional attention modules like Squeeze-and-Excitation attention (SE) [31] and Convolutional Block Attention Module (CBAM) [32] excel at modeling channel-to-channel relationships, they tend to overlook spatial positional information. On the other hand, other attention modules that address this issue have excessive parameters, which hinder their deployment in real-world applications.

However, CoordAtt mechanism [33] also uses directional perception and positional perception in addition to capturing cross-channel information, thus facilitating precise localization and identification of the targeted objects by the model. Additionally, the CoordAtt module is versatile and can be added at multiple locations in existing models. The framework diagram of the CoordAtt attention mechanism is shown in Figure 7.

As can be seen from Figure 7, in order to avoid all spatial information being compressed into channels, the global average pooling of the input is decomposed here. Specifically, given an input

X

, we encode each channel along the horizontal and vertical coordinates, respectively, using two spatial extents,

(H, 1)

or

(1, W)

, of the pooling kernel. Therefore, the output of the c-th channel at height

h

and the output of the c-th channel with width

w

are expressed as Formula (4) and Formula (5), respectively:

z_{c}^{h} (h) = \frac{1}{W} \sum_{0 \leq i < W} x_{c} (h, i)

(4)

z_{c}^{w} (w) = \frac{1}{H} \sum_{0 \leq j < H} x_{c} (j, w)

(5)

Secondly, the feature map

z_{c}^{w} (w) \in ℝ^{1 \times W \times C}

is transposed and spliced with the feature map

z_{c}^{w} (w) \in ℝ^{H \times 1 \times C}

, and then the 1 × 1 convolution operation with activation is performed to generate the feature map

f \in ℝ^{1 \times (H + W) \times C / r}

, as shown in Formula (6):

f = δ ({Conv}_{1 \times 1} ([z_{c}^{h}, z_{c}^{w}]))

(6)

Along the spatial dimension,

f

is split into

f^{h} \in ℝ^{1 \times H \times C / r}

and

f^{w} \in ℝ^{1 \times W \times C / r}

. Then, 1 × 1 convolution is used to increase the dimension and is then combined with the sigmoid activation function to obtain the final attention vectors

g^{h} \in ℝ^{1 \times H \times C}

and

g^{w} \in ℝ^{1 \times W \times C}

, as shown in Formulas (7) and (8):

g^{h} = δ ({Conv}_{1 \times 1} (f^{h}))

(7)

g^{w} = δ ({Conv}_{1 \times 1} (f^{w}))

(8)

Finally, the outputs

g^{h}

and

g^{w}

are, respectively, expanded and used as attention weights, and the output of the coordinate attention block

Y

can be written as Formula (9):

y_{c} (i, j) = x_{c} (h, i) \times g_{c}^{h} (i) \times g_{c}^{w} (j)

(9)

The input undergoes horizontal and vertical pooling to retain long-range dependencies in both directions. The resulting information is then concatenated and split, followed by convolutions that emphasize both horizontal and vertical directions. The resultant two-part feature map can be precisely localized to the target object’s row and column of interest.

2.4. Anchor Box Optimization

In order to speed up the detection accuracy and efficiency of the improved model, this paper uses the K-means++ [34] algorithm to cluster the target boxes in the BM-cell dataset to generate more suitable anchor boxes. The K-means++ algorithm selects one cluster center at a time, which not only allows the random center-point to approach the local optimal solution, but also approximates the global optimal solution.

Specific steps are as follows:

(1): Randomly select a BM-cell sample target box as the initial cluster center, and then calculate the minimum intersection ratio distance $D (x)$ between this center and the rest of the sample boxes.
(2): Compute the probability P(x) of each BM-cell sample box being selected as the next cluster center, and employ the roulette method to determine the next cluster center.
(3): Iteratively perform steps (1) and (2) until all cluster centers have been selected.
(4): Compute the distance between each BM cell sample and the cluster centers. Allocate each sample to the class associated with the cluster center with the smallest distance, and update the cluster center $c_{i}$ of each category. Continue updating the classification and cluster center iteratively until size remains constant.

The formula for the classification loss function is as follows:

D (x) = 1 - I o U_{(x, c)}

(10)

P (x) = \frac{D {(x)}^{2}}{\sum_{x_{i} \in X} D {(x_{i})}^{2}}

(11)

c_{i} = \frac{1}{| c_{i} |} \sum_{x \in c_{i}} x

(12)

where x represents the sub-target mark sample box,

c

denotes the center of the cluster,

IoU

denotes the intersection-over-union ratio between two rectangular boxes,

X

denotes the total sample of the target marker frame, i = 1, …, k, in this paper, the detection model consists of three detection feature maps, and each feature map corresponds to three anchor boxes, resulting in

k = 9

,

| c_{i} |

denoting the number of samples belonging to the cluster center as

c_{i}

category. Table 1 shows the optimized anchor boxes corresponding to the three feature map sizes.

The large-sized anchor box corresponds to the

20 \times 20

feature map and is responsible for detecting large objects in the image. The medium-sized anchor box corresponds to the

40 \times 40

feature map and is responsible for detecting medium-sized objects. The small-sized anchor box corresponds to the

80 \times 80

feature map and is responsible for detecting small objects.

2.5. Loss Function

The loss function of the model comprises three components: localization loss (

b o x_l o s s

), confidence loss (

o b j_l o s s

), and classification loss (

c l s_l o s s

). The total loss is the weighted sum of the three losses. Among them, the confidence loss use binary cross-entropy loss.

L o s s = w_{1} \times b o x_l o s s + w_{2} \times c l s_l o s s + w_{3} \times o b j_l o s s

(13)

where

w_{1}

,

w_{2}

, and

w_{3}

are the weight values of the three loss functions, respectively.

In a given image sample, the BM cells correspond to the foreground, while the remaining regions are regarded as the background. Nevertheless, the YOLO series, as one-stage object detectors, suffer from sample complexity and class imbalance issues caused by a disproportionate number of positive and negative samples [27]. These problems can adversely affect the network’s gradient update direction and lead to lower accuracy. To address the aforementioned issues, an improved version of the cross-entropy function is proposed, namely the Focal loss function, which is specifically designed for addressing the class imbalance problem in object detection. By assigning different weights to the loss function based on the sample difficulty level, the Focal loss function can assign less weight to easily distinguishable samples and greater importance to complex distinguishable samples.

The formula for the classification loss function is as follows:

c l s_l o s s = - α_{t} {(1 - p_{t})}^{γ} \log (p_{t})

(14)

p_{t} = {\begin{matrix} p i f y = 1 \\ 1 - p o t h e r w i s e \end{matrix}

(15)

α_{t} = {\begin{matrix} α i f y = 1 \\ 1 - α o t h e r w i s e \end{matrix}

(16)

where

y = 1

denotes the real sample and

p

is the probability that the sample predicted by the model belongs to the foreground. To solve the imbalance of sample categories, the weight parameter

α

is introduced. The adjustment factor

γ

is added to the cross-entropy loss function.

2.6. Non-Maximum Suppression

This paper conducts comparative experiments on the selection of the current mainstream IoU loss function, and finally chooses CIoU loss function as the localization loss function of the model. At the same time, this paper uses the CIoU-NMS [35,36] algorithm to filter the preliminary prediction box output by the image to be recognized. The final prediction box can be obtained with the following steps:

(1): Set both the confidence threshold and the CIoU threshold.
(2): Compute the confidence level of all prediction boxes output by the network model. Sort the prediction boxes that exceed the confidence threshold in descending order of confidence form high to low, and put them into the candidate list.
(3): Select the prediction box with the highest confidence from the candidate list, save it to the output list, and remove it from the candidate list.
(4): Compute the CIoU loss between the initial prediction box with the highest confidence obtained in the previous step and all other preliminary prediction boxes in the candidate list. Remove the prediction box from the candidate list if its cross-merger loss is higher than the CIoU threshold.
(5): Iteratively perform steps (3) and (4) until the candidate list is depleted.
(6): Use the prediction boxes in the output list as the final.

3. Results

3.1. Experimental Environment and Evaluation Metrics

To validate the efficacy of our model, we conducted neural network training and testing using the computer configuration parameters and hyperparameter settings in Table 2.

The evaluation indicators of the model mainly included precision and recall, whose expressions are as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(17)

R e c a l l = \frac{T P}{T P + F N}

(18)

where TP corresponds to true positives where the model prediction is positive and the actual value is also positive; FN represents false negatives where the model prediction is negative, but the actual value is positive; FP indicates false positives where the model prediction is positive, but the actual value is negative; and TN denotes true negatives where the model prediction is negative, and the actual value is also negative.

The accuracy of the model was assessed using the mean average precision (mAP), which considers both precision and recall.

3.2. Experimental Results and Analysis

Firstly, we conducted comparative experiments and ablation experiments to improve the model network structure using the YOLOv7 model as the baseline. The comparative experiment analyzed three of the most popular attention mechanisms to identify the optimal one, while the ablation experiment aimed to validate the effectiveness of the proposed model-structure improvement method. Secondly, we used the improved YOLOv7-CTA model as a baseline to determine a suitable localization loss function, and conducted ablation experiments on the setting of anchor box hyperparameters and the use of the improved loss function to further optimize the model. Finally, we compared the improved model’s performance with the original model and other classic models and verified its superiority. Since the main purpose of this paper is to improve the detection performance of the model, we used metrics such as precision, recall, mAP, and FPS (Frames Per Second) to evaluate the results.

To incorporate suitable attention mechanisms into the network, this paper added SE, CBAM, and CoordAtt attention mechanisms between ELAN modules for training and comparison. The results of the experiments are presented in Table 3.

The impact of adding different attention mechanisms to the backbone of model detection varies. While the SE attention mechanism ignores positional information, it considers channel attention. The result shows that adding SE attention to the backbone improves the mAP, but it improves the least. The CBAM attention aims to leverage positional information by reducing the input tensor’s channel dimension and computing spatial attention through convolutional operations. However, convolution can only capture local relationships and cannot model the long-term dependencies required for visual tasks. The incorporation of CBAM attention into the backbone improves the mAP, but it significantly increases the number of parameter counts and computational costs, and decreases by 4 FPS. In contrast, the CoordAtt attention mechanism embeds positional information into channel attention, capturing long-range dependencies in one spatial direction while preserving accurate positional information in the other. The results show that after adding CoordAtt to the backbone network, although the recall is reduced by 1.2%, the precision and the most important evaluation index, mAP, increase by 8.0% and 3.6%, respectively, which are significantly higher than the previous two methods, and the increase in parameters and calculation is not significant. This indicates that using the CoordAtt attention mechanism can enable the network model to detect targets more extensively and improve the network’s detection ability.

Ablation experiments were conducted in this study to validate the beneficial effects of the improved strategy proposed on the network. The results are presented in Table 4, where the adoption of the corresponding enhancement method is denoted by “√” and the absence of the enhancement method is denoted by “×”.

The detection outcomes of the original YOLOv7 network are presented in the first row of the table. As can be seen from the above table, when the ELAN module of the original model is replaced with the new designed structure, CoTLAN, compared with the original YOLOv7 without any improvement strategy, precision, recall, and mAP are improved by 6.4%, 0.4%, and 3.6%, respectively. The network model integrates the two modules of CoTLAN and CoordAtt. The results show that the recall rate and FPS of the combined method have decreased, but the accuracy and mAP have been greatly improved. Among them, the detection accuracy and mAP increased by 14.9% and 5.2%, respectively, which effectively improved the detection ability of the model.

After the network model was determined, we used the improved YOLOv7-CTA model as the baseline and conducted comparative experiments on different localization loss functions. We compared the performances of three new methods: SIoU [37], EIoU [38], and CIoU (ours). The results are shown in Table 5.

As seen from Table 5, when the CIoU loss function is used, the accuracy value and mAP value of the improved model are the highest. From this, we conclude that using the CIoU loss function can help the improved model to have a better detection effect on BM cells.

This paper finally uses CIoU loss as the localization loss function of the YOLOv7-CTA model, and conducts ablation experiments on the setting of anchor box hyperparameters and the use of the improved loss function, as shown in Table 6.

After replacing the original initial anchor box with the new anchor box generated by the K-means++ algorithm, the mAP of the improved model increased by 0.4%. This proves the effectiveness of the method. When the cross-entropy loss function in the improved model is replaced by the Focal loss function, the improved model improves mAP by 0.6%. For the combination of these two improvements, the mAP is significantly improved to 88.6%, which surpasses the previous two methods. For the above improvement schemes, this paper selects the improvement scheme with the highest mAP to participate in the subsequent experimental comparison.

Since the PR curve comprehensively considers the accuracy and recall of each category detected by the model, in order to better measure the performance of the model, we further used the PR curve to measure the performance of the model.

Figure 8 depicts the PR curve of the present model, where the horizontal axis indicates the recall rate, and the vertical axis represents the precision. The PR curve provides a visualization of how precision changes as the recall rate increases. If the area enclosed by the curve and the coordinate axis in the figure is larger, it means the overall performance of the model is better. Compared with the original model, the performance improvement of the model is very obvious. It can be seen from the figure that the mAP of N-segmented and basophil decreased slightly, and the mAP of other cells increased to varying degrees. Among them, the mAP of seven types of cells reached more than 92.5%, indicating the effectiveness of the YOLOv7-CTA model.

While the ablation experiments confirmed the effectiveness of the improved strategy proposed in this study relative to the original algorithm, it remains to be established whether it can achieve an advanced level. As such, a series of comparative experiments were conducted on BM-cell datasets, under identical experimental conditions, to compare the proposed method with the current mainstream object-detection approach. Figure 9 illustrates the comparison of training outcomes among multiple models. The results demonstrate that the proposed algorithm in this paper achieved notably higher precision and mAP than the other two models.

The training loss curves of various models are compared in Figure 10. The curves of all models stabilized and converged after 250 iterations. The results indicate that YOLOv5l exhibited a significantly inferior performance to YOLOv7 concerning both regression and classification losses. From the figure, the improved network structure in this paper resulted in a slightly slower convergence rate of the model compared with that of YOLOv7. Nonetheless, after approximately 70 iterations, the proposed model exhibited a superior decline rate and convergence ability to YOLOv7, suggesting the efficacy of the loss function adjustment in enhancing the network’s convergence performance.

Finally, this paper lists the comparison results of the performance metrics of different models, as shown in Table 7. The table shows that, compared with other models, the recall, precision, and mAP of YOLOv7-CTA are significantly higher. Among them, compared with YOLOv7, the precision, recall, and mAP of the improved model have increased by 10.1%, 0.6%, and 6.7%, respectively. Although the processing speed of YOLOv7-CTA is slightly lower than that of YOLOv7, it is still much faster than the two-stage model Faster R-CNN. The FPS indicator shows that YOLOv7-CTA can recognize 22 images of BM cells with a resolution of 600 × 600 per second, so the expected running time for the detection and recognition of 500 BM cells is <2.5 s, which can help pathologists greatly reduce their workload. In conclusion, the improved model has achieved high speed and high efficiency in predicting the task of BM cells.

In addition, we calculated the parameter count and computational cost of these models. Although the parameters of the two-stage model are small, the model structure is very complicated because the detection and classification are performed separately, resulting in a particularly large amount of calculations and high training costs. However, our proposed YOLOv7-CTA has the lowest computational cost, which reduces 2.7GFLOPs on the basis of YOLOv7, and its parameter count is also reduced by 0.6 M. Therefore, the YOLOV7-CTA does not have high requirements for hardware equipment, and has lower requirements than other models.

4. Discussion

In order to more intuitively verify the actual detection effect of the model after improving the feature-extraction network, this paper selects three images from the test set of the BM-cell dataset for detection. The detection effect is shown in Figure 11, where the red arrow points to the case of false detection. The improved model, YOLOv7-CTA, has significantly more outstanding detection ability than other models.

Limited research has been conducted on automating BM aspirate DCCs using image analysis. Choi et al. [21] presented preliminary outcomes utilizing convolutional networks for cell classification in DCCs. However, their dataset consisted solely of 2174 non-neoplastic erythroid and myeloid precursor cells, excluding other crucial cell types like eosinophils, basophils, monocytes, lymphocytes, and plasma cells. Their study primarily focused on classification and did not consider detection, relying solely on manually cropped cell images for classifier development and validation. Reta et al. [22] devised a comprehensive software pipeline for cell detection and classification to identify acute leukemia subtypes. Their dataset involved 633 cells from acute lymphoblastic leukemias and acute myeloid leukemias. They implemented advanced image processing techniques to detect and segment leukocytes in digital images of Wright-stained BMA smears. The segmented cells were then characterized using various features describing their shape, color, and texture. Subsequently, basic machine learning algorithms were employed to classify individual cells, and these cell classifications were aggregated to provide a single diagnosis for the sample. Although their approach is limited in scope, targeting a few specific cell types, they reported high-precision segmentation accuracy. Wu et al. [39] proposed the BMSNet model based on the YOLOv3 for BM puncture fluid-cell recognition, which is also a one-stage model. The study showed that while the model performed well with a small number of cells and few classes, they evaluated the model’s predictive performance on bounding boxes, finding that the average precision without considering the classification was 67.4%. This compares poorly with our mAP performance for identifying all cells in the validation set.

Our model builds upon the groundwork laid by these earlier publications, highlighting the potential and promise of deep learning methods in automating DCCs. However, the improvement of the model is still insufficient. The improvements reported in this work are primarily attributed to modifications made to the network backbone structure, while the detection head also plays a crucial role in feature fusion and model optimization. In order to more comprehensively optimize our network model, we will conduct further research in the future. Compared with the original model, the improved model has a slower inference speed. We can consider using knowledge distillation to compress the model to speed up the inference speed of the model. In addition, we performed relatively little in terms of data enhancement, because we have not found more enhancement methods suitable for the bone-marrow-cell scene, and we hope that more researchers will conduct further research on this.

5. Conclusions

In this paper, an improved BM-cell detection algorithm, YOLOv7-CTA, is proposed. This method can identify BM-cell images more accurately than similar models. Experimental analysis shows that a new feature-extraction network, CoTLAN, is designed in the backbone network, which can improve the extraction capability of fine-grained features. The CoordAtt module is combined in the network to make the model pay more attention to the features in the area to be detected, reducing irrelevant features, thereby enhancing the feature-extraction ability for small targets. Finally, under the determined network structure, the model is optimized through the selection of the localization loss function, the use of the K-means++ algorithm for clustering the target frame of the BM-cell dataset, and the replacement of cross entropy. The experimental results show that the mAP of the optimized model reaches 88.6%, surpassing the Faster R-CNN, YOLOv5l, and YOLOv7 models by 12.9%, 8.3%, and 6.7%, respectively. Furthermore, the detection speed of this model is 22 FPS, effectively satisfying the requirements for high performance. Compared with other models, the YOLOv7-CTA model has superiority in the BM-cell-detection task.

There are still some improvements to be made. At present, in the BM puncture smear image dataset, the number of some species is rare, and some species are not even included in the dataset. In the image data, the cell imaging effect is average, the microscope scanner equipment is relatively common, and the cytoplasm and nucleus features of some stained cells are not obvious. Therefore, we can improve the dataset and obtain a BM puncture smear image dataset with more comprehensive cell data, clearer cell imaging, and more uniform cell positions, so as to further research the detection of BM cells. In addition, under certain standards, our model will also cause recognition errors when the differences between cell types are extremely subtle, which also shows that there is still room for improvement in the model. At the same time, how to reduce the differences between labeling standards is a problem that needs to be solved. Importantly, we mainly focused on improving the model, and did not conduct auxiliary diagnosis experiments for specific BM or blood-related diseases. We hope that, in the future, there will be opportunities to quantify the effectiveness of the model for medical diagnosis, making it a valuable addition to the medical field.

Author Contributions

Methodology, Z.C. and Y.L.; software, Z.C.; data curation, Z.C.; writing—original draft preparation, Z.C. and Y.L.; supervision, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 12001408).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Written informed consent has been obtained from the patients to publish this paper.

Data Availability Statement

The code link: https://github.com/Gameness-czz/YOLOv7-CTA, accessed on 27 July 2023. The BM dataset link: https://doi.org/10.6084/m9.figshare.23805324.v1, accessed on 30 July 2023.

Acknowledgments

The authors thank all the teachers and students from the same team for their constructive comments on the article before submission. Their thoughtful suggestions have helped to improve the paper significantly. The authors also thank Wuhan Qianxing Technology Co., Ltd. for the use of experimental equipment. At the same time, the authors would like to thank Kangsheng Global Medical Technology (Wuhan) Co., Ltd. for the data support of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Döhner, H.; Estey, E.; Grimwade, D.; Amadori, S.; Appelbaum, F.R.; Büchner, T.; Dombret, H.; Ebert, B.L.; Fenaux, P.; Larson, R.A.; et al. Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel. Blood J. Am. Soc. Hematol. 2017, 129, 424–447. [Google Scholar]
Lee, S.H.; Erber, W.N.; Porwit, A.; Tomonaga, M.; Peterson, L.C.; International Councilfor Standardization In Hematology. ICSH guidelines for the standardization of bone marrow specimens and reports. Int. J. Lab. Hematol. 2008, 30, 349–364. [Google Scholar]
Giagounidis, A.; Haase, D. Morphology, cytogenetics and classification of MDS. Best Pract. Res. Clin. Haematol. 2013, 26, 337–353. [Google Scholar]
Li, L.; Yu, S.; Hu, X.; Liu, Z.; Tian, X.; Ren, X.; Guo, X.; Fu, R. Immunophenotypic changes of monocytes in myelodysplastic syndrome and clinical significance. Clin. Exp. Med. 2023, 23, 787–801. [Google Scholar]
Suguna, E.; Farhana, R.; Kanimozhi, E.; Kumar, P.S.; Kumaramanickavel, G.; Kumar, C.S. Acute myeloid leukemia: Diagnosis and management based on current molecular genetics approach. Cardiovasc. Haematol. Disord.-Drug Targets (Former. Curr. Drug Targets-Cardiovasc. Hematol. Disord.) 2018, 18, 199–207. [Google Scholar]
Xu, J.; Jorgensen, J.L.; Wang, S.A. How do we use multicolor flow cytometry to detect minimal residual disease in acute myeloid leukemia? Clin. Lab. Med. 2017, 37, 787–802. [Google Scholar]
Matek, C.; Krappe, S.; Münzenmayer, C.; Haferlach, T.; Marr, C. Highly accurate differentiation of bone marrow cell morphologies using deep neural networks on a large image data set. Blood J. Am. Soc. Hematol. 2021, 138, 1917–1927. [Google Scholar]
Anilkumar, K.K.; Manoj, V.J.; Sagi, T.M. A survey on image segmentation of blood and bone marrow smear images with emphasis to automated detection of Leukemia. Biocybernet. Biomed. Eng. 2020, 40, 1406–1420. [Google Scholar]
Abdulrahman, A.A.; Patel, K.H.; Yang, T.; Koch, D.D.; Sivers, S.M.; Smith, G.H.; Jaye, D.L. Is a 500-cell count necessary for bone marrow differentials? A proposed analytical method for validating a lower cutoff. Am. J. Clin. Pathol. 2018, 150, 84–91. [Google Scholar] [CrossRef]
Fuentes-Arderiu, X.; Dot-Bach, D. Measurement uncertainty in manual differential leukocyte counting. Clin. Chem. Lab. Med. 2009, 47, 112–115. [Google Scholar] [CrossRef] [PubMed]
d’Onofrio, G.; Zini, G. Analysis of bone marrow aspiration fluid using automated blood cell counters. Clin. Lab. Med. 2015, 35, 25–42. [Google Scholar] [PubMed]
Mori, Y.; Mizukami, T.; Hamaguchi, Y.; Tsuruda, K.; Yamada, Y.; Kamihira, S. Automation of bone marrow aspirate examination using the XE-2100 automated hematology analyzer. Cytom. Part B Clin. Cytom. 2004, 58, 25–31. [Google Scholar]
Khamael, A.L.D.; Banks, J.; Nugyen, K.; Al-Sabaawi, A.; Tomeo-Reyes, I.; Chandran, V. Segmentation of white blood cell, nucleus and cytoplasm in digital haematology microscope images: A review–challenges, current and future potential techniques. IEEE Rev. Biomed. Eng. 2020, 14, 290–306. [Google Scholar]
Kratz, A.; Bengtsson, H.I.; Casey, J.E.; Keefe, J.M.; Beatrice, G.H.; Grzybek, D.Y.; Lewandrowski, K.B.; Van Cott, E.M. Performance evaluation of the CellaVision DM96 system: WBC differentials by automated digital image analysis supported by an artificial neural network. Am. J. Clin. Pathol. 2005, 124, 770–781. [Google Scholar] [CrossRef] [PubMed]
Wang, Q.; Bi, S.; Sun, M.; Wang, Y.; Wang, D.; Yang, S. Deep learning approach to peripheral leukocyte recognition. PLoS ONE 2019, 14, e0218808. [Google Scholar]
Fan, H.; Zhang, F.; Xi, L.; Li, Z.; Liu, G.; Xu, Y. LeukocyteMask: An automated localization and segmentation method for leukocyte in blood smear images using deep neural networks. J. Biophotonics 2019, 12, e201800488. [Google Scholar]
Banik, P.P.; Saha, R.; Kim, K.D. An automatic nucleus segmentation and CNN model based classification method of white blood cell. Expert Syst. Appl. 2020, 149, 113211. [Google Scholar]
Reena, M.R.; Ameer, P.M. Localization and recognition of leukocytes in peripheral blood: A deep learning approach. Comput. Biol. Med. 2020, 126, 104034. [Google Scholar]
Khandekar, R.; Shastry, P.; Jaishankar, S.; Faust, O.; Sampathila, N. Automated blast cell detection for Acute Lymphoblastic Leukemia diagnosis. Biomed. Signal Process. Control 2021, 68, 102690. [Google Scholar]
Leng, B.; Leng, M.; Ge, M.; Dong, W. Knowledge distillation-based deep learning classification network for peripheral blood leukocytes. Biomed. Signal Process. Control 2022, 75, 103590. [Google Scholar]
Choi, J.W.; Ku, Y.; Yoo, B.W.; Kim, J.-A.; Lee, D.S.; Chai, Y.J.; Kong, H.-J.; Kim, H.C. White blood cell differential count of maturation stages in bone marrow smear using dual-stage convolutional neural networks. PLoS ONE 2017, 12, e0189259. [Google Scholar] [CrossRef]
Reta, C.; Altamirano, L.; Gonzalez, J.A.; Diaz-Hernandez, R.; Peregrina, H.; Olmos, I.; Alonso, J.E.; Lobato, R. Segmentation and classification of bone marrow cells images using contextual information for medical diagnosis of acute leukemias. PLoS ONE 2015, 10, e0130805, Correction in PLoS ONE 2015, 10, e0134066. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual transformer networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1489–1500. [Google Scholar] [CrossRef] [PubMed]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Ismkhan, H. I-k-means−+: An iterative clustering algorithm based on an enhanced version of the k-means. Pattern Recognit. 2018, 79, 402–413. [Google Scholar] [CrossRef]
Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; Volume 3, pp. 850–855. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Wu, Y.-Y.; Huang, T.-C.; Ye, R.-H.; Fang, W.-H.; Lai, S.-W.; Chang, P.-Y.; Liu, W.-N.; Kuo, T.-Y.; Lee, C.-H.; Tsai, W.-C.; et al. A hematologist-level deep learning algorithm (BMSNet) for assessing the morphologies of single nuclear balls in bone marrow smears: Algorithm development. JMIR Med. Inform. 2020, 8, e15963. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The specific process of making a BM aspirate cells dataset.

Figure 2. There are images of 15 types of cells collected from BM aspirate. (N: neutrophil, E: eosinophilic, P: polychromatic, O: orthochromatic).

Figure 3. (a) Number of BM cell labels, (b) label size distribution.

Figure 4. Detection framework of the proposed method.

Figure 5. Structure of the YOLOv7-CTA network.

Figure 6. CoT-Block module structure diagram.

Figure 7. The schematic diagram of CoordAtt.

Figure 8. Precision–recall curve of different models: (a) YOLOv7 model, (b) YOLOv7-CTA.

Figure 9. The training results of various models: (a) precision curve, (b) mAP@0.5 curve.

Figure 10. The training loss curves of various models: (a) cls_loss curve, (b) box_loss curve.

Figure 11. (a) Since we use boxes to label cells, the surrounding feature information of the cells in the boxes is also be learned. When the cell environment is complex, other models tend to treat the background as cells, but our method can avoid such errors. (b) This is the recognition effect of denser cells, and our model has better classification accuracy than other models. (c) Our model also has recognition errors, but the error rate is lower than that of other models. The wrong three-cell categories are very similar to the correct ones, which is related to the standard of labeling. Subtle differences in the standards for cell-class labeling have been problematic. In addition, there is still room for improvement in the model, and we will conduct further research in the future.

Table 1. Optimized anchor box parameters.

Feature Map Size	20 × 20	40 × 40	80 × 80
YOLOv7-CTA	(61, 55)	(54, 56)	(38, 39)
	(67, 64)	(57, 64)	(42, 57)
	(74, 75)	(60, 40)	(51, 49)

Table 2. Experimental environment configuration.

Parameter	Configuration
CPU	Intel(R) Core(TM) i7-6700CPU @ 3.40 GHz
GPU	NVIDIA GeForce GTX1660 Ti (6G)
Operating System	Windows10
CUDA	11.6
Python	3.8.5
Torch	1.12.1
Momentum	0.937
Weight decay	0.0005
Batch size	4

Table 3. Performance comparison of models equipped with various attention mechanisms.

Attention Mechanisms	GFLOPs	Params	Precision/%	Recall/%	mAP@0.5/%	FPS
None	103.4	36.6 M	68.3	84.5	81.9	26
SE	103.6	36.7 M	68.2	84.9	83.5	26
CBAM	113.9	38.1 M	72.6	85.1	83.8	22
CoordAtt (ours)	103.8	36.7 M	76.3	83.3	85.5	25

Table 4. Ablation experiment results.

Models	CoTLAN	CoordAtt	Precision/%	Recall/%	mAP@0.5/%	FPS
YOLOv7	×	×	68.3	84.5	81.9	26
	√	×	74.7	85.1	85.5	23
	×	√	76.3	83.3	85.5	25
	√	√	83.2	80.4	87.1	22

Table 5. Performance comparison of models equipped with various localization loss functions.

Loss	Precision/%	Recall/%	mAP@0.5/%
EIoU	66.3	84.3	79.2
SIoU	76.5	84.5	86.4
CIoU (ours)	83.2	80.4	87.1

Table 6. Ablation experiment results.

Models	K-Means++	Focal CIoU	Precision/%	Recall/%	mAP@0.5/%
YOLOv7-CTA	×	×	83.2	80.4	87.1
	√	×	80.4	82.6	87.5
	×	√	80.0	84.6	87.7
	√	√	78.4	85.1	88.6

Table 7. Comparison of performance metrics of various models.

Models	Backbone	GFLOPs	Params	Precision/%	Recall/%	mAP@0.5/%	FPS
Faster R-CNN	ResNet50	507.6	28.4 M	68.3	74.9	75.7	7
YOLOv5l	CSPDarkNet53	114.2	46.7 M	70.2	83.6	80.3	25
YOLOv7	CBS + ELAN	103.4	36.6 M	68.3	84.5	81.9	26
YOLOv7-CTA	CBS + CoTLAN + CoordAtt	100.7	36.0 M	78.4	85.1	88.6	22

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cheng, Z.; Li, Y. Improved YOLOv7 Algorithm for Detecting Bone Marrow Cells. Sensors 2023, 23, 7640. https://doi.org/10.3390/s23177640

AMA Style

Cheng Z, Li Y. Improved YOLOv7 Algorithm for Detecting Bone Marrow Cells. Sensors. 2023; 23(17):7640. https://doi.org/10.3390/s23177640

Chicago/Turabian Style

Cheng, Zhizhao, and Yuanyuan Li. 2023. "Improved YOLOv7 Algorithm for Detecting Bone Marrow Cells" Sensors 23, no. 17: 7640. https://doi.org/10.3390/s23177640

APA Style

Cheng, Z., & Li, Y. (2023). Improved YOLOv7 Algorithm for Detecting Bone Marrow Cells. Sensors, 23(17), 7640. https://doi.org/10.3390/s23177640

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved YOLOv7 Algorithm for Detecting Bone Marrow Cells

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Preparation and Preprocessing

2.1.1. Image Acquisition

2.1.2. Image Database and Label Database

2.2. Proposed Methods

2.3. Backbone Network

2.3.1. CoT-Block Module

2.3.2. CoordAtt Module

2.4. Anchor Box Optimization

2.5. Loss Function

2.6. Non-Maximum Suppression

3. Results

3.1. Experimental Environment and Evaluation Metrics

3.2. Experimental Results and Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI