Automatic Recognition of Blood Cell Images with Dense Distributions Based on a Faster Region-Based Convolutional Neural Network

: In modern clinical medicine, the important information of red blood cells, such as shape and number, is applied to detect blood diseases. However, the automatic recognition problem of single cells and adherent cells always exists in a densely distributed medical scene, which is difﬁcult to solve for both the traditional detection algorithms with lower recognition rates and the conventional networks with weaker feature extraction capabilities. In this paper, an automatic recognition method of adherent blood cells with dense distribution is proposed. Based on the Faster R-CNN, the balanced feature pyramid structure, deformable convolution network, and efﬁcient pyramid split attention mechanism are adopted to automatically recognize the blood cells under the conditions of dense distribution, extrusion deformation, adhesion and overlap. In addition, the Align algorithm for region of interest also contributes to improving the accuracy of recognition results. The experimental results show that the mean average precision of cell detection is 0.895, which is 24.5% higher than that of the original network model. Compared with the one-stage mainstream networks, the presented network has a stronger feature extraction capability. The proposed method is suitable for identifying single cells and adherent cells with dense distribution in the actual medical scene.


Introduction
Cell detection techniques have played an increasingly important role in modern clinical medicine and pathological diagnosis, because the shape and number of red blood cells contribute to detecting blood diseases.The manual cell detection method with lower efficiency is easily affected by subjective factors, while automatic instruments, such as cell analyzers, are not only expensive but also interfered with by impurities like white blood cells, resulting in a lower detection accuracy.Therefore, more and more researchers have been focusing on the automatic detection of cells.
The traditional cell detection methods may be divided into three categories.The first kind is cell detection based on traditional segmentation algorithms [1], commonly including watershed segmentation [2,3], global threshold segmentation [4] and color segmentation [5,6].These methods make use of the gray or color features of the image to realize cell segmentation and recognition.The second type are recognition algorithms based on edge detection, such as the canny operator [7,8] and the Sobel operator [9,10].These algorithms utilize the object edge information as the detection result.The third category are the traditional machine learning algorithms, for example, support vector machine (SVM) [11] and neural networks [12].They are used to classify the obtained features and further perform border regression by the target category.However, the traditional detection methods above need to select a suitable model for the automatic recognition of samples with different features; therefore, they are poor in generalization and robustness.
As deep learning is increasingly integrated into the medical field, much research on cell detection methods of deep learning has been carried out.The typical target detection network is composed of a faster region-based convolutional neural network (Faster R-CNN) [13], you only look once (YOLO) [14] and a single shot multi-box detector (SSD) [15], which mainly focus on cell segmentation or single-cell recognition.In 2018, Han et al. [16] proposed a method based on a generative adversarial network to realize the detection of cancer cells in pathological sections of breast cancer, in which the U-Net network was a generative network.In 2020, Banik et al. [17] used the K-means algorithm and a CNN to classify and locate white blood cells, respectively, thus separating them from whole blood smear images.In the same year, Chen et al. [18], by referring to a LeNet-5 network, constructed a model using a traditional texture feature classifier and shallow CNN to effectively identify the type of milk somatic cells, which provided a new idea for subsequent research on classification methods.In 2021, Lavitt et al. [19] combined the xResNet architecture of a CNN with transfer learning to realize cell counting, in which the cell counting problem was transformed into a regression task.In 2022, based on the design criteria of optimizing the accuracy and speed of detection with minimal resources, Anand et al. [20] developed a customized deep-learning architecture (YOLO-mp), and obtained the number of pathogens in microscope images of thick blood smears.Although these methods realize cell recognition and classification, most of them complete single-cell recognition under the premise of sparse distribution.However, dense cell distribution is inevitable in the actual medicine scenario, in which the target sizes caused by adhesion and overlap are different and the cell morphological changes caused by squeezing are irregular.There, the problem of unrecognizable adherent cells appears, resulting in an increased false detection rate.The current algorithm research on these cases is usually limited to general target detection or specific scene detection.Few studies involve the extraction of adherent cells' features, which makes feature extraction inadequate and the recognition rate low.
In this paper, a deep-learning target recognition network based on Faster R-CNN is proposed in order to recognize both single cells and adherent cells in a densely distributed cell scene.On the basis of the Faster R-CNN, a balanced feature pyramid (BFP) [21] structure, deformable convolution network (DCN) [22] and efficient pyramid split attention (EPSA) mechanism [23] are adopted to form a method named BDE_RCNN.This method can effectively extract multi-scale features of cells and thus realizes the automatic recognition of binary classification for single cells and adherent cells.The proposed method has better feature extraction capability in comparison to other networks in terms of cell recognition.This will provide an effective solution for the automatic recognition of densely distributed cells in an actual medical scene.

Network Structure
Faster R-CNN is a two-stage target detection model for end-to-end training.It is mainly composed of a backbone network, a region proposal network (RPN), region of interest pooling (ROI Pooling), classification and regression.The backbone network is a basic CNN, in which the input image is processed by convolution, rectified linear unit (ReLU) activation, pooling, etc.Its output image with high-dimensional features is imported into the RPN and processed by a 3 × 3 convolutional layer to generate the anchors with multiple size proportions.The two outputs of the classification layer and regression layer are obtained for every anchor.
For the medical scene with densely distributed cells, there are many characteristics, such as a large number of cell samples, dense and disorderly distributions and larger morphological difference between adherent cells and single cells.Thus, the recognition problems of the conventional Faster R-CNN are as follows: 1.
The labeling results inevitably contain some non-target information due to the dense distributions of cells in the complex environment, which leads to an increased false detection rate.

2.
The squeezing of cells may happen, resulting in irregular morphology.This increases the difficulty of feature extraction.

3.
There are single red cells, agglomerated red cells, white blood cells or other impurities in an actual medical scene.These cause the problem of multi-scale recognition and reduce the range of receptive field.
In the BDE_RCNN method, ResNet50 is selected to replace Vgg16 as the backbone network in the Faster R-CNN.Firstly, a feature pyramid network (FPN) [24] is added in the feature extraction network to make full use of the visual features extracted.Considering the dense distributions of cells, the FPN is integrated with a non-local module [25] to produce a BFP structure, in which the feature extraction capability is enhanced by correctly identifying multi-scale targets.Secondly, due to the diversity of irregular changes in cell morphology, it is difficult for the conventional convolutional networks to learn the multiposture features caused by cell squeezing and deformation.The DCN is adopted to reform the C3 and C4 layers in the backbone feature extraction network, thus improving the feature extraction capability of the multi-posture features.Finally, the EPSA module is introduced to extract the multi-scale features of spatial information in each channel feature map by channel segmentations.It solves the loss problem of feature details caused by the different sizes of cells and the overlap of adherent cells.In addition, the resolution of the feature map decreases with the deepening of network depth in target detection of conventional networks, so the top-level feature map will lose the cell details.By using ROI Align in the improved network, instead of ROI Pooling, the problem of region mismatch caused by two quantization errors can be avoided in the pooling process.Figure 1 shows the structure of BDE_RCNN.
with multiple size proportions.The two outputs of the classification layer and regression layer are obtained for every anchor.
For the medical scene with densely distributed cells, there are many characteristics, such as a large number of cell samples, dense and disorderly distributions and larger morphological difference between adherent cells and single cells.Thus, the recognition problems of the conventional Faster R-CNN are as follows: 1.The labeling results inevitably contain some non-target information due to the dense distributions of cells in the complex environment, which leads to an increased false detection rate.2. The squeezing of cells may happen, resulting in irregular morphology.This increases the difficulty of feature extraction.3.There are single red cells, agglomerated red cells, white blood cells or other impurities in an actual medical scene.These cause the problem of multi-scale recognition and reduce the range of receptive field.
In the BDE_RCNN method, ResNet50 is selected to replace Vgg16 as the backbone network in the Faster R-CNN.Firstly, a feature pyramid network (FPN) [24] is added in the feature extraction network to make full use of the visual features extracted.Considering the dense distributions of cells, the FPN is integrated with a non-local module [25] to produce a BFP structure, in which the feature extraction capability is enhanced by correctly identifying multi-scale targets.Secondly, due to the diversity of irregular changes in cell morphology, it is difficult for the conventional convolutional networks to learn the multi-posture features caused by cell squeezing and deformation.The DCN is adopted to reform the C3 and C4 layers in the backbone feature extraction network, thus improving the feature extraction capability of the multi-posture features.Finally, the EPSA module is introduced to extract the multi-scale features of spatial information in each channel feature map by channel segmentations.It solves the loss problem of feature details caused by the different sizes of cells and the overlap of adherent cells.In addition, the resolution of the feature map decreases with the deepening of network depth in target detection of conventional networks, so the top-level feature map will lose the cell details.By using ROI Align in the improved network, instead of ROI Pooling, the problem of region mismatch caused by two quantization errors can be avoided in the pooling process.Figure 1 shows the structure of BDE_RCNN.

BFP Module
The BFP is a feature integration method.It combines the respective advantages of FPN and non-local to solve the unbalanced feature level of FPN in feature fusion.The BFP uses the feature map information of multiple levels to enhance the expression ability of

BFP Module
The BFP is a feature integration method.It combines the respective advantages of FPN and non-local to solve the unbalanced feature level of FPN in feature fusion.The BFP uses the feature map information of multiple levels to enhance the expression ability of each level in the feature map.The feature extraction network proposed is a ResNet50.Its last four convolutional layers are selected as the bottom-up network of the feature pyramid, defined as C2, C3, C4 and C5.After the C5 layer, a transverse connection and top-down network of new feature layers are established, namely P5, P4, P3 and P2.The complementary forms of high-and low-level information can improve the detection performance of the network and produce richer semantic features.
In order to integrate the features of multiple layers while preserving their respective semantic layers, the BFP unifies the feature maps of different levels to the C4 layer by adaptive maximum pooling or the interpolation method.The pooling processing is adopted for a small feature map, while the bilinear interpolation is carried out for a large-scale feature map.By average operation, the balanced semantic feature is expressed as: where C l is the prediction feature layer, l is the hierarchy, L is the total number of predicted feature layers, l max and l min are the highest level and lowest level, respectively.The balanced semantic features are refined by an embedded non-local, which will integrate the global information and further re-scale it to enhance the original features.This way of feature integration allows each scale to utilize richer details for recognition of dense cells at multiple scales.

DCN Module
Cases of cell adhesion, squeezing and morphological change exist in a cell dataset, resulting in irregular deformations of cell morphology.To solve this problem, the DCN module is introduced in the C3 and C4 layers of the backbone, respectively.On the basis of traditional convolution, the DCN adds the direction vectors of adjusting the convolution kernel to obtain more areas of interest according to sample shapes.This allows the kernel to closely align with characteristic objects, facilitating the learning of more complex transformations and reducing the background information in the receptive field.Figure 2 shows the sampling location diagrams of standard convolution and DCN, respectively.The differently colored balls denote the sampling location results, in which the orange ball is the initial position in each diagram.It can be seen that the DCN enables the sampling points of convolution kernel to shift in the input feature diagram, which makes the convolutional module extract more accurate target features.
mance of the network and produce richer semantic features.
In order to integrate the features of multiple layers while preserving th semantic layers, the BFP unifies the feature maps of different levels to the adaptive maximum pooling or the interpolation method.The pooling adopted for a small feature map, while the bilinear interpolation is carried o scale feature map.By average operation, the balanced semantic feature is ex where Cl is the prediction feature layer, l is the hierarchy, L is the total numbe feature layers, lmax and lmin are the highest level and lowest level, respectively The balanced semantic features are refined by an embedded non-loca integrate the global information and further re-scale it to enhance the orig This way of feature integration allows each scale to utilize richer details for r dense cells at multiple scales.

EPSA Module
In order to effectively acquire the characteristic information of different gle cells and adherent cells, the lightweight EPSA module is adopted, which × 3 convolution with PSA in the residual network bottleneck.

EPSA Module
In order to effectively acquire the characteristic information of different sizes for single cells and adherent cells, the lightweight EPSA module is adopted, which replaces a 3 × 3 convolution with PSA in the residual network bottleneck.
First of all, the spatial pyramid convolutional (SPC) module divides the input channels and extracts the multi-scale features according to the spatial information in the feature map of each channel, which improves the multi-scale representation capability at a finer level.Secondly, the squeeze-and-excitation weight (SEWeight) module is used to extract the channel attention from the feature maps with different scales to obtain the attention vector of each channel.Then, the softmax algorithm recalibrates the attention vector features of multi-scale channels to obtain the attention weights.Finally, the dot product is processed between recalibrated weights and corresponding feature maps by elements.The EPSA module not only increases the ranges of trunk feature extraction and receptive field, but also significantly separates the important context information features.So, EPSA is superior to the existing attention modules in multi-scale cell recognition.

ROI Align Module
The input of ROI Pooling is the candidate region coordinate obtained by RPN calculation in the Faster R-CNN.Because there is only one regression process, the coordinates, mapped from the candidate region in the original map, are the floating point numbers in the feature map.However, ROI Pooling requires quantizing coordinates to integers, and the quantization errors produced twice cause the position deviation of the candidate region in the original feature map.Therefore, this process affects the accuracy of detection results.In order to resolve the problem, ROI Align [26] is chosen to replace ROI Pooling.The principle is to cancel the quantization operation and obtain the floating point coordinates by the bilinear interpolation so as to complete the whole continuous operation.When the feature information is entered into the pyramid, the last four layers of data are entered into the ROI Align layer in the prediction of output features, thereby increasing the feature extraction capability of the network.

Dataset
The dataset is derived from the Infahan Medical Image and Signal Processing (MISP) dataset.The blood smear images were taken by a Nikon ECLIPSE 50i microscope with a magnification of 100 times, in which a large number of single cells and adherent cells appear in dense distributions.These images reflect the diversity of cell morphology and the environmental complexity of actual medical scenes, affecting the accuracy of cell recognition.They are neither labeled nor preprocessed, so it is necessary to draw the boundary boxes on cell images by labelme 5.0.2 software.The processes of annotation and labeling can accurately mark and segment single cells and adherent cells to train and evaluate the model.The sample examples of original dataset are shown in Figure 3.These images were obtained in different conditions, such as different shooting angles, lighting conditions, equipment settings and other factors.In order to deal with various images in the actual scene, the original dataset is expanded to 260 images by randomly rotating, scaling, cropping and adjusting contrast and brightness.The images are 775 × 519 pixels.The expanded dataset is randomly divided into a test set, validation set and training set according to the ratio of 3:3:20.Among them, the numbers of labeled single cells and adherent cells are 11,350 and 9414, respectively.

Evaluation Indexes
In order to comprehensively assess the performance of the proposed model, four evaluation indexes of recall, precision, mean average precision (mAP) and F1 score are adopted in here.All these evaluations are calculated based on the confusion matrix [27].
In target detection, the intersection over union (IoU) is the overlap rate between the candidate bounding box and ground truth box.DR indicates the candidate bounding box and GT indicates the ground truth box, and the IoU is given as: where SGT represents the area of ground truth box, SDR represents the area of candidate bounding box, GT DR S S ∩ represents the intersection area of the two.
Recall is the number ratio of correctly detected targets to all real labeled targets.It is expressed as: where TP refers to the number of true positive samples, that is, the positive samples are correctly identified as positive samples, FN refers to the number of false negative samples, i.e., the positive samples are incorrectly identified as negative samples.Precision is the number ratio of correctly detected targets to all predicted targets.It is given as: where FP refers to the number of false positive samples, that is, the negative samples are incorrectly identified as positive samples.Average precision (AP) is used to comprehensively measure the model quality by considering recall and precision.AP is described as:

Evaluation Indexes
In order to comprehensively assess the performance of the proposed model, four evaluation indexes of recall, precision, mean average precision (mAP) and F1 score are adopted in here.All these evaluations are calculated based on the confusion matrix [27].
In target detection, the intersection over union (IoU) is the overlap rate between the candidate bounding box and ground truth box.DR indicates the candidate bounding box and GT indicates the ground truth box, and the IoU is given as: where S GT represents the area of ground truth box, S DR represents the area of candidate bounding box, S GT ∩ S DR represents the intersection area of the two.
Recall is the number ratio of correctly detected targets to all real labeled targets.It is expressed as: where T P refers to the number of true positive samples, that is, the positive samples are correctly identified as positive samples, F N refers to the number of false negative samples, i.e., the positive samples are incorrectly identified as negative samples.Precision is the number ratio of correctly detected targets to all predicted targets.It is given as: where F P refers to the number of false positive samples, that is, the negative samples are incorrectly identified as positive samples.
Average precision (AP) is used to comprehensively measure the model quality by considering recall and precision.AP is described as: where t is the threshold of IoU, c is the given category number, N is the number of confidence thresholds selected, ∼ R is the next recall value.Because there are only two categories, namely single type and adherent type, thus c = 2.
The mAP is the average of AP of all categories.It is calculated as: For cell target detection, the larger the mAP value, the better the detection performance of the model, thus the higher the recognition rate.
F1 score also combines precision and recall and is their harmonic mean.It avoids the single maximum value of precision or recall to comprehensively reflect the overall index.F1 is expressed as:

Results of Ablation Experiments
The ablation experiments are carried out among the original network and three improved network models, respectively.The network parameters are as follows: the momentum gradient is 0.9, the initial learning rate is 0.005, the weight attenuation is 0.0005, the batch size of the input image is 4 during training and each experiment is trained with 40 epochs.These networks all use the same dataset and training parameters.The original network selects Vgg16 as the backbone, while the three improved networks use ResNet50 as the backbone.The improved model 1 adds EPSA and ROI Align, the improved model 2 adds BFP on the basis of model 1 and the improved model 3, namely BDE_RCNN, adds DCN on the basis of model 2. It is decided that when the IoU between the candidate bounding box and ground truth box is larger than 0.7, the sample is considered as a positive sample.Otherwise, when it is smaller than 0.3, it is considered as a negative sample.The positive and negative samples are used to train the classification function of RPN, and then the suggested areas are accurately output for the subsequent full connection layer.Figure 4 shows the detection results of four network models, in which the green marks and blue marks denote the detection results of single red blood cells and adherent red blood cells, respectively.It is seen from the detection results of the original network that there are missed cells, as shown in the red boxes in Figure 4a.This is mainly because the extraction step of the original model is larger, resulting in inadequate learning of small features in densely distributed scenes.Compared to the original network model, the three improved models significantly improve the cell detection effects.Model 1 and model 2 still have a few missing cells, as shown by the red boxes marked in Figure 4b,c.But BDE_RCNN correctly identifies almost all single cells and adherent cells, as shown in Figure 4d.The recognition results show that the proposed BDE_RCNN has the best effect for cell detection.
In order to more directly compare the recognition effect of each algorithm, the evaluation indexes of four models are listed in Table 1.A check indicates a selected model and a cross indicates an unselected model.The experimental datasets include three types: one is the dataset of single cells (SRBC), one is the dataset of adherent cells (ARBC) and the other is the mixture dataset of single cells and adherent cells (MIX).It is shown from recall, precision, mAP and F1 that BDE_RCNN is better than other models in SRBC, ARBC and MIX.In SRBC, the mAP of BDE_RCNN increases by 25.7%, 15.9% and 1.5% and F1 increases by 27.1%, 16.1% and 1.5%, respectively, compared to the original model, model 1 and model 2. In ARBC, the mAP of BDE_RCNN increases by 54.4%, 18.6% and 2.3% and F1 increases by 54.5%, 18.8% and 2.2%, respectively, compared to the other three models.In MIX, the mAP of BDE_RCNN also increases by 24.5%, 16.4% and 2.4% and F1 correspondingly increases by 26.6%, 17.2% and 3.2%, respectively.The experimental results indicate that the BDE_RCNN method performs best in multiple scenes with different datasets when it adopts BFP, DCN, EPSA and ROI Align modules, especially for adherent cells with dense distribution.The proposed method effectively solves the automatic recognition problem of red blood cells under the conditions of dense distribution, extrusion deformation, adhesion and overlap.In order to more directly compare the recognition effect of each algorithm, the ev uation indexes of four models are listed in Table 1.A check indicates a selected model a a cross indicates an unselected model.The experimental datasets include three types: o is the dataset of single cells (SRBC), one is the dataset of adherent cells (ARBC) and t other is the mixture dataset of single cells and adherent cells (MIX).It is shown from rec precision, mAP and F1 that BDE_RCNN is better than other models in SRBC, ARBC a MIX.In SRBC, the mAP of BDE_RCNN increases by 25.7%, 15.9% and 1.5% and F1 creases by 27.1%, 16.1% and 1.5%, respectively, compared to the original model, mode and model 2. In ARBC, the mAP of BDE_RCNN increases by 54.4%, 18.6% and 2.3% a F1 increases by 54.5%, 18.8% and 2.2%, respectively, compared to the other three mode In MIX, the mAP of BDE_RCNN also increases by 24.5%, 16.4% and 2.4% and F1 cor spondingly increases by 26.6%, 17.2% and 3.2%, respectively.The experimental resu indicate that the BDE_RCNN method performs best in multiple scenes with different d tasets when it adopts BFP, DCN, EPSA and ROI Align modules, especially for adhere cells with dense distribution.The proposed method effectively solves the automatic reco nition problem of red blood cells under the conditions of dense distribution, extrusi deformation, adhesion and overlap.The convergences of training loss function and the changes in model learning rate (lr) for four models are shown in Figure 5.It is shown that the first three models tend Appl.Sci.2023, 13, 12412 9 of 12 to converge at 1750 iterations, as shown in Figure 5a-c, while BDE_RCNN converges at 600 iterations, as shown in Figure 5d.The BDE_RCNN method is superior to other models in terms of convergence times.Thus, with the reduction of loss curve, a higher level of training model can be obtained at fewer iterations.In addition, the learning rate curve of BDE_RCNN begins to decline when the iteration numbers reach about 250, as shown in Figure 5d.Compared with other models, BDE_RCNN more easily approaches the optimal model solution of training parameters in the training process.The convergences of training loss function and the changes in model learning rate (lr) for four models are shown in Figure 5.It is shown that the first three models tend to converge at 1750 iterations, as shown in Figure 5a-c, while BDE_RCNN converges at 600 iterations, as shown in Figure 5d.The BDE_RCNN method is superior to other models in terms of convergence times.Thus, with the reduction of loss curve, a higher level of training model can be obtained at fewer iterations.In addition, the learning rate curve of BDE_RCNN begins to decline when the iteration numbers reach about 250, as shown in Figure 5d.Compared with other models, BDE_RCNN more easily approaches the optimal model solution of training parameters in the training process.

Results of Comparison Experiments
In order to further verify the effectiveness of the proposed algorithm, several singlestage mainstream models such as SSD, YOLOv5 and RetinaNet [28] are compared to BDE_RCNN, as shown in Figure 6.Among them, SSD network is based on a convolutional network of forward propagation to generate the boundary frame with fixed size and target scores of the sample.Its final detection result is obtained by non-maximum suppression.YOLOv5 is one of the typical single-stage algorithms.By dividing the feature graph into multiple lattices and detecting the targets in each lattice, the position and category of targets within the lattices can be predicted at once.Using the FPN structure and focal loss function, the RetinaNet network solves the imbalance of positive and negative samples in

Results of Comparison Experiments
In order to further verify the effectiveness of the proposed algorithm, several singlestage mainstream models such as SSD, YOLOv5 and RetinaNet [28] are compared to BDE_RCNN, as shown in Figure 6.Among them, SSD network is based on a convolutional network of forward propagation to generate the boundary frame with fixed size and target scores of the sample.Its final detection result is obtained by non-maximum suppression.YOLOv5 is one of the typical single-stage algorithms.By dividing the feature graph into multiple lattices and detecting the targets in each lattice, the position and category of targets within the lattices can be predicted at once.Using the FPN structure and focal loss function, the RetinaNet network solves the imbalance of positive and negative samples in the target detection network.Figure 6 shows that the three single-stage models fail to detect the whole cells and miss some cells for the same dataset, as shown by the red boxes marked in Figure 6a-c, respectively.Meanwhile, BDE_RCNN can correctly recognize the single cells and adherent cells, as shown in Figure 6d.This is attributed to lacking the step of candidate region extraction in the single-stage network, and only the first-stage network completes the two tasks of classification and regression.
the target detection network.Figure 6 shows that the three single-stage models fail to detect the whole cells and miss some cells for the same dataset, as shown by the red boxes marked in Figure 6a-c Table 2 shows the comparison results of SSD, YOLOv5, RetinaNet and BDE_RCNN in the evaluation indexes.It is seen that the overall detection accuracy of the improved BDE_RCNN is better than that of single-stage networks.For mAP and F1 indexes, BDE_RCNN is 46.5% and 44% better than SSD, respectively.Compared to YOLOv5, they increase by 17.1% and 17.8%, respectively.Similarly, the two indexes increase by 6.9% and 7.5% compared to RetinaNet, respectively.These results indicate that BDE_RCNN has stronger feature extraction capability compared to single-stage mainstream networks in the actual scene, thus effectively improving the automatic recognition effect for cells.

Conclusions
In this paper, an automatic recognition method, BDE_RCNN, of adherent cells in densely distributed scene is proposed based on Faster R-CNN.To improve the network performance, BFP, DCN and EPSA are applied to automatically recognize the cells under Table 2 shows the comparison results of SSD, YOLOv5, RetinaNet and BDE_RCNN in the evaluation indexes.It is seen that the overall detection accuracy of the improved BDE_RCNN is better than that of single-stage networks.For mAP and F1 indexes, BDE_RCNN is 46.5% and 44% better than SSD, respectively.Compared to YOLOv5, they increase by 17.1% and 17.8%, respectively.Similarly, the two indexes increase by 6.9% and 7.5% compared to RetinaNet, respectively.These results indicate that BDE_RCNN has stronger feature extraction capability compared to single-stage mainstream networks in the actual scene, thus effectively improving the automatic recognition effect for cells.

Conclusions
In this paper, an automatic recognition method, BDE_RCNN, of adherent cells in densely distributed scene is proposed based on Faster R-CNN.To improve the network performance, BFP, DCN and EPSA are applied to automatically recognize the cells under the conditions of dense distribution, extrusion deformation, adhesion and overlap.In addition, the region mismatch caused by quantization error is avoided by the ROI Align module.The experimental results show that BDE_RCNN can obtain a higher level of training model with fewer iterations and its mAP and F1 values are 24.5% and 26.6% higher than that of the original Faster R-CNN, respectively.Compared with several single-stage mainstream models, BDE_RCNN has stronger feature extraction capability, thus effectively improving the cell recognition effect.The proposed method can maximize the network recognition rate and realize the automatic recognition of single cells and adherent cells in an
Cases of cell adhesion, squeezing and morphological change exist in a resulting in irregular deformations of cell morphology.To solve this probl module is introduced in the C3 and C4 layers of the backbone, respectively of traditional convolution, the DCN adds the direction vectors of adjusting tion kernel to obtain more areas of interest according to sample shapes.Th kernel to closely align with characteristic objects, facilitating the learning of m transformations and reducing the background information in the receptive fi shows the sampling location diagrams of standard convolution and DCN, The differently colored balls denote the sampling location results, in which th is the initial position in each diagram.It can be seen that the DCN enables points of convolution kernel to shift in the input feature diagram, which m volutional module extract more accurate target features.(a)(b)

Table 1 .
Index comparison results of four models.

Table 1 .
Index comparison results of four models.

Table 2 .
Index comparison results of mainstream models and BDE_RCNN.

Table 2 .
Index comparison results of mainstream models and BDE_RCNN.