Identifying and Counting Avian Blood Cells in Whole Slide Images via Deep Learning

Simple Summary: Avian blood analysis is crucial for understanding the health of birds. Currently, avian blood cells are often counted manually in microscopic images, which is time-consuming, expensive, and prone to errors. In this article, we present a novel deep learning approach to automate the quantification of different types of avian red and white blood cells in whole slide images of avian blood smears. Our approach supports ornithologists in terms of hematological data acquisition, accelerates avian blood analysis, and achieves high accuracy in counting different types of avian blood cells. Abstract: Avian blood analysis is a fundamental method for investigating a wide range of topics concerning individual birds and populations of birds. Determining precise blood cell counts helps researchers gain insights into the health condition of birds. For example, the ratio of heterophils to lymphocytes (H/L ratio) is a well-established index for comparing relative stress load. However, such measurements are currently often obtained manually by human experts. In this article, we present a novel approach to automatically quantify avian red and white blood cells in whole slide images. Our approach is based on two deep neural network models. The first model determines image regions that are suitable for counting blood cells, and the second model is an instance segmentation model that detects the cells in the determined image regions. The region selection model achieves up to 97.3% in terms of F1 score (i.e., the harmonic mean of precision and recall), and the instance segmentation model achieves up to 90.7% in terms of mean average precision. Our approach helps ornithologists acquire hematological data from avian blood smears more precisely and efficiently.


Introduction
Automated visual and acoustic monitoring methods for birds can provide information about the presence and the number of bird species [1] or individuals [2] in certain areas, but analyzing the physiological conditions of individual birds allows us to understand potential causes of negative population trends.For example, measuring the physiological stress of birds can serve as a valuable early warning indicator for conservation efforts.The physiological conditions and the stress of birds can be determined in several ways, e.g., by assessing the body weight or the fat and muscle scores in migratory birds [3,4].Other frequently used methods are investigating the parasite loads, measuring the heart rates, and measuring the levels of circulating stress hormones, such as corticosterone [5][6][7][8][9].Depending on the research questions studied, these methods can be a good choice for assessing long-term stress or the investment in immunity.
The method investigated in this article comprises analyzing blood smears and counting blood cells [10].Not only are white blood cells, i.e., leukocytes, an important part of the immune system of vertebrates such as mammals or birds, but also the composition of leukocytes is known to change in response to elevated stress hormones (glucocorticoids) and can, therefore, be used to assess stress levels [10].In particular, the ratio of heterophils to lymphocytes (H/L ratio) is considered to be a well-established stress index for assessing long-term stress in birds [10,11].Since the H/L ratio changes only 30 to 60 min after the onset of an acute stress event, it is possible to measure stress without mirroring the influence of the capture event [12].It is also possible to calculate the leukocyte concentration (leukocytes per 10,000 erythrocytes) or the concentration of specific leukocyte cell types for gaining an understanding of the current health status of a bird and the investment in immunity [13][14][15].
Leukocyte counts are quite cost-effective since they do not require complex laboratory techniques.However, evaluation under the microscope often requires manual interpretation by human experts, is time-consuming, and can only assess small portions of the entire smear.Typically, leukocytes are counted until 100 leukocytes are reached [16].Consequently, the counted values and subsequent ratios are not always reproducible, and the result depends on the section counted.Furthermore, the method is prone to observer errors.Therefore, there is an urgent need for automated methods to perform leukocyte counts in avian blood smears.
Bird and human blood cell analysis have some aspects in common.The counted leukocytes are similar: lymphocytes, eosinophils, basophils, and monocytes can be found in mammalian as well as avian blood.However, there are some significant differences that make the automated counting of avian blood cells more difficult [17].The neutrophils in human blood are equivalent to heterophils in birds.One of the main differences between bird and human blood, however, is the presence of nuclei in bird erythrocytes (i.e., red blood cells) and thrombocytes, whereas there is no nucleus in mammalian erythrocytes and thrombocytes [17].The presence of a nucleus in erythrocytes makes the cell identification process more complicated since lysed and ruptured erythrocytes can be mistaken for other cell types.Lastly, during ornithological field studies, the bird blood samples are usually not taken in a sterile environment, leading to dirt contaminating the smears.Such contaminants and stain remnants can further lead to confusion.Because of these differences from human blood and the associated challenges, it is necessary to develop dedicated solutions instead of relying on existing machine learning approaches for human blood analysis to automatically analyze bird blood samples.
A solid understanding of the different leukocyte types is necessary when analyzing avian blood samples since some are quite similar to each other.Figure 1 shows examples of each blood cell type as well as two challenging anomalies, i.e., stain remnants and lysed cells (Figure 1d) as well as ruptured cells (Figure 1h).Lymphocytes appear small, round, and blue in blood smears and are most common in passerine birds.Their nuclei usually take up more than 90% of the cell (see Figure 1b).Heterophils can be identified by their lobed cell nuclei and rod-shaped granules in the cytoplasm, as shown in Figure 1e.In birds, heterophils and lymphocytes make up approximately 80% of the leukocytes [18].Eosinophils are similar to heterophils but have round granules (see Figure 1c).Basophils can be recognized by their purple-staining granules, as shown in Figure 1f, but they are rare to find.Monocytes are larger cells that can be confused with lymphocytes, but their nucleus often has a kidney-shaped appearance and takes up only up to 75% of the cell (see Figure 1g) [19].Additionally, it is important to be aware of possible variations regarding the morphology and staining characteristics of these cell types between different avian species, which may affect their identification and interpretation.Avian blood counts are still mostly obtained manually.However, there are several approaches for more systematic, automated ways of counting avian blood cells.For instance, Meechart et al. (2020) [20] developed a simple computer vision algorithm based on Otsu's thresholding method [21] to automatically segment and count erythrocytes in chicken blood samples.Beaufrère et al. (2013) [22] used image cytometry, i.e., the analysis of blood in microscopy images, in combination with the open-source software CellProfiler [23,24] to classify each cell using handcrafted features as well as machine learning algorithms.However, they stated their results were not satisfactory.
Another way of automating avian blood counts is the use of hardware devices for blood analysis.For example, the Abbott Cell-Dyn 3500 hematology analyzer [25] (Abbott, Abbott Park, IL, USA) was used in studies analyzing chicken blood samples [26,27].The Cell-Dyn 3500 works on whole blood samples and relies on flow cytometry, i.e., the analysis of a stream of cells by a laser stream and electric impedance measurements.The device was standardized for poultry blood.
The CellaVision ® DC-1 analyzer [28] (CellaVision AB, Lund, Sweden) scans blood smears and pre-classifies erythrocytes as well as leukocytes.In combination with the proprietary CellaVision ® VET software [29], the device can be used to analyze animal blood, including bird blood.However, the pre-classification results still need to be verified by a human expert.The device has a limited capacity of a single slide and is able to process roughly 10 slides per hour, according to the manufacturer [28].This throughput does not appear to reduce turnaround times in (human) blood analysis [30].Yet, in a distributed laboratory network, the device could indeed contribute to reduced turnaround times [31].
In the last decade, deep learning models, in particular convolutional neural networks (CNNs), have become the state of the art in many computer vision tasks, such as image classification, object detection, and semantic segmentation.These deep neural networks are highly suitable for image processing since they can learn complex image features directly from the image data in an end-to-end manner.Apart from their success in natural image processing, they have also contributed to biological and medical imaging tasks, e.g., in cell detection and segmentation [32,33], blood sample diagnostics [34,35], histopathological sample diagnostics [36], such as breast cancer detection [37], and magnetic resonance imaging (MRI) analysis [38].
However, only a few deep learning approaches are available for avian blood cell analysis.For instance, Govind et al. (2018) [39] presented a system for automatically detecting and classifying avian erythrocytes in whole slide images.Initially, they extract optimal areas from the whole slide images for analyzing erythrocytes.In the first step, regions are chosen from low-resolution windows using a quadratic determinant analysis classifier.These optimal areas are then refined at higher resolution using an algorithm based on binary object sizes.This algorithm identifies overlapping cells that need to be split.The actual separation is conducted in a multi-step handcrafted algorithm.Intensity-and texture-based features are used to distinguish between erythrocytes and leukocytes, but the latter are not actually detected.In the final step, all detected erythrocytes, i.e., solitary and separated from clumps, are classified.This is the only part of the approach that relies on deep learning.Each detected cell is cropped and fed to a GoogLeNet deep neural network [40].The resulting model can classify the detected erythrocytes as mammalian, reptilian, or avian.Furthermore, the model can categorize erythrocytes into one of thirteen species.However, only one of these is a bird species.Kittichai et al. (2021) [41] used different CNN models to detect infections of an avian malaria parasite (Plasmodium gallinaceum) in domestic chickens.Initially, a YOLOv3 [42] deep learning model was used to detect erythrocytes in thin blood smear images.Then, four CNN architectures were employed for the classification of the detected cells to characterize the different avian malaria blood stages.
However, to the best of our knowledge, there is no hardware-independent and publicly available approach for the automated segmentation and classification of avian blood cells, i.e., erythrocytes as well as leukocytes.
In this article, we present a novel deep learning approach for the automated analysis of avian blood smears.It is based on two deep neural networks to automatically quantify avian red and white blood cells in whole slide images, i.e., digital images produced by scanning microscopic glass slides [43].The first neural network model determines image regions that are suitable for counting blood cells.The second neural network model performs instance segmentation to detect blood cells in the determined image regions.For both models, we investigate different neural network architectures and different backbone networks for feature extraction in cell instance segmentation.We provide an open-source software tool to automate and speed up blood cell counts in avian blood smears.We make the annotated dataset used in our work publicly available, along with the trained neural network models and source code [44].In this way, we enable ornithologists and other interested researchers to build on our work.

Materials and Methods
We present a deep learning approach for automatically identifying and counting avian blood cells.The approach is divided into three main phases.Figure 2 gives an overview of the entire process from acquiring blood smears to automatically emitting a blood cell count for a whole slide image.In the first phase, avian blood samples are acquired and digitized.Next, the resulting images are split into tiles that are uploaded to a web-based annotation tool used by a human expert to thoroughly annotate tiles for both tasks, i.e., tile selection and cell instance segmentation.In the second phase, adequate deep neural networks are trained for both tasks.The deep neural network models assist a human expert during annotation by providing pre-annotations for further labeling.In the third phase, the trained deep neural network models are applied to process whole slide images that were not used during training.These images are split into tiles that are analyzed by the tile selection model.Only tiles approved as countable are forwarded to the instance segmentation model.The final blood cell counts are determined based on the outputs of the instance segmentation model.

Data Acquisition and Annotation
To be able to train deep learning models at scale, we used avian blood smear samples from an ornithological field study in the Marburg Open Forest (MOF), a 250-hectare beechdominated forest in Central Hesse, Germany.The data collection took place over four consecutive years, from 2019 to 2022, during the breeding seasons of the entire forest bird community (29 species of 16 families and 5 orders) between mid-March and August.The birds were captured using mist nets 12 m in length and 2.5 m in height; the mesh size was 16 × 16 mm.Each bird was marked with a ring for re-capture identification with the necessary permissions obtained from the Heligoland Bird Observatory (Institut für Vogelforschung Heligoland, Germany).A blood sample was taken within 30 min from capture by puncturing the brachial vein and using heparinized capillary tubes, following the animal testing approval granted by the regional council of Giessen, Hesse, Germany (V 54-19 c 20 15 h 01 MR 20/15 Nr.G 10/2019).The blood was then used to create one or two whole blood air dry smears of each bird right after sampling in the field.In the laboratory, the blood smears were fixed in methanol within 24 h and stained with Giemsa within 21 days, following standard protocols [45].
Our work relies on two data sources.We created a first, small dataset consisting of 160 images through manual acquisition.The images were digitized with a Zeiss [46] AxioCam ERc 5s Rev.2 (Carl Zeiss AG, Oberkochen, Germany) in combination with a Zeiss Primo Star microscope (Carl Zeiss AG, Oberkochen, Germany) at 100× magnification with oil immersion and saved in the PNG image format with a resolution of 2560 × 1920 px.However, this method is not suitable for creating a large dataset since there is a significant manual effort involved, which makes the method very time-consuming.
To create a second, larger dataset, we digitized one or two blood smears per bird during the initial capture and, in recapture cases, we digitized another one or two blood smears of the same bird by scanning the blood smears with a Leica Aperio AT2 Scanner [47] (Leica Biosystems Nussloch GmbH, Nussloch, Germany) at 40× magnification.We selected the highest quality smear of each bird per capture for our analysis.Aperio scanners generate high-resolution images that are stored in the SVS file format, which consists of a series of TIFF images.The first image is always the full-resolution scan.In subsequent layers, it is split into tiles that are decreasing in size with each layer.Overall, we obtained 527 whole slide images from 459 individual birds of 29 species.These images range from 47,807 px to 205,176 px in width and from 35,045 px to 93,827 px in height.To be able to process these huge images with deep learning models, we used OpenSlide [48] to crop them into tiles.
The complete and accurate annotation of the dataset is crucial for training a highquality deep learning model.To reduce the impact of human errors and eliminate consistency issues in the annotations, we relied on a single human expert for the labeling task.In the following, we describe the annotation process in more detail for both of our tasks, i.e., tile selection and cell detection and segmentation.
There are several criteria for classifying an avian blood smear image tile as either countable or non-countable.Figure 3 shows examples of positive (i.e., countable) and negative (i.e., non-countable) image crops.Cells should be equally distributed, as shown in Figure 3b, without large empty spaces, as shown in Figure 3a.Furthermore, there should be only a few overlapping cells and especially no overlapping nuclei, as contrasted in Figure 3c,d.In general, good image quality is desirable (Figure 3f,e).To be able to train a deep learning model to classify blood smear image tiles as countable or non-countable, a human expert manually selected image tiles and classified them as countable or non-countable.This process led to a dataset consisting of 2288 positive and 2372 negative examples.While it is sufficient for the tile selection task to simply annotate each sample to be countable or not, we fully annotated the dataset for detection with segmentation masks instead of using bounding boxes.Hence, each single cell instance needs to be precisely covered by a mask and tagged with the corresponding class label.Although this annotation method takes even more time, it improves the performance of the final model [49].Providing exact cell boundaries is particularly beneficial in crowded image regions, where several bounding boxes may overlap.
We used the web platform Labelbox [50] for annotating images of our datasets.The instance segmentation dataset was annotated in an iterative, model-assisted manner.This means that we used the tile selection network to propose regions to be annotated and eventually selected them based on how many rare cells had been detected by an intermediate instance segmentation model.In the very first iteration, we used a superpixel algorithm to generate simple instance masks.In each iteration, we uploaded the corresponding instance segmentation masks to Labelbox to be refined by our human expert.This procedure significantly reduces the time needed to fully annotate an image file with masks and class labels compared to annotating from scratch.Overall, we went through four iterations of labeling.For the annotated cell instances, we established two primary categories: erythrocyte, with only the nucleus annotated, and leukocyte.The latter was further split into five subtypes, namely, lymphocyte, eosinophil, heterophil, basophil, and monocyte.Thrombocytes were not explicitly annotated; they were considered to be part of the background during training.Thus, our trained neural network model can distinguish between non-relevant thrombocytes and other annotated cell types, e.g., erythrocytes.By annotating only the nucleus of each erythrocyte rather than the entire cell including the cytoplasm, we maintained the option to label parasite-infected instances individually in future work.Cells infected with parasites may be annotated by masking the entire cell including the cytoplasm.One erythrocyte can be simultaneously counted as both an erythrocyte and a cell with blood parasite because of the distinct annotation regions.Overall, our segmentation dataset consisted of 1810 fully annotated images.As Table 1 shows, the dataset contained 226,781 annotated cell instances that were unevenly spread across 5 taxonomic bird orders, namely, Accipitriformes, Columbiformes, Falconiformes, Passeriformes, and Piciformes.The orders Passeriformes and Accipitriformes dominated the dataset.Moreover, with a share of 98% of all cells, erythrocytes were by far the most frequent type of blood cells in our dataset.Among the leukocytes, the subtypes were also not distributed equally.While lymphocytes, eosinophils, and heterophils numbers were between 1000 and 2000 samples, basophils and monocytes were expectedly very rare, as Figure 4 demonstrates.This imbalance is often challenging for machine learning approaches.The annotation of the instance segmentation dataset took our human expert more than 70 h.

Deep Learning Approach
Our novel approach for analyzing whole slide avian blood smear images consists of two stages, i.e., tile selection and instance segmentation.The tile selection process is shown in Figure 5. Initially, we decompose the input whole slide image into tiles.For each tile, we perform a binary classification to ensure that the contained cells fulfill the requirements to be countable.Next, an instance segmentation model is applied to all tiles that are classified as countable.This model detects all cells in the image and classifies each one as either an erythrocyte or as one of the subtypes of leukocytes.Figure 6 illustrates this procedure.Each step is explained in more detail in the following sections.

Tile Selection
For the tile selection model, we used EfficientNet [51] as our architecture.EfficientNet is a family of neural network architectures that has proven to be an excellent choice for image classification.We experimented with the two smallest versions of EfficientNet, namely EfficientNet-B0 and EfficientNet-B1.In a pre-processing step, we randomly applied data augmentation to prevent our model from overfitting the training data.Besides random contrast, random hue, random cropping, and horizontal as well as vertical flipping, these augmentations included elastic transformations that have proven to be very beneficial for cell recognition tasks [32,33].The input size of our model was 512 × 384 px.Our training was based on a well-established pre-trained ImageNet model and fine-tuned in two phases using the Adam optimizer [52] and binary cross-entropy loss.By experimenting with different learning rates, we found an initial learning rate of 1 × 10 −4 to work best in our case.During the first training phase, we kept the majority of the model parameters fixed.We made an exception for the last layer, where we introduced a new set of weights.This approach ensured that the new layer would not interfere with the pre-trained weights in the rest of the model.In this way, the randomly initialized last layer could adapt to the training data.We trained the network for 20 epochs, i.e., iterating with the whole training dataset 20 times.In the second training phase, we lowered the initial learning rate by a factor of 10 and trained the last 20 layers of the model to ensure that the model would learn useful features for the tile selection task.The loss converged again after up to 30 more epochs.Furthermore, in both phases, the learning rate was reduced whenever the validation accuracy stagnated in order to help the model find the optimal set of weights.

Detection and Segmentation
For the instance segmentation task, we used the CondInst architecture [53], as shown in Figure 6.This neural network is based on the anchor-free object detector FCOS [54] that tackles object detection in a fully convolutional way.To solve object detection or instance segmentation tasks, many approaches rely on anchor boxes and proposals, e.g., Faster R-CNN [55] and most versions of YOLO [56].
For instance, Faster R-CNN generates bounding box proposals based on pre-defined anchor boxes.Anchor boxes of different scales and aspect ratios are placed in each area of the image and are assigned a score based on how likely they are to contain a relevant object.High-scored proposals are resized to a common size and processed in parallel in two different branches of the network, the so-called heads.One refines the proposed bounding boxes, i.e., the box regression head, while the other predicts the corresponding class label, i.e., the classification head.
However, using anchor boxes has several drawbacks.First, deep neural networks based on anchor boxes are sensitive to the choice of hyperparameters.For example, since anchor box scales and aspect ratios are fixed, they cannot easily adapt to new tasks.Furthermore, such networks are computationally inefficient.They produce many negative boxes and have to calculate many Intersection over Union (IoU) values to find the optimal proposals.The object detector FCOS [54] relies on neither anchor boxes nor proposals.Instead, one of its heads simply predicts a 4D vector regressing the bounding box centered at this location.
While fully convolutional networks have been commonly used for semantic segmentation for many years, the CondInst architecture [53] successfully applies this type of neural network to instance segmentation.To be able to predict instance segmentation masks rather than bounding boxes, CondInst adds a mask branch to FCOS and inserts a so-called controller head, along with the classification and bounding box heads for each location (x, y).The controller head dynamically generates convolutional filters specifically built for each instance in an image by predicting its parameters.In this way, the mask branch becomes instance aware, i.e., it can predict one segmentation mask for each object instance in the image.This strategy yields very good results for irregular shapes that are challenging to tightly enclose within a bounding box.
To train our model, we used two different kinds of input data.The first part of our training dataset consisted of 138 images with a resolution of 2560 × 1920 px from the manually acquired data source.Since these images were captured with a magnitude different from the one used for the whole slide images, we needed to choose the crop tile size such that they contained a comparable number of similarly sized cells.This resulted in a tile size of 512 × 384 px.
In a pre-processing step, we resized each image to 1066 × 800 px, matching the maximal short-edge input size of CondInst.Furthermore, we applied extensive data augmentation to enrich the dataset and prevent the model from overfitting to the training data.In particular, we applied random horizontal and vertical flipping as well as random adaptations of brightness, contrast, and saturation in an empirically chosen range of 0.7 to 1.3, respectively.Additionally, we applied random elastic transformations that are explicitly useful for cell segmentation tasks [32,33] since they produce realistic alternations of the cells by using an x × x grid overlay and distorting it with random displacement vectors.We empirically chose x ∈ {6, 7, 8, 9}.The order of magnitude ranged from 5 to 10. Through visual inspection, we made sure that deformations based on these parameters did not produce unrealistic cell structures.
We built our CondInst model based on a ResNet-101 backbone architecture.To enable transfer learning, the entire CondInst network was initialized with weights pre-trained on the COCO (Common Objects in Context) object detection dataset [57].Hence, we lowered the initial learning rate by a factor of 10, which resulted in a learning rate of 0.001.To match the number of classes, we modified the classification head accordingly.We optimized the network for 16,600 iterations at a batch size of 4, i.e., more than 50 epochs.When approaching the end of the training, the learning rate was decreased twice according to the scheduler used in CondInst [53].

Hardware and Software
We implemented our method using the AdelaiDet [58] and Detectron2 [59] frameworks and utilized the Augmentor Python library [60] for pre-processing and, in particular, for generating elastic transformations.All our experiments were conducted on a workstation equipped with an AMD EPYC™ 7702P 64-Core CPU, 256 GB RAM, and four NVIDIA ® A100-PCIe-80GB GPUs.
For runtime measurements, we used only a single GPU to run the instance segmentation model.The tile selection model was applied as a parallelized pre-processing step on the CPU.

Quality Metrics
We evaluated the EfficientNet approach for choosing countable image regions by using the accuracy score, a widely used metric defined as the proportion of true positive results, i.e., both true positives and true negatives in all predictions made by the model.Furthermore, we calculated the F1 score, which is the harmonic mean of recall and precision, i.e., 2 • p•r p+r .Recall and precision were computed as p = TP TP+FP and r = TP TP+FN , where TP, TN, FP, and FN are true positives, true negatives, false positives, and false negatives, respectively.We evaluated our instance segmentation models in terms of average precision (AP), a common metric for object detection tasks.The AP is defined as the mean precision for equally spaced recall values.The metric corresponds to the area under the precisionrecall curve, where predicted bounding boxes with an Intersection over Union (IoU) of more than a threshold t are considered to be true positives (TPs).For two sets of pixels, the IoU is defined as IoU(A, B) = A∩B A∪B .Predictions that have no matching ground truth boxes are false positives (FPs) and ground truth boxes with no matching prediction are false negatives (FNs).For a given IoU threshold t, the AP is computed as an interpolation based on 101 values [57]: where p interp (r) = max r: r≥r p( r) and p( r) is the measured precision at recall r.In our experiments, we set t = 0.5, i.e., an IoU of 50% between ground truth and the predicted bounding box or segmentation is required for a proposal to be counted as a true positive.We denote this metric as AP@50.
To evaluate the overall performance, we calculated the mean AP (mAP) score by taking the mean value of the AP scores from the different classes.
We performed each experiment five times with random seeds and report the standard deviation to make sure that we present reliable results.

Tile Selection
The models were evaluated on a held-out dataset consisting of 298 positive and 346 negative examples.
As Table 2 shows, both of the models performed very well with accuracies and F1 scores above 96%.The smaller version, i.e., EfficientNet-B0, performed better with an accuracy of 97.5% and an F1 score of 97.3%.

Detection and Segmentation
First, we performed the training with no augmentation at all.The results are summarized in Table 3 for the detection task.Adding the default data augmentation of CondInst, i.e., random horizontal flipping, improved the results by roughly 1.9% in terms of mAP.The application of further data augmentation, namely, random vertical flipping, random brightness, random contrast, and random saturation, again improved the score by roughly 2.8%.If we instead applied horizontal flipping and elastic deformations, the models still achieved an mAP of 87.7%.Combining all data augmentation methods in one model resulted in the best model, achieving 98.9% for erythrocytes, 90.2% for lymphocytes, 87.3% for eosinophils, and 86.3% for heterophils in terms of AP and, hence, an mAP score of 90.7%.Overall, combining all data augmentation methods resulted in an improvement of roughly 5.2% in terms of mAP.Table 3.Detection results for erythrocytes and several subtypes of leukocytes.We experimented with different methods for data augmentation, i.e., random horizontal flipping (HFlip), random combinations of vertical flipping, brightness, contrast and saturation (DA), and elastic deformations (ED).We experimented with different architectures, i.e., CondInst and Mask R-CNN, and backbone models, i.e., ResNet-50 (R-50) and ResNet-101 (R-101).We report average precision at an Intersection over Union (IoU) of 50% (AP@50) for each class and the corresponding mean average precision (mAP).Values in bold indicate the best results for each metric.

Model
Backbone HFlip DA ED Detection (AP@50) The results of the instance segmentation shown in Table 4 are similar to those obtained for the corresponding bounding box detections.Each additional data augmentation step increased the mAP scores, and the best result was achieved by applying all random data augmentation techniques.However, the best model was not as dominant as in the detection task and could not outperform the other approaches in every category.

Erythrocyte
We observe that erythrocytes were continuously recognized almost perfectly with 98.9% and 99.0%AP for detection and segmentation, respectively.Thus, the model learned to not confuse thrombocytes or immature erythrocytes with erythrocytes.However, there was also an obvious drop in performance for all leukocyte subclasses.On the one hand, erythrocytes were easiest to identify because of the characteristic nucleus, and on the other hand, they were by far the most frequent cell type in avian blood samples, i.e., roughly 98% of all instances in our dataset.Therefore, the model could learn better features from this large set of samples.Among the leukocytes, this trend was evident as well.The most frequent leukocyte class, i.e., lymphocytes, still achieved 90.0% in terms of AP, while eosinophils and heterophils achieved 87.3% and 85.2%, respectively.Thus, there appears to be a correlation between the number of training samples and performance, which is, however, not statistically significant.
To further analyze the relation of precision and recall in more detail, we plotted the precision-recall curve of our best model in Figure 7.The graph of the erythrocyte (blue line) class is almost perfect, as expected, with an AP and, hence, an area under the curve of 99%.The graphs for the other classes start descending sooner, but by choosing the best threshold, in our case 0.5, a good balance between precision and recall could be achieved.
CondInst with a smaller backbone, namely ResNet-50, performed very well but could not compete with the model based on ResNet-101.In comparison, the performance deteriorated by roughly 2.5% in terms of mAP.However, the anchor-based Mask R-CNN approach using a ResNet-101 backbone showed a clear drop in performance of roughly 6.8% compared to the anchor-free CondInst approach using the identical backbone.Table 4. Instance segmentation results for erythrocytes and several subtypes of leukocytes.We experimented with different methods for data augmentation, i.e., random horizontal flipping (HFlip), random combinations of vertical flipping, brightness, contrast and saturation (DA), and elastic deformations (ED).We experimented with different architectures, i.e., CondInst and Mask R-CNN, and backbone models, i.e., ResNet-50 (R-50) and ResNet-101 (R-101).We report average precision at an IoU of 50% (AP@50) and the corresponding mean average precision (mAP).Values in bold indicate the best results for each metric.We did not have enough samples of basophils and monocytes for a comprehensive evaluation of their respective classes, but these samples could be aggregated into their superclass leukocytes.We trained a binary CondInst model that could classify avian blood cells into erythrocytes and leukocytes.As Tables 5 and 6 show, our model could perform very well on this task, achieving more than 93% and 98.8% in terms of AP for leukocytes and erythrocytes, respectively.As before, the larger backbone, i.e., the ResNet-101, pushed the model to a better performance on leukocytes.

Model
The mAP score regarding all leukocytes was higher than for any of the subclasses.Presumably, the multi-class model confused cell instances of the different subclasses.

Inference Runtimes
The inference runtimes for samples of different sizes are shown in Table 7.We included the largest whole slide image (i.e., sample 8_036) consisting of more than 19 billion pixels as well as the smallest sample (5_055) with only roughly 2.5 billion pixels.However, in addition to the size of the image, the fraction of actual countable tiles played a crucial role in the processing times.For the largest file (8_036) containing 97,200 tiles with a countable tiles fraction of roughly one-fourth, our approach took roughly 25 min.Yet, another sample (1_023) with only 91,059 tiles, but more than half of them classified as countable, took roughly 52 min.Processing the three smaller samples took less than 15 min each.In general, none of the selected images needed more than one hour to determine the cell counts in the corresponding blood smear.Depending on the mentioned factors, processing took mostly less than a tenth of a second for one countable tile, including tile selection, segmentation, and identification, as well as counting of the respective cell instances.In contrast, our human expert took an average of roughly two minutes to annotate a tile with labels and segmentation masks in our semi-automated setting.

Discussion
Our novel approach offers a proficient assessment of avian blood scans, which speeds up the workflow of blood cell counting significantly compared to the traditional method of visually counting on microscopes.Compared to existing hardware devices for automated blood analysis [25,28], which are usually quite expensive, our approach is freely available.Hence, we enable researchers who do not have access to such devices used in veterinarian laboratories to utilize an automated cell-counting method.The CellaVision ® DC-1 analyzer has been evaluated for mammalian, reptilian, and avian blood by comparing its preclassification to the final results after review by veterinarians [61].The agreement was very good for neutrophils, heterophils, and lymphocytes (each > 90%) and good for monocytes (81%).However, eosinophils and basophils needed massive re-classification by human experts.Interestingly, while we agree that achieving good performance for basophils is a challenge, our model appears to be more reliable for eosinophils.However, we could not evaluate our model on monocytes that were recognized in a satisfactory way by the CellaVision ® DC-1.Moreover, our approach can be more efficient than hardwarebased approaches.The DC-1 [28] analyzer processes given slides sequentially, achieving a throughput of no more than roughly 10 slides an hour.Our approach allows users to scan slides with various methods, e.g., with microscope cameras or high throughput scanners, like the Leica Aperio AT2 Scanner [47] with a capacity of 400 slides, as used in our study.The Leica Aperio AT2 Scanner can be used to digitize a large number of slides in a very time-efficient manner.Our approach can be arbitrarily scaled by processing several slide images in parallel and is only limited by the available hardware resources.Furthermore, our approach can handle low-quality blood smears because it has been trained under such conditions, while the CellaVision ® DC-1 analyzer is primarily designed for usage in veterinary laboratories.Moreover, because of its proprietary design, it is not possible to use custom training data to adapt the classification approach.Hence, regarding large numbers of avian blood data sampled in ornithological field studies, our approach opens new possibilities for bird-related research.
While our approach shows that it is feasible to automatically count not only red but also white avian blood cells with open-source software, it still has some downsides.Because of the low number of samples in our training set, our neural network model is not yet able to reliably recognize basophils or monocytes.Furthermore, the model is trained on a limited number of bird species.Because of potential variations in staining intensity, coloration, and cell morphology, it may be a challenge to detect cells of other bird species as reliably as for the given species [62].In particular, eosinophils may be quite different between bird species.
However, these issues indicate several areas for future work.The model performance can be further improved by extending the dataset in general and particularly for the rare classes, i.e., for basophils and monocytes.Instead of bluntly annotating more images that barely contain any of these cells, this can be done using an active learning approach, which reliably provides unlabeled images that contain these types of avian white blood cells.Moreover, it is a promising direction to generate more training samples by generative deep learning approaches, like GANs [63], or image generation models based on latent diffusion [64].Furthermore, our approach can be extended to recognize and count blood parasites (e.g., Haemosporida and Trypanosoma).Another interesting aspect is investigating and improving the generalization ability of our neural network model in cross-domain scenarios.This can include different techniques when creating blood smears for different bird species.We plan to include further bird species into our model, e.g., penguins.
Several studies have indicated that extreme ecological conditions can significantly increase hematocrit levels in birds.For example, a female great tit from the northernmost populations in Northern Finland showed a hematocrit level of 0.83 [65].This makes the blood viscous and leads to densely packed cells in the blood smear image, which can be challenging for automated counting approaches.Since we trained our model to count only areas matching human quality standards, we only counted tiles from the monolayer.Hence, a high hematocrit level may lead to significantly more rejected tiles.However, our approach is adaptable to new annotated data sources.Thus, providing our models with manually labeled images with high hematocrit levels in future training iterations will improve their ability to process and count cells with such rare conditions better.In general, our approach is based on open-source software.Therefore, the models can easily be adapted to other datasets or extended to recognize further cell types.So far, our approach aims to automate the tedious task of manually counting avian blood cells.Furthermore, it eliminates inter-observer errors.However, it still counts cells only in the monolayer.Future work may expand the countable areas, as achieved by handcrafted feature algorithms [39].For a deep learning approach like ours, this can be achieved by training the model with data involving lower-quality areas.By learning useful features from the annotated samples, the resulting models may be capable of achieving superhuman performance.
Our deep learning model opens up new opportunities in ornithology and ecology for documenting and evaluating the stress levels and health conditions of bird populations and communities efficiently and can, therefore, be used as an early warning indicator to detect physiological changes within populations or communities even before a population declines.With this fast, reliable, and automated approach, even old collection samples may be incorporated into modern ornithological research.Our approach is currently used in practice for research on the relative stress load of forest birds by automatically determining H/L ratios.

Conclusions
We presented a fully automated open-source software approach to determine not only the total erythrocyte count but also the total and differentiated leukocyte counts of avian blood samples.Our approach operates on whole slide blood smear images.First, we select tiles using a deep neural network model to determine the areas of the images that are suitable for counting contained cells by classifying all tiles into countable and non-countable ones.Each tile classified as countable is then fed into another deep neural network model.This model is capable of detecting and classifying avian blood cells, i.e., erythrocytes and leukocytes, with 96.1% in terms of mAP.Furthermore, if the model is trained to also recognize subtypes of leukocytes, it achieves up to 98.9%, 90.2%, 87.3%, and 86.3% in terms of AP for erythrocytes (with nuclei), lymphocytes, eosinophils, and heterophils, respectively.

Figure 1 .
Figure 1.Examples of different cell types: (a) erythrocytes of a blue tit (Cyanistes caeruleus) and a Eurasian blackcap (Sylvia atricapilla); (b) lymphocytes of a common blackbird (Turdus merula) and a common buzzard (Buteo buteo); (c) eosinophils of a common buzzard and a European robin (Erithacus rubecula); (d) stain remnants and lysed cells in the blood sample of a black woodpecker (Dryocopus martius); (e) heterophils of a common blackbird and a common buzzard; (f) basophils of a common blackbird and a common buzzard; (g) monocytes of a common blackbird; and a European robin and (h) ruptured cells in a blood sample of a Eurasian blue tit.The blood smears were stained with Giemsa and scanned at 40× magnification.

Figure 2 .
Figure 2. Overview of our approach for counting avian blood cells in whole slide images.Data acquisition: Blood smears are prepared and scanned.The images are then cut into tiles and annotated by a human expert via a web-based annotation tool.Training: The neural network models are trained using annotated data for tile selection and instance segmentation.The models also assist in iteratively annotating images.Inference: An input image is tiled and fed into the tile selection model.Countable tiles are passed to the instance segmentation model before the final counts are determined.

Figure 3 .
Figure 3. Positive and negative examples for image tiles that are suitable for counting cells.Subfigures (a,c,e) show non-countable examples, while Subfigures (b,d,f) show countable tiles.

Figure 4 .
Figure 4. Distribution of annotated bird blood cell types.The plot shows how the annotated cell instances are spread among different cell types, i.e., erythrocytes, lymphocytes, eosinophils, heterophils, basophils, and monocytes.

Figure 5 .
Figure 5. Overview of the tile selection phase.The original image is split up into single tiles.Each tile is fed into the EfficientNet CNN model that classifies it as either countable or non-countable.We finally visualize the results by blacking out all tiles classified as non-countable.For visualization purposes, we aggregated 16 × 16 tiles to one patch in the depicted output.

Figure 6 .
Figure 6.Overview of the cell instance segmentation phase.Countable image tiles serve as inputs to the CondInst instance segmentation model.On the right, we visualize the predictions made by the model.Each color corresponds to one cell type.In the example, the model recognized one lymphocyte (red), one heterophil (orange), four eosinophils (green), and many erythrocytes (blue) reliably.

Table 1 .
Summary of annotated bird blood cells across 24 bird species and 1810 blood smear images.Numbers are given for each order of birds occurring in the dataset and for each blood cell type, i.e., erythrocytes as well as five subtypes of leukocytes.

Table 2 .
Results for tile selection model.We present accuracy and F1 score for the smallest available EfficientNet architectures, i.e., B0 and B1.All experiments were performed five times, and the standard deviation is reported.Values in bold indicate the best results for each metric.
Precision-recall curve.The curves for the corresponding cell types are drawn in different colors, i.e., blue for erythrocytes, orange for eosinophils, green for lymphocytes, and red for heterophils.

Table 5 .
Detection results for binary model.We report the AP at an IoU of 50% (AP@50) based on the predicted bounding boxes for the CondInst architecture with two different backbone models, namely, ResNet-50 (R-50) and ResNet-101 (R-101).Values in bold indicate the best results for each metric.

Table 6 .
Segmentation results for binary model.We report the AP at an IoU of 50% (AP@50) based on the predicted segmentation masks for the CondInst architecture with two different backbone models, namely, ResNet-50 (R-50) and ResNet-101 (R-101).Values in bold indicate the best results for each metric.

Table 7 .
Inference runtimes.We present details on the samples regarding their size and countability along with the inference time of the respective image.The number of countable tiles is determined by our best area-selection model based on the EffcientNet-B0 architecture.