AI-Enhanced Blood Cell Recognition and Analysis: Advancing Traditional Microscopy with the Web-Based Platform IKOSA

: Microscopy of stained blood smears is still a ubiquitous technique in pathology. It is often used in addition to automated electronic counters or flow cytometers to evaluate leukocytes and their morphologies in a rather simple manner and has low requirements for resources and equipment. However, despite the constant advances in microscopy, computer science, and pathology, it still usually follows the traditional approach of manual assessment by humans. We aimed to extend this technique using AI-based automated cell recognition methods while maintaining its technical simplicity. Using the web platform IKOSA, we developed an AI-based workflow to segment and identify all blood cells in DAPI-Giemsa co-stained blood smears. Thereby, we could automatically detect and classify neutrophils (young and segmented), lymphocytes, eosinophils, and monocytes, in addition to erythrocytes and platelets, in contrast to previously published algorithms, which usually focus on only one type of blood cell. Furthermore, our method delivers quantitative measurements, unattainable by the classical method or formerly published AI techniques, and it provides more sophisticated analyses based on entropy or gray-level co-occurrence matrices (GLCMs), which have the potential to monitor changes in internal cellular structures associated with disease states or responses to treatment. We conclude that AI-based automated blood cell evaluation has the potential to facilitate and improve routine diagnostics by adding quantitative shape and structure parameters to simple leukocyte counts of classical analysis.


Introduction
Microscopy of stained peripheral blood smears still holds significant relevance in hematological studies [1][2][3].This simple protocol requires the expertise of physicians, who provide the robustness and versatility of the human power of pattern recognition but, simultaneously, the protocol has the limitation of being time-consuming.Despite recent reports showing that blood stains can be conflictive and imprecise, morphological modifications associated with leukemia and cancer observed on blood smears still play a significant role in diagnostics [4,5].Routine analyses use automated methods, such as flow cytometry or electronic cell counters, to gather a sufficient quantity and quality of measures but lack the structural and morphological assessment of blood cells.On the other hand, the morphological evaluation of leukocytes by a pathologist usually lacks quantitative numerical values that are feasibly unattainable by human means [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21].
Giemsa stains have been reported as a possible histological staining method at the beginning of the 20th century [22]; however, the first written report of their use on peripheral blood smears dates to 1968 [1].Since then, they have been extensively used for a myriad of diagnostics, ranging from parasitic infections, kidney or liver diseases, myeloproliferative or lymphoproliferative disorders, several types of cancer, and different types of anemia [2,3,23,24], in addition to their routine use to determine percentages of the leukocyte subtypes.The longevity of this diagnostic method relies on its simplicity, fast output, low demand for resources, and the expertise of physicians.The low demand allows its applicability in areas with minimal resources.In contrast to microscopy-based methods, high-throughput instruments, like flow cytometers, are often combined with immunofluorescence protocols to obtain quantitative information on cell markers.However, this requires far more complex methods than peripheral blood smears and an investment that would several-fold surpass the resources needed for brightfield microscopy of peripheral blood smears.Furthermore, these methods do not report on cellular morphologies, which are often linked to important signaling networks [25,26] of biological processes, such as ERK and PI3K/AKT/mTOR pathways [17,[27][28][29][30][31][32][33], which have been shown to be associated with pathological conditions and morphological changes.Furthermore, it has been published that the mechanical qualities of immune cells are closely related to their activity state [34][35][36][37], erythrocytes' membrane flexibility reflects their ability to take up oxygen [38][39][40][41][42], and the size distribution of platelets can be used to trace back abnormal physiological conditions [43].It should also be considered that changes in plasma compositions regarding lipid, protein, and ionic concentration can lead to structural changes in blood cells measurable in peripheral blood smears [41,44].
Commonly, blood stains are analyzed only in a qualitative manner, which is accompanied by quantitative laboratory assays ordered by physicians.The diagnosis and treatment of various diseases rely on the conjunct application of these protocols.Understandably, quantitative measurement of cell morphology by human means is unfeasible due to the time demands and the impracticality of the measurements.Therefore, blood stains are mainly used for qualitative evaluations while treatment decisions are made on the basis of quantitative laboratory tests.However, by enhancing the information output of blood stains and quantifying cellular morphologies, the evaluation of a patient could be refined by this technically simple method, and the battery of laboratory assays might be reduced.This could enable a more personalized and tailored treatment at an early stage.
The field of hematology is well-suited for AI applications, and several algorithms have been developed to classify blood cells.These algorithms have pros and cons, but overall, they have been proven quite effective in identifying cell types, with a minimal rate of misclassifications [7][8][9][10][45][46][47][48][49].However, so far, they have usually only been applied to one blood cell type, and in most cases have not been combined with cellular border detection, leaving the potential for the multifactorial quantification of morphology at the single cell level unrealized.On the other hand, high-throughput methods, such as flow cytometry, which have the required quantitative power, lack morphological information [13,16].It seems evident that multidimensional quantifications of cellular morphology bear a great potential for precision diagnostics [12,15,17,[50][51][52][53][54][55][56].Up to now, quantitative morphological assessments of leukocytes have mainly focused on neutrophils and lymphocytes, the more abundant immune cell types [15,16,19,20,[57][58][59][60][61][62], while less common cells, like eosinophils, are hardly analyzed in these correlative analyses.The aim of our study was to assess all cell types of blood circulation and fill the gap between non-numerical/morphological techniques and numerical/non-morphological methods, such as flow cytometry, expanding the realm of AI blood cell identification from late to early diagnostics.A current challenge of pathology is that the time capacity of expert pathologists is quite limited, meaning that they can investigate only a limited number of patient samples in more detail by microscopy; the overwhelming majority of samples are analyzed by automated cell counters, delivering only percentages and very basic quantitative measures.AI-based analysis of (automatically acquired) microscopy images can address this challenge by taking over an important part of human expertise in morphological assessment.Furthermore, it can outperform the human observer by delivering precise numerical values for geometries or shape factors.In our study, we achieved that by developing an artificial intelligence algorithm for microscopy images of blood stains to classify all circulatory cells into their specific subtypes and deliver quantitative and numerical values in a way that can be automated and that allows for applying multidimensional metrics and new methods of diagnostics.In essence, the more sophisticated multidimensional quantifications that we present in this study aim to transform mere qualitative morphological assessments of physicians into quantitative analyses that can be used for precision diagnostics of physiological and pathological conditions.

Materials and Methods
Blood was collected from nine healthy subjects from the ring fingers pricked with a lancet (Heinz Herenz Hamburg #1110102) and spread over microscopy glass slides as a monocellular layer following the traditional blood smear method described below.
For leukocyte concentrates, a phlebotomy was performed to collect blood in EDTAcoated tubes from six healthy individuals.These six individuals are accounted for in the peripheral blood smear group.A total of 9 mL of freshly collected blood was centrifuged in a Sigma 4-16 KS centrifuge (Sigma Zentrifugen, Osterode am Harz, Germany) at 300× g for 7 min.After the blood was separated into an erythrocyte-rich (bottom) and erythrocyte-poor (top) liquid-liquid 2-phase system, the erythrocyte-poor layer was put into a separated 10 mL tube, which was further centrifuged at 700× g for 10 min.The resulting pellet was resuspended in 150 µL of the supernatant.The resulting resuspended pellet was subsequently spread out on microscopy slides as the WBC concentrate.
A drop of blood or a WBC concentrate was placed on the end of a glass microscope slide opposite the frosted ends.A beveled microscope glass slide (Knittel Glass, Braunschweig, Germany, VS1147#1FKB.01)was placed at a 45 • angle in front of the drop.The drop was touched with the beveled slide, and once the fluid was distributed on the edge of the slide, it was smeared toward the frosted end of the microscope slide.The smeared fluid was left to dry for 15 min.The dry slides were fixed with −20 • C methanol for 9 min and air-dried again.A 1 µg/mL solution of DAPI, diluted in distilled water, was used to label the fixed smears for 5 min, followed by air drying.The DAPI-labeled smears were then stained with a GIEMSA solution for 30 min, according to the manufacturer's specifications (Chemlab, Zedelgem, Belgium, CL04.0502), and mounted with a 125 g/L glycerin solution, followed by sealing of the edges with nail polish.
The DAPI-Giemsa co-stained smears were imaged with an Olympus IX71 inverted microscope (Tokyo, Japan) equipped with a DFK 72BUC02 color camera (Middleton, WI, USA), an iXon Ultra 888 EMCCD camera (Abingdon, UK), a 73006081 Ludl filter wheel (Hawthorne, CA, USA), a CoolLED pE-4000 light source (Andover, UK), a 100× Olympus oil objective, and a halogen lamp.The setup was controlled using the free and opensource software "Micro-Manager 2.0" (https://micro-manager.org/Citing_Micro-Manager accessed on 3 January 2023).Four images were taken for each field of view selected: one color image, one transmission image, one image using the 385 nm excitation line and a 447/60 blue emission filter (DAPI), and one image using the 565 nm excitation line and a 593/40 red emission filter (red equivalent).
The algorithms that we present in this study are based on 2138 images taken from 31 peripheral blood smears and 20 smears of white blood cell concentrates.As mentioned above, images were acquired using two different cameras: a color camera and a monochromatic camera.The color images were used for the manual classification of blood cells.Due to the difference in pixel size and field of view, an algorithm was developed to match the segments and sizes of the images.The code for matching the segments is available in our repository (https://github.com/MCamposMedina/Image-matchingaccessed on 9 November 2023).Once the images were processed, they were uploaded to the online image analysis software platform IKOSA 1.3, a product provided by our industrial partner (KML Vision GmbH, Graz, Austria, available from https://app.ikosa.ai/accessed on 27 December 2023).The development of the algorithms was based on providing initial annotations of the different cell types by an experienced observer.The computational training of the AI was then autonomous, and all internal pre-existing parameters were application-agnostic.All internal variables were calculated during the AI training phase based on the information provided by the training dataset.An instance segmentation AI algorithm was trained with 80% of the images and validated with the remaining 20%.The images used for the training and validation were randomly allocated.Each developed algorithm was later tested with images excluded from the training/validation set.In addition to the commonly used 80/20 ratio of training versus validation datasets, other ratios were tested as well.
In order to quantify the performance of the algorithms, we used 3 statistical parameters: Accuracy, Recall, and IoU (intersection over union, Jaccard index), as specified below.These quantifications take into account the observational errors (FPs = false positives, FNs = false negatives), and we compared them to the correct observations (TPs = true positives, TNs = true negatives).Their corresponding equations are:

Development of a Workflow for Image Analysis
Based on our camera hardware and our goal to compare color with monochrome/fluorescence image acquisition, we first had to establish a workflow for matching the images of the two different cameras.

Image Matching Segment Extraction
The three monochromatic images constrained a 1024 × 1024 (79 × 79 µm) pixel field of view, whereas the color image provided a 648 × 484 pixel field (48 × 36 µm).The color images were white-normalized using an ad hoc Python script, which is publicly available in our repository (https://github.com/MCamposMedina/Image-matchingaccessed on 9 November 2023).The monochromatic images were resized and realigned accordingly using an in-house Python 3.9.13script.The field of view captured by the color image was then extracted from the monochromatic images using the open-source Python library OpenCV (https://www.oreilly.com/library/view/mastering-opencv-4/9781789344912/c7cd6c68-e4be-46d3-b5b1-474066284ee6.xhtmlaccessed on 4 April 2023).Once the corresponding image segment was extracted, the monochromatic images were saved in a three-channel stack.

Algorithm Training and Validation
Each object was adequately identified and annotated in the color images according to one of the following seven labels: erythrocyte, young neutrophil, segmented neutrophil, lymphocyte, eosinophil, monocyte, and platelet.The annotation of the object consisted of using a hand-drawn tool to highlight the area that corresponds to the label assigned to the object (Figure 1).The 2138 images contain the objects' distribution shown below in Table 1.
Interestingly, when we tested Giemsa stains on the fluorescence channels, we noticed that it provided a clear signal in the red channel (Figure 1).By taking advantage of this endogenous fluorescence, we characterized cells based on their cytoplasm red-equivalent fluorescence together with the DAPI fluorescence of the nucleus for automated cell recognition.
Interestingly, when we tested Giemsa stains on the fluorescence channels, we noticed that it provided a clear signal in the red channel (Figure 1).By taking advantage of this endogenous fluorescence, we characterized cells based on their cytoplasm red-equivalent fluorescence together with the DAPI fluorescence of the nucleus for automated cell recognition.Considering that our algorithm had to assign multiple labels of the same type in a single image, as is obviously the case for erythrocytes, we selected an instance segmentationoriented algorithm, which is appropriate for object-based classification (while semantic segmentation is a pixel-based classification).The AI instance segmentation algorithm was initially trained using 80% of the annotated images.The remaining 20% was used for the validation of the algorithm.The 80/20 ratio is a well-established principle, as it prevents overfitting.To verify the robustness of the 80/20 ratio against other ratios of training/validation, we tested three other ratios (see Supplementary Table S1) to ensure the best-performing algorithm.The one with the lowest observational errors was designated as the main functioning algorithm.Since the training and validation images were randomly selected, this configuration avoids introducing selection bias and provides cross-validation.Two primary cell detection and classification types of algorithms were developed.One was trained with color images, and the other one was trained with the three-channel monochromatic stack (transmission, DAPI, red, TDR).The annotations made on the color images were copied to the TDR image set.Consequently, both algorithms contain the same number of images and annotations.

Multidimensional Quantifications
In addition to the AI-based segmentation of objects (cells), we aimed for a deeper analysis of the identified objects based on internal structures and textures.Similar to the GLCM quantification [63][64][65], our interpretation of textures focused on quantifying the distribution of intensities according to their spatial distribution.However, we collected the intensities of the surrounding eight pixels of each pixel that belonged to the realization of an object.In this way, the edge corners only take into account pixels that lie within the detected area.The intensities 1-255 were quantified according to the intensity values surrounding them.Based on the matrix of intensities, the corresponding probability distribution was calculated as: Finally, texture quantification was based on Shannon's definition of entropy [66], which follows the following equation: where: S = Texture entropy of an object; x ij = Probability of a pixel having a value of i to be surrounded by a pixel of value j; i, j = Eight-bit-based intensity value; N ij = Number of pixels with the value of i surrounded by pixels of value j.

Cell Detection and Measurements
As can be observed in Figure 2, both algorithms-the one with color images and the one with fluorescence/monochrome-transmission images-could detect the blood cells that they were trained to classify, except for young neutrophils.There was no significant difference in the area detected by either of the algorithms.It should be considered that the objects are detected regardless of the number of realizations, as can be seen for the erythrocytes and the platelets.Additionally, it is crucial to mention that all the measurements and calculations were performed with cells that were not touching the border of the field of view.A future version of the IKOSA platform will exclude these objects automatically.To corroborate the reliability of the measurements that follow the object detection, we determined the diameters of each type of blood cell identified by the algorithm and compared them to the values reported in the literature [4,6,43,[67][68][69][70][71][72].This demonstrated that our cell recognition delivered the diameters that are typical for each object's respective cell type (see Table 2).Furthermore, we tested our algorithm on publicly available images of leukocytes classified by independent pathologists.It has to be noted that the techniques presented in this study are only capable of correctly detecting the blood cells included in the development of the algorithm.Therefore, using images containing other cells, structures, textures, or an arrangement of colors similar to those found in any of the identifiable blood cells (erythrocytes, platelets, young neutrophils, segmented neutrophils, monocytes, lymphocytes, and eosinophils) could lead to erroneous detections (if the algorithm is applied to the wrong type of sample).Supplementary Figure S1 shows the results of the algorithm applied to images that do not contain any blood cells but rather adherent endothelial cells, demonstrating that the error rate is low with unrelated cells.Nevertheless, it is evident that the algorithm should only be applied to the type of sample (blood and leukocyte smears) for which it had been developed.To corroborate the reliability of the measurements that follow the object detection, we determined the diameters of each type of blood cell identified by the algorithm and compared them to the values reported in the literature [4,6,43,[67][68][69][70][71][72].This demonstrated that our cell recognition delivered the diameters that are typical for each object's respective cell type (see Table 2).Furthermore, we tested our algorithm on publicly available images of leukocytes classified by independent pathologists.It has to be noted that the techniques presented in this study are only capable of correctly detecting the blood cells included in the development of the algorithm.Therefore, using images containing other cells, structures, textures, or an arrangement of colors similar to those found in any of the identifiable blood cells (erythrocytes, platelets, young neutrophils, segmented neutrophils, monocytes, lymphocytes, and eosinophils) could lead to erroneous detections (if the algorithm is applied to the wrong type of sample).Supplementary Figure S1 shows the results of the algorithm applied to images that do not contain any blood cells but rather adherent endothelial cells, demonstrating that the error rate is low with unrelated cells.Nevertheless, it is evident that the algorithm should only be applied to the type of sample (blood and leukocyte smears) for which it had been developed.We found very narrow distributions of erythrocyte and platelet diameters, while leukocytes exhibited a broader variability of sizes (Figure 3).We found very narrow distributions of erythrocyte and platelet diameters, while leukocytes exhibited a broader variability of sizes (Figure 3).According to the observational error metrics (Tables 3 and 4), the algorithm trained using color images was slightly more efficient than its monochromatic counterpart.However, as presented in Figure 2, as well as Tables 3 and 4, only the algorithm based on the monochromatic images was able to correctly differentiate between young neutrophils and According to the observational error metrics (Tables 3 and 4), the algorithm trained using color images was slightly more efficient than its monochromatic counterpart.However, as presented in Figure 2, as well as Tables 3 and 4, only the algorithm based on the monochromatic images was able to correctly differentiate between young neutrophils and segmented neutrophils.Hence, the multidimensional calculations shown below were based on detections with the TDR algorithm.We also used alternative randomizations of images into training and validation datasets (in total three different randomizations) to build up additional algorithms and tested their performances and variation.This revealed very similar results with only very low coefficients of variation (Supplementary Tables S2 and S3).By combining the spatial distribution of pixels with their corresponding intensities, a multidimensional analysis allowed the quantification of the smoothness or roughness observed in the transmission, DAPI, or red-equivalent channels.Shannon's entropy interpretation suggests that a cell that shows a uniform distribution of intensities throughout would show lower entropy values.On the contrary, cells with high variability in their internal structures comprise a more diverse histogram of spatially dependent intensities and are thus associated with higher entropy levels.Moreover, discontinuities in the intensities of a cell's cytoplasm would also correlate to higher entropy values.
The multidimensional quantifications of the transmission images showed the lowest entropy values for erythrocytes, similarly low values for lymphocytes and platelets, and increasingly higher values for the remaining leukocyte subtypes (Figure 4).Nonetheless, there was no clear separation, especially between the leukocyte subtypes, except for the lymphocytes.However, the DAPI staining-based multidimensional quantification revealed a more cell type-specific grouping.Young neutrophils, segmented neutrophils, and monocytes yielded similar values, whereas the other leukocyte subtypes were clearly separated.Interestingly, erythrocytes were separated into two distinct populations in the DAPI quantifications, similar to the red fluorescence-based multidimensional analysis.After surveying the images, the distinct values in the DAPI and red-equivalent quantifications were caused by a population of erythrocytes that had platelets on top of them mimicking internal structures, thereby causing a higher entropy.All the erythrocytes in that population belonged to the white blood cell concentrated smears (which always had some remaining erythrocytes), indicating that this was a preparation artifact.In order to test the applicability of our method to the assessment of disease states, we applied our algorithm to images from a public database (provided by the American Society of Hematology: https://imagebank.hematology.org/accessed on November 10 2023) containing pathological conditions.The results presented in Figure 5 show highly segmented neutrophils from an anemia patient and erythrocytes with Pappenheimer bodies and basophilic stippling obtained from a patient with sickle cell disease.Neutrophils with a clear pathological condition deviated from the entropy/diameter clusters of normal cells.The separation observed in the entropy-cell diameter plane was higher for cells with more drastic structural modifications than cells with minor morphological changes.In the case of sickle cell erythrocytes, the separation was smaller than the heavily segmented neutrophils, apparently due to lower aberrations of internal structures.In order to test the applicability of our method to the assessment of disease states, we applied our algorithm to images from a public database (provided by the American Society of Hematology: https://imagebank.hematology.org/accessed on m containing pathological conditions.The results presented in Figure 5 show highly segmented neutrophils from an anemia patient and erythrocytes with Pappenheimer bodies and basophilic stippling obtained from a patient with sickle cell disease.Neutrophils with a clear pathological condition deviated from the entropy/diameter clusters of normal cells.The separation observed in the entropy-cell diameter plane was higher for cells with more drastic structural modifications than cells with minor morphological changes.In the case of sickle cell erythrocytes, the separation was smaller than the heavily segmented neutrophils, apparently due to lower aberrations of internal structures.

Discussion
The big advantage of an AI-based workflow used to identify the various cell types of the blood is that objects cannot only be detected and classified in an automated manner without any human workload; these objects can also be measured and quantified in a way that exceeds the possibilities of human observers.In our study, we were able to demonstrate that adding a nuclear dye and taking advantage of the endogenous fluorescence of the Giemsa components can improve the AI-based algorithm and cell recognition, thereby extending the exploratory potential of a traditional hematological diagnostic method, like blood smear microscopy.Importantly, the inclusion of the DAPI staining in the Giemsa protocol does not significantly increase the requirements of the method regarding resources or equipment.
The algorithms that we developed with our samples and images correctly detected and identified blood cells taken from publicly available databases and delivered the expected parameters for cell diameters, demonstrating their robustness for quantitative measurements.Moreover, the color-trained algorithm was able to nullify the effect of different tonalities in the cells' cytoplasm, abnormal shapes, and foreign bodies in their

Discussion
The big advantage of an AI-based workflow used to identify the various cell types of the blood is that objects cannot only be detected and classified in an automated manner without any human workload; these objects can also be measured and quantified in a way that exceeds the possibilities of human observers.In our study, we were able to demonstrate that adding a nuclear dye and taking advantage of the endogenous fluorescence of the Giemsa components can improve the AI-based algorithm and cell recognition, thereby extending the exploratory potential of a traditional hematological diagnostic method, like blood smear microscopy.Importantly, the inclusion of the DAPI staining in the Giemsa protocol does not significantly increase the requirements of the method regarding resources or equipment.
The algorithms that we developed with our samples and images correctly detected and identified blood cells taken from publicly available databases and delivered the expected parameters for cell diameters, demonstrating their robustness for quantitative measurements.Moreover, the color-trained algorithm was able to nullify the effect of different tonalities in the cells' cytoplasm, abnormal shapes, and foreign bodies in their internal structures.As is known from various pathological conditions, the shape of cells can change with disease state.Erythrocytes and platelets usually exhibit a very well-defined size, and deviations from it are clear signs of pathological conditions [40,43,73].Since we used blood from healthy subjects in this study, the distribution of cell diameters that we observed for both these cell types was compact, even after considering the overlapping in the erythrocytes.However, it can be anticipated that, for example, the erythrocyte distribution width changes with chemotherapy treatment of cancer patients, given that this affects cell division and thus erythropoiesis and that this can be used for therapy monitoring purposes [74].
In contrast to erythrocytes and platelets, the leukocytes' diameters measured in our images showed a higher variance.Lymphocytes and monocytes increase in size as a sign of maturity [21,60].Monocytes become larger before they migrate into the lymphatic system, followed by their differentiation into macrophages [42,75,76].Similarly, neutrophils have been characterized as leukocytes, whose mechanical properties are intrinsically related to their activation state [11,36,77].Although activated neutrophils are usually not found in the blood, their plasticity leads to deformations that are expected to slightly broaden their size distribution once a blood smear is performed.Testing our algorithm with blood stain images from megaloblastic anemia patients verified that aberrations of neutrophils can be detected in a quantitative manner via entropy measures, emphasizing the potential of this automated image analysis for diagnostic purposes.Interestingly, little is known about the morphodynamics of eosinophils.Instead, an elevation or a reduction in their numbers has been used as a marker for various disease states.Considering eosinophils' crucial role in parasitic infections and allergies, quantifying their morphology might help to find trends in the dynamics of such pathologies.
It is clear that our study is not the first one on the application of AI methods to the classification of blood cells.There are several published reports on related approaches, especially for pathological conditions, such as anemia and leukemia [2,7,47,78].However, most reported models have focused on structural modifications in only one specific type of blood cell, usually erythrocytes or lymphocytes.Furthermore, previously published AI models yield qualitative information about the cells studied, while our model yields quantitative information on the blood cells' morphology.This opens new possibilities to assess structural modifications associated with diseases that affect hematopoiesis or cellular activation states.It might be possible, for example, to detect developing leukemia by morphological anomalies, which are imperceivable to the human eye but are noticeable by AI algorithms, even before a significant increase in leukocyte numbers is observed.As shown in Figure 5, these imperceivable abnormalities can manifest as deviations from entropy or shape factor values expected for normal cells.As mentioned before, most of the previously reported algorithms were, by design, constrained to detect only one type of blood cell, and some required up to 10,000 images.The algorithms that we present in this study achieve comparable values of accuracy (>90%) and recall (>95%) for all our detectable objects [8,9,45,46,48,49,78,79].However, the similarities between segmented neutrophils and young neutrophils required a higher number of cells for exact discrimination.In combination with the rather low number of young neutrophils (comprising about 3-5% of the leukocytes), it is clear that this limits the algorithm's ability to discriminate between these two states of neutrophils.An over-sampling of neutrophils was consequently required to differentiate young and mature neutrophils.Nevertheless, this had to be performed only for the training of the AI so that subsequent analyses are able to discriminate them precisely, at least with the algorithm that was based on a combination of transmitted light and DAPI, as well as red fluorescence (the TDR algorithm).Naturally, having a limited number of realizations of an object causes some loss of accuracy of the algorithm, and objects that slightly differ from what is found in the database might be overlooked.
An interesting observation during our study was that the AI algorithms initially had difficulties distinguishing between certain cells, such as eosinophils and erythrocytes, which are easily discriminated by human observers.The reason for this was apparently that the red color of erythrocytes was similar to the red staining of eosinophils, despite their clear differences in size and structure.However, with an increasing number of objects in the training dataset, this issue was solved.This finding emphasizes that AI algorithms based on neural networks find their decisions in a way that differs significantly from humans.Consequently, either sufficient numbers of objects have to be applied for the training of the algorithm, or additional parameters, such as cell diameters, have to be used as "gating" values to deliver precise classifications.The latter would have required a completely different set-up of machine learning, which was unfeasible in the current study.Nevertheless, our approach demonstrated a clear advantage of analyzing the pixels' intensities and arrangement to achieve an entropy quantification, which turned out to be a simple yet efficient way of clustering cells according to their visual features.This method quantifies how complex a cell's internal structure is.Consequently, cells with similar internal structures in a given staining are not easily separable.However, adding DAPI fluorescence to the entropy analysis proved to be an effective way to improve the classification.All the leukocyte subtypes showed individual clustering, except for the segmented neutrophils, which could not be separated from the young neutrophils.As expected, the leukocyte's red-fluorescence entropy values had insignificantly different values.In addition, the second population of erythrocytes that we observed only in the white blood concentrated smears implies that cell preparation routines can affect the analysis.Nevertheless, our entropy-related measurement method detected internal structure modifications that are not reflected in differences in geometric parameters, such as cell diameters or perimeters.

Conclusions
In conclusion, we think that our multidimensional analysis, including the entropy measurements, bears the potential to detect alterations in internal structures that are associated with pathological states.
Although extensive research has been published on correlations between pathological conditions (such as depression, chemotherapy effects, stroke risks) and leukocyte size, erythrocyte diameter, or leukocyte type distribution, none of these reports have explored internal cellular structures.Our algorithm can potentially follow the development of diseases and responses to therapies by tracking subtle morphological changes whose latter stages are well-defined in histological studies, including parasite infections, various types of cancer, hematological conditions, or autoimmune diseases.We conclude that the algorithms developed during this study provide a means for automatizing the visual recognition and morphological quantification of different types of blood cells.In addition, combining the traditional Giemsa stain with DAPI enhances the recognition rate of otherwise similar cells, particularly young neutrophils and segmented neutrophils.Furthermore, our results cast light on the potential role of multidimensional analysis employing measurements of generalized GLCM entropy as a marker for blood cells' general condition.
The algorithms that we present in this study have been established for the blood of healthy individuals and proved to discover pathological alterations seen in public image databases.Nevertheless, the algorithms and the AI training have to be extended to samples comprising well-defined disease states to unleash their full potential.However, we are confident that our approach lays the foundation for a new range of studies to quantify and characterize the blood cells of patients in various disease states.Since blood smear microscopy is technically very simple and does not require any expensive equipment, our AI-based analysis approach has the potential of widespread application.Even small diagnostic labs or general practitioners could use it and apply it for routine analysis of their patients.Comparing individual results with the database of common features and following personal changes over time, for instance, during treatment phases, has the potential to contribute to personalized precision health care.errors for the color-trained algorithms; Table S2: Statistical quantification of the observational errors for three color-trained algorithms using three different randomizations of images into training and validation datasets; Table S3: Statistical quantification of the observational errors for three TDR-trained algorithms using three different randomizations of images into training and validation datasets.

Figure 1 .
Figure 1.Example of annotations in a color image (top row) and a monochromatic stack (bottom row).The three channels that constitute the stack are shown separately as images A (transmission),

Figure 1 .
Figure 1.Example of annotations in a color image (top row) and a monochromatic stack (bottom row).The three channels that constitute the stack are shown separately as images (A) (transmission), (B) (DAPI), and (C) (red fluorescence channel).Only the cells that are present in the field of view are labelled, according to the legend's pseudo-colors.

Figure 2 .
Figure 2. Example of the automatic detection of blood cells in color images (first and third row) and TDR stacks (second and fourth row).TDR images are transmission, DAPI and red-fluorescence channels merged into one.The rainbow-colored areas with thin white borders represent objects (cells) that had been automatically identified by the algorithm.Red areas in rows two and four without white borders represent the red fluorescence of the Giemsa stain.

Figure 2 .
Figure 2. Example of the automatic detection of blood cells in color images (first and third row) and TDR stacks (second and fourth row).TDR images are transmission, DAPI and red-fluorescence channels merged into one.The rainbow-colored areas with thin white borders represent objects (cells) that had been automatically identified by the algorithm.Red areas in rows two and four without white borders represent the red fluorescence of the Giemsa stain.

Figure 3 .
Figure 3. Distribution of the measured diameters for the identified blood cells.(A) Cells detected using the color-trained algorithm.(B) Cells detected using the TDR stack-trained algorithm.

Figure 3 .
Figure 3. Distribution of the measured diameters for the identified blood cells.(A) Cells detected using the color-trained algorithm.(B) Cells detected using the TDR stack-trained algorithm.

Figure 4 .
Figure 4. Clustering of detected blood cells for transmission images, DAPI fluorescence, and redequivalent fluorescence, in descending order.Blue denotes a lower occurrence density, whereas yellow indi-cates a higher density , see colorbar .In addition, due to their significantly bigger occurence, erythrocytes and platelets are given a transparency of 90%.The black dots represent the median values for each of the measurements used for clustering.A = platelets, B = erythrocytes, C = young neutrophils, D = segmented neutrophils, E = monocytes, F = lymphocytes, G = eosinophils.The erythrocytes whose quantification belongs to the highlighted population are circled with a dotted line.

Figure 4 .
Figure 4. Clustering of detected blood cells for transmission images, DAPI fluorescence, and redequivalent fluorescence, in descending order.Blue denotes a lower occurrence density, whereas yellow indi-cates a higher density, see colorbar.In addition, due to their significantly bigger occurence, erythrocytes and platelets are given a transparency of 90%.The black dots represent the median values for each of the measurements used for clustering.A = platelets, B = erythrocytes, C = young neutrophils, D = segmented neutrophils, E = monocytes, F = lymphocytes, G = eosinophils.The erythrocytes whose quantification belongs to the highlighted population are circled with a dotted line.

Figure 5 .
Figure 5. Entropy/cell diameter scattergrams of healthy versus pathological blood cells.(A) Neutrophils from a patient with megaloblastic anemia (red dots, https://imagebank.hematology.org/collection/62441accessed on November 10 2023) compared to healthy controls.(B) Erythrocytes from a sickle cell disease patient (red dots, https://imagebank.hematology.org/collection/3980accessed on November 10 2023) compared to healthy controls.Blue denotes a lower occurrence density, whereas yellow indicates a higher density, see colorbar.The black dots represent median values for each of the measurements used for the clustering of healthy cells.Arrows in the original microscopy images come from the image database and did not impair the analysis by the algorithm.

Figure 5 .
Figure 5. Entropy/cell diameter scattergrams of healthy versus pathological blood cells.(A) Neutrophils from a patient with megaloblastic anemia (red dots, https://imagebank.hematology.org/collection/62441 accessed on 10 November 2023) compared to healthy controls.(B) Erythrocytes from a sickle cell disease patient (red dots, https://imagebank.hematology.org/collection/3980accessed on 10 November 2023) compared to healthy controls.Blue denotes a lower occurrence density, whereas yellow indicates a higher density, see colorbar.The black dots represent median values for each of the measurements used for the clustering of healthy cells.Arrows in the original microscopy images come from the image database and did not impair the analysis by the algorithm.

Table 1 .
Number of objects annotated for each label used in the algorithm.

Table 2 .
Comparison of cell diameters [µm] obtained with the two AI-based instance segmentation and values reported in the literature.

Table 2 .
Comparison of cell diameters [µm] obtained with the two AI-based instance segmentation and values reported in the literature.

Table 3 .
Statistical quantification of the observational errors for the color-trained algorithm (table entries are colored according to the respective values from zero: white to 1: full green).

Table 4 .
Statistical quantification of the observational errors for the TDR-trained algorithm (table entries are colored according to the respective values from zero: white to 1: full green).