Next Article in Journal
AI-Enhanced Rescue Drone with Multi-Modal Vision and Cognitive Agentic Architecture
Previous Article in Journal
Reducing Annotation Effort in Semantic Segmentation Through Conformal Risk Controlled Active Learning
Previous Article in Special Issue
Rodent Social Behavior Recognition Using a Global Context-Aware Vision Transformer Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AI-Driven Analysis for Real-Time Detection of Unstained Microscopic Cell Culture Images

1
Pathology, Faculty of Medicine, University of Augsburg, 86156 Augsburg, Germany
2
LABMaiTE GmbH, 79110 Freiburg, Germany
3
Institute of Mathematics, University of Augsburg, 86159 Augsburg, Germany
4
Hematology and Oncology, Faculty of Medicine, University of Augsburg, 86156 Augsburg, Germany
5
Comprehensive Cancer Center Augsburg (CCCA), 86156 Augsburg, Germany
6
Bavarian Cancer Research Center (BZKF), 86156 Augsburg, Germany
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
AI 2025, 6(10), 271; https://doi.org/10.3390/ai6100271
Submission received: 12 September 2025 / Revised: 14 October 2025 / Accepted: 16 October 2025 / Published: 18 October 2025
(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)

Abstract

Staining-based assays are widely used for cell analysis but are invasive, alter physiology, and prevent longitudinal monitoring. Label-free, morphology-based approaches could enable real-time, non-invasive drug testing, yet detection of subtle and dynamic changes has remained difficult. We developed a deep learning framework for stain-free monitoring of leukemia cell cultures using automated bright-field microscopy in a semi-automated culture system (AICE3, LABMaiTE, Augsburg, Germany). YOLOv8 models were trained on images from K562, HL-60, and Kasumi-1 cells, using an NVIDIA DGX A100 GPU for training and tested on GPU and CPU environments for real-time performance. Comparative benchmarking with RT-DETR and interpretability analyses using Eigen-CAM and radiomics (RedTell) was performed. YOLOv8 achieved high accuracy (mAP@0.5 > 98%, precision/sensitivity > 97%), with reproducibility confirmed on an independent dataset from a second laboratory and an AICE3 setup. The model distinguished between morphologically similar leukemia lines and reliably classified untreated versus differentiated K562 cells (hemin-induced erythroid and PMA-induced megakaryocytic; >95% accuracy). Incorporation of decitabine-treated cells demonstrated applicability to drug testing, revealing treatment-specific and intermediate phenotypes. Longitudinal monitoring captured culture- and time-dependent drift, enabling separation of temporal from drug-induced changes. Radiomics highlighted interpretable features such as size, elongation, and texture, but with lower accuracy than the deep learning approach. To our knowledge, this is the first demonstration that deep learning resolves subtle, drug-induced, and time-dependent morphological changes in unstained leukemia cells in real time. This approach provides a robust, accessible framework for label-free longitudinal drug testing and establishes a foundation for future autonomous, feedback-driven platforms in precision oncology. Ultimately, this approach may also contribute to more precise and adaptive clinical decision-making, advancing the field of personalized medicine.

1. Introduction

In the search for new cancer therapeutics, in vitro drug testing and drug screening in cell lines and on primary samples and their derivatives have proven to be indispensable tools that accelerate the drug discovery process. This approach allows researchers to evaluate the efficacy and safety of potential drug candidates before advancing to more complex and ethically challenging in vivo studies [1,2,3,4]. One of the primary advantages of in vitro drug testing is the ability to conduct experiments in a highly controlled environment. Cell lines provide a simplified yet representative model of cellular behavior. This enables precise investigation of the impact of drug candidates on phenotypic changes and specific molecular pathways. Furthermore, it facilitates high-throughput screening, allowing to assess many compounds efficiently and accelerating the drug discovery timeline by quickly identifying promising candidates for further investigation. As such, in vitro drug testing may have utmost relevance in personalized medicine allowing researchers to tailor drug-screening assays to individual genetic profiles, predicting how a patient might respond to a specific treatment and optimizing therapeutic strategies [5]. Automated and advanced imaging technologies further enhance the speed and accuracy of compound screening, enabling researchers to sift through vast libraries of potential drugs and combinations [6]. However, the ultimate utility of cellular models for drug testing depends on accurate real-time longitudinal monitoring and recognizing relevant molecular and/or phenotypic changes such as differentiation, induction of cell death and others [7]. The ability to monitor such experiments in real time, without staining or fixation, remains a critical unmet need.
Conventional staining methods such as immunofluorescence provide valuable molecular insights, but they have fundamental drawbacks. Stains and labels are often invasive, may alter cellular physiology, and prevent true longitudinal observations. Moreover, marker expression is not always specific or universally available, and labeling can reduce viability or even activate signaling pathways [8]. These limitations make unbiased, real-time monitoring of cells in their native state essentially impossible with staining-based techniques.
Computer-aided approaches based on unstained, morphology-based analysis represents an attractive alternative. Yet, previous approaches to label-free phenotyping have been restricted to relatively coarse distinctions, such as separating cell groups with overt morphological differences (e.g., lymphocytes, granulocytes, and erythrocytes), or to viability assessments in simple contexts [9,10,11]. Classical radiomics-based pipelines, while interpretable, rely on handcrafted features such as size, shape, texturing of the cytoplasm and nucleus [9,12] and therefore lack the sensitivity to capture the subtle and dynamic phenotypes required for modern drug testing. Crucially, no method has yet demonstrated the ability to track fine-grained, time- and drug-dependent morphological changes in morphologically similar cells under fully stain-free conditions.
Beyond their methodological importance, such approaches are also relevant from a cellular pathology perspective. Reliable, morphology-based recognition could support not only preclinical drug testing but also key clinical tasks such as diagnosis, treatment monitoring, and prognosis (e.g., remission or relapse assessment, risk stratification, and detection of metastatic spread). Continuous, non-invasive observation has potential for disease surveillance and could provide insight into clonal evolution, a major contributor to relapse and therapy resistance. Equally important, minimizing false negatives is critical to avoid missing rare but clinically relevant subclones. These expectations underscore the broader significance of developing robust, real-time, label-free monitoring systems.
In this work, we address this gap by introducing a deep learning–based system capable of real-time analysis of unstained suspension cell cultures. Our model distinguishes between closely related leukemia cell lines, detects differentiation states induced by specific agents, and separates drug-induced morphological effects from gradual culture-time drift. This establishes a non-invasive framework for longitudinal drug testing, enabling continuous monitoring of living cells in their native state without staining, fixation, or disruption. By closing this methodological gap, our approach lays the foundation for autonomous, feedback-driven drug testing platforms in precision oncology.

2. Material and Methods

2.1. Cell Culture and In Vitro Treatment

The myeloid cell lines K562, HL-60, and Kasumi-1 (German Collection of Microorganisms and Cell Cultures GmbH, DSMZ, Braunschweig, Germany) were cultured in RPMI1640 media (Gibco, Waltham, MA, USA) containing 5% Penicillin/Streptomycin (Gibco) and 10% (K562) or 20% (HL-60, Kasumi-1) heat-inactivated fetal calf serum (Sigma-Aldrich, St. Louis, MO, USA). Cells were maintained at 0.2 × 106 /cells per ml in T75 flasks and split every 3 to 4 days.
K562 cells were treated with 50 µM hemin (Sigma-Aldrich) for six days and with 5 nM phorbol myristate acetate (PMA, Sigma-Aldrich) for 48 h for induction of erythroid differentiation and megakaryocytic differentiation, respectively. Decitabine (DAC, Sigma-Aldrich) treatment was conducted with three 20 or 100 nM pulses in 24 h intervals and cells were harvested after six days, as previously described by Stomper et al. [13]. Cell numbers were counted by 0,4% trypan blue staining and viability was measured using the NucleoCounter system (ChemoMetec A/S, Allerød, Denmark).

2.2. Hemoglobin (HBA) ELISA

Hemoglobin a-subunit (HBA) levels in cell culture supernatants from leukemic cell lines were quantified using the Human Hemoglobin Alpha ELISA Kit MBS167177 (MyBioSource, Inc., San Diego, CA, USA) according to the manufacturer’s instructions. Supernatants and recombinant standards were assayed in duplicate, and measurements were performed in three independent biological replicates. Absorbance was read at 450 nm, and concentrations were calculated from the standard curve with adjustment for dilution factors.

2.3. Automated Cell Culture and Image Generation

For image generation cells were transferred to a 50 mL falcon (reactor) with a minimum concentration of 0.2 × 106 cells/mL in 20 mL media. All preparations of the reactor were carried out under a laminar flow. The reactor containing the cells was assembled for automated cell culture and image generation in a custom-developed semi-automated AICE3 device (LABMaiTE, Freiburg, Germany) inside of an incubator as described in Figure 1A. The device comprises a peristaltic pump system and a microscope with a 20× objective with an integrated camera and is kept in an incubator at 37 °C and a 5% CO2 atmosphere to ensure the correct conditions for cell culture during imaging. Cell suspension mixing was performed automatically by the integrated peristaltic pump of the AICE3 system, operating at a flow rate of 35 mL/min to ensure homogeneous distribution of cells and medium. At each step of image generation, five ml cell suspension were pumped from the reactor through a 0.2 µm Luer slide (ibidi GmbH, Gräfelfing, Germany) attached to the microscopic device. Cells were allowed to sediment on the slide for three minutes to ensure optimal focus before image generation. Images were acquired at a native resolution of 1920 × 1080 pixels with 24-bit RGB color depth (8 bits per channel). Using the integrated bright-field camera of the AICE3 system. At each step, four different images were generated and cell counts were documented as the mean of the detected cells over all four images. With this setup, image generation was performed for 1.5 h, resulting in 30 images per experimental run. Images without cells were used for background training. For longer experiments, half of the reactor was drained and refilled with fresh media about every 2.5 days and time between images was adapted to every 2 h. Cell imaging was performed independently at two laboratories using separate AICE3 setups, which introduced variation in illumination, focus, and culture conditions. Distinct cell batches from different passages (2–19) were included.

2.4. Image Processing

Roboflow [14] was used for annotation, splitting, and augmentation (flips, rotations). The annotation process was performed manually for each cell type and segmentation masks were created using Roboflow’s smart polygon tool. Annotation of training images was performed exclusively on mono-culture datasets, avoiding any mixed-cell populations. For each image, only two categories were distinguished: cell or non-cell (the latter including bubbles, debris, or other background structures, Supplementary Figure S1). An object was annotated as a cell if it displayed an intact cell body with clearly defined borders; objects without these features were labeled as non-cell. Annotation was conducted independently by three researchers experienced in cell culture. Although inter-rater reliability was not systematically quantified, the binary nature of the task and use of mono-cultures ensured a high degree of concordance across annotators.

2.5. Model Training and Validation

The data was split into training, validation, and test subsets (ratio 70:20:10) to facilitate effective model training, hyperparameter tuning, and performance evaluation and to ensure reliable and robust machine learning applications.
The models were trained on an NVIDIA DGX A100 using Ultralytics YOLOv8 environment [15]. YOLOv8 is a single-shot state of the art object detection algorithm with a high accuracy rate on COCO [16] and Roboflow. Different sized YOLOv8-models (n, s, m, l) and the transformer-based Realtime Detection Transformer (RT-DETR) were evaluated. YOLOv8 was chosen for all subsequent analyses. The overall detection framework, including integration of the semi-automated culture system, imaging pipeline, YOLOv8-based analysis, and interpretability components, is illustrated in Figure 1A. A more detailed depiction of the training and evaluation workflow is provided in Figure 1B. The code snippets used for training, validation, and prediction are listed in Supplementary Figure S2, as are the confidence thresholds used (objects with values below this threshold were discarded as false positives) and the remaining hyperparameters (image size, training epochs, confidence threshold) (Supplementary Table S1). To investigate the interpretability of the model, we applied both salience-based visualization and radiomic-based approaches.
Eigen-CAM was used to highlight the image regions contributing most strongly to predictions of the YOLOv8 model [17]. We used a public Eigen-CAM library for YOLOv8 (https://github.com/rigvedrs/YOLO-V12-CAM, accessed on 20 June 2024) on randomly picked cell crops. Gradients of the final convolutional layers were computed, and class-activation maps were generated to visualize the areas of highest model attention in each detection. These maps were overlaid on the original microscopy images to allow qualitative assessment of whether the model’s focus corresponded to biologically meaningful regions.
To complement this qualitative approach, we employed RedTell, a radiomics-based feature extraction tool for single-cell imaging data [12]. From each segmented cell, a set of morphological and texture descriptors was calculated, including size, elongation, solidity, perimeter ratios, and gray-level texture metrics such as variance and run-length emphasis. These features provided quantitative descriptors that could be related to known biological changes such as cell enlargement during differentiation, alterations in nuclear condensation, or irregular outlines associated with apoptosis.
In addition, we applied Orthogonal Matching Pursuit (OMP) for sparse feature selection within the radiomics feature space. OMP identifies the most discriminative subset of features that contribute to class separation, thereby enhancing interpretability and reducing redundancy. This allowed us to focus on a compact set of features with direct biological relevance.

3. Results

To explore whether deep learning (DL) can detect morphological differences in unstained cells, we used suspension cell lines cultured with the semi-automated AICE3 device. An overview of all baseline and application datasets is provided in Table 1 and Table 2.

3.1. Single-Class Cell Line Detection

We first tested the human chronic myeloid leukemia cell line K562 (Figure 2A). To build a robust dataset, we included not only in-focus cells but also images with out-of-focus cells, debris, bubbles, and background-only samples as negative controls (22 images), as exemplified in Supplementary Figure S1. The final dataset contained 436 images with an average of 15.5 annotations per image (range 0–54), yielding a total of 6764 annotations (Table 1).
All YOLOv8 detection models (s, n, m, l) achieved similar metrics, but larger models required longer training. To balance speed and accuracy, we selected YOLOv8-s for further analysis. In the test set, the model correctly identified 570 true-positive K562 cells (95.6%). Misclassifications were rare: 15 false positives (mainly bubbles or debris) and 11 false negatives (out-of-focus cells) (Figure 2B). Precision and recall were both above 97% (Figure 2C). Standard detection metrics, including mean average precision (mAP@[0.5:0.95]) and per-class average precision (AP), are summarized in Table 3.
To assess robustness and reproducibility, we validated the model in a second laboratory using an independent AICE3 device in Freiburg. K562 cells were cultured and imaged on this system, generating a dataset of 75 images with varying cell densities (Supplementary Figure S3). This dataset inherently included variation in illumination, focus, confluence, and culture conditions, as well as distinct cell batches (passages 2–19). Despite this technical and biological variability, the model reliably detected unstained cells. Precision was slightly reduced due to occasional misinterpretation of background structures; however, all cells were detected, and no false negatives occurred. These results demonstrate that the YOLOv8-based approach is stable, reliable, and accurate across independent laboratories and imaging setups.
To test real-time applicability, we compared latency and throughput on GPU and CPU environments. On the GPU, prediction required 5.4 ms, with preprocessing and postprocessing adding 0.9 ms, and an average of 33.3 ms for 75 images (>370 cells). CPU-based deployments were substantially slower: on a Kubernetes/Kubeflow environment preprocessing, prediction, and postprocessing took 14.1 ms, 2468.9 ms, and 38.4 ms, while on a local CPU they took 16.3 ms, 1301.9 ms, and 1.5 ms. These results show that GPU-based inference is preferable for real-time detection, although CPU-based execution remains feasible in laboratory settings without dedicated GPUs.
Finally, we compared YOLOv8 with the transformer-based RT-DETR architecture. RT-DETR achieved competitive accuracy but was slower and less consistent. YOLOv8 provided a superior balance of accuracy and real-time performance, which was critical for our setup, and was therefore chosen for all subsequent analyses.

3.2. Detection of Longitudinal Morphological Changes

Prolonged culture is expected to induce subtle morphological changes, independent of drug treatment. To separate such effects from substance-specific changes in a longitudinal drug-testing context, we analyzed the impact of culture time on K562 cells over six days. Cells were maintained under standard conditions without splitting or media change and imaged after six days, referred to as K562d6. The dataset comprised 156 images with 3913 annotations, averaging 25.1 cells per image (range 0–79). Five images contained only background. A subset of the baseline K562 dataset was used as reference. Using this setup, the YOLOv8-n model achieved high precision and sensitivity (>95%) and ~99% specificity in distinguishing fresh K562 cells from K562d6. The high mean average precision (mAP) scores further confirmed accurate cell detection (Table 3).
To test performance in a dynamic longitudinal experiment, K562 cells were continuously cultured for over seven days (190 h) in the AICE3 device. Images were acquired every two hours, generating 380 images in total. Across this time span, the proportion of cells classified as K562d6 gradually increased, reflecting the natural morphological drift associated with prolonged culture (Figure 2D). These results show that the AI model can detect subtle, time-dependent morphological changes in unstained cells that are not obvious to the human eye, allowing correction for culture-associated effects in longitudinal analyses. The validity of this detection is supported by the fact that the virtual mixing ratios of ‘young’ and ‘old’ cells closely mirrored the expected gradual culture-associated drift over time.

3.3. Multiclass Leukemia Cell Line Discrimination

We next extended the model to a multi-class setup by adding two morphologically similar acute myeloid leukemia cell lines, HL-60 and Kasumi-1, alongside K562. To maintain a balanced dataset, only a subset of K562 images was used, considering both image numbers and cell density. In total, the dataset comprised 399 images with an average of 23.8 annotations per image (range 0–139), resulting in 2682 annotations for K562, 3319 for HL-60, and 3498 for Kasumi-1. Among the tested architectures, YOLOv8-m performed best. High mAP values (mAP@0.5 = 98.3%; mAP@[0.5:0.95] = 74.3%) indicated strong overlap between ground-truth annotations and the model’s bounding box predictions. The model achieved a very high rate of true positives across all classes, with misclassifications occurring only rarely and mainly involving background (e.g., out-of-focus cells or bubbles). Only four K562 cells were incorrectly classified as HL-60. Average sensitivity and specificity exceeded 97%, and precision was 94.6% (Figure 3A,B). Detailed metrics are shown in Table 3. These results demonstrate that the trained model can reliably distinguish three morphologically similar leukemia cell lines based solely on unstained microscopic images.

3.4. Applying the Cell Line Recognition Model to a Cellular Differentiation Model

We next applied the recognition model to a previously described K562-based differentiation assay [13]. K562 cells were treated with hemin to induce erythroid differentiation and with PMA to induce megakaryocytic differentiation. In this model, not all cells fully differentiate; rather, they display gradual changes in morphology. As illustrated in Figure 4B, untreated, hemin-treated, and PMA-treated K562 cells exhibit clear morphological differences. Hemin treatment increased cell size and cytoplasmic opacity, while PMA treatment induced irregular morphology and greater granularity, in contrast to the baseline morphology of untreated K562 cells. The Differentiation Model dataset contained 216 images with 3041 annotations, averaging 14.1 annotations per image. Misclassifications were rare: three untreated K562 cells were incorrectly labeled as PMA-treated, and 15 background structures (debris or bubbles) were initially annotated as cells. Despite these few errors, YOLOv8-s achieved high performance, with precision, sensitivity, and specificity all around 95% (Table 3). These results show that the trained model successfully captured treatment-specific morphological differences and reliably distinguished untreated K562 cells from those treated with hemin or PMA.

3.5. Applicability of the Cell Differentiation Model to a Drug Testing Setup

We next tested the hypomethylating agent decitabine (DAC) (Figure 4A). In patients with acute myeloid leukemia (AML), DAC induces fetal hemoglobin expression, which serves as a dynamic biomarker of outcome. In vitro, DAC preferentially promotes erythroid over megakaryocytic differentiation [13]. We therefore introduced DAC-treated K562 cells as a new class in the training model.
We first assessed whether object detection could distinguish between treatment concentrations. The dataset included 2005 annotations from cells treated with 20 nM DAC and 1537 annotations from cells treated with 100 nM DAC. None of the models reliably separated these two concentrations (Table 3). Consequently, both concentrations were merged into a single “DAC” class, and 20 additional images with 1194 annotations were added.
The resulting dataset included four classes: untreated K562, hemin-treated, PMA-treated, and DAC-treated cells. In this setup, YOLOv8-m achieved high precision, sensitivity, specificity, and strong mAP scores (Table 3). To validate performance, we tested predefined mixing ratios of the four classes (Figure 4C,D). DAC cells were only detected in experiments where they were included (DAC–hemin/DAC–K562). In 50:50 DAC–hemin mixtures, hemin cells were identified with high precision but slightly underestimated (−10%), while DAC cells were substantially underestimated (−38%) and often misclassified as untreated K562. A similar pattern was observed in DAC–PMA mixtures, where DAC cells were frequently misclassified as untreated K562. In DAC–K562 mixtures, DAC cells were underestimated by ~25%, whereas untreated K562 cells were almost perfectly classified (98%). To provide orthogonal support for these differentiation assays, we measured hemoglobin induction by ELISA (Supplementary Figure S4). Hemin treatment caused a robust increase in hemoglobin concentration, consistent with erythroid differentiation. In contrast, PMA and DAC (20 nM or 100 nM, alone or in combination) produced little or no induction. These findings are consistent with previous reports [13] confirming robust hemoglobin induction in the hemin model. Although the ELISA does not provide single-cell resolution or proportional quantification of differentiation, it supports the differentiation capacity detected by the trained model.

3.6. Morphological Insights

To add interpretability to the model’s performance, we first applied Eigen-CAM to generate heatmaps highlighting image regions that contributed to classification of the three leukemia cell lines (Figure 5). As shown in Figure 5A, Eigen-CAM occasionally highlighted background regions, making its results not always conclusive.
To complement this, we performed classical radiomic feature extraction using RedTell, comparing conventional shape and texture descriptors with the deep learning model [12]. A total of 74 features were extracted per cell (examples in Figure 5B). To identify the most informative features, we applied Orthogonal Matching Pursuit (OMP), which iteratively selects the best feature combinations. This approach was tested on the three cell line datasets, resulting in a binary classification for each line. The selected variables are summarized in Supplementary Table S2.
For HL-60 cells, the model relied on intensity energy, RMS intensity, solidity, elongation, gray-level run-length matrix (GLRLM) variance, and perimeter–surface ratio, yielding 71% accuracy. For K562 cells, the most relevant features were perimeter and three gray-level emphasis measures, resulting in 82% accuracy. For Kasumi-1, eight features (Long-Run Low Gray-Level Emphasis/LRLGLE, minimum intensity, gray-level variance, diameter, Short-Run Low Gray-Level Emphasis/SRLGLE, maximum intensity, Low Gray-Level Run Emphasis/LGLRE, lmc12) were identified, achieving 78% accuracy. Precision–recall curves for all classes are shown in Figure 6. Overall, the extracted radiomic features provided some explanatory insight into morphological differences between leukemia cell lines. However, none of the explainable models reached the high accuracy of the deep learning approach.

4. Discussion

In vitro drug testing has become an important tool in cancer research. It enables high-throughput screening under controlled conditions and supports personalized medicine by linking therapeutic responses to individual molecular and genetic profiles. Conventional staining-based assays, however, are invasive, may alter physiology, and prevent sequential monitoring of dynamics. Label-free, morphology-based analysis offers a promising alternative, allowing real-time, non-invasive observation. Advances in DL and automated microscopy now make unbiased high-throughput phenotyping feasible, but detecting subtle drug-induced changes remains challenging.
Previous studies have shown that unstained cells can be detected [18,19,20], yet these were limited to cell groups with distinct morphologies. Here, we investigated whether DL models can discriminate unstained leukemia cells with highly similar morphology. AML cell lines provide a suitable test case: they grow in suspension, enabling automated culture in closed systems, and are increasingly used in drug-testing approaches [21].
Our models reliably detected unstained cells and distinguished them from non-cellular components. They also classified highly similar leukemia cell lines with high accuracy. While real-time detection usually requires GPU-based execution, we showed that even CPU-based systems can deliver rapid predictions, broadening applicability to routine laboratories without specialized infrastructure. YOLOv8 proved particularly effective due to its accuracy, efficiency, and real-time capability, while RT-DETR showed comparable accuracy but slower, less stable inference. At the time of study initiation (early 2023), YOLOv8 represented the state of the art. Since then, further refinements (e.g., TSD-YOLO, YOLOv12) and transformer-based segmentation models (e.g., Segment–Anything) have been introduced [22,23,24,25,26,27,28,29]. Although systematic benchmarking against these newer models was beyond our scope, they represent promising directions for future work.
In summary, these results lay the foundation for complex experiments with a straightforward, accessible, and fast readout, enabling measurement of subtle changes (e.g., specific responses to certain substances) in a drug screening setup in near real-time.
By eliminating pre-processing steps like staining and fixation, cytotoxic effects are avoided. This permits to keep the cells in a closed system, fostering a feedback loop that enhances experimental efficiency and allowing longitudinal experiments [30]. However, cells in cultivation carry out physiological processes naturally, such as metabolizing and consuming media over time. Additionally, cells undergo nutrient depletion and waste accumulation [31]. These effects can lead to subtle but significant morphological changes which may play a critical role in longitudinal drug testing experiments [32]. In longitudinal experiments, our AI model detected culture- and time-associated drift, demonstrating sensitivity to gradual changes not evident to the human eye. This ability is critical in drug testing, as it allows separation of temporal effects from drug-specific responses. Applied to differentiation assays, the model accurately classified erythroid (hemin) and megakaryocytic (PMA) induction, as well as decitabine treatment, with high precision. Importantly, it appeared sensitive to intermediate phenotypes, suggesting that the model captures differentiation as a continuum rather than as binary states—an important consideration for tracking dynamic cellular responses.
However, one of the key challenges in AI applications for biomedical imaging is the understanding how models make their decisions. While Eigen-CAM provided initial insights into the model’s focus, we observed that this technique is prone to background interference, with non-cellular regions occasionally highlighted. Such artifacts illustrate a general limitation of saliency-map approaches in microscopy, where illumination or background can confound interpretability. To address this, we complemented CAM analyses with RedTell-based radiomics, which yield biologically interpretable descriptors. Several of the most discriminative radiomics features correspond closely to known morphological characteristics of leukemia cells. For instance, measures of size and elongation reflect differentiation-associated shifts in cell geometry, while texture features (e.g., gray-level variance, run-length emphasis) align with cytoplasmic granularity and nuclear condensation. Solidity and perimeter ratios capture irregular cell outlines, often linked to stress or apoptosis. By connecting these descriptors to established morphological traits, we provide a logical bridge between the model’s data-driven outputs and the underlying biology.
Importantly, our YOLO-based model extends beyond these explainability approaches. While Eigen-CAM highlights image regions and RedTell captures predefined morphological descriptors, both remain limited in sensitivity and accuracy. In contrast, our deep learning framework not only achieves superior performance but also resolves subtle, drug-induced, and time-dependent morphological changes that were previously inaccessible with stain-free or radiomics-based methods. To our knowledge, this is the first demonstration of a deep learning system enabling such real-time, non-invasive monitoring of morphologically similar leukemia cell lines in a drug testing context.
Together, our results demonstrate the feasibility of robust, label-free, real-time monitoring of leukemia cells and highlights the potential of AI-driven morphology analysis to complement conventional assays. Furthermore, it lays groundwork for fully automated, feedback-driven drug testing platforms. Ultimately, such approaches could speed drug discovery and enable rapid, individualized treatment adaptation in precision oncology.
Beyond methodological advances, our results highlight potential applications aligned with standard expectations in cellular pathology. Accurate morphology-based detection can support diagnosis, treatment monitoring, and prognosis, for instance by stratifying patients based on drug sensitivity or remission dynamics. Continuous label-free monitoring could also contribute to disease surveillance, enabling early detection of relapse or treatment resistance. Importantly, by capturing subtle morphological adaptations, AI-driven models may provide insights into clonal evolution, which often underlies therapy failure. At the same time, reducing false negatives is essential in a clinical context, as undetected abnormal cells could delay treatment adaptation or miss emerging subclones.
Several limitations of this study should be acknowledged. A methodological limitation of our study is that it relied primarily on bounding box annotations, which, while effective for detection, do not capture detailed cell morphology. Segmentation-based approaches—now more accessible through tools such as Roboflow—could provide richer morphological information but will require additional computational resources and validation. In addition, our evaluation was restricted to two object detection architectures, YOLOv8 and RT-DETR, selected for their strong real-time performance at the time of the study. Although these delivered robust results, broader benchmarking against alternative or more recent models may further optimize performance. Third, the absence of a definitive ground truth remains a key limitation. Although image recognition provides relative quantification and pattern detection, we could not fully validate differentiation states (e.g., erythroid versus megakaryocytic) at the single-cell level, since confirmatory assays such as flow cytometry or immunophenotyping of the exact same cells were not included. Instead, we relied on supportive evidence such as cytospin preparations and hemoglobin ELISA, which confirmed differentiation trends but lack proportional single-cell resolution. Fourth, the biological scope of our work was intentionally narrow. Experiments were restricted to suspension leukemia cell lines and selected substances, reflecting a proof-of-concept design. While this demonstrates feasibility in a controlled setting, the transferability to other biological systems—such as adherent cells, primary patient samples, or co-culture environments—remains uncertain. Extending validation to these contexts will be essential for broader applicability in drug testing. Fifth, although imaging was performed in two independent laboratories on distinct AICE3 devices, ensuring some variability in illumination, focus, and cell batches, we did not systematically enforce leakage-safe splits by plate, day, or batch, nor did we provide per-batch confidence intervals. These factors could influence generalizability and should be addressed in follow-up studies. Finally, we did not perform systematic ablation experiments or extended benchmarking on diverse hardware platforms. While our GPU versus CPU comparison reflects practically relevant scenarios in standard laboratory environments, future studies should extend latency and throughput analyses to edge devices (e.g., laptops, Jetson boards, Coral TPUs) and current-generation GPUs such as NVIDIA Blackwell, which are optimized for real-time AI workloads.

5. Conclusions

Our study demonstrates that deep learning can resolve subtle, drug-induced, and time-dependent morphological changes in unstained leukemia cell cultures in real time. By avoiding invasive staining and fixation, this approach enables robust, label-free longitudinal drug testing in a closed system and lays the groundwork for automated, feedback-driven drug screening platforms. Beyond drug testing, the capabilities demonstrated here highlight broader potential applications of morphology-based AI models, including diagnosis, prognosis, and disease surveillance, where minimizing false negatives and capturing clonal evolution are of critical clinical importance.
Looking ahead, future work will expand applicability across additional biological systems, cell types, and technical contexts, including segmentation-based analyses for richer morphological detail and integration with complementary assays. A particularly promising direction is the incorporation of AI-driven models into feedback-loop systems for personalized medicine, where real-time monitoring of treatment responses could enable highly adaptive and individualized therapeutic strategies. Ultimately, such frameworks may advance precision oncology by providing a reliable and scalable platform for clinical decision-making.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ai6100271/s1, Figure S1: Examples of images with cells off focus, debris, bubbles, and background only, included to build a robust dataset; Figure S2: YOLOv8 command-line scripts for model training, validation, and prediction; Figure S3: Testing robustness of model prediction on the dataset provided by another machine; Figure S4: Hemoglobin induction after differentiation treatments; Table S1: Overview of selected model size and confidence threshold for the different datasets; Table S2: The most relevant descriptive morphological features for each cell line using the OMP model.

Author Contributions

Conception and design: R.C. and M.T. Provision of study materials: J.B. Collection and assembly of data: T.M., M.K., D.R., E.M., and A.S. Data analysis and interpretation: K.H., T.M., A.R., and S.S. Manuscript writing: K.H., T.M., and R.C. Visualization: K.H. Final approval of manuscript: all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Mertelsmann Foundation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Code snippets used for model training and evaluation are provided in the Supplementary Materials. The de-identified dataset, together with trained model weights and representative images, is accessible at https://cloud.fwd.uk-augsburg.science/index.php/s/6bi96bCSL4PNzpq (pw: AI_at_AML_w&d). Annotation guidelines are described in the Methods and included with the dataset. Detailed information on the software environment and versioned dependencies is explicitly given in the Methods Section to ensure reproducibility.

Acknowledgments

We would like to thank all staff of the Mertelsmann Foundation and LABMaiTE.

Conflicts of Interest

Dennis Raith, Eelco Meerdink, Avani Sapre, and Jonas Bermeitinger are employees of LABMaiTE GmbH. Jonas Bermeitinger, Avani Sapri, and Dennis Raith own shares of LABMaiTE GmbH. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

DLdeep learning
CNNconvolutional neural network
DACdecitabine
RT-DETRreal-time detection transformer
AMLacute myeloid leukemia
AIartificial intelligence
mAPmean average precision
OMPorthogonal matching pursuit
PCAprincipal component analysis

References

  1. van Rijt, A.; Stefanek, E.; Valente, K. Preclinical Testing Techniques: Paving the Way for New Oncology Screening Approaches. Cancers 2023, 15, 4466. [Google Scholar] [CrossRef]
  2. Horak, P.; Heining, C.; Kreutzfeldt, S.; Hutter, B.; Mock, A.; Hullein, J.; Frohlich, M.; Uhrig, S.; Jahn, A.; Rump, A.; et al. Comprehensive Genomic and Transcriptomic Analysis for Guiding Therapeutic Decisions in Patients with Rare Cancers. Cancer Discov. 2021, 11, 2780–2795. [Google Scholar] [CrossRef]
  3. Letai, A. Functional precision cancer medicine-moving beyond pure genomics. Nat. Med. 2017, 23, 1028–1035. [Google Scholar] [CrossRef]
  4. Prasad, V. Perspective: The precision-oncology illusion. Nature 2016, 537, S63. [Google Scholar] [CrossRef]
  5. Vlachogiannis, G.; Hedayat, S.; Vatsiou, A.; Jamin, Y.; Fernandez-Mateos, J.; Khan, K.; Lampis, A.; Eason, K.; Huntingford, I.; Burke, R.; et al. Patient-derived organoids model treatment response of metastatic gastrointestinal cancers. Science 2018, 359, 920–926. [Google Scholar] [CrossRef] [PubMed]
  6. Schilling, M.P.; El Khaled El Faraj, R.; Urrutia Gomez, J.E.; Sonnentag, S.J.; Wang, F.; Nestler, B.; Orian-Rousseau, V.; Popova, A.A.; Levkin, P.A.; Reischl, M. Automated high-throughput image processing as part of the screening platform for personalized oncology. Sci. Rep. 2023, 13, 5107. [Google Scholar] [CrossRef] [PubMed]
  7. Tebon, P.J.; Wang, B.; Markowitz, A.L.; Davarifar, A.; Tsai, B.L.; Krawczuk, P.; Gonzalez, A.E.; Sartini, S.; Murray, G.F.; Nguyen, H.T.L.; et al. Drug screening at single-organoid resolution via bioprinting and interferometry. Nat. Commun. 2023, 14, 3168. [Google Scholar] [CrossRef] [PubMed]
  8. Serafini, C.E.; Charles, S.; Casteleiro Costa, P.; Niu, W.; Cheng, B.; Wen, Z.; Lu, H.; Robles, F.E. Non-invasive label-free imaging analysis pipeline for in situ characterization of 3D brain organoids. Sci. Rep. 2024, 14, 22331. [Google Scholar] [CrossRef]
  9. Issa, J.; Abou Chaar, M.; Kempisty, B.; Gasiorowski, L.; Olszewski, R.; Mozdziak, P.; Dyszkiewicz-Konwińska, M. Artificial-intelligence-based imaging analysis of stem cells: A systematic scoping review. Biology 2022, 11, 1412. [Google Scholar] [CrossRef]
  10. Zhu, Y.; Huang, R.; Wu, Z.; Song, S.; Cheng, L.; Zhu, R. Deep learning-based predictive identification of neural stem cell differentiation. Nat. Commun. 2021, 12, 2614. [Google Scholar] [CrossRef]
  11. Das, P.K.; Diya, V.; Meher, S.; Panda, R.; Abraham, A. A systematic review on recent advancements in deep and machine learning based detection and classification of acute lymphoblastic leukemia. IEEE Access 2022, 10, 81741–81763. [Google Scholar] [CrossRef]
  12. Sadafi, A.; Bordukova, M.; Makhro, A.; Navab, N.; Bogdanova, A.; Marr, C. RedTell: An AI tool for interpretable analysis of red blood cell morphology. Front. Physiol. 2023, 14, 1058720. [Google Scholar] [CrossRef] [PubMed]
  13. Stomper, J.; Ihorst, G.; Suciu, S.; Sander, P.N.; Becker, H.; Wijermans, P.W.; Plass, C.; Weichenhan, D.; Bissé, E.; Claus, R. Fetal hemoglobin induction during decitabine treatment of elderly patients with high-risk myelodysplastic syndrome or acute myeloid leukemia: A potential dynamic biomarker of outcome. Haematologica 2019, 104, 59. [Google Scholar] [CrossRef]
  14. Dwyer, B.; Nelson, J.; Hansen, T. Roboflow (Version 1.0) [Software]. Computer Vision. 2025. Available online: https://roboflow.com (accessed on 10 April 2024).
  15. Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8 (Version 8.0.0) [Software]. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 10 April 2024).
  16. Papers with Code—Real-Time Object Detection on COCO Benchmark. Available online: https://paperswithcode.com/sota/real-time-object-detection-on-coco (accessed on 10 April 2024).
  17. Muhammad, M.B.; Yeasin, M. Eigen-CAM: Class Activation Map using Principal Components. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Virtual, 19–24 July 2020; pp. 1–7. [Google Scholar]
  18. Long, X.; Cleveland, W.L.; Yao, Y.L. Automatic detection of unstained viable cells in bright field images using a support vector machine with an improved training procedure. Comput. Biol. Med. 2006, 36, 339–362. [Google Scholar] [CrossRef]
  19. Thite, N.G.; Tuberty-Vaughan, E.; Wilcox, P.; Wallace, N.; Calderon, C.P.; Randolph, T.W. Stain-Free Approach to Determine and Monitor Cell Heath Using Supervised and Unsupervised Image-Based Deep Learning. J. Pharm. Sci. 2024, 113, 2114–2127. [Google Scholar] [CrossRef]
  20. Pan, F.; Wu, Y.; Cui, K.; Chen, S.; Li, Y.; Liu, Y.; Shakoor, A.; Zhao, H.; Lu, B.; Zhi, S.; et al. Accurate detection and instance segmentation of unstained living adherent cells in differential interference contrast images. Comput. Biol. Med. 2024, 182, 109151. [Google Scholar] [CrossRef]
  21. Dombret, H.; Gardin, C. An update of current treatments for adult acute myeloid leukemia. Blood 2016, 127, 53–61. [Google Scholar] [CrossRef]
  22. Aldughayfiq, B.; Ashfaq, F.; Jhanjhi, N.Z.; Humayun, M. YOLOv5-FPN: A Robust Framework for Multi-Sized Cell Counting in Fluorescence Images. Diagnostics 2023, 13, 2280. [Google Scholar] [CrossRef]
  23. Cui, C.; Chen, X.; He, L.; Li, F. CA-YOLO: An Efficient YOLO-Based Algorithm with Context-Awareness and Attention Mechanism for Clue Cell Detection in Fluorescence Microscopy Images. Sensors 2025, 25, 6001. [Google Scholar] [CrossRef]
  24. Kang, M.; Ting, C.-M.; Fung Ting, F.; Phan, R. CST-YOLO: A Novel Method for Blood Cell Detection Based on Improved YOLOv7 and CNN-Swin Transformer. arXiv 2023, arXiv:2306.14590. [Google Scholar] [CrossRef]
  25. Voloshin, N.; Putlyaev, E.; Chechekhina, E.; Usachev, V.; Karagyaur, M.; Bozov, K.; Grigorieva, O.; Tyurin-Kuzmin, P.; Kulebyakin, K. NuclePhaser: A YOLO-based framework for cell nuclei detection and counting in phase contrast images of arbitrary size with support of fast calibration and testing on specific use cases. bioRxiv 2025. [Google Scholar] [CrossRef]
  26. Du, S.; Pan, W.; Li, N.; Dai, S.; Xu, B.; Liu, H.; Xu, C.; Li, X. TSD-YOLO: Small traffic sign detection based on improved YOLO v8. IET Image Process. 2024, 18, 2884–2898. [Google Scholar] [CrossRef]
  27. Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar] [CrossRef]
  28. Archit, A.; Freckmann, L.; Nair, S.; Khalid, N.; Hilt, P.; Rajashekar, V.; Freitag, M.; Teuber, C.; Spitzner, M.; Tapia Contreras, C.; et al. Segment Anything for Microscopy. Nat. Methods 2025, 22, 579–591. [Google Scholar] [CrossRef] [PubMed]
  29. Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar] [CrossRef]
  30. Stover, A.E.; Herculian, S.; Banuelos, M.G.; Navarro, S.L.; Jenkins, M.P.; Schwartz, P.H. Culturing Human Pluripotent and Neural Stem Cells in an Enclosed Cell Culture System for Basic and Preclinical Research. J. Vis. Exp. 2016, 112, e53685. [Google Scholar] [CrossRef]
  31. Griffiths, J.B. The effect of medium changes on the growth and metabolism of the human diploid cell, W1-38. J. Cell. Sci. 1971, 8, 43–52. [Google Scholar] [CrossRef]
  32. Golikov, M.V.; Valuev-Elliston, V.T.; Smirnova, O.A.; Ivanov, A.V. Physiological Media in Studies of Cell Metabolism. Mol. Biol. 2022, 56, 629–637. [Google Scholar] [CrossRef]
Figure 1. Overview of the deep learning workflow and model evaluation strategy. (A) Comprehensive schematic of the overall detection framework, illustrating the integration of the semi-automated culture and microscopy system (AICE3), the imaging pipeline including annotation and augmentation (Roboflow), the deep learning–based object detection, and the interpretability analyses (by Eigen-CAM and radiomic feature extraction using RedTell). Both YOLOv8 and RT-DETR architectures were initially evaluated; YOLOv8 was ultimately selected for subsequent analyses based on its superior balance of speed and accuracy in our real-time setup. (Created in https://BioRender.com). (B) Workflow of data generation and analysis, illustrating sequential steps from live-cell imaging to dataset preparation, model training, validation, and performance evaluation. (C) Schematic representation of the model training, validation, and application pipeline, showing dataset allocation (training, validation, and test sets) and iterative optimization cycles leading to final model deployment.
Figure 1. Overview of the deep learning workflow and model evaluation strategy. (A) Comprehensive schematic of the overall detection framework, illustrating the integration of the semi-automated culture and microscopy system (AICE3), the imaging pipeline including annotation and augmentation (Roboflow), the deep learning–based object detection, and the interpretability analyses (by Eigen-CAM and radiomic feature extraction using RedTell). Both YOLOv8 and RT-DETR architectures were initially evaluated; YOLOv8 was ultimately selected for subsequent analyses based on its superior balance of speed and accuracy in our real-time setup. (Created in https://BioRender.com). (B) Workflow of data generation and analysis, illustrating sequential steps from live-cell imaging to dataset preparation, model training, validation, and performance evaluation. (C) Schematic representation of the model training, validation, and application pipeline, showing dataset allocation (training, validation, and test sets) and iterative optimization cycles leading to final model deployment.
Ai 06 00271 g001
Figure 2. Performance evaluation and application of a deep learning model for cell classification. (A) Exemplary microscopic picture of K562 cells generated in the AICE3 device with a 20× objective. A scale bar representing 10 µm is shown in the lower right corner. (B) Confusion matrix illustrating the classification performance of the YOLOv8 model in distinguishing K562 cells from background objects in the test set. Darker shades of blue indicate higher counts of correctly or incorrectly classified objects. (C) Precision–recall curve showing the average precision (AP) for the K562 class and the overall mean average precision (mAP) across an intersection-over-union (IoU) threshold of 0.5. (D) Stacked area plot displaying the temporal distribution of K562 (blue) and K562d6 (orange) cells over a 192 h culture period. The y-axis represents the proportion of each class relative to the total detected cell population, and the white dashed line indicates the moving average trend.
Figure 2. Performance evaluation and application of a deep learning model for cell classification. (A) Exemplary microscopic picture of K562 cells generated in the AICE3 device with a 20× objective. A scale bar representing 10 µm is shown in the lower right corner. (B) Confusion matrix illustrating the classification performance of the YOLOv8 model in distinguishing K562 cells from background objects in the test set. Darker shades of blue indicate higher counts of correctly or incorrectly classified objects. (C) Precision–recall curve showing the average precision (AP) for the K562 class and the overall mean average precision (mAP) across an intersection-over-union (IoU) threshold of 0.5. (D) Stacked area plot displaying the temporal distribution of K562 (blue) and K562d6 (orange) cells over a 192 h culture period. The y-axis represents the proportion of each class relative to the total detected cell population, and the white dashed line indicates the moving average trend.
Ai 06 00271 g002
Figure 3. Performance evaluation of the deep learning model for myeloid cell lines. (A) Confusion matrix of predicted (y axis) versus true classes (x axis) for the myeloid cell lines HL-60, K562, and Kasumi-1. Higher numbers are represented by darker shades of blue. Correct classification is shown along the diagonal. Background classes holds cells that were not detected at all. (B) Precision-recall curves with the average precisions (AP) for HL-60 (light blue), K562 (mid blue), and Kasumi-1 (dark blue). The overall mean average precision (mAP) at an IoU threshold of 0.5 is shown as a grey dashed line.
Figure 3. Performance evaluation of the deep learning model for myeloid cell lines. (A) Confusion matrix of predicted (y axis) versus true classes (x axis) for the myeloid cell lines HL-60, K562, and Kasumi-1. Higher numbers are represented by darker shades of blue. Correct classification is shown along the diagonal. Background classes holds cells that were not detected at all. (B) Precision-recall curves with the average precisions (AP) for HL-60 (light blue), K562 (mid blue), and Kasumi-1 (dark blue). The overall mean average precision (mAP) at an IoU threshold of 0.5 is shown as a grey dashed line.
Ai 06 00271 g003
Figure 4. Application of the cell line recognition model to a cellular differentiation and drug testing setup. (A) Schematic representation of the differentiation and treatment model showing K562 cells undergoing erythroid (Hemin) or megakaryocytic (PMA) differentiation, and treatment with the hypomethylating agent Decitabine (DAC). Hemin promotes differentiation towards red blood cell (RBC)-like morphology, PMA induces platelet (Plt)-like features, and DAC treatment partially overlaps with erythroid characteristics. (B) Representative unstained bright-field images illustrating morphological differences between untreated K562, Hemin-, PMA-, and DAC-treated cells. Hemin-treated cells exhibit increased size and cytoplasmic opacity consistent with erythroid differentiation, PMA-treated cells display irregular shapes and enhanced granularity characteristic of megakaryocytic maturation, while DAC-treated cells show intermediate features between untreated and Hemin-treated morphologies. Scale bar: 10 µm. (C) Confusion matrix validating the classification of predefined mixtures of cell populations (x-axis: experimental composition; y-axis: predicted classes). Overestimation of a class by the model is shown in red shades, underestimation in blue, and gray indicates model detection of a class not present in the respective mixture. (D) Density plots illustrating class probability distributions for K562, Hemin, PMA, and DAC across different mixture experiments (100% PMA, DAC–Hemin, DAC–K562, Hemin–K562). The x-axis represents the predicted probability per class, and the y-axis represents density. Vertical dashed lines indicate the mean predicted probability for each experiment, showing high classification accuracy and sensitivity to intermediate phenotypes.
Figure 4. Application of the cell line recognition model to a cellular differentiation and drug testing setup. (A) Schematic representation of the differentiation and treatment model showing K562 cells undergoing erythroid (Hemin) or megakaryocytic (PMA) differentiation, and treatment with the hypomethylating agent Decitabine (DAC). Hemin promotes differentiation towards red blood cell (RBC)-like morphology, PMA induces platelet (Plt)-like features, and DAC treatment partially overlaps with erythroid characteristics. (B) Representative unstained bright-field images illustrating morphological differences between untreated K562, Hemin-, PMA-, and DAC-treated cells. Hemin-treated cells exhibit increased size and cytoplasmic opacity consistent with erythroid differentiation, PMA-treated cells display irregular shapes and enhanced granularity characteristic of megakaryocytic maturation, while DAC-treated cells show intermediate features between untreated and Hemin-treated morphologies. Scale bar: 10 µm. (C) Confusion matrix validating the classification of predefined mixtures of cell populations (x-axis: experimental composition; y-axis: predicted classes). Overestimation of a class by the model is shown in red shades, underestimation in blue, and gray indicates model detection of a class not present in the respective mixture. (D) Density plots illustrating class probability distributions for K562, Hemin, PMA, and DAC across different mixture experiments (100% PMA, DAC–Hemin, DAC–K562, Hemin–K562). The x-axis represents the predicted probability per class, and the y-axis represents density. Vertical dashed lines indicate the mean predicted probability for each experiment, showing high classification accuracy and sensitivity to intermediate phenotypes.
Ai 06 00271 g004
Figure 5. Visualization of cell-specific features contributing to deep learning classification. (A) Class activation maps (Eigen-CAMs) overlaid on unstained cell images for three leukemia cell lines (HL-60. K562, and Kasumi-1). Each pair shows the original image (left) and the corresponding heatmap (right), highlighting the image regions most influential for model decision-making. The color scale represents the relative activation intensity, with red indicating the highest relevance and blue the lowest. While relevant cellular regions are often emphasized, background areas may also appear activated, illustrating the limitations of CAM-based interpretability in bright-field microscopy. (B) Schematic illustration of selected morphological and texture-based radiomics features extracted using RedTell, including major and minor axis length, entropy, sphericity, and intensity-related metrics (minimum and maximum pixel values). These biologically interpretable descriptors partially align with known cell-type-specific morphologies but overall yielded lower predictive accuracy compared to the YOLOv8 deep learning model.
Figure 5. Visualization of cell-specific features contributing to deep learning classification. (A) Class activation maps (Eigen-CAMs) overlaid on unstained cell images for three leukemia cell lines (HL-60. K562, and Kasumi-1). Each pair shows the original image (left) and the corresponding heatmap (right), highlighting the image regions most influential for model decision-making. The color scale represents the relative activation intensity, with red indicating the highest relevance and blue the lowest. While relevant cellular regions are often emphasized, background areas may also appear activated, illustrating the limitations of CAM-based interpretability in bright-field microscopy. (B) Schematic illustration of selected morphological and texture-based radiomics features extracted using RedTell, including major and minor axis length, entropy, sphericity, and intensity-related metrics (minimum and maximum pixel values). These biologically interpretable descriptors partially align with known cell-type-specific morphologies but overall yielded lower predictive accuracy compared to the YOLOv8 deep learning model.
Ai 06 00271 g005
Figure 6. Precision–recall performance of the Orthogonal Matching Pursuit (OMP) model applied to radiomics-derived features. (AC) Precision–recall (PR) curves illustrating the classification performance of the Orthogonal Matching Pursuit (OMP) model for HL-60 (A), K562 (B), and Kasumi-1 (C) cells. The OMP model was trained on morphological and texture-based radiomics features extracted using the RedTell pipeline, including shape descriptors (e.g., axis lengths, sphericity) and intensity metrics (e.g., entropy, gray-level variance). Precision (y-axis) is plotted against recall (x-axis), providing a visual summary of the trade-off between classification accuracy and sensitivity for each leukemia cell line.
Figure 6. Precision–recall performance of the Orthogonal Matching Pursuit (OMP) model applied to radiomics-derived features. (AC) Precision–recall (PR) curves illustrating the classification performance of the Orthogonal Matching Pursuit (OMP) model for HL-60 (A), K562 (B), and Kasumi-1 (C) cells. The OMP model was trained on morphological and texture-based radiomics features extracted using the RedTell pipeline, including shape descriptors (e.g., axis lengths, sphericity) and intensity metrics (e.g., entropy, gray-level variance). Precision (y-axis) is plotted against recall (x-axis), providing a visual summary of the trade-off between classification accuracy and sensitivity for each leukemia cell line.
Ai 06 00271 g006
Table 1. Overview of the baseline datasets including name, description, abbreviation, and information per dataset including the number of images (# Images), average of annotations per image (Avg. ann./image), and the number of instances of corresponding classes (# classes).
Table 1. Overview of the baseline datasets including name, description, abbreviation, and information per dataset including the number of images (# Images), average of annotations per image (Avg. ann./image), and the number of instances of corresponding classes (# classes).
DatasetDescription# Images
Avg. Ann./Image
# Classes
Single Class Cell DetectionObject detection based on K562 cells436
15.5
K562: 6764
Multiclass Cell DetectionTesting for discrimination between myeloid cell lines (K562, HL-60 and Kasumi-1)399
23.8
K562: 2682
HL-60: 3319
Kasumi-1: 3498
Cell culture timeComparing fresh and 6 days-old K562156
25.1
K562: 1924
K562d6: 1989
Differentiation ModelUsing known substances (Hemin and PMA) to differentiate K562 cells216
14.1
Hemin: 1112
K562: 1298
PMA: 631
Decitabine (DAC)Applying 20 nM and 100 nM Decitabine on K562 cells62
57.1
DAC 20 nM: 2005
DAC 100 nM: 1537
Decitabine + Differentiation ModelDecitabine-treated K562 added to the Differentiation Model independent of concentration236
17.9
DAC: 1194
Hemin: 1112
K562: 1298
PMA: 631
Table 2. Overview of the validation datasets including abbreviation, name, description, and information about number of images.
Table 2. Overview of the validation datasets including abbreviation, name, description, and information about number of images.
Abbr.DatasetDescriptionNumber of Images
LMLABMaiTE K562 datasetOther passage of K562 cells on a similar AICE system in Freiburg275 images
MixtureMixture datasetMixing cells with different substance treatment in a 50:50 ratio128 images
GCGrowth curve datasetTwo runs of K562 over nearly eight days in AICE system. Images were taken every two hours716 images
Table 3. Performance metrics across different datasets used for model development and evaluation. Summary of precision (P), sensitivity (Sens), specificity (Spec), and mean average precision at IoU thresholds of 0.5 (mAP@0.5) and 0.5:0.95 (mAP@0.5:0.95) for the detection and classification tasks across various datasets. Precision, sensitivity, and specificity are provided overall and per class.
Table 3. Performance metrics across different datasets used for model development and evaluation. Summary of precision (P), sensitivity (Sens), specificity (Spec), and mean average precision at IoU thresholds of 0.5 (mAP@0.5) and 0.5:0.95 (mAP@0.5:0.95) for the detection and classification tasks across various datasets. Precision, sensitivity, and specificity are provided overall and per class.
DatasetPSensSpecmAP@0.5mAP@0.5:0.95
Cell Detection97.4%98.1%-98.9%76.9%
Leukemia cell lines
  HL-60
  K562
  Kasumi-1
94.6%
92.3%
95.5%
96.0%
97.7%
99.2%
96.2%
97.6%
97.3%
97.0%
98.2%
98.1%
98.3%74.3%
Aging
  K562
  K562d6
95.6%
94.0%
97.2%
99.2%
100%
98.3%
95.8%
94.6%
96.9%
98.8%74.9%
Differentiation Model
  Hemin
  K562
  PMA
94.6%
96.9%
93.3%
93.5%
96.2%
98.4%
95.2%
95.1%
97.1%
98.0%
94.8%
98.5%
97.8%77.7%
Decitabine
  DAC20
  DAC100
71.5%
77.9%
66.1%
78.4%
80.5%
76.2%
71.8%
69.4%
74.2%
82.6%64.5%
Decitabine added to differentiation model
  DAC
  Hemin
  K562
  PMA
92.9%
94.3%
91.7%
89.2%
96.5%
94.5%
95.1%
96.8%
95.9%
90.2%
97.1%
97.8%
96.6%
94.5%
99.5%
97.3%77.4%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hildebrand, K.; Mögele, T.; Raith, D.; Kling, M.; Rubeck, A.; Schiele, S.; Meerdink, E.; Sapre, A.; Bermeitinger, J.; Trepel, M.; et al. AI-Driven Analysis for Real-Time Detection of Unstained Microscopic Cell Culture Images. AI 2025, 6, 271. https://doi.org/10.3390/ai6100271

AMA Style

Hildebrand K, Mögele T, Raith D, Kling M, Rubeck A, Schiele S, Meerdink E, Sapre A, Bermeitinger J, Trepel M, et al. AI-Driven Analysis for Real-Time Detection of Unstained Microscopic Cell Culture Images. AI. 2025; 6(10):271. https://doi.org/10.3390/ai6100271

Chicago/Turabian Style

Hildebrand, Kathrin, Tatiana Mögele, Dennis Raith, Maria Kling, Anna Rubeck, Stefan Schiele, Eelco Meerdink, Avani Sapre, Jonas Bermeitinger, Martin Trepel, and et al. 2025. "AI-Driven Analysis for Real-Time Detection of Unstained Microscopic Cell Culture Images" AI 6, no. 10: 271. https://doi.org/10.3390/ai6100271

APA Style

Hildebrand, K., Mögele, T., Raith, D., Kling, M., Rubeck, A., Schiele, S., Meerdink, E., Sapre, A., Bermeitinger, J., Trepel, M., & Claus, R. (2025). AI-Driven Analysis for Real-Time Detection of Unstained Microscopic Cell Culture Images. AI, 6(10), 271. https://doi.org/10.3390/ai6100271

Article Metrics

Back to TopTop