Automated Single-Cell Analysis in the Liquid Biopsy of Breast Cancer

Shishido, Stephanie N.; Courcoubetis, George; Kuhn, Peter; Mason, Jeremy

doi:10.3390/cancers17172779

Open AccessArticle

Automated Single-Cell Analysis in the Liquid Biopsy of Breast Cancer

¹

Convergent Science Institute in Cancer, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA

²

Catherine and Joseph Aresty Department of Urology, Institute of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA

³

Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA

⁴

Department of Biological Sciences, Dornsife College of Letters, Arts, and Sciences, University of Southern California, Los Angeles, CA 90089, USA

⁵

Department of Aerospace and Mechanical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA 90089, USA

⁶

Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA 90089, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Cancers 2025, 17(17), 2779; https://doi.org/10.3390/cancers17172779

Submission received: 13 August 2025 / Revised: 22 August 2025 / Accepted: 23 August 2025 / Published: 26 August 2025

(This article belongs to the Section Cancer Causes, Screening and Diagnosis)

Download

Browse Figures

Versions Notes

Simple Summary

Breast cancer (BC) diagnostic methods, such as mammography and tissue biopsy, have inherent limitations in both accuracy and accessibility. A standard blood draw allows for the detection of rare events, such as circulating tumor cells (CTCs), that can be indicative of both cancer and the extent of the disease within the body. This study demonstrates the potential of a fully automated, liquid biopsy workflow as a highly scalable and minimally invasive companion to current methods that can detect rare events useful for identifying and characterizing BC within an individual.

Abstract

Background/Objectives: Breast cancer (BC) is the most prevalent cancer worldwide, with approximately 40% of early-stage BC patients developing recurrence despite initial treatments. Current diagnostic methods, such as mammography and solid tissue biopsies, face limitations in sensitivity, accessibility, and the ability to characterize tumor heterogeneity or monitor systemic disease progression. Methods: To address these gaps, this study investigates a fully automated analysis workflow using data derived from fluorescent Whole-Slide Imaging (fWSI) for detecting and classifying rare cells (circulating tumor and tumor microenvironment cells) in peripheral blood samples. Our methodology integrates supervised machine learning algorithms for rare event detection, immunofluorescence-based classification, and statistical quantification of cellular features. Results: Using a fWSI dataset of 534 cancer and non-cancer peripheral blood samples, the automated model demonstrated high concordance with manual annotation, achieving up to 98.9% accuracy and a precision-sensitivity AUC of 83.2%. Morphometric analysis of rare cells identified significant differences between normal donors, early-stage BC, and late-stage BC cohorts, with distinct clusters emerging in late-stage BC. Conclusions: These findings highlight the potential of liquid biopsy and algorithmic approaches for improving BC diagnostics and staging, offering a scalable, minimally invasive solution to enhance clinical decision-making. Future work aims to refine the automated framework to minimize errors and improve the robustness across diverse cohorts.

Keywords:

breast cancer; liquid biopsy; peripheral blood; fluorescent whole-slide imaging; circulating tumor cell; screening; mathematical modeling; automation

1. Introduction

Breast cancer (BC), with 7.8 million global cases diagnosed in the past 5 years, is the most common cancer in women and the most prevalent overall [1,2,3]. Most women are diagnosed with early-stage BC (94%), without evidence of widespread disease; however, despite this and with the administration of subsequent treatments, 40% of these patients will experience recurrence in their lifetime [4,5,6,7,8,9]. Late-stage BC (i.e., relapse, progression, and onset of distant metastasis) has a 5-year survival rate of less than 30%, significantly lower than that of early-stage BC at 91% [1,3]. Given this, it is vital that screening methods be improved, and that robust stratification of early-stage BC be made possible at the time of the initial diagnostic workup.

Currently, the standard screening method for BC is mammography, with a tissue biopsy to confirm diagnosis [3,4]. Despite mammography being common practice, only about 60% of all cases are currently diagnosed via the screening pathway. This is primarily due to two factors, one being the limited (86.9%) sensitivity of mammography, and second, its limited use across the at-risk patient population, with only 76.4% of women aged 50–74 years having regularly scheduled mammograms every two years if at average risk for BC [10,11,12,13]. In addition, only limited screening pathways exist for younger patients. For patients diagnosed with BC, the extent of tumor burden and prediction of treatment response are assessed via imaging and clinical evaluation of symptoms [4]. To identify the spread of disease, cross-sectional advanced imaging is sometimes used; however, it is often inconclusive, expensive, and unable to provide deeper insights into the status of the molecular tumor profile.

Solid tissue biopsies are widely used in current clinical care as they contain a wealth of information. They can provide information on molecular profiles, histological subtyping, and tumor biomarkers. They can also lend advice on treatment planning. However, several caveats persist. First, it is not always easy or straightforward to access and biopsy the primary tumor or metastatic lesions. Second, these biopsies are designed to survey a precise sampling area and may fail to capture the true heterogeneity of the tumor [14,15,16,17,18]. Circulating tumor cells (CTCs) have the potential to resolve the spatial heterogeneity problem, as they have been shown to shed from primary and metastatic tumor sites [19,20,21,22,23,24,25]. Third, solid biopsies will inherently fail to characterize subclinical systemic disease spread. And lastly, they are infeasible for longitudinal monitoring due to their invasiveness and both pain and potential risk that are inflicted on the patient [26,27,28,29,30].

Consequently, better screening methods accessible to all individuals and more precise staging at diagnosis and prognosis are essential components for improving clinical management of BC patients. To evaluate the potential of a liquid biopsy for early BC detection and assessment, in a previous study, we utilized a customized fluorescent assay on whole-slide cell monolayer preparations to identify and analyze rare cells, such as CTCs, in peripheral blood (PB) samples [31]. We revealed statistically significant differences in circulating rare cell populations, including CTCs, through analyzing a cohort of early-stage BC patients, late-stage BC patients, and age- and gender-matched controls through a labor-intensive manual process [31]. Patient-level modeling to predict Cancer vs. Normal and Early-stage vs. Late-stage achieved Area Under the Curve (AUC) values of 0.99 and 0.91, respectively, indicating 99% accuracy in differentiating cancerous from non-cancerous samples and 91% accuracy in stratifying disease stage. While these results were obtained through a manual approach supported by computation, these imaging datasets require extensive human interpretation. We hypothesize that an automated data science approach could aid in this analysis and accelerate the discovery process.

In this study, we developed a fully automated methodology to investigate the scalability of a liquid biopsy test using an expanded validation set. The approach incorporates supervised machine learning algorithms for rare event identification, event classification based on IF expression, and quantification of results. Our automated system generates cell-level prediction and classification models that detect and assign IF phenotypes to rare events. The results highlight the potential of algorithmic approaches to enhance annotation efficiency and accelerate the discovery phase of liquid biopsy analyses, while emphasizing the need for manual intervention to minimize errors.

2. Materials and Methods

2.1. Study Design, Patient Information

A total of 1070 deidentified fluorescent whole-slide images (fWSIs) from 534 PB samples corresponding to a variety of human samples (carcinoma patients and non-cancerous normal donors [NDs]) were utilized in designing the automated approach (Design Set). The Design Set was split into 1030 fWSIs for training the model and 40 fWSIs for testing. To focus on an application in BC, the test set of fWSIs was only comprised of BC samples. An additional 779 fWSIs from 410 PB samples were utilized in the application and assessment of the automated approach (Apply Set). Samples were previously published: BC (n = 575) [31,32], pancreatic cancer (n = 123) [33], upper tract urothelial carcinoma (n = 51) [34,35], bladder cancer (n = 50) [36], lung cancer (n = 40) [37], colorectal cancer (n = 18) [38], and NDs (n = 87) [31,33,39]. Patient recruitment occurred according to Institutional Review Board-approved protocols, with all participants providing written informed consent. All samples were collected between 5 April 2013 and 22 September 2022.

2.2. LBx Acquisition, Processing, and Cryobanking

The fWSI workflow we utilize here is aimed at identification and characterization of analytes from liquid biopsy samples. Approximately 7.5 mL of PB was collected in 10 mL Cell-free DNA blood collection tubes (Streck, Omaha, NE, USA) at the clinical site and shipped to our laboratory for processing within 48 h as previously described [36,40]. In brief, the samples were subjected to red blood cell lysis in isotonic ammonium chloride solution. All nucleated cells were plated as a monolayer on custom glass slides (Marienfeld, Lauda, Baden-Württemberg, Germany) with approximately 3M cells per slide. The cells were then blocked with 7% bovine serum albumin (BSA), dried, and cryopreserved at −80 °C until analysis. Each sample yields approximately 14 prepared glass slides, resulting in ~0.5 mL of PB plated on each.

2.3. Staining, Scanning, and Pre-Processing

Prior to IF staining via an IntelliPATH FLX autostainer (Biocare Medical LLC, Pacheco, CA, USA), 2 glass slides per PB sample were thawed for 1 h at room temperature and subsequently fixed using 2% paraformaldehyde (PFA) for 20 min. We then used 10% goat serum (Millipore, Billerica, MA, USA) for 20 min to block nonspecific binding sites. The specific IF antibodies utilized in the customized fluorescent assay are DAPI (D; nuclear identification), cytokeratin (CK; epithelial cells), CD45 (white blood cells), CD31 (endothelial cells), and Vimentin (V; mesenchymal cells). CD45 and CD31 are visualized in the same IF channel (CD). Specifically, the following steps were applied to each slide as detailed previously [40]:

Incubated for 4 h with a conjugate containing the following:
○
2.5 μg/mL of a mouse IgG1 anti-human CD31:Alexa Fluor 647 mAb (clone: WM59, MCA1738A647, BioRad, Hercules, CA, USA);
○
100 μg/mL of a goat antimouse IgG monoclonal Fab fragments (115–007–003, Jackson ImmunoResearch, West Grove, PA, USA).
Cold methanol used for 5 min to permeabilize the cells.
Incubated for 2 h with an antibody cocktail consisting of the following:
○
mouse IgG1/IgG2a anti-human CK 1, 4, 5, 6, 8, 10, 13, 18, and 19 (clones: C-11, PCK-26, CY-90, KS-1A3, M20, A53-B/A2, C2562, Sigma, St. Louis, MO, USA);
○
mouse IgG1 anti-human CK 19 (clone: RCK108, GA61561–2, Dako, Carpinteria, CA, USA);
○
mouse antihuman CD45:Alexa Fluor 647 (clone: F10–89–4, MCA87A647, AbD Serotec, Raleigh, NC, USA);
○
rabbit IgG antihuman V: Alexa Fluor 488 (clone: D21H3, 9854BC, Cell Signaling Technology, Danvers, MA, USA).
Incubated for 1 h with Alexa Fluor 555 goat anti-mouse IgG1 antibody (A21127, Invitrogen, Carlsbad, CA, USA) and 4′,6-diamidino-2-phenylindole (DAPI; D1306, Thermo Fisher Scientific, Waltham, MA, USA).
Mounted with a glycerol-based aqueous mounting media.
Coverslipped to maintain cell integrity.

After IF staining, slides were scanned at 100× magnification in each of the IF channels as previously performed [41]. Given the staining assay used for these data, the scanning process produces 9216 total frames of view (i.e., 2304 frames × 4 IF channels) to comprise a single fWSI. The EBImage package in R (version 4.22.1) is used to mask each cellular (DAPI+) event and subsequently generate morphological features. With each of the IF channels and paired combinations (e.g., CK + V), each event produces 761 descriptive features. Examples include eccentricity, area, mean radius, major axis, perimeter, and channel intensity.

2.4. Rare Event Detection, Identification, Classification, and Enumeration

To detect rare events, we used our previously reported framework of OCULAR (Outlier Clustering Unsupervised Learning Automated Report) [36,40]. This methodology uses principal component analysis (PCA) to reduce the 761 descriptive features to 350 principal components. It then utilizes the reduced dimension space and hierarchical clustering to identify clusters with a small number of cells as well as individual cells that are distinctly different from the median cell (i.e., large computational distance). For further discrimination, and filtering out technical artifacts, we developed a machine learning classification model to predict whether an event is interesting (i.e., biologically relevant) or not. The positive class consisted of previously identified rare events that were deemed to be biologically relevant to the disease under study. The negative class consisted of previously identified rare events that were deemed to be not biologically relevant (e.g., technical artifacts). The events in both classes were categorized as such due to a multitude of reasons including their shape, fluorescent intensities, localizations of signals within the segmented area, as well as the characteristics of the neighboring events. We utilized a histogram gradient boosting algorithm [42] with k-fold cross-validation [43,44] over 1000 iterations to select the best performing models. We also employed hyperparameter grid searches [45] on the morphometric features to identify the subsets that yielded the best accuracy. The individual prediction confidences of each event were used as cutpoints to evaluate the model and balance sensitivity (i.e., correctly identifying interesting events) with specificity (i.e., correctly identifying non-interesting events).

Additionally, we utilized a trio of machine learning classification models (random forest architectures) to predict the CK, V, and CD channels to classify the rare, interesting cells by their IF expression level [33,34,39]. With these, we stratified the identified cells into 8 distinct types and subsequently enumerated each category for a given sample. We normalized these values based on the amount of blood analyzed using the total number of cells detected and the automatically determined white blood cell counts for the sample, allowing for fractional counts of cells/mL.

2.5. Cellular Morphometric Comparison and Statistical Analysis

We investigated the differences at the liquid biopsy level between NDs, early-stage BC patients, and late-stage BC patients via cell-level enumerations. First, we created Uniform Manifold Approximation and Projection (UMAP) plots of the predicted interesting, rare events based on a subset of the 761 morphometric features, color-coded by their predicted channel type. We then enumerated the 8 cell types across the cohorts and statistically compared them using the Mann–Whitney U Test, with statistical significance set at 0.05.

3. Results

3.1. Development of Automated Rare Cell Stratification Model

The 1070 fWSIs to be used in the training and testing sets were first applied to the OCULAR framework to reduce the ~2 billion cells to ~1 to 2 million rare events. This subset is comprised of ~50,000 biologically relevant cells (positive class) and the remaining events that are deemed as not interesting (negative class), both confirmed by manual review. The performance of the rare cell identification model was evaluated based on concordance with manual analyst annotations, with metrics provided in Table 1. Overall, the model demonstrated high accuracy, ranging from 96.5% at a 50% confidence threshold to 98.9% at a 90% threshold. Increasing the confidence threshold improved precision (from 37.1% at 50% to 68.4% at 90%) and specificity (from 96.4% to 99.1%) but reduced sensitivity (from 97.6% to 85.5%). As confidence thresholds increased, the model predicted fewer non-interesting events (average of 97.6 at 50% vs. 23.2 at 90%), but this resulted in more true positives being missed (average of 1.4 missed events at 50% vs. 8.6 at 90%). Notably, many of the false positives were attributed to biologically irrelevant artifacts such as bubbles and fluorescent flares. These findings underscore the adaptability of the model to different application contexts, where confidence thresholding can be tuned to prioritize either sensitivity or precision depending on experimental needs.

3.2. Morphometric Analysis

For downstream investigation, the automated rare cell identification model at the 50% threshold was applied to the Apply Set, which consisted of 779 fWSIs from non-cancer NDs (n = 74; samples = 37), early-stage BC (n = 248; samples = 125), and late-stage BC (n = 457; n = 248). These were excluded from training and testing to validate the approach. To compare the rare cells detected, UMAP visualizations were generated as a dimensionality reduction technique projecting the high-dimensional data onto a low-dimensional space while preserving the local and global structure of the data (Figure 1). UMAP analysis confirms overlap in the rare cell phenotypes identified by the assay in ND, early-stage BC, and late-stage BC cohorts, as well as distinct signatures relative to each sample source. This validates that both the assay is consistently labeling and that the algorithm is consistently identifying the rare (outlier) events in any sample, regardless of origin. Across all cohorts, the DAPI only cells form a distinct cluster, although it exhibits some heterogeneity. The remaining three major clusters observed are primarily composed of D|V, D|V|CD, and D|CK|V|CD cells. Interestingly, a well-defined cluster emerges specifically in the late-stage BC cohort for D|CK cells. This is consistent with previous findings, which found enumerations of D|CK to be significantly higher in late-stage BC cohorts compared to ND or early-stage BC cohorts.

3.3. Cohort Level Analysis

Rare cells detected in each cohort were normalized to count/mL for comparison (Figure 2). There is an observable statistically significant difference between ND and early-stage BC for D|CK|V|CD (p-value = 0.003) and D|CK|CD (p-value = 0.009) rare cells, and between ND and late-stage BC for D|CK (p-value = 8 × 10⁻⁴) and D|V (p-value = 0.006) rare cells. Additionally, there is a significant difference between early-stage and late-stage rare cell counts for D|CK (p-value = 0.003), D|CK|V|CD (p-value = 4 × 10⁻⁹), D|CK|CD (p-value = 0.03), D|V|CD (p-value = 0.005), and D|V (p-value = 0.01). There were no significant differences measured in the D|CD, D|CK|V, and D rare cell groups across any of the cohorts.

4. Discussion

This study highlights the ability of automated analysis of single-cell data derived from fWSI data as a liquid biopsy approach to effectively identify and classify rare circulating cells from PB samples without prior enrichment. Using machine learning algorithms and five biomarkers, the platform identified and characterized circulating cells of epithelial, mesenchymal, endothelial, and hematologic origins. Importantly, the study observed a statistically significant increase in CTCs in late-stage BC compared to early-stage, consistent with prior research [31] linking higher CTC counts to metastatic dissemination [46]. This is expected given that these late-stage patients have active disease that is metastasizing via the circulatory system and therefore would yield increased activity within the blood. Additionally, specific phenotypic subtypes of rare circulating cells, with morphology consistent for CTCs [31,40,47] and endothelial cells [38,48], were automatically detected and phenotyped, underscoring the heterogeneity of circulating tumor-related analytes. These findings demonstrate the feasibility of a fully automated, non-enrichment liquid biopsy approach to provide robust diagnostic and prognostic insights.

Identification of rare cells from fWSI datasets typically requires extensive human interpretation, performed by a pathologist-trained technician supported by computational algorithms as previously described, which restricts their scalability. The fully automated approach developed here addresses these limitations by enhancing the identification of rare cells and subsequent phenotyping using advanced computational methods toward an operator-independent solution by leveraging the substantial human-annotated data we have curated.

We have previously revealed profound heterogeneity and plasticity among CTCs, reflecting cellular plasticity and underscoring the need for single-cell analyses in understanding cancer progression [40,47,49]. These insights point to the need for progress in developing robust, automated, and efficient methodologies for detecting rare cells and phenotyping them once identified. Automated analysis for rare cell detection is crucial for advancing precision medicine due to its scalability, efficiency, and accuracy. Manual analysis, while valuable, is labor-intensive and prone to human error, limiting its feasibility for large-scale studies or clinical applications. Automation enables the rapid processing of vast datasets, facilitating the detection of rare cellular events with higher consistency and reproducibility. This speed and scalability are essential for timely decision-making in diagnostics, monitoring treatment efficacy, and conducting large population-based studies. Furthermore, automated systems can integrate complex algorithms to identify subtle patterns in data that may be overlooked by manual methods, improving sensitivity and specificity. By reducing the reliance on skilled personnel for routine tasks, automation also lowers costs and makes cutting-edge technologies accessible to more laboratories and clinics, ultimately accelerating research and improving patient care.

The concept of automated cell identification and classification is not a novel concept within the field of fluorescence microscopy. In fact, many other groups have investigated this application as it relates to their workflows [50,51,52]; however, each contains specific limitations that are not observed in our approach. Namely, they rely on a secondary data collection (e.g., scRNA-seq data), they are designed for the identification of a single phenotype, or they only consider cancer cells vs. non-cancer cells (i.e., they ignore tumor microenvironment cells). Our platform is designed to work with fWSIs alone to generate the relevant morphometric parameters to identify every cell of interest that would be tumor-related. Given this realization, we have also replicated this framework with a secondary immunofluorescence assay designed for the identification and monitoring of multiple myeloma cells on fWSIs.

While these automated tools boast many advantages within the field of biological sciences, they are not without their challenges. Although the cell-level models described here to identify and classify biologically relevant events perform exceptionally well, they do suffer from a quantifiable and consistent level of error in incorrectly identifying rare events. Additionally, the accuracy is directly linked to the training data used to build the models. In essence, if a new and relevant biological cell type is presented for identification and classification, there is a chance that the model incorrectly predicts it and filters all of these into the negative class, potentially overlooking a significant biomarker for a subset of patients. It is also important to understand how the challenges and errors at the cell-level extrapolate up to patient-level models. The cellular landscape reflects the presence or absence of disease as well as the diversity of the individual. Significant shifts in this landscape can mean the difference between unnecessary medical tests and missed diagnoses. Thus, it is highly critical to maximize both sensitivity and specificity at all levels of model building to ensure maximum benefit for the patients. At the cell level, we addressed this by including samples from a variety of different cancer types in our training data. In doing so, we were able to maximize the number of correctly identified biologically relevant cells (sensitivity) while simultaneously minimizing the amount of junk that was missed (specificity).

In short, automation is a game-changer in the healthcare setting, transforming how diagnostic tests are carried out, impacting efficiency and accuracy. The integration of automated processes has streamlined analyses, resulting in quicker turnaround times, increased throughput, and improved precision in testing [53]. Manual input can be time-consuming with a risk of error that may compromise outcomes [54,55], especially in the healthcare environment. AI algorithms can analyze datasets quickly with precision to interpret results for diagnostic accuracy [56]. It is important to approach automated implementation strategies with caution and acknowledge the challenges.

5. Conclusions

This study demonstrates the capabilities of automating and scaling a liquid biopsy framework for the detection and classification of rare cells from a PB sample, enabling earlier detection, enhanced accessibility, and improved patient outcomes. Herein, we find that the rare cell profile automatically identified and classified is heterogeneous and distinct in patients with BC and in NDs. While the results are promising, additional studies are warranted to investigate the liquid biopsy signal within specific cohorts as well as those with benign, non-cancerous conditions.

Author Contributions

Conceptualization, S.N.S., G.C., P.K. and J.M.; Data curation, S.N.S., G.C. and J.M.; Formal analysis, S.N.S., G.C. and J.M.; Funding acquisition, P.K.; Investigation, G.C. and J.M.; Project administration, P.K. and J.M.; Software, J.M.; Validation, S.N.S., G.C. and J.M.; Visualization, G.C. and J.M.; Writing—original draft, S.N.S., G.C. and J.M.; Writing—review & editing, S.N.S., G.C., P.K. and J.M. All authors have read and agreed to the published version of the manuscript.

Funding

Supported in part by grants from the Breast Cancer Research Foundation BCRF-23-089 (P.K.), the US National Institutes of Health, National Cancer Institute grants U01CA285013 (P.K., J.M., S.N.S.), Dr. Miriam and Sheldon G. Adelson Medical Research Foundation (P.K.), Ming Hsieh Institute for Research on Engineering-Medicine for Cancer (P.K., J.M.), Epic Sciences, and USC Norris Comprehensive Cancer Center (CORE) Support 5P30CA014089-40 (P.K., J.M.). This work also received institutional support from the USC Michelson Center Convergent Science Institute in Cancer, Vassiliadis Research Fund, and Hart Family Research Fund.

Institutional Review Board Statement

Ethical review and approval were waived for this study, due to all data elements used being from previously published sources and are referenced appropriately.

Informed Consent Statement

Patient consent was waived due to all data elements used being from previously published sources and are referenced appropriately.

Data Availability Statement

All data discussed in this manuscript are included in the main manuscript text. The imaging data are available through the BloodPAC Data Commons utilized for the previous publications. The code for training the machine learning models is available for download on GitHub (https://github.com/CSI-Cancer/ocular_streamlining).

Acknowledgments

We thank the patients and their caregivers who consented to this study. We also thank our clinical collaborators whose continued support and contributions have enabled these large-scale computational studies. Namely Donna E Hansel, Hooman Djaladat, Inderbir S Gill, Jorge Nieva, Kelly Bethel, Kenneth J Pienta, Scott D Patterson, Shelley Hwang, Siamak Daneshmand, Simon K. Lo, and Stephen Pandol. We are grateful to the clinical research staff who contributed to the study, as well as to past and current technical staff at CSI-Cancer for processing the samples.

Conflicts of Interest

All authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BC	Breast cancer
CTC	Circulating tumor cell
PB	Peripheral blood
IF	Immunofluorescence
AUC	Area under the curve
DAPI	4′,6-diamidino-2-phenylindole
D	DAPI
CK	Cytokeratin
V	Vimentin
CD	CD45/CD31
OCULAR	Outlier Clustering Unsupervised Learning Automated Report
PCA	Principal component analysis
ND	Normal donor
UMAP	Uniform Manifold Approximation and Projection
BSA	Bovine Serum Albumin
fWSI	Fluorescent Whole Slide Images

References

Siegel, R.L.; Miller, K.D.; Fuchs, H.E.; Jemal, A. Cancer Statistics, 2021. CA Cancer J. Clin. 2021, 71, 7. [Google Scholar] [CrossRef]
Society, A.C. Cancer Facts & Figures 2021. Available online: https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancer-facts-figures-2021.html (accessed on 1 January 2023).
Society, A.C. Breast Cancer Facts & Figures 2019–2020. Available online: https://www.cancer.org/content/dam/cancer-org/research/cancer-facts-and-statistics/breast-cancer-facts-and-figures/breast-cancer-facts-and-figures-2019-2020.pdf (accessed on 1 January 2023).
AJCC Cancer Staging Manual, 8th ed.; Springer: Cham, Switzerland, 2017.
Mariotto, A.B.; Etzioni, R.; Hurlbert, M.; Penberthy, L.; Mayer, M. Estimation of the Number of Women Living with Metastatic Breast Cancer in the United States. Cancer Epidemiol. Biomarkers Prev. 2017, 26, 809. [Google Scholar] [CrossRef]
Pan, H.; Gray, R.; Braybrooke, J.; Davies, C.; Taylor, C.; McGale, P.; Peto, R.; Pritchard, K.I.; Bergh, J.; Dowsett, M.; et al. 20-Year Risks of Breast-Cancer Recurrence after Stopping Endocrine Therapy at 5 Years. N. Engl. J. Med. 2017, 377, 1836. [Google Scholar] [CrossRef]
Colleoni, M.; Sun, Z.; Price, K.N.; Karlsson, P.; Forbes, J.F.; Thürlimann, B.; Gianni, L.; Castiglione, M.; Gelber, R.D.; Coates, A.S.; et al. Annual Hazard Rates of Recurrence for Breast Cancer During 24 Years of Follow-Up: Results From the International Breast Cancer Study Group Trials I to V. J. Clin. Oncol. 2016, 34, 927. [Google Scholar] [CrossRef]
Sestak, I.; Dowsett, M.; Zabaglo, L.; Lopez-Knowles, E.; Ferree, S.; Cowens, J.W.; Cuzick, J. Factors predicting late recurrence for estrogen receptor-positive breast cancer. J. Natl. Cancer Inst. 2013, 105, 1504. [Google Scholar] [CrossRef]
Nishimura, R.; Osako, T.; Nishiyama, Y.; Tashima, R.; Nakano, M.; Fujisue, M.; Toyozumi, Y.; Arima, N. Evaluation of factors related to late recurrence--later than 10 years after the initial treatment--in primary breast cancer. Oncology 2013, 85, 100. [Google Scholar] [CrossRef]
Davis, M. Nearly 4 in 5 Women Getting Recommended Routine Mammograms, Though Rates Rise in Some States, Fall in Others. Available online: https://www.valuepenguin.com/mammogram-rates-study (accessed on 10 March 2022).
Sabatino, S.A.; Thompson, T.D.; White, M.C.; Shapiro, J.A.; de Moor, J.; Doria-Rose, V.P.; Clarke, T.; Richardson, L.C. Cancer Screening Test Receipt—United States, 2018. MMWR Morb. Mortal. Wkly. Rep. 2021, 70, 29. [Google Scholar] [CrossRef]
Berkowitz, Z.; Zhang, X.; Richards, T.B.; Sabatino, S.A.; Peipins, L.A.; Smith, J.L. Multilevel Small Area Estimation for County-Level Prevalence of Mammography Use in the United States Using 2018 Data. J. Womens Health 2022, 32, 216–223. [Google Scholar] [CrossRef]
Sabatino, S.A.; Thompson, T.D.; White, M.C.; Shapiro, J.A.; Clarke, T.C.; Croswell, J.M.; Richardson, L.C. Cancer Screening Test Use-U.S., 2019. Am. J. Prev. Med. 2022, 63, 431. [Google Scholar] [CrossRef]
Gerlinger, M.; Rowan, A.J.; Horswell, S.; Math, M.; Larkin, J.; Endesfelder, D.; Gronroos, E.; Martinez, P.; Matthews, N.; Stewart, A.; et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 2012, 366, 883. [Google Scholar] [CrossRef]
Hinohara, K.; Polyak, K. Intratumoral Heterogeneity: More Than Just Mutations. Trends Cell Biol. 2019, 29, 569. [Google Scholar] [CrossRef]
Dagogo-Jack, I.; Shaw, A.T. Tumour heterogeneity and resistance to cancer therapies. Nat. Rev. Clin. Oncol. 2018, 15, 81. [Google Scholar] [CrossRef]
Polyak, K. Heterogeneity in breast cancer. J. Clin. Investig. 2011, 121, 3786. [Google Scholar] [CrossRef]
Zardavas, D.; Irrthum, A.; Swanton, C.; Piccart, M. Clinical management of breast cancer heterogeneity. Nat. Rev. Clin. Oncol. 2015, 12, 381. [Google Scholar] [CrossRef]
Welter, L.; Xu, L.; McKinley, D.; Dago, A.E.; Prabakar, R.K.; Restrepo-Vassalli, S.; Xu, K.; Rodriguez-Lee, M.; Kolatkar, A.; Nevarez, R.; et al. Treatment response and tumor evolution: Lessons from an extended series of multianalyte liquid biopsies in a metastatic breast cancer patient. Cold Spring Harb. Mol. Case Stud. 2020, 6, a005819. [Google Scholar] [CrossRef]
Fehm, T.; Müller, V.; Aktas, B.; Janni, W.; Schneeweiss, A.; Stickeler, E.; Lattrich, C.; Löhberg, C.R.; Solomayer, E.; Rack, B.; et al. HER2 status of circulating tumor cells in patients with metastatic breast cancer: A prospective, multicenter trial. Breast Cancer Res. Treat. 2010, 124, 403. [Google Scholar] [CrossRef]
Babayan, A.; Hannemann, J.; Spötter, J.; Müller, V.; Pantel, K.; Joosse, S.A. Heterogeneity of estrogen receptor expression in circulating tumor cells from metastatic breast cancer patients. PLoS ONE 2013, 8, e75038. [Google Scholar] [CrossRef]
Miyamoto, D.T.; Lee, R.J.; Stott, S.L.; Ting, D.T.; Wittner, B.S.; Ulman, M.; Smas, M.E.; Lord, J.B.; Brannigan, B.W.; Trautwein, J.; et al. Androgen receptor signaling in circulating tumor cells as a marker of hormonally responsive prostate cancer. Cancer Discov. 2012, 2, 995. [Google Scholar] [CrossRef]
Miyamoto, D.T.; Zheng, Y.; Wittner, B.S.; Lee, R.J.; Zhu, H.; Broderick, K.T.; Desai, R.; Fox, D.B.; Brannigan, B.W.; Trautwein, J.; et al. RNA-Seq of single prostate CTCs implicates noncanonical Wnt signaling in antiandrogen resistance. Science 2015, 349, 1351. [Google Scholar] [CrossRef]
Scher, H.I.; Lu, D.; Schreiber, N.A.; Louw, J.; Graf, R.P.; Vargas, H.A.; Johnson, A.; Jendrisak, A.; Bambury, R.; Danila, D.; et al. Association of AR-V7 on Circulating Tumor Cells as a Treatment-Specific Biomarker With Outcomes and Survival in Castration-Resistant Prostate Cancer. JAMA Oncol. 2016, 2, 1441. [Google Scholar] [CrossRef]
Guibert, N.; Delaunay, M.; Lusque, A.; Boubekeur, N.; Rouquette, I.; Clermont, E.; Mourlanette, J.; Gouin, S.; Dormoy, I.; Favre, G.; et al. PD-L1 expression in circulating tumor cells of advanced non-small cell lung cancer patients treated with nivolumab. Lung Cancer 2018, 120, 108–112. [Google Scholar] [CrossRef]
Alieva, M.; van Rheenen, J.; Broekman, M.L.D. Potential impact of invasive surgical procedures on primary tumor growth and metastasis. Clin. Exp. Metastasis 2018, 35, 319. [Google Scholar] [CrossRef]
Griffiths, J.I.; Chen, J.; Cosgrove, P.A.; O’Dea, A.; Sharma, P.; Ma, C.; Trivedi, M.; Kalinsky, K.; Wisinski, K.B.; O’Regan, R.; et al. Serial single-cell genomics reveals convergent subclonal evolution of resistance as early-stage breast cancer patients progress on endocrine plus CDK4/6 therapy. Nat. Cancer 2021, 2, 658. [Google Scholar] [CrossRef]
Harbeck, N.; Penault-Llorca, F.; Cortes, J.; Gnant, M.; Houssami, N.; Poortmans, P.; Ruddy, K.; Tsang, J.; Cardoso, F. Breast cancer. Nat. Rev. Dis. Primers 2019, 5, 66. [Google Scholar] [CrossRef]
Almendro, V.; Marusyk, A.; Polyak, K. Cellular heterogeneity and molecular evolution in cancer. Annu. Rev. Pathol. 2013, 8, 277–302. [Google Scholar] [CrossRef]
Fazel, R.; Krumholz, H.M.; Wang, Y.; Ross, J.S.; Chen, J.; Ting, H.H.; Shah, N.D.; Nasir, K.; Einstein, A.J.; Nallamothu, B.K. Exposure to low-dose ionizing radiation from medical imaging procedures. N. Engl. J. Med. 2009, 361, 849. [Google Scholar] [CrossRef]
Setayesh, S.M.; Hart, O.; Naghdloo, A.; Higa, N.; Nieva, J.; Lu, J.; Hwang, S.; Wilkinson, K.; Kidd, M.; Anderson, A.; et al. Multianalyte liquid biopsy to aid the diagnostic workup of breast cancer. NPJ Breast Cancer 2022, 8, 112. [Google Scholar] [CrossRef]
Shishido, S.N.; Welter, L.; Rodriguez-Lee, M.; Kolatkar, A.; Xu, L.; Ruiz, C.; Gerdtsson, A.S.; Restrepo-Vassalli, S.; Carlsson, A.; Larsen, J.; et al. Preanalytical Variables for the Genomic Assessment of the Cellular and Acellular Fractions of the Liquid Biopsy in a Cohort of Breast Cancer Patients. J. Mol. Diagn. 2020, 22, 319. [Google Scholar] [CrossRef]
Shishido, S.N.; Lin, E.; Nissen, N.; Courcoubetis, G.; Suresh, D.; Mason, J.; Osipov, A.; Hendifar, A.E.; Lewis, M.; Gaddam, S.; et al. Cancer-related cells and oncosomes in the liquid biopsy of pancreatic cancer patients undergoing surgery. NPJ Precis. Oncol. 2024, 8, 36. [Google Scholar] [CrossRef]
Ghoreifi, A.; Shishido, S.N.; Sayeed, S.; Courcoubetis, G.; Huang, A.; Schuckman, A.; Aron, M.; Desai, M.; Daneshmand, S.; Gill, I.S.; et al. Blood-based liquid biopsy: A promising noninvasive test in diagnosis, surveillance, and prognosis of patients with upper tract urothelial carcinoma. Urol. Oncol. 2024, 42, 118.e9. [Google Scholar] [CrossRef]
Shishido, S.N.; Ghoreifi, A.; Sayeed, S.; Courcoubetis, G.; Huang, A.; Ye, B.; Mrutyunjaya, S.; Gill, I.S.; Kuhn, P.; Mason, J.; et al. Liquid Biopsy Landscape in Patients with Primary Upper Tract Urothelial Carcinoma. Cancers 2022, 14, 3007. [Google Scholar] [CrossRef]
Shishido, S.N.; Sayeed, S.; Courcoubetis, G.; Djaladat, H.; Miranda, G.; Pienta, K.J.; Nieva, J.; Hansel, D.E.; Desai, M.; Gill, I.S.; et al. Characterization of Cellular and Acellular Analytes from Pre-Cystectomy Liquid Biopsies in Patients Newly Diagnosed with Primary Bladder Cancer. Cancers 2022, 14, 758. [Google Scholar] [CrossRef]
Bai, L.; Courcoubetis, G.; Mason, J.; Hicks, J.B.; Nieva, J.; Kuhn, P.; Shishido, S.N. Longitudinal tracking of circulating rare events in the liquid biopsy of stage III-IV non-small cell lung cancer patients. Discov. Oncol. 2024, 15, 142. [Google Scholar] [CrossRef]
Narayan, S.; Courcoubetis, G.; Mason, J.; Naghdloo, A.; Kolenčík, D.; Patterson, S.D.; Kuhn, P.; Shishido, S.N. Defining A Liquid Biopsy Profile of Circulating Tumor Cells and Oncosomes in Metastatic Colorectal Cancer for Clinical Utility. Cancers 2022, 14, 4891. [Google Scholar] [CrossRef]
Resnick, K.; Shah, A.; Mason, J.; Kuhn, P.; Nieva, J.; Shishido, S.N. Circulation of rare events in the liquid biopsy for early detection of lung mass lesions. Thorac. Cancer 2024, 15, 2100. [Google Scholar] [CrossRef]
Chai, S.; Matsumoto, N.; Storgard, R.; Peng, C.C.; Aparicio, A.; Ormseth, B.; Rappard, K.; Cunningham, K.; Kolatkar, A.; Nevarez, R.; et al. Platelet-Coated Circulating Tumor Cells Are a Predictive Biomarker in Patients with Metastatic Castrate-Resistant Prostate Cancer. Mol. Cancer Res. 2021, 19, 2036. [Google Scholar] [CrossRef]
Marrinucci, D.; Bethel, K.; Kolatkar, A.; Luttgen, M.S.; Malchiodi, M.; Baehring, F.; Voigt, K.; Lazar, D.; Nieva, J.; Bazhenova, L.; et al. Fluid biopsy in patients with metastatic prostate, pancreatic and breast cancers. Phys. Biol. 2012, 9, 016003. [Google Scholar] [CrossRef]
Singh, P.; Gupta, S.; Gupta, V. Multi-objective hyperparameter optimization on gradient-boosting for breast cancer detection. Int. J. Syst. Assur. Eng. Manag. 2024, 15, 1676. [Google Scholar] [CrossRef]
Nematzadeh, Z.; Ibrahim, R.; Selamat, A. Comparative studies on breast cancer classifications with k-fold cross validations using machine learning techniques. In Proceedings of the 2015 10th Asian Control Conference (ASCC), Kota Kinabalu, Malaysia, 31 May–3 June 2015. [Google Scholar]
Prusty, S.; Patnaik, S.; Dash, S.K. SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. Front. Nanotechnol. 2022, 4, 972421. [Google Scholar] [CrossRef]
Shekar, B.H.; Dagnew, G. Grid Search-Based Hyperparameter Tuning and Classification of Microarray Cancer Data. In Proceedings of the 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Gangtok, India, 25–28 February 2019. [Google Scholar]
Dasgupta, A.; Lim, A.R.; Ghajar, C.M. Circulating and disseminated tumor cells: Harbingers or initiators of metastasis? Mol. Oncol. 2017, 11, 40. [Google Scholar] [CrossRef]
Chai, S.; Ruiz-Velasco, C.; Naghdloo, A.; Pore, M.; Singh, M.; Matsumoto, N.; Kolatkar, A.; Xu, L.; Shishido, S.; Aparicio, A.; et al. Identification of epithelial and mesenchymal circulating tumor cells in clonal lineage of an aggressive prostate cancer case. NPJ Precis. Oncol. 2022, 6, 41. [Google Scholar] [CrossRef]
Welter, L.; Zheng, S.; Setayesh, S.M.; Morikado, M.; Agrawal, A.; Nevarez, R.; Naghdloo, A.; Pore, M.; Higa, N.; Kolatkar, A.; et al. Cell State and Cell Type: Deconvoluting Circulating Tumor Cell Populations in Liquid Biopsies by Multi-Omics. Cancers 2023, 15, 3949. [Google Scholar] [CrossRef]
Seo, J.; Kumar, M.; Mason, J.; Blackhall, F.; Matsumoto, N.; Dive, C.; Hicks, J.; Kuhn, P.; Shishido, S.N. Plasticity of circulating tumor cells in small cell lung cancer. Sci. Rep. 2023, 13, 11775. [Google Scholar] [CrossRef]
Ramirez, A.B.; Bhat, R.; Sahay, D.; De Angelis, C.; Thangavel, H.; Hedayatpour, S.; Dobrolecki, L.E.; Nardone, A.; Giuliano, M.; Nagi, C.; et al. Circulating tumor cell investigation in breast cancer patient-derived xenograft models by automated immunofluorescence staining, image acquisition, and single cell retrieval and analysis. BMC Cancer 2019, 19, 220. [Google Scholar] [CrossRef]
Ianevski, A.; Giri, A.K.; Aittokallio, T. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nat. Commun. 2022, 13, 1246. [Google Scholar] [CrossRef]
Mousavikhamene, Z.; Sykora, D.J.; Mrksich, M.; Bagheri, N. Morphological features of single cells enable accurate automated classification of cancer from non-cancer cell lines. Sci. Rep. 2021, 11, 24375. [Google Scholar] [CrossRef]
Alhammad, L.A.; Ainosah, T.K.; Ahmad, A.M.; Samarkandi, M.S.; Jawi, N.H.; Alharthi, M.A.; Alsharif, A.M.; Anazi, E.A.A.; Aldugeshem, S.A.; Johali, F.Y. The impact of laboratory automation on efficiency and accuracy in healthcare settings. Int. J. Community Med. Public Health 2023, 11, 459. [Google Scholar] [CrossRef]
Holland, I.; Davies, J.A. Automation in the Life Science Research Laboratory. Front. Bioeng. Biotechnol. 2020, 8, 571777. [Google Scholar] [CrossRef]
Mrazek, C.; Lippi, G.; Keppel, M.H.; Felder, T.K.; Oberkofler, H.; Haschke-Becher, E.; Cadamuro, J. Errors within the total laboratory testing process, from test selection to medical decision-making—A review of causes, consequences, surveillance and solutions. Biochem. Med. 2020, 30, 020502. [Google Scholar] [CrossRef]
Al-Antari, M.A. Artificial Intelligence for Medical Diagnostics-Existing and Future AI Technology! Diagnostics 2023, 13, 688. [Google Scholar] [CrossRef]

Figure 1. UMAP visualization of rare cell morphometrics for each cohort color coded by channel-type classification. Generated using cellular and nuclear area and mean channel intensity for each fluorescent channel.

Figure 2. Cohort level analysis of rare cells using the automated approach. Box plot analysis of channel-type rare cell enumerations (log counts) detected in each cohort. Pairwise comparison of each predicted cell type was statistically compared via Mann–Whitney U test.

Table 1. Performance metrics for the automated rare cell identification model at various confidence thresholds.

Confidence Thresholds	50%	60%	70%	80%	90%
Accuracy (%)	96.5	97.1	97.7	98.3	98.9
Precision (%)	37.1	42.0	47.7	55.7	68.4
Sensitivity (%)	97.6	96.6	95.4	92.8	85.5
Specificity (%)	96.4	97.1	97.7	98.4	99.1
Average False Negative (Rare Events Missed)	1.4	2.0	2.7	4.2	8.6
Average False Positive (Common Predicted Rare)	97.6	78.9	61.7	43.6	23.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shishido, S.N.; Courcoubetis, G.; Kuhn, P.; Mason, J. Automated Single-Cell Analysis in the Liquid Biopsy of Breast Cancer. Cancers 2025, 17, 2779. https://doi.org/10.3390/cancers17172779

AMA Style

Shishido SN, Courcoubetis G, Kuhn P, Mason J. Automated Single-Cell Analysis in the Liquid Biopsy of Breast Cancer. Cancers. 2025; 17(17):2779. https://doi.org/10.3390/cancers17172779

Chicago/Turabian Style

Shishido, Stephanie N., George Courcoubetis, Peter Kuhn, and Jeremy Mason. 2025. "Automated Single-Cell Analysis in the Liquid Biopsy of Breast Cancer" Cancers 17, no. 17: 2779. https://doi.org/10.3390/cancers17172779

APA Style

Shishido, S. N., Courcoubetis, G., Kuhn, P., & Mason, J. (2025). Automated Single-Cell Analysis in the Liquid Biopsy of Breast Cancer. Cancers, 17(17), 2779. https://doi.org/10.3390/cancers17172779

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Single-Cell Analysis in the Liquid Biopsy of Breast Cancer

Simple Summary

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design, Patient Information

2.2. LBx Acquisition, Processing, and Cryobanking

2.3. Staining, Scanning, and Pre-Processing

2.4. Rare Event Detection, Identification, Classification, and Enumeration

2.5. Cellular Morphometric Comparison and Statistical Analysis

3. Results

3.1. Development of Automated Rare Cell Stratification Model

3.2. Morphometric Analysis

3.3. Cohort Level Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI