Establishing a Highly Accurate Circulating Tumor Cell Image Recognition System for Human Lung Cancer by Pre-Training on Lung Cancer Cell Lines
Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. CTC-Chip Preparation
2.2. Cell Lines
2.3. Clinical Samples
2.4. Fixation, Staining, and CTC Identification on the Chip
2.5. Processing of Captured Images Using AI and Classification by Computation
2.6. Data Augmentation
2.7. Hardware and Software Environment Used for Computation
2.8. Acquisition of Pre-Trained Models
2.8.1. Data Conditions for Pre-Training
2.8.2. Addressing Data Imbalance During Pre-Training
2.8.3. Model Conditions for Pre-Training
2.9. Evaluation of Transfer Learning
2.9.1. Comparison of Classification Accuracy Between Models with and Without Transfer Learning (Using Clinical Data Only)
- •
- χData Splitting: CTCs and non-CTCs obtained from clinical samples were divided into training and testing datasets. The model was trained using the training data, and its accuracy was evaluated using the test data. One sample comprised the entire process, from data splitting to accuracy evaluation.
- •
- ϕEvaluation Method: Training and testing were repeated on 100 samples to assess the classification accuracy. The accuracy was used as an evaluation metric.
- •
- αData: The numbers of CTCs and other cells obtained from the patient samples were divided into training and test datasets, which are shown in Table 1. A total of 201 CTCs were visually identified in clinical samples. Although non-CTCs were more abundant, 201 images were randomly selected to match the number of CTCs, thereby preventing data imbalance.
- •
- ρHyperparameters: The CNN model structure and epoch count were adjusted during pre-training. The hyperparameters, including the learning rate, batch size, and dropout rate, were optimized using a systematic grid search, evaluating multiple epoch numbers and sampling methods to determine the most effective parameters based on validation accuracy.
2.9.2. Comparison of Classification Accuracy Between Models with and Without Transfer Learning (Using Only Pre-Training)
2.10. Statistical Analysis
3. Results and Discussion
3.1. Does Training on Cell Lines Actually Help?
- •
- Classification accuracy improved for all training sample sizes when transfer learning was performed.
- •
- A statistically significant difference was observed between the two groups, particularly when the number of training samples was 19 or fewer.
3.2. Does Training on Cell Lines Negatively Affect Performance?
- •
- Minimum classification accuracy with transfer learning: 98.33%.
- •
- Minimum classification accuracy without transfer learning: 97.5%.
- •
- Accuracy improvement with transfer learning: 0.83%.
- •
- With transfer learning: 99.51%.
- •
- Without transfer learning: 99.46%.
3.3. Is Pre-Training on Cell Lines Alone Sufficient?
- •
- Average recognition accuracy with pre-training only: 96.96%.
- •
- Average recognition accuracy with transfer learning: 99.51%.
- •
- Accuracy improvement with transfer learning: 2.55% (statistically significant).
- •
- Minimum recognition accuracy improvement: from 84.16% to 96.67% (+12.51%).
3.4. Effect of Training Sample Size
3.5. Biological Differences Between Cell Lines and Clinical Lung Cancer CTCs
3.6. Limitations
- Bias in cell line selection: This study focused on pre-training using lung cancer cell lines; however, further validation is required to determine whether the same approach applies to other cancer types and cell lines. For example, verifying the effectiveness of this method in CTC detection systems for breast or colorectal cancer would help establish its generalizability.
- Diversity of clinical sample data: The clinical samples used in this study were collected under limited conditions that do not fully reflect the diversity encountered in real-world clinical settings. It is necessary to investigate how differences in patient backgrounds (e.g., age, sex, and disease stage) and variations in sample collection protocols across different institutions may have affected the results.
- Limited dataset size: Although this study demonstrated the effectiveness of pre-training and transfer learning, a larger dataset is needed for further validation. Particularly, multi-institutional collaboration is crucial for collecting diverse patient samples and conducting comprehensive evaluations to enhance the clinical applicability of the model.
- Impact on the overall CTC detection process: This study primarily focused on the effectiveness of transfer learning in image classification but did not evaluate its impact on the entire CTC detection pipeline, including the capture, staining, and data acquisition processes. Therefore, a more comprehensive investigation is required to assess these aspects.
- Evaluation of other deep learning architectures: Future studies should explore and compare alternative deep learning architectures, such as Vision Transformer or hybrid models (CNN + LSTM), to potentially improve classification accuracy and yield new insights into CTC recognition. Comparative studies will elucidate the optimal architecture suitable for specific imaging tasks and conditions.
3.7. Future Directions
3.8. Direct Comparison with Expert Pathologist Evaluation
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
CK | Cytokeratin |
CNN | Convolutional neural networks |
CTC | Circulating Tumor Cell |
CUDA | Compute Unified Device Architecture |
IgC | Immunoglobulin G |
RPMI | Roswell Park Memorial Institute (medium) |
SGD | Stochastic Gradient Descent |
V-RAM | Video Random Access Memory |
References
- Castro-Giner, F.; Aceto, N. Tracking cancer progression: From circulating tumor cells to metastasis. Genome Med. 2020, 12, 31. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; Liao, K.; Yang, X.; Wu, C.; Wu, W. Using single-cell sequencing technology to detect circulating tumor cells in solid tumors. Mol. Cancer 2021, 20, 104. [Google Scholar] [CrossRef] [PubMed]
- Hamilton, G.; Rath, B.; Stickler, S. Significance of circulating tumor cells in lung cancer: A narrative review. Transl. Lung Cancer Res. 2023, 12, 877–894. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Bai, L.; Kong, L.; Guo, Z. Advances in circulating tumor cells for early detection, prognosis and metastasis reduction in lung cancer. Front. Oncol. 2024, 14, 1411731. [Google Scholar] [CrossRef]
- Chen, M.; Xu, K.; Li, B.; Wang, N.; Zhang, Q.; Chen, L.; Zhang, D.; Yang, L.; Xu, Z.; Xu, H. HMGA1 regulates the stem cell-like properties of circulating tumor cells from GIST patients via Wnt/β-catenin pathway. Onco Targets Ther. 2020, 13, 4943–4956. [Google Scholar] [CrossRef]
- Childs, A.; Steele, C.D.; Vesely, C.; Rizzo, F.M.; Ensell, L.; Lowe, H.; Dhami, P.; Vaikkinen, H.; Luong, T.V.; Conde, L.; et al. Whole-genome sequencing of single circulating tumor cells from neuroendocrine neoplasms. Endocr. Relat. Cancer 2021, 28, 631–644. [Google Scholar] [CrossRef]
- Wang, C.; Luo, Q.; Huang, W.; Zhang, C.; Liao, H.; Chen, K.; Pan, M. Correlation between circulating tumor cell DNA genomic alterations and mesenchymal CTCs or CTC-associated white blood cell clusters in hepatocellular carcinoma. Front. Oncol. 2021, 11, 686365. [Google Scholar] [CrossRef]
- Laprovitera, N.; Salamon, I.; Gelsomino, F.; Porcellini, E.; Riefolo, M.; Garonzi, M.; Tononi, P.; Valente, S.; Sabbioni, S.; Fontana, F.; et al. Genetic characterization of cancer of unknown primary using liquid biopsy approaches. Front. Cell Dev. Biol. 2021, 9, 666156. [Google Scholar] [CrossRef]
- Cappelletti, V.; Verzoni, E.; Ratta, R.; Vismara, M.; Silvestri, M.; Montone, R.; Miodini, P.; Reduzzi, C.; Claps, M.; Sepe, P.; et al. Analysis of single circulating tumor cells in renal cell carcinoma reveals phenotypic heterogeneity and genomic alterations related to progression. Int. J. Mol. Sci. 2020, 21, 1475. [Google Scholar] [CrossRef]
- Allard, W.J.; Matera, J.; Miller, M.C.; Repollet, M.; Connelly, M.C.; Rao, C.; Tibbe, A.G.J.; Uhr, J.W.; Terstappen, L.W.M.M. Tumor cells circulate in the peripheral blood of all major carcinomas but not in healthy subjects or patients with nonmalignant diseases. Clin. Cancer Res. 2004, 10, 6897–6904. [Google Scholar] [CrossRef]
- Riethdort, S.; Fritsche, H.; Müller, V.; Rau, T.; Schindlbeck, C.; Rack, B.; Janni, W.; Coith, C.; Beck, K.; Jänicke, F. Detection of circulating tumor cells in peripheral blood of patients with metastatic breast cancer: A validation study of the CellSearch System. Clin. Cancer Res. 2007, 13, 920–928. [Google Scholar] [CrossRef] [PubMed]
- Tanaka, F.; Yoneda, K.; Kondo, N.; Hashimoto, M.; Takuwa, T.; Matsumoto, S.; Okumura, Y.; Rahman, S.; Tsubota, N.; Tsujimura, T.; et al. Circulating tumor cell as a diagnostic marker in primary lung cancer. Clin. Cancer Res. 2009, 15, 6980–6986. [Google Scholar] [CrossRef] [PubMed]
- Ohnaga, T.; Shimada, Y.; Moriyama, M.; Kishi, H.; Obata, T.; Takata, K.; Okumura, T.; Nagata, T.; Muraguchi, A.; Tsukada, K. Polymeric microfluidic devices exhibiting sufficient capture of cancer cell line for isolation of circulating tumor cells. Biomed. Microdevices 2013, 15, 611–616. [Google Scholar] [CrossRef] [PubMed]
- Chikaishi, Y.; Yoneda, K.; Ohnaga, T.; Tanaka, F. EpCAM-independent capture of circulating tumor cells with a ‘universal CTC-chip’. Oncol. Rep. 2017, 37, 77–82. [Google Scholar] [CrossRef]
- Kanayama, M.; Kuwata, T.; Mori, M.; Nemoto, Y.; Nishizawa, N.; Oyama, R.; Matsumiya, H.; Taira, A.; Shinohara, S.; Takenaka, M.; et al. Prognostic impact of circulating tumor cells detected with the microfluidic “universal CTC-chip” for primary lung cancer. Cancer Sci. 2022, 113, 1028–1037. [Google Scholar] [CrossRef]
- Toseland, C.P. Fluorescent labeling and modification of proteins. J. Chem. Biol. 2013, 6, 85–95. [Google Scholar] [CrossRef]
- Kulkarni, M.B.; Reed, M.S.; Cao, X.; García, H.A.; Ochoa, M.I.; Jiang, S.; Hasan, T.; Doyley, M.M.; Pogue, B.W. Combined dual-channel fluorescence depth sensing of indocyanine green and protoporphyrin IX kinetics in subcutaneous murine tumors. J. Biomed. Opt. 2025, 30 (Suppl 1), S13709. [Google Scholar] [CrossRef]
- Lannin, T.B.; Thege, F.I.; Kirby, B.J. Comparison and optimization of machine learning methods for automated classification of circulating tumor cells. Cytom. A. 2016, 89, 922–931. [Google Scholar] [CrossRef]
- Toratani, M.; Konno, M.; Asai, A.; Koseki, J.; Kawamoto, K.; Tamari, K.; Li, Z.; Sakai, D.; Kudo, T.; Satoh, T.; et al. A convolutional neural network uses microscopic images to differentiate between mouse and human cell lines and their radioresistant clones. Cancer Res. 2018, 78, 6703–6707. [Google Scholar] [CrossRef]
- Zeune, L.L.; Boink, Y.E.; van Dalum, G.; Nanou, A.; de Wit, S.; Andree, K.C.; Swennenhuis, J.F.; van Gils, S.A.; Terstappen, L.W.M.M.; Brune, C. Deep learning of circulating tumour cells. Nat. Mach. Intell. 2020, 2, 124–133. [Google Scholar] [CrossRef]
- Wang, S.; Zhou, Y.; Qin, X.; Nair, S.; Huang, X.; Liu, Y. Label-free detection of rare circulating tumor cells by image analysis and machine learning. Sci. Rep. 2020, 10, 12226. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Wang, X.; Zhang, K.; Fung, K.M.; Thai, T.C.; Moore, K.; Mannel, R.S.; Liu, H.; Zheng, B.; Qiu, Y. Recent advances and clinical applications of deep learning in medical image analysis. Med. Image Anal. 2022, 79, 102444. [Google Scholar] [CrossRef] [PubMed]
- He, B.; Lu, Q.; Lang, J.; Yu, H.; Peng, C.; Bing, P.; Li, S.; Zhou, Q.; Liang, Y.; Tian, G. A new method for CTC images recognition based on machine learning. Front. Bioeng. Biotechnol. 2020, 8, 897. [Google Scholar] [CrossRef] [PubMed]
- Hong, B.; Zu, Y. Detecting circulating tumor cells: Current challenges and new trends. Theranostics 2013, 3, 377–394. [Google Scholar] [CrossRef]
- Paterlini-Brechot, P.; Benali, N.L. Circulating tumor cells (CTC) detection: Clinical impact and future directions. Cancer Lett. 2007, 253, 180–204. [Google Scholar] [CrossRef]
- Alunni-Fabbroni, M.; Sandri, M.T. Circulating tumour cells in clinical practice: Methods of detection and possible characterization. Methods 2010, 50, 289–297. [Google Scholar] [CrossRef]
- Alix-Panabières, C.; Pantel, K. Challenges in circulating tumour cell research. Nat. Rev. Cancer 2014, 14, 623–631. [Google Scholar] [CrossRef]
- Akashi, T.; Okumura, T.; Terabayashi, K.; Yoshino, Y.; Tanaka, H.; Yamazaki, T.; Numata, Y.; Fukuda, T.; Manabe, T.; Baba, H.; et al. The use of an artificial intelligence algorithm for circulating tumor cell detection in patients with esophageal cancer. Oncol. Lett. 2023, 26, 320. [Google Scholar] [CrossRef]
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
- Yan, Y.; Chen, M.; Shyu, M.L.; Chen, S.C. Deep learning for imbalanced multimedia data classification. In Proceedings of the 2015 IEEE International Symposium on Multimedia, Miami, FL, USA, 14–16 December 2015; pp. 483–488. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Bai, J.; Al-Sabaawi, A.; Santamaría, J.; Albahri, A.S.; Al-Dabbagh, B.S.N.; Fadhel, M.A.; Manoufali, M.; Zhang, J.; Al-Timemy, A.H.; et al. A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications. J. Big Data. 2023, 10, 46. [Google Scholar] [CrossRef]
- Pullikuth, A.K.; Routh, E.D.; Zimmerman, K.D.; Chifman, J.; Chou, J.W.; Soike, M.H.; Jin, G.; Su, J.; Song, Q.; Black, M.A.; et al. Bulk and single-cell profiling of breast tumors identifies TREM-1 as a dominant immune suppressive marker associated with poor outcomes. Front. Oncol. 2021, 11, 734959. [Google Scholar] [CrossRef]
Types of Cancer Cell Lines | |||||
---|---|---|---|---|---|
Cancer Cell Lines | Healthy Controls | ||||
Cell types | A549 | H441 | PC9 | Healthy control 1 | Healthy control 2 |
Number of samples obtained | 4289 | 1555 | 3412 | 21,648 | 9287 |
Number of pre-training samples | |||||
Cancer cell lines | Healthy controls | ||||
Training samples | 9106 | 30,785 |
Unchanged Hyperparameters | |
---|---|
Iteration | 20 |
Learning | 0.001 |
Optimizer | SGD (momentum: 0.9) |
Loss function | Cross-entropy |
Changed hyperparameters | |
Epoch | 2, 6, 12, 25, 50, 100, and 300 |
Sampling method | Balanced minibatch, All data |
Author (Year) | Methodology | Cancer Type | Training Data | Accuracy |
---|---|---|---|---|
He et al. (2020) [23] | Machine learning | General cancers | Clinical images | 92.5% |
Zeune et al. (2020) [20] | Deep learning | Various cancers | Clinical images | 93.8% |
Wang et al. (2020) [21] | Label-free detection and deep learning | General cancers | Clinical images | 94.3% |
Akashi et al. (2023) [28] | AI-based detection | Esophageal cancer | Clinical images | 95.0% |
Matsumiya et al. (Present study) | Transfer learning CNN | Lung cancer | Cell lines + Clinical images | 99.51% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Matsumiya, H.; Terabayashi, K.; Kishi, Y.; Yoshino, Y.; Mori, M.; Kanayama, M.; Oyama, R.; Nemoto, Y.; Nishizawa, N.; Honda, Y.; et al. Establishing a Highly Accurate Circulating Tumor Cell Image Recognition System for Human Lung Cancer by Pre-Training on Lung Cancer Cell Lines. Cancers 2025, 17, 2289. https://doi.org/10.3390/cancers17142289
Matsumiya H, Terabayashi K, Kishi Y, Yoshino Y, Mori M, Kanayama M, Oyama R, Nemoto Y, Nishizawa N, Honda Y, et al. Establishing a Highly Accurate Circulating Tumor Cell Image Recognition System for Human Lung Cancer by Pre-Training on Lung Cancer Cell Lines. Cancers. 2025; 17(14):2289. https://doi.org/10.3390/cancers17142289
Chicago/Turabian StyleMatsumiya, Hiroki, Kenji Terabayashi, Yusuke Kishi, Yuki Yoshino, Masataka Mori, Masatoshi Kanayama, Rintaro Oyama, Yukiko Nemoto, Natsumasa Nishizawa, Yohei Honda, and et al. 2025. "Establishing a Highly Accurate Circulating Tumor Cell Image Recognition System for Human Lung Cancer by Pre-Training on Lung Cancer Cell Lines" Cancers 17, no. 14: 2289. https://doi.org/10.3390/cancers17142289
APA StyleMatsumiya, H., Terabayashi, K., Kishi, Y., Yoshino, Y., Mori, M., Kanayama, M., Oyama, R., Nemoto, Y., Nishizawa, N., Honda, Y., Kuwata, T., Takenaka, M., Chikaishi, Y., Yoneda, K., Kuroda, K., Ohnaga, T., Sasaki, T., & Tanaka, F. (2025). Establishing a Highly Accurate Circulating Tumor Cell Image Recognition System for Human Lung Cancer by Pre-Training on Lung Cancer Cell Lines. Cancers, 17(14), 2289. https://doi.org/10.3390/cancers17142289