Deep Learning-Based In Vitro Detection Method for Cellular Impurities in Human Cell-Processed Therapeutic Products

Featured Application: The described method can be applied for detecting cellular impurities in images. Abstract: Automated detection of impurities is in demand for evaluating the quality and safety of human cell-processed therapeutic products in regenerative medicine. Deep learning (DL) is a powerful method for classifying and recognizing images in cell biology, diagnostic medicine, and other ﬁelds because it automatically extracts the features from complex cell morphologies. In the present study, we construct prediction models that recognize cancer-cell contamination in continuous long-term (four-day) cell cultures. After dividing the whole dataset into Early- and Late-stage cell images, we found that Late-stage images improved the DL performance. The performance was further improved by optimizing the DL hyperparameters (batch size and learning rate). These ﬁndings are ﬁrst report for the implement of DL-based systems in disease cell-type classiﬁcation of human cell-processed therapeutic products (hCTPs), that are expected to enable the rapid, automatic classiﬁcation of induced pluripotent stem cells and other cell treatments for life-threatening or chronic diseases.


Introduction
In regenerative medicine, human cell-processed therapeutic products (hCTPs) derived from human pluripotent stem cells (hPSCs), such as induced pluripotent stem cells (iPSCs) and embryonic stem cells, are promising treatment modalities for life-threatening or incurable diseases owing to their pleiotropic effects. Specifically, living cells have infinite self-renewal capacity and can differentiate into various cell types, providing new resources with high efficiency [1][2][3][4]. However, the difficulty of managing the quality of the new cells is problematic because quality is highly correlated with the efficacy and safety of the hCTPs, which is demanded for the stable supply of clinical needs. The generation protocol of hPSCs must be down-scaled for monolayer cell culture conditions, and tumor formation has been reported in patients receiving transplantation of human somatic cells [1,2,[5][6][7][8][9][10][11][12][13][14]. By integrating control systems, we could realize the automatic management of raw materials, intermediates, final products, and process parameters for good manufacturing practice on a simple and cost-effective platform [15][16][17][18][19][20]. In particular, hCTPs can become contaminated with tumor cells and other abnormal cells during the manufacturing process, which hinders their application in regenerative medicine. During long-term suboptimal culturing, contaminants are generated by two main mechanisms: (1) spontaneous transformation of the original component cells induced by accumulation of abnormal karyotypes and (2) cross-contamination with other cell types such as fibrosarcoma, osteosarcoma, and glioma cell lines. Therefore, establishing a highly accurate method that detects abnormal cells at each step of the hCTP production process is essential [10,11,[21][22][23][24][25][26][27][28][29][30][31]. For quality and safety reasons, hCTPs are comprehensively evaluated in multiple tumorigenicity-related tests to remove the risk of tumorigenic cells in transplants [1,[9][10][11][32][33][34]. However, the throughput, accuracy, and cost of these time-consuming tests must be improved before administering hCTPs to humans. This problem might be resolved by deep learning (DL) approaches that automatically and rapidly classify different cell types by detecting their small differences among a large number of phase-contrast cell images. DL algorithms can morphologically classify cells into fibroblast-like (elongated), epithelial-like (polygonal), and lymphoblast-like (spherical). The main component in DL algorithms is a convolutional neural network (CNN), which has proven successful in the classification and semantic segmentation of live cell images [31,[35][36][37][38][39][40][41][42][43]. The other classification approach is conventional machine learning (ML), which differentiates cellular morphologies, predicts cell differentiation potential, and provides non-invasive evaluations of human iPSCs. When surveying cellular conditions, ML requires trial-by-error feature extraction, feature selection, and classification steps. DL generally outperforms ML because it leverages the powerful capability of the CNN to automatically extract and learn the image data representing cells with complicated and uniformed morphologies [44][45][46][47][48][49]. CNNs have become useful tools for quality control of cultured cells because they successfully discriminate between abnormal and normal cells. After training, a CNN automatically classifies and recognizes cells in different types of images [31,[50][51][52][53][54][55][56][57][58][59]. For example, Toratani et al. [53] recently reported applying a CNN to microscopic images of patient-derived cancer and radiation-resistant cells to determine the effectiveness of radiation therapy. CNNs make it possible to discriminate small differences in cell images that cannot be discriminated by the human eye. Novel technologies based on CNNs are expected that can use images to predict not only the cell type but also its responsiveness to treatment. Isozaki et al. [60] reported a DL-based technology that uses image analysis to identify individual cells at high speed. This technology can be used to image large cell populations at high speed and to discriminate specific cells in real time based on information processing technology.
In an experiment to demonstrate the versatility of their technology, they imaged cells of different sizes (3-30 µm) at high speed and identified potential cancer cells circulating in the blood of cancer patients. However, a manufacturing process has not yet been developed that can ensure high-quality and safe hCTP production.
In this study, we constructed model systems for distinguishing abnormal cervical cancer cells (HeLa-GFP) when mixed together with human embryonic lung fibroblasts (MRC-5). Images of the cells at different stages of a continuous culture period were input to a DL algorithm. The prediction performance was improved in the late stage of the culture period. The prediction performances of the constructed models were further improved by optimizing the DL hyperparameters, namely, the batch size (BS) and learning rate (LR). These results suggest that DL-based prediction models can effectively classify different cell types in cell images; in particular, DL can automatically extract the meaningful features from complex morphological cell information.

Preparation of the Image Data of Culture Cells
The images of the MRC-5 cells alone and co-cultured MRC-5 cells and HeLa-GFP cells were taken on a Confocal Quantitative Image Cytometer, CQ1 (Yokogawa Electric Corporation, Tokyo, Japan) every 20 min for 4 days. Thirty positive areas (pArea) in which normal cells and cancer cells coexist in co-cultured wells and 60 negative areas (nArea) in which normal cells alone exist in monoculture wells were excised from whole-well bright-field images over time acquired with CQ1, as inputs for a DL algorithm ( Figure  1b). The dataset was split into training (Tra) and validation (Val) datasets at a ratio of 1:1. The Total dataset of the cell culture period (Days 0-4) comprised 4320 images in each of the Tra and Val datasets of the positive area and 8640 images in each of the Tra and Val datasets of the negative area (Table S1). The Total dataset was then divided into two groups. The  (Table 1).

Preparation of the Image Data of Culture Cells
The images of the MRC-5 cells alone and co-cultured MRC-5 cells and HeLa-GFP cells were taken on a Confocal Quantitative Image Cytometer, CQ1 (Yokogawa Electric Corporation, Tokyo, Japan) every 20 min for 4 days. Thirty positive areas (pArea) in which normal cells and cancer cells coexist in co-cultured wells and 60 negative areas (nArea) in which normal cells alone exist in monoculture wells were excised from whole-well brightfield images over time acquired with CQ1, as inputs for a DL algorithm (Figure 1b). The dataset was split into training (Tra) and validation (Val) datasets at a ratio of 1:1. The Total dataset of the cell culture period (Days 0-4) comprised 4320 images in each of the Tra and Val datasets of the positive area and 8640 images in each of the Tra and Val datasets of the negative area (Table S1). The Total dataset was then divided into two groups. The  (Table 1). Figure 1. The culture model for assessing the presence of abnormal cells in a product. In this suspension culture system, MRC-5, normal cells as product model, and HeLa-GFP, abnormal cells as cellular impurity model, were co-cultured (a). Positive areas (pArea) in which normal cells and cancer cells coexist in co-cultured wells and negative areas (nArea) in which normal cells alone exist in monoculture wells were excised from whole-well bright-field images over time in each well, as inputs for a DL algorithm (b).  Positive areas (pArea) in which normal cells and cancer cells coexist in co-cultured wells and negative areas (nArea) in which normal cells alone exist in monoculture wells were excised from whole-well bright-field images over time in each well, as inputs for a DL algorithm (b).

Deep Learning
All image files produced using fluorescence microscopy were resized using NVIDIA DL GPU Training System (DIGITS) version 4.0.0 software (NVIDIA, Santa Clara, CA, USA). The final image files, which had a fixed resolution of 256 × 256 pixels, were used as input for DL. The DL process included data management, design, and training of the neural networks on four-GPU systems (Tesla-V100-PCIE; 31.7 GB), real-time monitoring neural networks on four-GPU systems (Tesla-V100-PCIE; 31.7 GB), real-time monitoring of the model performance, and selection of the best performance model from the results.
The Total, Early_stage, and Late_stage prediction models were constructed using the corresponding Tra datasets and were evaluated using the corresponding Val datasets. The evaluation indices were the loss and accuracy: For the loss, y i has a value of 1 or 0 for positive and negative samples, respectively. p i represents the prediction probability of each record being a positive example. The loss function computes the prediction probability of the true value, so a lower loss value indicated a better performance [61]. pi denotes the prediction probability of a true value. For a positive true value, pi = pi; for a negative example, pi = 1 − pi. A positive example is given a large penalty when the probability of being a positive example is low; similarly, a negative example is given a large penalty when the probability of being a positive example is high. Differentiating the score For the accuracy, TP, FN, TN, and FP denote the numbers of true positives, false negatives, true negatives, and false positives, respectively. To test the independence of the dataset, McNemar's test was performed using a free algorithm on a website: https: //www2.ccrb.cuhk.edu.hk/stat/confidence%20interval/McNemar%20Test.htm (accessed on 8 September 2021) [62].

Results
In the present study, we employed the recently reported novel 3D cell culture system with a low molecular weight agar, LA717 that would promote uniform suspension of cells on low-attachment culture plates [63]. In this culture environment, normal cells are unable to proliferate and die or aggregate with neighboring cells over time, on the other hand, abnormal cells, such as cancer cells, form colonies through anchorage-independent growth (Figure 1a). We used MRC-5, normal cells as a hCTP model, and HeLa-GFP, abnormal cells as a cellular impurity model in the culture system. Positive areas (pArea) in which normal MRC-5 cells and cancer HeLa-GFP cells coexist in co-cultured wells and negative areas (nArea) in which MRC-5 cells alone exist in monoculture wells were excised from whole-well bright-field images over time in each well, as inputs for a DL algorithm (Figure 1b).
To construct the prediction models of contamination by other cell types in hCTP-cell culture systems, we trained the CNN on images of single and mixed cell types during continuous cell culture. We first analyzed the effects of the hyperparameters on the prediction performance of the constructed models. In DL with the stochastic gradient descent method, the dataset is divided into several subsets during training to reduce the effects of outliers. The BS specifies the number of samples in each training iteration [62,[64][65][66]. The gradients were averaged over each population of the Tra dataset, which was training dataset and divided into N samples. After the weight was updated according to the gradient, the next group was processed. When all groups were finally processed (i.e., all epochs), the order of the Tra dataset was randomly changed for learning. However, when a large BS is set for training, the DL shows a well-known generalization gap that remarkably degrades its generalization performance [67,68]. To optimize the hyperparameters of the DL for detecting HeLa-GFP contaminants co-cultured with normal human MRC-5 cells, all images containing pArea and nArea during the cell culture period were divided into three datasets: Total (Days 1-4), Early_stage (Days 1-2), and Late_stage (Days 3-4). The Total, Early_stage, and Late_stage models had BSs of 26 (5-350), 11 (100-200), and 24 (5-200), respectively (see Figure 2 and Tables S1-S3). The mean loss(Val) and Acc(Val) denote the loss and accuracy values, respectively, with the Val datasets; they were 0.718 ± 0.068 and 53.7% ± 11.8%, respectively, for Total; 1.268 ± 0.220 and 35.3% ± 3.1%, respectively, for Early_stage; and 0.660 ± 0.067 and 66.3% ± 3.9%, respectively, for Late_stage ( Figure 2, Tables S1-S3). The best loss(Val) and Acc(Val) performances were 0.649 at BS = 30 and 64.3% at BS = 7, respectively, for Total; 0.986 at BS = 180 and 40.0% at BS = 5, respectively, for Early_stage; and 0.623 at BS = 189 and 70.2% at BS = 189 or 190, respectively, for Late_stage ( Figure 2, Table 2, Table 3 and Tables S1-S3). Figure 3 plots the prediction performances as a function of the other hyperparameter (LR). For fine-tuning, we set the LR to 16 for Total (−log10[LR] = 1.40-7.00), 7 for Early_stage (−log10[LR] = 2.00-9.00), and 22 for Late_stage (−log10[LR] = 2.00-5.00). The mean loss(Val) and Acc(Val) were 0.690 ± 0.049 and 58.2% ± 11.2%, respectively, for Total; 1.099 ± 0.879 and 51.7% ± 15.8%, respectively, for Early_stage; and 0.634 ± 0.020 and 67.2% ± 4.4%, respectively, for Late_stage (Figure 3, Tables S1-S3). The best loss(Val) and Acc(Val) performances were 0.  Table 4, Table 5 and Tables S1-S3).       The prediction performances of five DL solver types-Adaptive Delta (AdaDelta), Adaptive gradient (AdaGrad), Adaptive Moment Estimation (Adam), Nesterv's accelerated gradient, and root mean square propagation (RMSprop)-were then evaluated at the optimal hyperparameter values. With the Early_stage dataset, RMSprop achieved a slightly lower loss(Val) value than the other solvers, but all five solvers obtained very similar Acc(Val) values (Figures 4a,b, Tables S2 and S3). The prediction performances were also evaluated for five policies: step down, exponential decay, inverse decay, polynomial decay, and sigmoidal decay in the DL hyperparameters. With the Late_stage dataset, the loss(Val) value was slightly lower under the sigmoidal decay policy than under the other policies, but all five policies yielded very similar Acc(Val) values (Figures 4c,d), Tables S2 and S3). To test the independence of the Val dataset, McNemar's chi-square test was applied to compare the accuracies among the different algorithm solvers and policies. The calculation results did not show significant differences in accuracy among the solvers or policies (p = 0.074). These results suggest that the accuracy and policies used for the prediction model may not affect the accuracy. On the contrary, an association study of Acc(Val) and loss(val) with BSs of 100-200 between Early-stage and Late-stage data indicated significant differences (Pc < 3.77 × 10 −20 for Acc(Val) and Pc < 1.02 × 10 −5 for loss(Val) according to Student's t-test with the Bonferroni correction). According to these results, the prediction performance was improved by evaluating images from the Late_stage da-  The prediction performances of five DL solver types-Adaptive Delta (AdaDelta), Adaptive gradient (AdaGrad), Adaptive Moment Estimation (Adam), Nesterv's accelerated gradient, and root mean square propagation (RMSprop)-were then evaluated at the optimal hyperparameter values. With the Early_stage dataset, RMSprop achieved a slightly lower loss(Val) value than the other solvers, but all five solvers obtained very similar Acc(Val) values (Figure 4a,b, Tables S2 and S3). The prediction performances were also evaluated for five policies: step down, exponential decay, inverse decay, polynomial decay, and sigmoidal decay in the DL hyperparameters. With the Late_stage dataset, the loss(Val) value was slightly lower under the sigmoidal decay policy than under the other policies, but all five policies yielded very similar Acc(Val) values (Figure 4c,d), Tables S2 and S3). To test the independence of the Val dataset, McNemar's chi-square test was applied to compare the accuracies among the different algorithm solvers and policies. The calculation results did not show significant differences in accuracy among the solvers or policies (p = 0.074). These results suggest that the accuracy and policies used for the prediction model may not affect the accuracy. On the contrary, an association study of Acc(Val) and loss(val) with BSs of 100-200 between Earlystage and Late-stage data indicated significant differences (Pc < 3.77 × 10 −20 for Acc(Val) and Pc < 1.02 × 10 −5 for loss(Val) according to Student's t-test with the Bonferroni correction). According to these results, the prediction performance was improved by evaluating images from the Late_stage dataset. This suggests that easily identifiable contaminant cells appear later in the culture period and that the prediction performance can be improved by acquiring a dataset containing cells of clearly different morphologies.

Discussion
Generalizing the learning results for time series prediction with test data is an important issue for DL. However, conventional methods have problems such as a poor generalization performance and suboptimal solutions. The generalization ability, which is the ability to predict future results of input data that have never been seen before, is deter-

Discussion
Generalizing the learning results for time series prediction with test data is an important issue for DL. However, conventional methods have problems such as a poor generalization performance and suboptimal solutions. The generalization ability, which is the ability to predict future results of input data that have never been seen before, is determined by the generalization error, which is the difference between the prediction errors with the training and test datasets. This demonstrates the importance of developing a prediction method with a small error. Although various predictive methods have been reported for DL, current mainstream approaches have some limitations. DL can be used to extract features automatically, but overfitting is likely to occur with fluctuations in training data, which increases the generalization error. Therefore, the dimensionality needs to be reduced to obtain information with good representativeness. The most widely used approach to dimensionality reduction and feature extraction is principal component analysis, which extracts a large amount of noise that is not related to prediction. Therefore, it is not always compatible with maximizing the generalization ability. A state-space model that infers the hidden state variables and parameters of the nonlinear generation process for the input signal is currently widely used for prediction. However, this model has drawbacks such as not being able to infer the generation process correctly because it may fall into a non-optimal solution (i.e., local minimum) when the underlying dynamics are unknown. Thus, traditional approaches are known to have large generalization errors or local minimum values, both of which increase the prediction error with new test data. Lin et al. [69] recently reported that, with sufficient training samples and high-dimensional inputs, the state variables, parameters, and dimensions of standard nonlinear generation processes can be identified with high accuracy. Their results suggest that their approach has high robustness to observed noise because generalization and feature extraction can be maintained with only limited training data. Because this system is easy to configure with a neural network and has a low computational cost, it is expected to be applicable to highly reliable and explainable AI with a mathematically guaranteed optimal generalization strategy when implemented in parallel with our constructed prediction model.
The application of the DL algorithms in our study to various fields of regenerative medicine requires addressing various problems such as classification accuracy reproduction, implementation difficulty, and learning times. Classification accuracy reproduction refers to the difficulty of knowing whether a performance level can be achieved until actual implementation. Implementation difficulty refers to the higher level of programming skills and mathematical knowledge required for DL than for other types of ML. DL is based on mathematical principles with varied backgrounds, and it takes a long time to understand. Finally, DL models take a long time to train. To reduce the learning time, GPUs have been applied to speeding up calculations [70][71][72][73].
In addition to implementation problems, some fundamental DL challenges remain. One goal for DL is to minimize the generalization error. However, this is not an easy task, because the tissues handled by DL are considered ill-posed problems (i.e., some of the required information to obtain the solution is lacking). By selecting the appropriate heuristics, information can be added to transform an ill-posed problem into a well-posed problem. Domain-specific heuristics are used for efficient solving, but if the algorithm is not clear, the correct answer cannot be obtained. For ill-posed problems where not all data can be collected, the criteria are unclear, and a formal and rigorous verification cannot be performed. Therefore, remembering that the results contain some uncertainties is important for DL implementation. Another issue with DL is the "no free lunch" theorem, which postulates that a good DL model is theoretically impossible for any problem, and if one model outperforms another, it is specialized for a particular problem. In particular, if the problem changes, the algorithm should change, and it is important to devise a solution that is specific to each problem based on preconditions and prerequisite knowledge as much as possible. The performance of ML methods including DL can be improved for specific problems by applying an inductive bias, which is prior knowledge or a hypothesis besides the training data.
Then, in what kind of problems does DL specialize? Simple and realistically sized neural networks can sufficiently approximate the following problems: (1) Low-degree polynomial models: any polynomial can be approximated by a neural network consisting of about four times as many neurons as the number of multiplications required for calculation. (2) Locality: a local Markov network can be approximated using a neural network consisting of a number of neurons proportional to the number of nodes. (3) Symmetry: CNNs that explicitly incorporate mobile and temporal universality can significantly reduce the number of parameters required for learning, which also greatly reduces the apparent complexity [74].
According to the "ugly duckling" theorem, it is theoretically impossible to classify or judge similarity without some prior knowledge or induction bias. This is conceptually similar to the no free lunch theorem, which argues that there is no universal one-size-fits-all ML model or search or optimization algorithm that can solve any problem efficiently. Meanwhile, the ugly duckling theorem argues that, without assumptions or prior knowledge, there is no optimal feature representation and feature set that will result in better classification performance. This raises the issue that objective or general-purpose classification is difficult. Therefore, this theorem also indicates that algorithms need to be formulated according to the classification problem to be solved.
Also, in general, DL has some problems, such as "overfitting or overtraining", that if the training period is too long or the training data is not typical, it will adapt to certain random features of the training data, that are unrelated and unwanted features. The state in which unknown test data cannot be adapted and be generalized will lead to a lock of generalization ability [75][76][77][78][79][80][81][82]. One of the causes is that the model is complicated and has too many degrees of freedom compared to the number of training data, such as too many parameters to fitting to the statistical model. The main strategies for suppressing the overfitting are the following (1) Increase the number of training data To make sure that training is a better generalization, the variation of training data is increased. For the purpose, there is also a method of extending the existing training data by generating a variety of image data with changing the saturation, brightness, and direction of one image. Thus, these approaches make it possible to acquire abundant data and build a flat learning model that does not cause the overfitting by making good use of limited data. However, in the process of the increase of training data, if only biased data is given, it will adversely affect the construction of the model. If a bias, which is a representation of the difference between the predicted results and measured value, is small, the prediction, which represents variability in prediction results, accuracy is high, but if the variance is too large, the versatility is low, and overfitting may occur. In order to build an appropriate prediction model that does not cause the overfitting, it is necessary to keep the balance between bias and variance constant.
(2) Model simplification Introduction complex and sophisticated algorithms allows for advanced analysis with the addition of various parameters, but at the same time increases the risk of the overfitting. Therefore, the simplest way to prevent the overfitting is to reduce the size of the model, i.e., the number of learnable parameters in the model, which is determined by the number of layers and units per layer. By probabilistically selecting the units of each layer of the multi-layer network and invalidating the units other than the selected ones, it is possible to create a "temporary network with a small degree of freedom". By training with this process, the effect of simplifying the model can be obtained.

(3) Regularization
There is a method of adding a regularization term to the error function to suppress the complexity and degree of freedom of the model and prevent overfitting. The most common regularizations are L1 regularization and L2 regularization. With L1 regularization, some parameters can be set to zero, which is similar in feature selection, resulting in a sparse model. Furthermore, if there are many zero, it can be expressed as a sparse matrix and can be calculated at high speed. On the other hand, L2 regularization is a method of approaching zero according to the size of the data and is characterized by being able to build a smooth model. Thus, while DL has grown into a program that can produce amazing output through long-term operation, proper tuning and verification by human must be continued to avoid the overfitting, because the overfitting can be prevented depending on the approach, and even after it occurs, DL can be updated to a normal model by troubleshooting in the correct way. Further, another of the challenges facing machine learning (ML) is the selection of appropriate ML models for classifying domain-specific data. Some of the main ML techniques are: (1) Support vector machine (SVM) The SVM is a model of unsupervised learning that can handle both analysis and regression, and mainly solves classification tasks, and can make predictions with high accuracy even for unknown data by maximizing the margin that draws a boundary so that the distance of data near the boundary is maximized.
(2) K-nearest neighbor (kNN) algorithm The kNN method plots the given training data on the vector space, and if unknown data is obtained, acquire any k pieces in order of distance from them, and estimate the class to which the data belongs by the majority vote.
(3) Naïve Bayes classifier This method is classifier to determine which category a certain data belongs to and makes predictions according to Bayesian discrimination rules based on a simple stochastic model.
In addition, it was reported that comparison of the DL and traditional ML models for classification of normal and abnormal cells by using seven different ML algorithms such as logistic regression, kNN, SVM, naïve Bayes, random forest, XGBoost, and deep neural network. Of these algorithms, deep neural network indicated outperformance compared with other algorithms [83].
One of the major concerns with the manufacture of transplanted cells used in regenerative medicine is the risk of cross-contamination, which causes accidental mixing of cell samples. Control of the manufacturing process requires eliminating the existence of harmful impurities such as cancer cells and ensuring the quality of the transplanted cells. A soft agar colony formation (SACF) test is commonly used to evaluate the anchorageindependent growth, which is one of characteristics of malignant transformed cells such as cancer cells. This test can detect the presence of abnormal cells in normal cells in a short time. Our group [9] developed a digital assay screening system that detects cancer cells among normal cells with better sensitivity and efficiency than the conventional SACF test. However, this approach requires a skillful technique to make multiple layers of viscous agar medium with different concentrations in many wells. Therefore, in the present study, we applied the novel 3D cell culture system with 0.03% LA717 containing medium and low-attachment culture plates, as a culture method to evaluate the anchorage-independent growth of cells more easily and effectively than the conventional soft agar culture. The co-culture of MRC-5 and HeLa-GFP in this method showed efficient growth of HeLa-GFP only, suggesting that this culture system is useful as a model for evaluating contamination of normal cells by cellular impurities such as cancer cells equivalent to HeLa cells. Through this culture system, a large number of image data over time were acquired by time-lapse imaging, which is necessary for developing a DL-based prediction model of cellular impurity contamination.

Conclusions
We constructed DL-based prediction models that detect the contamination of human embryonic lung fibroblasts (MRC-5 cells) with human cervical cancer (HeLa-GFP) cells. The cultures were incubated for 4 days. The image data collected from Days 1 to 4 were divided into two groups: Early_stage (Days 1 and 2) and Late_stage (Days 3 and 4). The prediction performance of the DL algorithm was higher after training on the Late_stage dataset than on the Early_stage dataset. The performance was further improved by optimizing the BS and LR hyperparameters in the DL. This method is first reports that the implement of the DL for the classification of the contamination of cancer cells in hCTPs, that expected to provide high-throughput and low-cost alternatives to traditional methods; especially, the culture cell conditions can be adjusted during the generation process of hCTPs.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/app11209755/s1, Table S1: loss(Val) and Acc(Val) in total dataset with BS and LR, Table S2: loss(Val) and Acc(Val) in early_stage dataset with BS and LR, Table S3: loss(Val) and Acc(Val) in late_stage dataset with BS and LR.