High Accuracy Classification of Developmental Toxicants by In Vitro Tests of Human Neuroepithelial and Cardiomyoblast Differentiation

Human-relevant tests to predict developmental toxicity are urgently needed. A currently intensively studied approach makes use of differentiating human stem cells to measure chemically-induced deviations of the normal developmental program, as in a recent study based on cardiac differentiation (UKK2). Here, we (i) tested the performance of an assay modeling neuroepithelial differentiation (UKN1), and (ii) explored the benefit of combining assays (UKN1 and UKK2) that model different germ layers. Substance-induced cytotoxicity and genome-wide expression profiles of 23 teratogens and 16 non-teratogens at human-relevant concentrations were generated and used for statistical classification, resulting in accuracies of the UKN1 assay of 87–90%. A comparison to the UKK2 assay (accuracies of 90–92%) showed, in general, a high congruence in compound classification that may be explained by the fact that there was a high overlap of signaling pathways. Finally, the combination of both assays improved the prediction compared to each test alone, and reached accuracies of 92–95%. Although some compounds were misclassified by the individual tests, we conclude that UKN1 and UKK2 can be used for a reliable detection of teratogens in vitro, and that a combined analysis of tests that differentiate hiPSCs into different germ layers and cell types can even further improve the prediction of developmental toxicants.


Introduction
Testing for developmental toxicity in vivo, for example, by two-generation reproduction studies, is cost-intensive and requires large numbers of experimental animals [1,2]. Therefore, a significant advancement in this field would be if developmental toxicity could reliably be identified in vitro. Recently, much effort has been invested into establishing SBAD2 cells, a human induced pluripotent stem cell line that was originally produced for the StemBANCC project [23], were received from Prof. Marcel Leist (University of Konstanz). The Leibniz-Institute DSMZ (German Collection of Microorganisms and Cell Cultures) validated the cell identity by short tandem repeat profiling.
For the UKN1 test system, cells were cultured in Essential 8 TM (E8) medium (Thermo Fisher Scientific Inc., Waltham, MA, USA) on Biolaminin 521 LN (BioLamina, Sweden) coated culture vessels and in the Cellartis ® DEF-CS TM 500 Culture System (Takara Bio, Japan), according to the manufacturers' guidelines. The cells were cultured following a three-or four-day protocol, i.e., the cells were seeded at a density of 20,000 cells/cm 2 or 12,000 cells/cm 2 , respectively, and cultured (5% CO 2 , 37 • C) for three or four days until confluency. The medium was changed daily. For dissociation of the cells during each passaging, the dissociation reagent TrypLE TM Select (Thermo Fisher Scientific Inc., Waltham, MA, USA) was used. When cells were passaged in Essential 8 TM medium, 10 µM Rho-kinase inhibitor Y-27632 (Cell Guidance Systems, Cambridge, UK) was added to the medium for the first 24 h.

Neuroepithelial Differentiation of hiPSCs and Compound Exposure
The differentiation of SBAD2 hiPSCs to neuroepithelial precursor cells [24] was performed using the UKN1 protocol as published before in [24], with minor changes. Briefly, hiPSCs were seeded in 1 mL pluripotent stem cell (PSC) medium (spiked with a Rho-kinase inhibitor (ROCKi)) per well on extracellular matrix protein-coated 12-well-plates, at a density of 12,000-24,000 cells/cm 2 on day −3. On day −2 and day −1, the PSC medium was refreshed. On days 0, 1, and 2, the medium was changed to a differentiation medium, which was spiked with 21.6 µM SB431542, 0.64 µM dorsomorphin, 35 ng/mL noggin, and 0.1% DMSO to induce neural differentiation. At the same time, the cells were incubated (5% CO 2 , 37 • C) with the test compounds at 1-fold C max and 1.67-, 10-, or 20-fold C max concentrations for a total of 96 h, as well as the vehicle alone (0.1% DMSO). The compounds leflunomide and teriflunomide were tested at a DMSO concentration of 0.5% Cells 2022, 11, 3404 4 of 25 and compared to a 0.5% DMSO vehicle control. On day 4, the medium was changed to a mixed medium of 75% differentiation medium/25% N2-S and the same concentrations of SB431542, dorsomorphin, and noggin as given above. On day 6, cells were collected for RNA extraction. When no adherent cells were visible upon microscopic inspection, or if the harvested amount of RNA was below 2 µg per well of the 12-well plate, the respective test compound concentration was considered as cytotoxic. A more detailed method description is given in the Supplemental Information.
For each non-cytotoxic compound and concentration (further named 'condition') in the UKN1 test, three independent biological replicates were generated. Exceptions from this were as follows: for all 1-fold and 20-fold C max samples of ampicillin, ascorbic acid, buspirone, chlorpheniramine, doxylamine, folic acid, magnesium chloride, methicillin, and valproic acid, as well as for all 1-fold C max samples of famotidine, isotretinoin, methotrexate, paroxetine, and thalidomide, four biological replicates were available. Further exceptions were levothyroxine and methylmercury at the 20-fold C max , where two biological replicates were generated. For UKK2, sample composition was as described in [14]. Briefly, for all noncytotoxic conditions, three independent biological replicates were generated. Exceptions were as follows: 9-cis-retinoic acid at 20-fold C max , where two biological replicates were generated, as well as isotretinoin at 1-fold C max and thalidomide at 1-fold and 20-fold C max , where six replicates were available in each case.

Affymetrix Microarray Analysis
Total RNA was isolated from sonicated cell lysates with the ExtractMe Total RNA-Kit (Blirt, Gdansk, Poland) according to the manufacturer's instructions. A NanoDrop2000 instrument (Thermo Fisher Scientific Inc., Waltham, MA, USA) was used to assess the concentration and purity of the isolated RNA. Microarray gene expression studies were performed on Affymetrix Human Genome U133 Plus 2.0 arrays (Affymetrix, Santa Clara, CA, USA) as described previously in [14].
In order to avoid batch effects, the samples were normalized with respect to the control samples. For the majority of samples in UKN1, matched control samples were available, such that differences between the expression values for the non-control samples and the corresponding matched controls were calculated. For the samples where this was not possible, a batch-wise mean of the corresponding control samples was calculated and subtracted from the expression values of the non-control samples. This was the case for all samples of entinostat, lithium chloride, methylmercury, sucralose, and trichostatin A of UKN1, which were compared to the mean of three biological replicates of corresponding control samples, and for all samples of UKK2.

PCA Plots
The principal component analyses (PCA) were based on the normalized expression values, as described above. The replicates for each condition, i.e., for each compound and each concentration, were summarized PS-wise by calculating the mean value.

Limma Analysis
The R-package limma [29] was used for the calculation of differential expression. This is an empirical Bayes method, where the complete set of all PS was considered for the adjustment of the variance estimates of single PS. The resulting moderated t-test is abbreviated here as 'limma t-test.' Resulting p-values were multiplicity adjusted to control the false discovery rate (FDR) by the Benjamini-Hochberg procedure [30]. The resulting gene list for each compound comprises estimates for fold-change (FC), log2 fold-change, and the p-values of the limma t-test (unadjusted and FDR-adjusted).

Classification Based on the Number of Significant Probe Sets (SPS-Procedure)
One classification of the compounds was obtained by using the number of significant probe sets (SPS). A probe set was considered to be a SPS if both the FDR-adjusted p-value from the limma t-test was smaller than 0.05 and the absolute value of the FC was larger than 2. The number of SPS was determined for each condition. The highest number of SPS across all conditions was identified for each test system (UKN1 and UKK2). Cytotoxic conditions were assigned test system-wise with the highest number plus five.
Classification based on these SPS numbers was then conducted as follows: all conditions with the number of SPS higher than a defined threshold were considered to be test-positive, and all conditions with the number of SPS lower than this threshold were considered to be test-negative. This was done test system-wise and concentration-wise, meaning that SPS-numbers of substances at the 1-fold C max were compared to 1-fold-C max thresholds, and of substances at the 20-fold C max to 20-fold C max thresholds, for each test individually. Each threshold was defined as a number of SPS, where the highest accuracy (see below) was achieved for the compound classification. If two or more thresholds led to the same highest accuracy, a threshold was chosen where the highest accuracy and sensitivity were achieved. If two or more thresholds led to the same highest accuracy and sensitivity, a threshold was chosen where the highest accuracy, sensitivity, and specificity were achieved.
The quality of the classification was assessed by calculating the following measures: sensitivity (true positive rate) was calculated as the number of true positives divided by the sum of true positives and false negatives; specificity (true negative rate) was calculated as the number of true negatives divided by the sum of true negatives and false positives; and accuracy, which was calculated as the proportion of correctly classified conditions. The area under the curve (AUC) was based on the receiver-operator characteristic curve (ROC-curve), where this ROC-curve was calculated as follows: for each possible threshold, the sensitivity and specificity were calculated. The ROC-curve was obtained by plotting all pairs of (1-specificity) and sensitivity against each other. The AUC was determined as the area under this ROC-curve.
For all substances, SPS numbers were available at the 1-fold and 20-fold C max , except for leflunomide (LFL), phenytoin (PHE), teriflunomide (TER), and vismodegib (VIS). For these four substances, solubility limits were exceeded at the 20-fold C max so that only SPS numbers at the 1-fold C max were available. In order to integrate these compounds in the classification at the 20-fold C max , the SPS numbers from the 1-fold C max were used instead. Implications of this approach are addressed in the discussion.

Classification Based on Penalized Logistic Regression (Top-1000-Procedure)
Based on the normalized gene expression values, a second classification procedure making use of penalized logistic regression was performed. For this, a leave-one-out cross-validation approach was chosen, where in an iteration over the non-cytotoxic compounds, all samples (i.e., all replicates for the 1-fold and 20-fold C max ) corresponding to one compound were left out of the dataset in the respective iteration. The 1000 PS with highest variance for the normalized expression values across all samples of the remaining compounds were selected. An 1-regularized logistic regression-based classifier was trained on this dataset and evaluated on the compound that was left out. This yielded a probability for teratogenicity for each replicate at the 1-fold and 20-fold C max of the left-out compound. Over all replicates corresponding to the same concentration, the probabilities were summarized via the mean value, resulting in one average probability for the 1-fold C max and one for the 20-fold C max for each compound. The penalty parameter 'lambda' in the 1-regularized logistic regression was optimized via 10-fold cross-validation in order to minimize the mean cross-validated error. Cytotoxic conditions were assigned with a probability of 1.
Using these predicted probabilities for teratogenicity, the classification was conducted as described above for the SPS-procedure, where SPS numbers were used instead of the predicted probabilities. Quality assessment of the classification and integration of the compounds LFL, PHE, TER, and VIS for the 20-fold C max classification were performed as described above as well.
The R-package mlr [31] was used as the framework for the classification tasks, together with the package glmnet [32] for the calculation of the specific classifier.

Combination of the UKN1 and UKK2 Test Systems
The results obtained in the two test systems, UKN1 and UKK2, were combined to classify the conditions in a complementary approach. For both procedures (SPS-procedure and top-1000-procedure) the respective numbers of SPS or predicted probabilities for teratogenicity were compared condition-wise between the two test systems. Cytotoxic conditions were considered as described above for the SPS-and top-1000-procedure. For the combinations 'min' and 'max', the lower and the higher value was used, respectively. For the combination 'mean,' the respective mean value of the two SPS numbers or of the two probabilities of the two test systems was calculated. Definition of the thresholds and calculation of the AUC, accuracy, sensitivity and specificity were conducted as explained above for the SPS-and top-1000-procedure, except for the 'gene only' variant, where in 'mean' and 'max,' all conditions were removed that were cytotoxic in at least one test system.

Venn Diagrams, Top Genes, GO Group Overrepresentation and KEGG Pathway Enrichment
Significant probe sets in the UKN1 and UKK2 tests were used to identify top genes, overrepresented Gene Ontology (GO) groups, and enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. This was conducted a. within the UKN1 test system, and b. in a comparative manner between the UKN1 and UKK2 test systems. For each a. and b., six analyses were performed, where either all probe sets, only upregulated, or only downregulated SPS were considered, at 1-fold or 20-fold C max .
At first, Venn diagrams were created to compare sets of SPS that were deregulated by non-teratogens and by teratogens. For all analyses, only such SPS were considered that were deregulated by teratogens; SPS that were exclusively deregulated by non-teratogens were not considered. For b, the SPS were further separated into three subgroups (further named 'gene set'): the 'overlap' gene set, consisting of all SPS that were deregulated in both test systems, as well as the 'UKN1 and 'UKK2 gene set, consisting of SPS that were exclusively deregulated by the UKN1 and UKK2 test system, respectively.
For each gene set, ranked top lists of the corresponding PS and genes were determined. As the first level for the ranking, the number of compounds that led to differential expression was determined for each PS. In the 'overlap' set, the compounds were separately counted for the UKN1 and the UKK2 test system and summed up. For the second level of the ranking, the arithmetic mean of the log2 fold-changes across the compounds that led to differential expression was calculated. In the 'overlap' set, the arithmetic mean was calculated on the basis of absolute log2 fold-changes. For the translation of the top PS into top genes, only the highest ranked PS for each gene was considered. Lower ranked PS representing the same gene were removed. Additionally, for the displayed top-10-lists, only PS with the suffixes _at, _a_at, and _s_at were considered.
Overrepresentation analyses were conducted as follows: for each gene set, SPS were assigned to GO groups according to their biological process. It was statistically tested whether more PS in the respective GO groups were differentially expressed than expected at random, using Fisher's exact test. This procedure was conducted in a bottom-up approach ('elim' approach) with respect to the GO group hierarchy. PS that were already contained in a more specific GO group were not considered again in more general groups [33]. For b, significant GO groups for each gene set were determined, where a GO group was called significant if the FDR-adjusted p-value of the 'elim' method was smaller than 0.05, and further analyzed with respect to their appearance in the UKN1 and UKK2 system using Venn diagrams.
Additionally, SPS were assigned to KEGG pathways. Fisher's exact test was used to statistically test whether more PS in the respective pathway were differentially expressed than expected at random.
GO group analyses were conducted using the R package topGO [34], and KEGG pathway analyses were conducted using the R package clusterProfiler [35].

Classification Based on Seven Significantly Deregulated Top Genes in RT-qPCR
For the measurement of gene expression changes in UKN1 with RT-qPCR, the same RNA samples as for the microarray studies were used. Therefore, RNA was at first transcribed into complementary deoxyribonucleic acid (cDNA) with the 'High-Capacity cDNA Reverse Transcription Kit' (Thermo Fisher Scientific Inc., Waltham, MA, USA) following the manufacturer's instructions. Then, the RT-qPCR measurements were performed using QuantiFast SYBR ® Green PCR master mix and QuantiTect Primer Assays (Qiagen, Germany) for the genes CTHRC1, LMAN1, PNCK, RBM24, SEMA3C, SLIT2, and ZNF385B in an ABI 7500 real-time PCR system (Thermo Fisher Scientific Inc., Waltham, MA, USA). Expression values were normalized to the housekeeping gene TBP, for which self-designed primers were used (5 -> 3 ; forward: GGGCACCACTCCACTGTATC; reverse: GCAGCAAACCGCTTGGGATTATATTCG; Eurofins, Luxembourg). Fold-changes were obtained by using the 2 −∆∆CT -method [36]. Gene expression changes with an absolute fold-change larger than 2 and a p-value smaller than 0.05 were considered to be significant. The level of significance was obtained by application of a two-sided one-sample t-test using the software Excel (Microsoft, USA). Cytotoxic conditions were integrated as follows: for upregulated genes (CTHRC1, SMEA3C, and SLIT2), the ∆∆C T mean-value was set as 2 with a p-value of 0.01; for downregulated genes (LMAN1, PNCK, RBM24, and ZNF385B) the ∆∆C T mean-value was set as −1.6 with a p-value of 0.01. In order to classify the results based on significant gene expression changes (further called 'SPS-like'), the number of genes where a substance caused a significant deregulation was counted for each condition. If that number was higher than 0, the condition was test-positive, otherwise it was test-negative. Furthermore, a penalized logistic regression was performed as described above for the top-1000-procedure in order to classify the results, where in each iteration of the leave-one-out procedure, all seven measured genes were considered (further named 'Top-1000-like'). The identification of false and true negatives and positives, as well as the calculation of the AUC, accuracy, sensitivity, and specificity was done as explained above for the SPS-and top-1000-procedure.
For all substances, 3-4 biological replicates were considered, except for acitretin, entinostat, lithium chloride, and trichostatin A at 1-fold C max , as well as dextromethorphan at 20-fold C max , where only two biological replicates were analyzed. Since no biological replicates were available for classification of levothyroxine at 20-fold C max, the same values as for 1-fold C max were considered instead. a U.S. Food and Drug Administration (FDA) and Australian Therapeutic Goods Administration (TGA) pregnancy categories: A = compounds are safe to use during pregnancy, proven by well-controlled studies in humans or abundant data from pregnant women; B = compounds are considered to be safe, but they lack sufficient human data; C and D = compounds showed little or some evidence of teratogenicity in humans or animals; X = compounds with known teratogenic activity in humans or with a suspected high teratogenic potential based on animal experiments; n/a = not available; b maximal plasma or blood concentration after administration of therapeutic compound dose; c Carbamazepine and VPA were tested at 10-fold and 1.67-fold C max , respectively, instead of 20-fold C max ; leflunomide, phenytoin, teriflunomide, and vismodegib were only tested at 1-fold C max due to limited solubility; d Retinol was considered as a non-teratogen at 1-fold C max and as a teratogen at 20-fold C max . Rationale is given in the supplemental information.

Gene Expression Profiling
Two concentrations (1-fold C max and 20-fold C max ) of 23 teratogenic and 16 nonteratogenic compounds (Table 1) were analyzed using the hiPSC-based UKN1 test (Figure 1), where hiPSCs were differentiated to neuroepithelial precursor cells while being exposed to test compounds with the aim of detecting developmental toxicity. The selected concentrations corresponded to 1-fold and 20-fold C max reported in human blood after therapeutic doses (Table 1). Substance-induced gene expression changes were detected by microarrays after an incubation period of 4 days with the test compounds, followed by a washout period (without test compounds) of 2 days (Figure 1). as explained above for the SPS-and top-1000-procedure.
For all substances, 3-4 biological replicates were considered, except for acitretin, entinostat, lithium chloride, and trichostatin A at 1-fold Cmax, as well as dextromethorphan at 20-fold Cmax, where only two biological replicates were analyzed. Since no biological replicates were available for classification of levothyroxine at 20-fold Cmax, the same values as for 1-fold Cmax were considered instead.

Gene Expression Profiling
Two concentrations (1-fold Cmax and 20-fold Cmax) of 23 teratogenic and 16 non-teratogenic compounds (Table 1) were analyzed using the hiPSC-based UKN1 test (Figure 1), where hiPSCs were differentiated to neuroepithelial precursor cells while being exposed to test compounds with the aim of detecting developmental toxicity. The selected concentrations corresponded to 1-fold and 20-fold Cmax reported in human blood after therapeutic doses (Table 1). Substance-induced gene expression changes were detected by microarrays after an incubation period of 4 days with the test compounds, followed by a washout period (without test compounds) of 2 days (Figure 1).

Figure 1.
Schematic representation of the UKN1-test (modified from [20,37]). The overview scheme depicts the differentiation protocol, important experimental steps, and the principal of the toxicity assay. In the pluripotency phase (day -3 to 0), hiPSCs were cultured in a pluripotent stem cell (PSC) medium to maintain their pluripotent state. Factors that inhibited Rho-kinase (ROCKi) were additionally given on the day of seeding (day −3) to support the survival of hiPSCs seeded as single cells on extracellular matrix proteins. From day 0 onwards, the change to a differentiation medium spiked with SB431542, dorsomorphin, and noggin initiated neuroectodermal differentiation of the cells. Simultaneously, cells were exposed to test compounds for a total of 96 h. On day 4, substances were withdrawn and addition of 25% N2-S further enhanced the neural differentiation process. On day 6, compound-induced cytotoxicity was determined and the cells were harvested for gene array analysis. Media changes were conducted as indicated (double arrows) on the days −2, −1, 0, 1, 2, and 4.  [20,37]). The overview scheme depicts the differentiation protocol, important experimental steps, and the principal of the toxicity assay. In the pluripotency phase (day −3 to 0), hiPSCs were cultured in a pluripotent stem cell (PSC) medium to maintain their pluripotent state. Factors that inhibited Rho-kinase (ROCKi) were additionally given on the day of seeding (day −3) to support the survival of hiPSCs seeded as single cells on extracellular matrix proteins. From day 0 onwards, the change to a differentiation medium spiked with SB431542, dorsomorphin, and noggin initiated neuroectodermal differentiation of the cells. Simultaneously, cells were exposed to test compounds for a total of 96 h. On day 4, substances were withdrawn and addition of 25% N2-S further enhanced the neural differentiation process. On day 6, compound-induced cytotoxicity was determined and the cells were harvested for gene array analysis. Media changes were conducted as indicated (double arrows) on the days −2, −1, 0, 1, 2, and 4.
In a principal component analysis (PCA) considering all 54,675 analyzed probe sets (Figure 2A), as well as the 100 probe sets with the highest variance ( Figure 2B), all nonteratogens except for ascorbic acid (ASC, abbreviations defined in Table 1) at the 20-fold C max formed a narrow cluster, which was intermixed with 11 of 25 non-cytotoxic teratogenic conditions, such as phenytoin (PHE) or methylmercury (MEM), which deregulated either none or only a small number of probe sets ( Table 2). The high percentage of explained variance by PC1 and PC2 (82.24%) in the top-100-PCA suggests that only a small subset of genes is sufficient to identify teratogens that cause major gene expression changes.
genic conditions, such as phenytoin (PHE) or methylmercury (MEM), which deregulated either none or only a small number of probe sets ( Table 2). The high percentage of explained variance by PC1 and PC2 (82.24%) in the top-100-PCA suggests that only a small subset of genes is sufficient to identify teratogens that cause major gene expression changes.   Table 1. Table 2. Cytotoxicity and number of significantly deregulated probe sets in compound-exposed cells in the UKN1 test. Non-teratogens Yes, if the compound was highly cytotoxic; no, if the compound showed no cytotoxicity; c Gene array-probe sets that were deregulated with an FDR-adjusted p-value < 0.05 and an absolute fold-change > 2 compared to untreated control cells; d Carbamazepine and VPA were tested at 10-fold and 1.67-fold C max , respectively, instead of 20-fold C max ; leflunomide, phenytoin, teriflunomide, and vismodegib were only tested at 1-fold C max due to limited solubility; e Retinol was considered as a non-teratogen at 1-fold C max and as a teratogen at 20-fold C max . Rationale is given in the supplemental information.

Compounds
Genome-wide expression changes were illustrated in volcano plots for a representative set of non-teratogenic and teratogenic test compounds (Figure 3). Plots of all compounds and concentrations (further named 'conditions') are available in the Supplemental Infor-mation. In subsequent analyses, all probe sets that were at least 2-fold deregulated and statistically significant with a false discovery rate (FDR) adjusted p-value < 0.05 were identified. In general, a large number of significantly deregulated probe sets (SPS) was obtained for many teratogens, whereas none or only a few were observed for the non-teratogens ( Table 2). Raw data are available in the Gene Expression Omnibus (GEO) database under the accession number GSE209962. Genome-wide expression changes were illustrated in volcano plots for a representative set of non-teratogenic and teratogenic test compounds (Figure 3). Plots of all compounds and concentrations (further named 'conditions') are available in the Supplemental Information. In subsequent analyses, all probe sets that were at least 2-fold deregulated and statistically significant with a false discovery rate (FDR) adjusted p-value < 0.05 were identified. In general, a large number of significantly deregulated probe sets (SPS) was obtained for many teratogens, whereas none or only a few were observed for the nonteratogens (Table 2). Raw data are available in the Gene Expression Omnibus (GEO) database under the accession number GSE209962. Figure 3. Volcano plots of deregulated probe sets of selected test compounds in the UKN1 test. Volcano plots show genome-wide gene expression changes in substance-exposed SBAD2 cells for a representative subset of known teratogens and non-teratogens at therapeutic 1-fold Cmax concentrations. Each dot represents one out of 54,675 probe sets from the Affymetrix gene chips. The foldchange of the differentially-expressed probe sets in substance-exposed cells is given on the x-axis in log2-values, and the corresponding p-values of the limma-analyses are given on the y-axis in negative log10-values. Red dots represent probe sets with a statistically significant, FDR-adjusted p-value < 0.05 and an absolute fold-change > 2. The numbers of up-and downregulated red-dot-probe sets are indicated. Volcano plots show genome-wide gene expression changes in substance-exposed SBAD2 cells for a representative subset of known teratogens and non-teratogens at therapeutic 1-fold C max concentrations. Each dot represents one out of 54,675 probe sets from the Affymetrix gene chips. The fold-change of the differentially-expressed probe sets in substance-exposed cells is given on the x-axis in log2-values, and the corresponding p-values of the limma-analyses are given on the y-axis in negative log10-values. Red dots represent probe sets with a statistically significant, FDR-adjusted p-value < 0.05 and an absolute fold-change > 2. The numbers of up-and downregulated red-dot-probe sets are indicated.

Gene Expression-Based Classification to Identify Teratogens and Non-Teratogens by the UKN1 Test
We used two techniques to classify the test compounds analyzed by the UKN1 test: (i) the 'SPS-procedure' that is exclusively based on the number of SPS, and (ii) the 'top-1000-procedure', a penalized logistic regression-technique based on the 1000 probe sets with the highest variance and leave-one-out-cross-validation ( Figure 4A,D), as previously described [14]. A compound was classified as test-positive or test-negative if the test result, i.e., SPS number in the SPS-procedure and the predicted probability for teratogenicity in the top-1000-procedure, was above or below a defined threshold. Moreover, cytotoxicity was included by considering the in vitro result as test-positive when the tested concentration was cytotoxic. Both procedures identified most teratogens as test-positive and most of the non-teratogens as test-negative, resulting in a high rate of true positives and true negatives (Supplementary Table S1).
The top-1000-procedure was consistently more sensitive than the SPS-procedure, i.e., it classified more teratogenic compounds as true positives, and it also obtained higher AUC-values (Table 3). At the 20-fold C max concentration, it reached the highest values for the AUC (0.95) and sensitivity (0.92) compared to the SPS-procedure (0.90 and 0.83, respectively). However, the non-teratogens ASC, diphenhydramine (DPH), and sucralose (SUC) were misclassified, as well as the teratogens thalidomide (THD) and vismodegib (VIS) ( Table 4). In contrast, the SPS-procedure was consistently more specific than the top-1000-procedure (Table 3). At 20-fold C max , all non-teratogens were correctly classified, but four of the teratogens (MEM, PHE, THD, and VIS) were not identified as test-positives (Table 4). A comprehensive overview of the classification results (true/false positive; true/false negative) and the predicted probabilities of all compounds in all tests can be found in Supplementary Tables S1 and S2, respectively.   Table 1). 1-fold-C max conditions are indicated with black dots, 20-fold C max conditions with black triangles. Grey dots represent the numbers of SPS at 1-fold C max of LFL, PHE, TER, and VIS, which were compared to 20-fold C max thresholds. For the UKK2 test, SPS numbers were adapted from Cherianidou et al. 2022, but retinol at 20-fold C max was considered as a teratogen. SPS numbers above or below the thresholds T 1× and T 20× (red dashed lines) for 1-fold and 20-fold C max conditions, respectively, were considered to be test-positive or test-negative. Cytotoxic conditions were considered to be test-positive and were assigned with a high number of SPS (UKN1: 4318; UKK2: 4257). Thresholds  Cytotoxicity = Only cytotoxicity data were considered for the calculation of the metrics, i.e., cytotoxic conditions were considered as positive and non-cytotoxic conditions as negative test results. Gene expression = only gene expression data were considered for the calculation of the metrics. Cytotoxicity and gene expression = all data for cytotoxicity, as well as for gene expression, were considered for the calculation of the metrics. AUC (area-undercurve) = for each possible cut-off used as threshold, predictions were made for each of the conditions based on which sensitivity and specificity were calculated. The ROC-curve (receiver operator characteristic) was obtained by plotting all pairs of (1-specificity) and sensitivity against each other. The AUC was determined as the area under this ROC-curve. Accuracy = ratio of correct predictions ((true negatives and positives)/(true and false negatives and positives)) (Supplementary Table S1). Sensitivity = ratio of detected teratogens (true positives/(false negatives + true positives)) (Supplementary Table S1). Specificity = ratio of detected non-teratogens (true negatives/(true negatives + false positives)) (Supplementary Table S1. a Including 10-fold C max carbamazepine, 1.67-fold C max VPA, and 1-fold C max samples of leflunomide, phenytoin, teriflunomide, and vismodegib. RT-qPCR metrics used 1-fold C max results instead of 20-fold C max results for levothyroxine. Bold = best metrices for each test for the SPSand top-1000-procedure and RT-qPCR.  Ampicillin  AMP  TN  TN  TN  TN  TN  TN  TN  TN  Ascorbic acid  ASC  TN  TN  TN  FP  TN  TN  TN  FP  Buspirone  BSP  TN  TN  TN  TN  TN  TN  TN  TN  Chlorpheniramine  CPA  TN  TN  TN  TN  TN  TN  TN  TN  Dextromethorphan  DEX  TN  TN  TN  TN  TN  TN  TN  TN  Diphenhydramine  DPH  TN  TN  TN  FP  FP  FP  TN  TN  Doxylamine  DOA  TN  TN  TN  TN  TN  TN  TN  TN  Famotidine  FAM  TN  TN  TN  TN  TN  TN  TN  TN  Folic acid  FOA  TN  TN  TN  TN  TN  TN  TN  TN  Levothyroxine  LEV  TN  TN  TN  TN  TN  TN  TN e  TN e  Liothyronine  LIO  TN  TN  TN  TN  TN  TN  TN  FP  Magnesium  chloride  MAG  TN  TN  TN  TN  TN  TN  TN  TN   Methicillin  MET  TN  TN  TN  TN  TN  TN  TN  TN  Ranitidine  RAN  TN  TN  TN  TN  TN  TN  TN  TN  Retinol SUC  TN  TN  TN  FP  FP  FP  TN  TN  Teratogens  9-cis-retinoic acid  9RA  TP  TP  TP  TP  TP  TP  TP  TP  Acitretin  ACI  TP  TP  TP  TP  TP  TP  TP  TP  Actinomycin D  ACD  TP  TP  TP  TP  TP  TP  TP  TP  Atorvastatin  ATO  TP  FN  TP  TP  FN  TP  TP  TP  Carbamazepine  CMZ  TP b  TP  TP b  TP b  TP  TP b  TP b  TP b  Doxorubicin  DXR  TP  TP  TP  TP  TP  TP  TP  TP  Entinostat  ENT  TP  TP  TP  TP  TP  TP  TP  TP  Favipiravir  FPV  TP  FN  TP  TP  TP  TP  TP  TP  Isotretinoin  ISO  TP  TP  TP  TP  TP  TP  TP  TP  Leflunomide  LFL  TP c  TP  TP c  TP c  TP  TP c  TP c  TP c  Lithium chloride  LTH  TP  TP  TP  TP  TP  TP  TP  TP  Methotrexate  MTX  TP  TP  TP  TP  TP  TP  TP  TP  Methylmercury  MEM  FN  TP  FN  TP  TP  TP  FN  FN  Panobinostat  PAN  TP  TP  TP  TP  TP  TP  TP  TP  Paroxetine  PAX  TP  TP  TP  TP  TP  TP  TP Teriflunomide  TER  TP c  TP  TP c  TP c  TP  TP c  TP c  TP c  Thalidomide  THD  FN  TP  TP  FN  TP  TP  FN  TP  Trichostatin A  TSA  TP  TP  TP  TP  TP  TP  TP  TP  Valproic acid  VPA  TP b  TP  TP b  TP b  TP  TP b  TP b  TP b  Vinblastine  VIN  TP  TP  TP  TP  TP  TP  TP  TP  Vismodegib  VIS  FN c  FN  FN c  FN c  TP  TP c  FN c  FN c  Vorinostat  VST  TP  TP  TP  TP  TP  TP  TP  TP   TN  and VPA were tested at 10-fold and 1.67-fold C max , respectively, instead of 20-fold C max ; c Due to a limited solubility of LFL, PHE, TER and VIS, the highest tested concentration was 1-fold C max ; here, SPS-numbers and predicted probabilities for teratogenicity obtained at 1-fold C max were used to classify the compounds compared to the 20-fold C max threshold. See methods and discussion for further details; d Retinol was considered as a nonteratogen at 1-fold C max and as a teratogen at 20-fold C max . Rationale is given in the supplemental information; e RT-qPCR measurements were not available for levothyroxine at 20-fold C max ; instead, the RT-qPCR results of levothyroxine at 1-fold C max were used to classify levothyroxine at 20-fold C max .

Biological Interpretation of Genes Differentially Expressed in the UKN1 Test
Analysis of the significantly altered probe sets at 20-fold C max showed that a total of 7647 (7552 + 95) different PS were significantly influenced by the 23 teratogens, while the 16 non-teratogenic compounds only altered the expression of 100 PS ( Figure 5A). Among the genes altered by most individual teratogens ( Figure 5B) were SEMA3C, a member of the class of semaphorins that function as axonal growth guidance molecules [38,39]; MIAT, a long non-coding RNA which is expressed in neurons and plays a role in retinal development [40,41], and carboxypeptidase E (CPE), which is involved in the biosynthesis of neuropeptides [42]. KEGG pathway analysis and the analysis of GO groups resulted in 'axon guidance' and 'neuron migration' as the top overrepresented motifs ( Figure 5C,D), which correspond to the intended neuroepithelial differentiation in UKN1. Moreover, genes involved in several pathways relevant in developmental processes were overrepresented, such as PI3K-Akt, MAPK, P53, and EGFR signaling ( Figure 5C). Similar top genes, KEGG pathways, and GO groups were obtained when probe sets at the 1-fold C max , or only up-or downregulated probe sets at the 20-fold C max , were analyzed (Supplementary Figures S1-S5).

Comparing the UKN1 Test with UKK2 for the Classification of Developmental Toxicity
We next compared the performance of the here-described UKN1 test system to the previously published UKK2-based test [14], after adjusting for retinol as a teratogen at 20fold Cmax ( Figure 4B,E), which was carried out based on the rationale given in the methods section. From this comparison, the following conclusions could be drawn: (i) the use of gene expression and cytotoxicity data together led to a higher test-performance in both tests than the use of gene expression or cytotoxicity data alone (Table 3); (ii) the SPS-procedure was more specific than the top-1000-procedure, but the teratogens PHE and VIS were consistently misclassified as false-negatives; (iii) the top-1000-procedure was more sensitive than the SPS-procedure, but consistently misclassified the non-teratogens DPH and SUC as false-positives; (iv) the UKN1 test performed better at 20-fold Cmax; and (v) the UKK2 test performed better at the 1-fold Cmax. Figure 5. Biological interpretation of genes differentially expressed after exposure of hiPSC to teratogens at 20-fold C max . (A) Number of significant probe sets (log2 fold change > 1; adjusted p-value < 0.05) induced by non-teratogens and teratogens at the 20-fold C max (including also 10-fold C max carbamazepine and 1.67-fold C max VPA). (B) Top-10 genes from the 7647 SPS deregulated by teratogens. The number in the bar indicates the number of compounds that deregulated the specific gene. The absolute mean log2 fold-change of each gene is given on the x-axis. A comprehensive gene list is given in the Supplementary Excel-file 1. (C) KEGG pathway enrichment analysis of the 7647 SPS deregulated by teratogens. The ten KEGG pathways with the lowest adj. p-values are given. Full names and complete KEGG-pathway lists are given in the Supplementary Excelfile 2. "Count:" number of significant genes from A linked to the KEGG pathway. "Gene Ratio:" percentage of significant genes associated with the pathway compared to the number of all significant genes associated with any pathway. (D) The ten GO groups with the lowest adj. p-values from all significantly (adj. p-value < 0.05) overrepresented GO groups in the 7647 SPS deregulated by teratogens. The names of the GO groups have been shortened. Full names and complete GO group lists can be found in the Supplementary Excel-file 3. 'Count:' number of significant genes from A linked to the GO group. 'Hits:' percentage of significant genes compared to all genes assigned to the GO group.

Comparing the UKN1 Test with UKK2 for the Classification of Developmental Toxicity
We next compared the performance of the here-described UKN1 test system to the previously published UKK2-based test [14], after adjusting for retinol as a teratogen at 20-fold C max ( Figure 4B,E), which was carried out based on the rationale given in the Supplemental Information. From this comparison, the following conclusions could be drawn: (i) the use of gene expression and cytotoxicity data together led to a higher test-performance in both tests than the use of gene expression or cytotoxicity data alone (Table 3); (ii) the SPS-procedure was more specific than the top-1000-procedure, but the teratogens PHE and VIS were consistently misclassified as false-negatives; (iii) the top-1000-procedure was more sensitive than the SPS-procedure, but consistently misclassified the non-teratogens DPH and SUC as false-positives; (iv) the UKN1 test performed better at 20-fold C max ; and (v) the UKK2 test performed better at the 1-fold C max .
Overall, UKN1 and UKK2 showed a high congruency in their classifications of the tested compounds. When the tests were compared to each other at their optimal 'working concentration,' that is, UKN1 at 20-fold C max and UKK2 at the 1-fold C max , 34 of the 38 substances (89%, retinol not considered) were identically classified in the SPS-procedure, as well as in the top-1000-procedure (Table 4). Among the exceptions were atorvastatin (ATO), which was a true-positive in UKN1 but false-negative in UKK2, and thalidomide, which was a true-positive in UKK2 but a false negative in UKN1. Different classifications by the two tests (including both the SPS and the top-1000-procedure) were further observed for ASC, ATO, favipiravir (FPV), MEM, THD, and VIS.

Overlap of Teratogen-Induced Expression Patterns in UKN1 and UKK2
That similar classification results were obtained for both the UKN1 and the UKK2 tests may appear surprising since the protocols recapitulate different biological processes: differentiation to NEPs for UKN1 versus myoblast development for UKK2. In order to gain more insight into the involved genes and pathways, we compared gene expression changes for both cell systems. A relatively large overlap of teratogen-induced gene expression changes was obtained for both tests; nevertheless, the number of genes exclusively influenced by either the UKN1 or UKK2 test was higher than the number altered by both tests, for example, 4013 or 4885 genes, respectively, versus 3634 genes for all probe sets at 20-fold C max ( Figure 6A). Analyzing the numbers of significantly overrepresented GO groups in the gene sets of the overlap, UKN1 (only) and UKK2 (only) demonstrated that 83% of all significantly overrepresented GO groups were obtained from the overlap ( Figure 6B). The most significant KEGG pathways and GO groups of the UKN1 (only) gene set included 'axon guidance' and 'neuronal crest migration' (Figure 6C,D), in agreement with the abovereported motifs in the complete set of genes altered in UKN1 ( Figure 5C,D). The UKK2 (only) gene set resulted in overrepresentation of 'myoblast fate commitment', in agreement with the purpose of this protocol, but unexpectedly also resulted in enriched cancer motifs, such as 'hepatocellular cancer,' 'breast cancer,' and 'pancreatic cancer' (Figure 6C,D). A conspicuous feature of the overlap gene set was overrepresentation of signaling pathways known to be relevant in embryonic development and carcinogenesis, such as PI3K-Akt, P53, TGF-beta, EGFR, and Hippo ( Figure 6C), similar to the pathways obtained for UKN1 ( Figure 5C). The overlap of the top probe sets of UKN1 and UKK2 included genes that play a role in both neural crest and cardiac development, such as MEIS2 and the helix-span-helix transcription factor TFAP2A (AP-2α) (Machon et al., 2015; Brewer et al., 2002) ( Figure 6E). In conclusion, while UKN1 and UKK2 only overlapped by 27% at the level of significant probe sets, there was a more than 80% overlap when the biological motifs, such as GO groups, were considered. Similar results were obtained when probe sets at the 1-fold C max were investigated (Supplemental Figures S6-S8), or when only up-or downregulated probe sets at the 20-fold C max were analyzed (Supplemental Figures S9 and S10).

Combining UKN1 and UKK2 Improves the Classification Performance
Using the SPS numbers and the predicted probabilities of UKN1 and UKK2, we finally investigated whether combining both tests could further improve the outcome of the classification. Therefore, three different combinations were created where for each condition, either the lowest ('min') or the highest value ('max') from one of the tests was used, or where the arithmetic mean of both tests was calculated ('mean'). From these combinations, the 'mean' value ( Figure 4C,F) improved the outcome of both the top-1000and the SPS-procedure (Table 3) compared to each test alone. Furthermore, the top-1000procedure at 20-fold C max classified all compounds except DPH and SUC correctly (Table 4), thus improving the AUC, accuracy, and specificity to 0.98, 0.95, and 0.87, respectively, while reaching a sensitivity of 1.0 (Table 3). In addition, the SPS-procedure led to an AUC, accuracy, sensitivity, and specificity of 0.91, 0.92, 0.88, and 1.0, respectively, at 20-fold C max , with only three misclassified compounds, namely MEM, PHE, and VIS, as false-negatives ( Table 4). The combination 'max' (Supplementary Figure S11B,D) improved the classification with the SPS-procedure, but not the top-1000-procedure (Supplementary Table S3).

Top Genes-Based Classification in UKN1 by RT-qPCR
Seven so-called 'top genes' were identified based on the definition that they were altered by the highest numbers of teratogens: CTHRC1, LMAN1, PNCK, RBM24, SEMA3C, SLIT2, and ZNF385B. We tested whether these genes could be used for compound classification in a simplified approach that utilized RT-qPCR instead of gene arrays. The RT-qPCR data highly correlated with the gene array analysis with r = 0.97 ( Figure S12). Significant changes (absolute fold-change > 2, p-value < 0.05) were only observed for the teratogenic substances. By considering each condition that showed at least one significantly deregulated top-gene or that was cytotoxic (SPS-like) as test-positive, an accuracy, sensitivity, and specificity of 0.90, 0.83, and 1.0, respectively, could be reached at 20-fold C max (Table 3). Interestingly, the result of the classification based on the seven top-genes was identical to the classification obtained by the SPS-procedure (Table 4). The top-1000-like classification using logistic regression and leave-one-out cross-validation (Supplementary Figure S13) led to a lower accuracy and specificity of each 0.87, but a higher sensitivity of 0.88 (Table 3). Further information is available as Supplemental Information: classification of all conditions (Supplementary Table S1), predicted probabilities for teratogenicity (Supplementary Table S2), gene expression diagrams (Supplemental Figures S14-S20), and data of all seven genes (Supplementary Excel-File 7).

Discussion
The identification of teratogenic substances that affect embryonic development and lead to congenital malformations in newborns remains an important task in toxicity testing. However, conventional in vivo tests are expensive, and the number of required experimental animals is high. As a result, alternative test strategies, such as stem cell-based in vitro tests, are urgently needed [1-3]. In the current study, we used the UKN1 test, an approach based on hiPSCs differentiating to NEPs, to identify developmental toxicants in vitro. Using transcriptomics, we analyzed the effect of 23 teratogenic and 16 non-teratogenic compounds at concentrations of 1-fold and 20-fold C max , and classified the results by using either the number of significantly deregulated probe sets (SPS-procedure) or a penalized logistic regression procedure based on the 1000 probe sets with the highest variance (top-1000-procedure). Together with cytotoxicity data, the SPS-procedure at the 20-fold C max was able to classify the teratogens with an AUC, accuracy, sensitivity, and specificity of 0.90, 0.90, 0.83, and 1.0, respectively. Alternatively, a higher sensitivity but lower specificity was obtained for the top-1000-procedure at the 20-fold C max with the AUC, accuracy, sensitivity, and specificity at 0.95, 0.87, 0.92, and 0.80, respectively.
Compared to the previously published UKK2 test, which used the same set of compounds and techniques to analyze and classify compound-induced effects on gene expression, but employed a cardiomyogenic rather than a neuronal differentiation process [14], the classification outcome was surprisingly similar and overlapped for 90% of the analyzed compounds. In addition, the efficiency of UKK2 to detect teratogens was very similar to UKN1, even though UKK2 performed best at 1-fold C max and not at 20-fold C max like UKN1. These findings led to the question of whether a combination of both tests could further improve the classification metrics. Indeed, when the arithmetic means of the results at 20-fold C max from both tests were combined, the AUC, accuracy, sensitivity, and specificity of the SPS-procedure slightly improved to 0.91, 0.92, 0.88, and 1.0, respectively, and to 0.98, 0.95, 1.0, and 0.87 for the top-1000-procedure.
Although both tests classified most of the compounds correctly and could clearly determine if a compound and concentration influenced gene expression, some limitations should be considered. First, the information on cytotoxicity was required to obtain the best classification performance, as some teratogens were cytotoxic at the tested 1-fold or 20-fold C max , especially in UKN1. This observation was unexpected, since, when designing the present study, we did not expect cytotoxicity so close to the therapeutic C max . Thus, a cytotoxicity assay was not included, but the cytotoxicity information was derived from the observation of cell detachment from the culture dishes and the corresponding lack of a sufficient amount of RNA for gene array measurement. In future, studies should integrate quantitative cytotoxicity analysis and consider this information for the classification. Moreover, a concentration-dependent gene expression analysis should be performed instead of restricting the analysis to the here-chosen 1-fold and 20-fold C max . In combination with cell viability assays, such an approach would directly link gene expression and cell viability and enable the precise discrimination between teratogenicity-and cytotoxicity-related gene expression alterations.
The second limitation is the consistent misclassification of some compounds. The SPS-procedure was unable to identify PHE and VIS as teratogens, whereas the top-1000procedure misclassified the non-teratogens DPH and SUC as teratogens. Although DPHinduced toxicity is documented [43], and some effects were also reported for SUC in mesenchymal stromal cells and mice [44][45][46], the positive test results point to a shortcoming of the top-1000-procedure rather than to an actual adverse effect. Nevertheless, the top-1000-procedure was cross-validated in contrast to the SPS-procedure, thus avoiding the problem of overfitting the data.
For the comparison of the accuracy (and further performance metrics) at the 1-fold and the 20-fold C max , the challenge had to be addressed that four compounds (LFL, PHE, TER, and VIS) exceeded their solubility limits at the 20-fold C max . In order to, nevertheless, allow a comparison, the SPS numbers and predicted probabilities obtained for the 1-fold C max were also used for the classification at the (not testable) 20-fold C max for these four compounds. The here-applied procedure to also use the 1-fold C max results for calculations (e.g., of accuracy) at the 20-fold C max was chosen in order to avoid these compounds influencing the comparison, since their classifications were identical at the 1-fold and 20-fold C max in all cases (except for VIS in UKN1 for the top-1000-procedure). The difference in accuracy for both concentrations must be due to the other compounds that could all be tested at both 1-and 20-fold C max . In principle, an alternative approach would have been to calculate the classifiers without the compounds LFL, PHE, TER, and VIS. This approach was not chosen because two of these four compounds, PHE and VIS, led to false negative classifications in the SPS-procedure; classifier construction without PHE and VIS may, therefore, have resulted in overoptimistic performance metrics.
Another limitation of the present study is that the influence of the different exposure schedules on gene expression and cytotoxicity has not yet been systematically evaluated. In the UKK2 test, a 24 h incubation period with test substances was used and gene ex-pression was analyzed immediately afterwards. In contrast, a four-day incubation period followed by two days of a test compound-free washout period was applied for the UKN1 test. The intention of the washout period was to allow the recovery of the cells from reversible compound-induced expression changes while only retaining irreversible expression changes, for example, due to the differentiation to aberrant cell types. These differences in the protocols could be the reason why, for example, the teratogens ATO and THD were differently classified by UKN1 and UKK2. ATO only had small effects on gene expression and was not cytotoxic in the UKK2 test, resulting in a false-negative classification, whereas, possibly due to the longer incubation period, ATO was cytotoxic in the UKN1 test, resulting in a true-positive classification. In contrast, UKN1 was not able to clearly identify the well-known teratogen THD, a limitation that requires further investigations.
Analysis of KEGG pathways and GO groups of individual probe sets that were significantly influenced by the teratogens in the UKN1 test demonstrated that genes involved in axon guidance, neuron migration, or anterior/posterior specification were overrepresented. These findings suggest that the differentiation process of stem cells to NEPs may be compromised by the test substances. The same set of test compounds analyzed in the present work was also previously analyzed in the UKK2 system [14] that recapitulates the differentiation of stem cells to myoblasts, thereby offering the possibility to compare both differentiation protocols. Both UKN1 and UKK2 showed a large overlap of overrepresented GO groups, e.g., 83% for all probe sets at 20-fold C max , although the overlap of significant probe sets was smaller (27%). Moreover, UKN1 and UKK2 overlapped for a substantial number of signaling pathways critical for developmental processes, such as PI3K-Akt, P53, TGF-beta, MAPK, EGFR, and Hippo, which are influenced by the teratogens in both tests. This overlap may explain why UKN1 and UKK2 led to a similar classification of most teratogens and non-teratogens, although the applied protocols and induced differentiation processes were quite different.
Finally, validation of the gene array data by RT-qPCR showed a high correlation of gene expression changes obtained for both methods for seven selected top-genes. Interestingly, the top-genes, which were selected because their expression was influenced by the largest numbers of teratogens, allowed classification with identical sensitivity and specificity as the SPS-procedure with genome-wide data. Thus, by selecting a small, well-chosen set of top genes, it may be possible to identify teratogens with targeted gene expression analysis in a manner that is cost-efficient, instead of using cost-intensive whole-transcriptome analysis.
In conclusion, both the UKN1 and UKK2 tests allow for the identification of teratogens at human-relevant concentrations. Despite recapitulating distinct differentiation processes, a high degree of overlap in the classification results was obtained by both tests, likely because similar pathways were affected. A combined analysis of tests that differentiate hiPSCs into different germ layers may even further improve the prediction of developmental toxicants.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cells11213404/s1, Teratogenicity of high-dosed retinol; SOP for the UKN1 protocol; Table S1: Classification of the in vitro test results; Table S2: Predicted probabilities for teratogenicity in the UKN1 test, the UKK2 test, the test combinations, and the RT-qPCR test; Table S3: Performance metrics of the test combinations 'min' and 'max'; Table S4: Cytotoxicity and number of significantly deregulated probe sets in the test combinations; Figure S1-S5: Biological interpretation of genes differentially expressed after exposure of hiPSC to teratogens in the UKN1 test; Figure S6-S10: Biological interpretation and comparison of genes differentially expressed in the UKN1 and UKK2 test after exposure of hiPSC to teratogens; Figure S11: Classification of the teratogenic and non-teratogenic compounds by (A and B) combinations of the SPS-procedures and (C and D) the top-1000-procedures of the UKN1 and UKK2 test; Figure S12: Correlation plot of substance-induced gene expression changes in UKN1 measured in gene arrays and RT-qPCR; Figure S13: Classification of the teratogenic and non-teratogenic compounds based on the expression of 7-top-genes of the UKN1 test measured in RT-qPCR; Figure S14-S20: Expression changes of selected top genes in UKN1 samples relative to controls obtained by gene array and RT-qPCR;