Involvement of Epithelial–Mesenchymal Transition Genes in Small Cell Lung Cancer Phenotypic Plasticity

Simple Summary Small cell lung cancer (SCLC) is an aggressive cancer that is difficult to treat. There are at least five subtypes of SCLC cells, defined by gene expression signatures. The transitions between these subtypes and cooperation between them contribute to the progression of SCLC. Particularly, transitions between neuroendocrine (NE) states, including A, A2, and N subtypes, and the non-NE states, including P and Y subtypes, are hallmarks of SCLC plasticity. In this study, the relationship between SCLC subtypes and epithelial to mesenchymal transition (EMT) was analyzed. EMT is a well-known form of cellular plasticity that contributes to cancer invasiveness and resistance. The results showed that the SCLC-A2 subtype is epithelial while SCLC-A and SCLC-N are mesenchymal but distinct from the non-NE mesenchymal states. This study provides a basis for understanding the gene regulatory mechanisms of SCLC tumor plasticity and its applicability to other cancer types. Abstract Small cell lung cancer (SCLC) is an aggressive cancer recalcitrant to treatment, arising predominantly from epithelial pulmonary neuroendocrine (NE) cells. Intratumor heterogeneity plays critical roles in SCLC disease progression, metastasis, and treatment resistance. At least five transcriptional SCLC NE and non-NE cell subtypes were recently defined by gene expression signatures. Transition from NE to non-NE cell states and cooperation between subtypes within a tumor likely contribute to SCLC progression by mechanisms of adaptation to perturbations. Therefore, gene regulatory programs distinguishing SCLC subtypes or promoting transitions are of great interest. Here, we systematically analyze the relationship between SCLC NE/non-NE transition and epithelial to mesenchymal transition (EMT)—a well-studied cellular process contributing to cancer invasiveness and resistance—using multiple transcriptome datasets from SCLC mouse tumor models, human cancer cell lines, and tumor samples. The NE SCLC-A2 subtype maps to the epithelial state. In contrast, SCLC-A and SCLC-N (NE) map to a partial mesenchymal state (M1) that is distinct from the non-NE, partial mesenchymal state (M2). The correspondence between SCLC subtypes and the EMT program paves the way for further work to understand gene regulatory mechanisms of SCLC tumor plasticity with applicability to other cancer types.

Synapse ID syn23630203. Human cell line and RPM mouse tumor datasets were preprocessed as described by Groves et al. [16], including filtering and normalization of total counts using the Python package Scanpy [36], log-transformation using the log1p function from the Numpy package, and scaling using Scanpy. Log-normalized expression levels were used for comparisons (Mann-Whitney U test) between subtypes.
Human CDX data were preprocessed as described by Gay et al. [9]. Briefly, cells were filtered as described [9] to remove non-tumor cells. SC53 tumors (before and after cisplatin treatment) were concatenated together, and SC68 tumors were concatenated together. For each dataset (SC53 and SC68), Scanpy was used: total counts were normalized by cell, then data were log transformed (log1p). Highly variable genes were determined with min_mean = 0.0125, max_mean = 5, and min_dsip = 0.8. A PCA, tSNE, and Leiden clustering were then calculated using Scanpy.

Subtyping of SCLC Cells
Single cell subtype labeling was done as described by Groves et al. [16]. Briefly, after archetype analysis was applied to the MAGIC-imputed single cell datasets and vertices were identified [37,38], we labeled the cells closest to each archetype as that subtype if the archetype score was >0.95 (each cell received weighted archetype scores that sum to 1). SCLC archetypal signatures were generated from bulk human cell line and tumor transcriptomics data, giving a matrix of 105 genes by 5 archetypes (SCLC-A, SCLC-A2, SCLC-N, SCLC-P, and SCLC-Y). In order to align the single-cell archetypes with our bulk archetype signatures, we consider the scores for each cell described in the STAR Methods section "Bulk gene signature scoring of single cells using archetype signature matrix" from the study by Groves et al. [16]. For each bulk signature x and for each single-cell archetype a, we ran the following significance test: (1) Find the mean bulk score x for a specialist, m.
(2) Choose a random sample of size n a , where n a is the number of specialists, with replacement from the remaining cells (i.e., cells that are not specialists, including generalists and other specialist cells). Find the mean bulk score for this sample. N.B. Because some time points have very few cells, we sample evenly from each time point to ensure adequate representation across the time points. (3) Repeat this random selection 1000 times. (4) Generate a p-value, which is equal to the percentage of means from this random distribution above m. (5) Using statsmodels.states.multitest, correct p-values for multiple tests. We used the Bon ferroni-Holm method to control the family-wise error rate. Consider q < 0.1 significant.
Therefore, each archetype was labeled with an SCLC subtype if enriched in that subtype's signature. For this work, we considered only the cells labeled with such a subtype (i.e., specialists). For the RPM tumor cells, six archetypes were originally found in the study by Groves et al. [16] (A/N, A2, P/Y, two Y groups, and one archetype not enriched in any signatures). Here, we consider those enriched in SCLC subtype signatures, combining the two Y groups resulting in A/N, A2, P/Y, and Y.

Single Sample Gene Set Enrichment Analysis (ssGSEA)
ssGSEA was performed to compute the enrichment scores for individual cells (for scRNA-seq data) or individual cell lines (for bulk RNA-seq data) [39,40]. A list of 232 epithelial signature genes and a list of 193 mesenchymal signature genes were used to compute the E enrichment score (E score) and M enrichment score (M score), respectively [41,42]. The combined list of 425 EMT genes has 20 genes that also appear in the 105 signature genes used for SCLC subtyping. To ensure that the inferred relationship between EMT scoring and SCLC subtyping was not simply due to the shared list of genes, we excluded the intersection of the two gene sets and performed additional ssGSEA scoring for each dataset. The patterns of the scoring were not altered by the exclusion.

Non-Negative Principal Component Analysis (nnPCA)
To quantify the divergent progression of EMT across the subtypes of SCLC cells, we used a second approach to compute the E and M scores. We performed nnPCA using the gene sets mentioned above [41,43]. nnPCA determines the approximate orthogonal axes with non-negative coefficients (loadings) for features (genes). Variances of projections of data points (cell or cell lines) onto these axes are maximized via an optimization method [44]. In brief, the vectors of weights, w, used to define the first principal component of PCA is defined such that it maximizes the variance of the first component, i.e., where C is the covariance matrix of the original data set X and w is unit vector (||w|| 2 = 1). In our case, X is an m by n matrix of expression values where m is the number of samples and n is the number of genes in the selected gene set. This method for determining w can be treated as an expectation maximization problem where the original data is projected using the current estimate of w (y = Xw t ) and this projection is used to re-estimate w using the following minimization step: where x n are the rows of the original data and y n are the rows of the projected data [44]. This expectation-maximization formulation allows additional constraints on w, including forcing the component values to be non-negative. Note that the non-negativity constraint applies only to the weight components such that negative scores can still exist if there are negative values in underlying data, such as those produced by centering expression data to zero which we did for all nnPCA inputs. Subsequent components are calculated in the same way, under the constraint that they are orthogonal to the preceding ones.
To select the principal components (PCs) that best represent the EMT programs across SCLC subtypes, we used two criteria to rank the PCs in a semi-supervised manner. We first selected the top five PCs that have the highest variances explained for individual samples (cells or cell lines). Among the five PCs, we re-rank them based on the variances of means of individual SCLC subtypes. Similar to ssGSEA, we excluded the EMT gene set and the SCLC gene set and performed additional nnPCA scoring for each dataset. The ranks of the PCs and the patterns of the nnPCA were not altered by the exclusion.

The SCLC-A2 Subtype Is Highly Enriched in Epithelial Gene Expression
We first analyzed a time course of single-cell RNA-sequencing (scRNA-seq) dataset from an SCLC genetically engineered mouse model (GEMM) with an overactive Myc oncogene (Rb1 fl/fl ; Trp53 fl/fl ; Lox-Stop-Lox [LSL]-Myc T58A , RPM) [10]. The 15,138 single cells were annotated with the SCLC subtypes using gene expression signatures and archetype analysis, as recently described [16]. An SCLC signature comprising 105 genes that distinguishes the five SCLC subtypes was used to classify individual cells relative to extreme expression patterns of NE and non-NE subtypes, with some subpopulations enriched in multiple subtype signatures, resulting in four classes (A/N, A2, P/Y, and Y) (see Section 2) [16]. In parallel, using normalized gene expression, we computed the epithelial (E) and mesenchymal (M) scores for this dataset. The scores were computed using single-sample gene set enrichment analysis (ssGSEA) with a list of 232 epithelial (E) associated genes and 193 mesenchymal (M) associated genes (see Section 2) [39,40,42]. In the EMT spectrum depicted by the E and M scores, we observed a striking difference between the position of the NE A2 subtype and other NE and non-NE subtypes ( Figure 1A). The A2 cells had a significantly higher E score and a significantly lower M score compared to other subtypes (p < 10 −5 for each comparison). Consistent with the E/M scores, we found that A2 cells had significantly higher levels of Cdh1 (coding E-cadherin) (p < 10 −5 ), a widely used E marker gene crucial for cell adhesion, compared to other cells ( Figure 1A inset). These results indicate that gene expression in the A2 subtype is most restricted to an epithelial phenotype, whereas, in all other SCLC subtypes (NE and non-NE), M gene expression appears to be permitted to various extents. method. In this dataset, A2 cells are more distinct from other subtypes than in the bulk RNA-seq data, and the separation is more similar to the RPM tumor cell data ( Figure 1C). Interestingly, in both bulk and single-cell RNA-seq datasets for SCLC cell lines, the NE subtype A has an intermediate EMT expression profile ( Figure 1B,C).
The strong association between SCLC subtypes and EMT progression raises the question of whether this connection is trivial due to a sharing of genes between the 105 SCLC subtype signature gene set and the 425 EMT signature gene set. However, there are only 20 genes appearing in both gene sets, indicating limited overlap (Jaccard index 0.039). Furthermore, excluding the 20 overlapping SCLC signature genes from the EMT gene set had minimal effect on the distributions of SCLC cells in the EMT spectrum in each of the three datasets ( Figure S1). Taken together, our analyses indicate a strong and nontrivial association between SCLC subtypes and EMT progression. In particular, a specific SCLC NE subtype, A2, exhibits phenotypes associated with highly restricted epithelial lineages in mammals.

Mesenchymal Scoring Is Diverse across Non-A2 Subtypes
The ssGSEA-based E/M scoring did not show any dramatic difference among A, N, P, and Y (i.e., non-A2) subtypes ( Figure 1). However, the expression of individual M We next asked whether the association between the A2 subtype and the epithelial lineage in RPM mouse SCLC tumor cells extends to human SCLC cells. We first analyzed bulk RNA-seq data from 120 human SCLC cell lines assigned with subtype identities (see Section 2). Several A2 cell lines (e.g., DMS53) were located in the extreme E position on the EMT spectrum ( Figure 1B), in agreement with A2 cells from RPM tumors ( Figure 1A). As a group, A2 cell lines had the highest mean E score and the lowest M score among SCLC subtypes, although these scores were closer to A and Y cells than in the RPM tumor cell data ( Figure 1B). This partial overlap could be due to cell-cell heterogeneity within cell lines that is not resolved in bulk RNA-seq data. Therefore, we visualized 13,945 singlecell transcriptomes from eight human SCLC cell lines [16] with the same E/M projection method. In this dataset, A2 cells are more distinct from other subtypes than in the bulk RNA-seq data, and the separation is more similar to the RPM tumor cell data ( Figure 1C). Interestingly, in both bulk and single-cell RNA-seq datasets for SCLC cell lines, the NE subtype A has an intermediate EMT expression profile ( Figure 1B,C).
The strong association between SCLC subtypes and EMT progression raises the question of whether this connection is trivial due to a sharing of genes between the 105 SCLC subtype signature gene set and the 425 EMT signature gene set. However, there are only 20 genes appearing in both gene sets, indicating limited overlap (Jaccard index 0.039). Furthermore, excluding the 20 overlapping SCLC signature genes from the EMT gene set had minimal effect on the distributions of SCLC cells in the EMT spectrum in each of the three datasets ( Figure S1). Taken together, our analyses indicate a strong and nontrivial association between SCLC subtypes and EMT progression. In particular, a specific SCLC NE subtype, A2, exhibits phenotypes associated with highly restricted epithelial lineages in mammals.

Mesenchymal Scoring Is Diverse across Non-A2 Subtypes
The ssGSEA-based E/M scoring did not show any dramatic difference among A, N, P, and Y (i.e., non-A2) subtypes ( Figure 1). However, the expression of individual M genes, such as Vim, differed significantly among non-A2 subtypes (Figure 2A,B). This suggests the possibility that M genes have divergent expression patterns across the SCLC subtypes, which may mask the ssGSEA-based scoring. Recent studies in tumors and cell lines also suggested diversity of EMT programs [25,26,45]. For example, activation of a subset of M genes depends on EMT transcription factor ZEB1, while that of other M genes are activated via ZEB1-independent pathways [25]. To determine potentially distinct M programs in SCLC cells, we used an alternative scoring method based on nonnegative principal component analysis (nnPCA), which generates leading PCs that describe the variance for each gene set (see Section 2). With the single-cell data from both the mouse RPM model and the SCLC cell lines, we noticed that the first PC for M scores explained less variance compared to the first PC for E scores ( Figure 2C,D top). Furthermore, the first two M PCs produced comparable variances explained. These results indicate that M gene expression is spread evenly over at least two orthogonal dimensions, suggesting diversity of M scoring and warranting further investigation. Interestingly, M scores obtained from the first and second PCs ranked the SCLC subtypes differently: A/N subtypes have higher scores than Y subtype cells based on the first M PC (nnPC1) but lower scores than Y subtype with the second M PC (nnPC2) ( Figure 2C,D scatter plots). In fact, two representative M genes-Vim (the gene encoding vimentin, an intermediate filament protein component of mesenchymal cell cytoskeletons) and Zeb1 (a widely studied EMT transcription factor)had an anticorrelated pattern between A/N and Y subtypes in the RPM dataset (compare Figure 2E to Figure 2A) [46,47]. Although this anticorrelation was less prominent in the cell line data (compare Figure 2F to Figure 2B), distinct Vim and Zeb1 expressions of SCLC subtypes were observed. Overall, like the case of Vim and Zeb1, the mean differences in expression of M genes between A/N subtypes and Y subtype were diverse, and these differences were positively correlated between the RPM data and the cell line data ( Figure 2G, Pearson correlation coefficient 0.29, p < 10 −5 ). Some M genes, such as Zeb1, had higher expression in the A/N subtypes than in the Y subtype, while some others, such as Vim, had the opposite pattern ( Figure 2G,H). We defined these two groups of M genes as M1 and M2, respectively (FDR < 0.05 for N-Y differences). We performed gene ontology analysis and found that, while 20 enriched biological processes were shared between the two groups, M1 genes were uniquely enriched in more than 200 processes including 'negative regulation of cell adhesion involved in substrate-bound cell migration' (fold enrichment > 100, FDR = 0.005), whereas M2 were uniquely enriched in 11 processes including 'extracellular matrix organization' (fold enrichment = 14.08, FDR = 0.02) (Tables S1 and S2). As these two groups of processes may both contribute to enhanced cell motility, this result suggests that NE and non-NE subtypes may use distinct strategies to achieve mesenchymal-like cellular functions such as cell migration. In summary, our result supports the functional divergence of the M genes differentially expressed in NE and non-NE SCLC subtypes.

The Highly Epithelial A2 Subtype and Diverse M Gene Expression Patterns Are Detectable in Human SCLC Tumor
The distinction between the A2 and A subtypes has generally not been considered in previous studies by other investigators. For example, Chan et al. profiled 54,523 individual SCLC cells from 19 human samples and classified them into A, N, and P subtypes (note the absence of the Y subtype) [35]. We therefore asked whether our analytical framework can be used to discover previously unknown, extremely epithelial-like cells in this human SCLC tumor dataset. We first performed E and M scoring for this dataset ( Figure 3A) and found that A cells (as subtyped in Chan et al. [35]) fell into two distinct groups, one of which was highly E-like ( Figure 3B, red arrow). We hypothesized that the cells labeled as A subtype by Chan et al. include both extreme E-like (A2) and EMT-intermediate-like (A) subtypes. We re-subtyped the SCLC cells from Chan et al. using our subtype gene signatures (see Section 2), and we found a significant fraction of the previously labeled A cells had high scores for the A2 subtype and low scores for the A subtype, which we labeled A2* (Figures 3C,D and S2). In particular, the distinct group of E-like cells corresponds to the group of cells that had high A2 scores and low A scores ( Figures 3C,D and S2). Interestingly, the cell cluster with high A2 scores corresponds to a unique tumor sample ( Figure 3A, pink) with a unique treatment type ( Figure S3). We found that the A2 scores of these 11,056 A2-enriched SCLC cells were strongly positively correlated with their E scores (nnPCA performed with RU1108 cells alone) ( Figure 3E) (Spearman correlation coefficient s r = 0.69). In contrast, A scores were negatively correlated with E scores in these cells ( Figure 3E) (s r = −0.23). In addition, A2 scores had a moderately negative correlation with M scores (nnPCA) (s r = −0.17) ( Figure 3F). absence of any Y subtype cells in the dataset. Nonetheless, heterogeneous expressions of M genes ZEB1 and VIM were observed within the N subtype ( Figure 3G). Furthermore, three M-like cell clusters with similar E and M score rankings ( Figure 3A, red, dark green, and brown) had dramatically different profiles of VIM and ZEB1 expression ( Figure 3G) even though all were classified as N ( Figure 3H). These results further support the association between SCLC and EMT programs and reveal a previously underappreciated connection between SCLC tumor cell heterogeneity and EMT.  [35] were projected onto indicated score axes. Linear regression lines and confidence intervals were obtained for all cells and ASCL1 + cells.
is Spearman correlation coefficient. (G) Normalized expression of ZEB1 and VIM in the same dataset as in A. Color code represents each of the 19 patients as in A. (H) The same data as in G but color labels are defined by SCLC subtype shown in B.

Intratumor Heterogeneity of SCLC Indicates Strong A2-Epithelial Association at Single-Cell Level
The A2-epithelial (A2-E) association in human tumors described in the previous section relies on gene set analyses across multiple tumors, where all cells within each tumor Systematic analysis for divergent M programs could not be performed due to the absence of any Y subtype cells in the dataset. Nonetheless, heterogeneous expressions of M genes ZEB1 and VIM were observed within the N subtype ( Figure 3G). Furthermore, three M-like cell clusters with similar E and M score rankings ( Figure 3A, red, dark green, and brown) had dramatically different profiles of VIM and ZEB1 expression ( Figure 3G) even though all were classified as N ( Figure 3H). These results further support the association between SCLC and EMT programs and reveal a previously underappreciated connection between SCLC tumor cell heterogeneity and EMT.

Intratumor Heterogeneity of SCLC Indicates Strong A2-Epithelial Association at Single-Cell Level
The A2-epithelial (A2-E) association in human tumors described in the previous section relies on gene set analyses across multiple tumors, where all cells within each tumor were classified into a single subtype [35] (Figure 3A-D). We next analyzed a scRNA-seq dataset of two human SCLC circulating tumor cell-derived xenografts (CDXs) [9], each of which underwent an EMT-like cell state transition upon cisplatin treatment such that multiple subtypes exist within individual tumors. We first performed subtyping analysis and computed scores of five SCLC subtypes for 5268 cells (untreated and cisplatin-treated) of tumor SC53. We visualized the SCLC scores in the E-M space obtained from nnPCA ( Figure 4A) and found that while there was a distinct, small population corresponding to an M-like state as previously reported [9], most tumor cells were located in a continuous region with relatively high A2 and A scores ( Figure 4A), consistent with their positivity for ASCL1 as previously shown. Furthermore, N and P subtypes seem to be absent from this tumor ( Figure 4A). Interestingly, although both A and A2 scores were positively correlated with E scores for all cells in the tumor (s r = 0.43 for A2; s r = 0.09 for A), only A2 scores had a strong correlation with E scores for cells that express ASCL1 ( Figure 4B,C) (s r = 0.35 for A2; s r = −0.08 for A). In addition, neither the SC53 tumor cells nor its ASCL1 + subpopulation showed a strong correlation between A2 scores and M scores ( Figure 4D).  [9] were projected onto the E and M score axes using nnPCA. Colors indicate SCLC subtype scores (see Section 2). (B-D) Linear regression lines and confidence intervals were obtained for all cells and ASCL1 + cells from data in D. (E) Datasets of three tumors (top labels) were used to compute the Spearman correlation coefficients between SCLC subtype scores (left labels) and E scores from nnPCA or CDH1 expression levels. All cells and ASCL1 + cells were analyzed separately. (F) Correlations between A2 scores and E scores (nnPCA) in untreated and cisplatin-treated cells of SC53 tumor. (G) Datasets of SC53 and SC68 tumors were used to compute the Spearman correlation coefficients between SCLC subtype scores (left labels) and E scores from nnPCA or CDH1 expression levels. All cells and ASCL1 + cells were analyzed separately. Untreated and cisplatin-treated cells were analyzed separately.

Discussion
Understanding the molecular basis for tumor heterogeneity is crucial for developing next-generation therapeutic strategies [48][49][50][51]. Recent advances in defining subtypes of SCLC tumors provide such potential [9], but cellular and molecular mechanisms contributing to phenotypic heterogeneity are still unclear. Interestingly, SCLC tumors have both NE and epithelial characteristics [52]. We therefore asked whether SCLC cells hijack EMT, canonically defined as a developmental program [53], to increase invasiveness and drug resistance, which has been shown to occur in other cancer types [22,23].
While it was shown that some genes promoting EMT are activated in NE/non-NE transitions of SCLC [10,16,54], connections between SCLC subtypes to the EMT spectrum have not been studied in depth. It is also unclear whether SCLC involves multiple (unique) partial EMT states. Through analysis of transcriptome datasets from multiple sources including a mouse tumor model, human cell lines, and human tumor samples, our work reveals a strong correspondence between the EMT spectrum, containing an epithelial state and several divergent mesenchymal states, and recently defined SCLC subtypes. Our study indicates the partial EMT status of the NE subtypes A and N, whereas the NE subtype A2 is fully epithelial-like. In addition, we show divergence of mesenchy-  [9] were projected onto the E and M score axes using nnPCA. Colors indicate SCLC subtype scores (see Section 2). (B-D) Linear regression lines and confidence intervals were obtained for all cells and ASCL1 + cells from data in D. (E) Datasets of three tumors (top labels) were used to compute the Spearman correlation coefficients between SCLC subtype scores (left labels) and E scores from nnPCA or CDH1 expression levels. All cells and ASCL1 + cells were analyzed separately. (F) Correlations between A2 scores and E scores (nnPCA) in untreated and cisplatin-treated cells of SC53 tumor. (G) Datasets of SC53 and SC68 tumors were used to compute the Spearman correlation coefficients between SCLC subtype scores (left labels) and E scores from nnPCA or CDH1 expression levels. All cells and ASCL1 + cells were analyzed separately. Untreated and cisplatin-treated cells were analyzed separately.
We next extended our analysis to SC68, another tumor that underwent an EMT-like transition upon cisplatin treatment [9]. Similar to SC53, there was a strong positive correlation between A2 and E scores both in all cells in the tumor and in ASCL1 + cells, whereas the correlations between other subtypes' scores and E scores were either significantly weaker or negative ( Figure 4E). We also found that the strong A2-E association was partly explained by a positive correlation between A2 scores and the expression of CDH1 ( Figure 4E). Again, the A2-E association was not due to shared genes between the two gene sets that we used for scoring ( Figure S4). Together, our results show that the intratumor association between A2 and E transcriptional programs is a consensus pattern across RU1108, SC53, and SC68 tumors.
We next asked whether the A2-E correlation in SC53 was driven by cisplatin treatment or intrinsic cell-to-cell heterogeneity. We found that cisplatin-treated ASCL1 + cells showed both greater variance in the E-M space and stronger correlation between A2 and E scores compared to untreated cells ( Figure 4F) (s r = 0.31 for untreated cells; s r = 0.42 for treated cells). Nonetheless, even in the untreated cells, a positive correlation between A2 and E scores was observed ( Figure 4G). Therefore, this A2-E association was found across all cells of both tumors before and after treatment ( Figure 4G). These results excluded the possibility that the observed A2-E association was due to Simpson's paradox in which the treatment condition may be a hidden variable. Instead, they suggest that both cisplatin treatment and intrinsic cell-to-cell heterogeneity contribute to the A2-E correlation and that treatment can induce a cell state transition towards an M-like state without turning off ASCL1 transcription.

Discussion
Understanding the molecular basis for tumor heterogeneity is crucial for developing next-generation therapeutic strategies [48][49][50][51]. Recent advances in defining subtypes of SCLC tumors provide such potential [9], but cellular and molecular mechanisms contributing to phenotypic heterogeneity are still unclear. Interestingly, SCLC tumors have both NE and epithelial characteristics [52]. We therefore asked whether SCLC cells hijack EMT, canonically defined as a developmental program [53], to increase invasiveness and drug resistance, which has been shown to occur in other cancer types [22,23].
While it was shown that some genes promoting EMT are activated in NE/non-NE transitions of SCLC [10,16,54], connections between SCLC subtypes to the EMT spectrum have not been studied in depth. It is also unclear whether SCLC involves multiple (unique) partial EMT states. Through analysis of transcriptome datasets from multiple sources including a mouse tumor model, human cell lines, and human tumor samples, our work reveals a strong correspondence between the EMT spectrum, containing an epithelial state and several divergent mesenchymal states, and recently defined SCLC subtypes. Our study indicates the partial EMT status of the NE subtypes A and N, whereas the NE subtype A2 is fully epithelial-like. In addition, we show divergence of mesenchymal gene expression in N (an NE) and Y (a non-NE) subtypes.
Our analysis shows diverse expressions of mesenchymal factors across SCLC subtypes. This is consistent with recent observations showing EMT is a context-specific dynamic process [26]. Interestingly, the EMT transcription factor ZEB1 and EMT effector gene VIM have an anti-correlated expression pattern between N and Y subtypes. Although the contribution of ZEB1 to EMT was demonstrated in vivo and in vitro [25,[55][56][57], its high expression may not be required for some partial EMT states. Our previous work showed that expressions of some mesenchymal genes do not require ZEB1 activation and that the high expression of a group of EMT genes positively controlled by ZEB1, but not TGF-β, is correlated with better prognosis of breast cancer patients [25]. It is possible that ZEB1 is transiently required for maintaining a group of NE cells during SCLC progression, and the transitions to more drug-resistant subtypes may require the down-regulation of ZEB1.
This work builds on our prior studies that suggested the partial EMT status of N, based on mixed morphological features and expression of ZEB1, SNAI1, and TWIST1 but not VIM [16]. Here, we expand this work by analyzing EMT-related gene signatures in this subtype across multiple datasets. While SCLC-N is canonically defined by expression of NEUROD1, characterizing this subtype as a partial-EMT state may lead to new insights regarding its functional role in SCLC tumor progression and metastasis. Further work is needed to understand how enrichment of a mesenchymal signature (M1) in this subtype is related to this partial-EMT characterization.
The heterogeneity of NE and non-NE SCLC subtypes has been observed previously. However, the factors driving transitions between NE and non-NE subtypes are not well characterized. It has long been known that EMT is reversible in both embryonic and postnatal development, and this reversible process can also be triggered in cancer cell lines using either dynamic extracellular signals (e.g., TGF-β) or forced expression of intracellular factors (e.g., TFs or microRNAs) [25,29,58,59]. In addition, oscillations arising from gene regulatory networks of EMT can facilitate reversible cell state transitions [60]. Our findings suggest that the machinery for reversible EMT may also be responsible for the transitions between NE and non-NE cells during the progression of SCLC. Future work is warranted to determine the temporal sequence of activation among EMT TFs, signaling molecules, and subtype-defining factors for SCLC, such that the intrinsic and extrinsic factors contributing to the NE/non-NE transition can be dissected. Overall, by revealing the relationships between NE/non-NE subtypes and EMT progressions, this study will help guide future work to improve our understanding of SCLC tumor heterogeneity and cell state transitions that may be driven by EMT programs.

Conclusions
Our analysis of multiple transcriptome datasets, including single-cell data from human and mouse SCLC tumors, consistently showed that an extreme epithelial state corresponds to the SCLC-A2 subtype, whereas two NE subtypes, SCLC-A and SCLC-N, have significant expression of a subset of mesenchymal genes, with surprisingly limited overlap with the mesenchymal genes highly expressed in non-NE cells. Our work reveals a previously unknown connection between an EMT program and a specific SCLC subtype and an underappreciated divergence of EMT trajectories contributing to the SCLC phenotypic heterogeneity. These results shed light into molecular systems' mechanisms underlying plasticity and progression of SCLC.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers15051477/s1, Figure S1. Distributions of SCLC cells across the epithelial and mesenchymal spectrum using non-overlapping gene sets. Figure S2. Subtype scores for human tumor cells. Figure S3. Treatments of human tumor cells visualized in EMT space. Figure S4. Correlations between A2 and epithelial transcriptional programs in individual human tumor cells using non-overlapping gene sets. Table S1. List of Gene Ontology terms of M1 genes. Table S2. List of Gene Ontology terms of M2 genes. Data Availability Statement: Lead contact: Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Tian Hong. Materials availability: This study did not generate new unique reagents. Data and code availability: The code generated during this study is available at GitHub (https://github.com/smgroves/EMT_SCLC_ project (accessed on 10 February 2023)). No new data was generated for this study.