Functional States in Tumor-Initiating Cell Differentiation in Human Colorectal Cancer

Simple Summary Different types of cells with tumor-initiating cell (TIC) activity contribute to colorectal cancer (CRC) progression and resistance to anti-cancer treatment. In this study, we aimed to understand whether different cell types exist within a patient-derived tumor culture, distinguishable by different patterns of their gene expression. By mRNA sequencing of patient-derived CRC cultures at the single-cell level, we defined expression programs that closely resemble differentiated cell populations of the normal intestine. Here, cell type-associated subpopulations showed differences in functional properties such as cell growth and energy metabolism. Subsequent functional analyses in vitro and in vivo demonstrated that metabolic states are linked to TIC activity in primary CRC cultures. We also show that TIC activity is dependent on oxidative phosphorylation, which may therefore represent a target for novel therapies. Abstract Intra-tumor heterogeneity of tumor-initiating cell (TIC) activity drives colorectal cancer (CRC) progression and therapy resistance. Here, we used single-cell RNA-sequencing of patient-derived CRC models to decipher distinct cell subpopulations based on their transcriptional profiles. Cell type-specific expression modules of stem-like, transit amplifying-like, and differentiated CRC cells resemble differentiation states of normal intestinal epithelial cells. Strikingly, identified subpopulations differ in proliferative activity and metabolic state. In summary, we here show at single-cell resolution that transcriptional heterogeneity identifies functional states during TIC differentiation. Furthermore, identified expression signatures are linked to patient prognosis. Targeting transcriptional states associated to cancer cell differentiation might unravel novel vulnerabilities in human CRC.


Introduction
In many tumor entities, tumor formation and progression are driven by a cellular subfraction with tumor-initiating cell (TIC) activity [1][2][3]. In colorectal cancer (CRC), the TIC compartment is organized as a functional cellular hierarchy with extensively self-renewing long-term TICs driving serial tumor propagation in vivo. Long-term TICs generate highly proliferative, short-lived tumor transient-amplifying cells with limited or no self-renewal capacity giving rise to the bulk of post-mitotic tumor cells [4]. Remarkably, this functional heterogeneity within individual CRCs is not primarily driven by genetic events, suggesting that epigenetic or extrinsic factors contribute to functional cellular heterogeneity [5].
Lineage-tracing experiments demonstrate that in CRC the population of highly selfrenewing TICs expresses LGR5 and generates progeny differentiating towards mucosecretingand absorptive-like phenotypes [6]. Thus, CRCs harbor a subfraction of stem-like TICs and maintain a hierarchical organization reminiscent of the normal intestinal epithelium [7]. Moreover, a gene signature specific for intestinal stem cells has been suggested to predict disease relapse [8], indicating a potential clinical relevance of stem-like TICs for CRC patients. However, prospective validation in an independent cohort is still not available.
Recent evidence suggests that the epigenome of an individual CRC is already formed by the cell-of-origin. Methylation analyses demonstrate maintenance of the cell-of-origin differentiation state during tumor progression, and identified three CRC subclasses of intestinal crypt differentiation of the cell-of-origin. Importantly, patients with a stem-like methylation signature showed significantly reduced overall survival [9].
While the hierarchical organization of normal and malignant stem cell systems has previously been thought to be fixed and unidirectional, evidence for plasticity in these systems is accumulating [10][11][12]. Lineage-tracing experiments in CRC highlight that more differentiated cells can repopulate a free stem-like niche and acquire TIC activity upon ablation of the active stem-like population [6,13,14]. Similarly, pronounced plasticity drives pancreatic cancer by clonal succession of transient TIC activity [15].
Current understanding of TIC heterogeneity in CRC is mainly derived from serial syngeneic or xenogeneic transplantation models, where TICs have been retrospectively identified by interpreting the kinetics of genetically marked or pre-enriched bulk cells [4,8,[16][17][18]. While this allowed deep insights into functional heterogeneity within tumors, such retrospective experimental strategies from bulk samples hamper direct assignment of transcriptional states in individual cells. To characterize molecular underpinnings of functional CRC intra-tumor heterogeneity at the single-cell level, we here asked whether distinct functional programs within individual cells from patient-derived CRC models can be assigned to specific cellular subpopulations.

Transcriptional Heterogeneity of Patient-Derived CRC Spheroid Cultures
To assess whether heterogeneous transcriptional programs can be detected in CRC tumor spheroids at the single-cell level, we performed single-cell RNA-sequencing (scRNAseq) using a nanowell platform [19]. As patient-derived spheroid cultures contain purely tumor cells, thereby allowing to study tumor cell heterogeneity in high resolution, and recapitulate the histology of the original tumor after xenotransplantation into immunodeficient NOD.Cg-Prkdc scid Il2rg tm1Wjl /SzJ (NSG) mice [4], we sequenced 12 three-dimensional tumor spheroid cultures (P1-P12) derived from primary tumors (n = 6 patients) or metastases (n = 6 patients) of 12 different CRC patients. These patient tumors and derived spheroids cover known subtypes (microsatellite stable or microsatellite instable tumors) and driver mutations (loss of APC and/or TP53, activating mutations in KRAS; Table 1). On average, 389 cells (range: 141-736) were sequenced per patient, resulting in 4663 single-cell profiles with an average of more than 4000 detected genes per cell ( Table 2).  f  primary  Transverse colon  IV  MSS  ---P6  f  primary  Caecum  IV  MSS  X  --P7  m  liver met  Sigmoid  IV  MSS  -X  X  P8  m  liver met  Caecum  IV  MSS  X  --P9  m  primary  Rectum  IV  MSS  X  -X  P10  m  primary  Sigmoid  IIIB  Unsupervised clustering of single-cell profiles [20] revealed grouping of cells according to the patient-of-origin ( Figure 1a). Hierarchical clustering based on the top 10 differentially expressed genes per patient showed that cells primarily cluster, with one exception, by the tumor site they originate from, but not by microsatellite status (Figure 1a,b).
among cell type and cell state signatures in the spheroid scRNA-seq data ( Figure 1d) were detectable in patient whole transcriptome data. Clustering of the TCGA cohort based on signature expression resulted in six clusters of patients (cl1-cl6) with different combinations of low or high expression of individual signatures. Significantly different progression-free survival (p = 0.043) and numerically decreased overall survival (p = 0.059) were observed between groups of clusters, indicating a relevance of signature expression for patient prognosis (Figure 2a-c, Table S2).

Distinct Cell Types and Cell States in Individual CRC Spheroids
To identify heterogeneous gene expression programs shared across patients in single cells from individual tumor spheroid cultures, we corrected for inter-patient variability by calculating relative expression levels for each patient individually [21,22]. Principal com-ponent analysis (PCA) of the combined dataset revealed an anti-correlated transcriptional pattern independent of patient origin with genes either involved in cell growth, proliferation, and oxidative phosphorylation (OXPHOS), or hypoxia and glycolysis. Notably, the hypoxia/glycolysis signature contains intestinal differentiation markers (e.g., TFF3, FABP1, KRT19; Figure 1c), indicating an association of distinct metabolic states with tumor cell differentiation and proliferation, as recently described for the normal intestinal crypt [23].
As the activation of continuous gene expression programs may not be captured by discrete clustering, we adapted a previously described computational approach based on non-negative matrix factorization (NNMF) [24,25] to more precisely identify transcriptional programs heterogeneously expressed across patients (Figure 1d, Figure S1a-e). In order to focus on tumors that display preserved hierarchical organization, we focused on the eight cultures with detectable LGR5 transcript levels (LGR5 score = total LGR5 transcript counts/cell number > 1; Table 2), as LGR5 represents an established marker for intestinal stem cells and CRC TICs, whereas the phenotype and the role of potential LGR5-negative stem cells and TICs are much less defined [8,14,26,27]. Thus, four cultures with very low or non-detectable LGR5 transcript abundance were excluded for this analysis. We identified 13 heterogeneous gene expression programs that could be classified into two partially overlapping categories: one (A) linked to 'cell types' or lineages analogous to the normal intestinal epithelium, and the other (B) associated with 'cell states' (Figure S1f).
Next, each individual cell was scored for inferred expression programs using the averaged expression of the top genes per factor identified by NNMF. To reduce redundancy, signatures showing similar enrichment and clustering patterns were combined, resulting in eight meta-signatures (Figure S1a-f, Table S1). Clustering of meta-signature scores allowed identification of discrete and overlapping transcriptional programs ( Figure 1d). Similar to PCA, cell cycle, OXPHOS, and TA signatures showed a pronounced overlap, indicating a highly proliferative cell fraction-potentially corresponding to the TA-like compartment-driven by MYC and characterized by high OXPHOS. In contrast, stemlike, Paneth-like, and Tdiff-like cells did not show significant overlap with the cell cycle signature (Figure 1d), suggesting reduced or absent proliferative activity. This indicates that scRNA-seq and matrix factorization analysis are capable of distinguishing functionally distinct cell populations based on transcriptional profiles.
To analyze the cell type composition in all eight LGR5 + cultures individually, we used the NNMF-inferred signature scores (stem, TA, Paneth, Tdiff) to assign cells to one of the four cell types which allowed us to assess the extent of active cell type-specific transcriptional programs. Despite different cell type compositions, we observed presence of stem-like, TA-like, and Tdiff-like cells in all, and rare, but detectable Paneth-like cells in six out of the eight LGR5 + cultures ( Figure S1g). This indicates that individual CRC tumors display similar cellular diversity resembling normal intestinal cell types even with different clinico-pathological features (Table 1).
We next assessed whether the signatures identified in our patient-derived in vitro models can also be identified in patient tumors. We therefore applied our signatures (Table S1) on publicly available expression data of colon adenocarcinoma (COAD) patients (The Cancer Genome Atlas (TCGA) cohort; n = 328 patients) [28]. Correlations among cell type and cell state signatures in the spheroid scRNA-seq data ( Figure 1d) were detectable in patient whole transcriptome data. Clustering of the TCGA cohort based on signature expression resulted in six clusters of patients (cl1-cl6) with different combinations of low or high expression of individual signatures. Significantly different progression-free survival (p = 0.043) and numerically decreased overall survival (p = 0.059) were observed between groups of clusters, indicating a relevance of signature expression for patient prognosis (Figure 2a-c, Table S2).
In line with previously published data reporting an intestinal stem cell-specific gene signature linked to LGR5 and EPHB2 expression related to CRC relapse [8], high expression of our stem signature defined by 200 genes (Table S1) in the TCGA cohort displayed decreased progression-free survival (p = 0.068) compared to patients with low expression (Figure 2e).
Taken together, our six clusters exhibit a better prognostic value for progressionfree survival (p = 0.043) than previously reported subtypes linked to cancer-associated fibroblasts [31] (p = 0.15), CMSs [29,30] (p = 0.18), or our stem signature alone (p = 0.068). Indeed, when our clusters were added to multivariable clinico-molecular survival models, we still observed a significant discriminative contribution by our cluster combinations in predicting recurrence, but no significant contribution was appreciated when adding CMSs or cancer-associated fibroblasts to our model. On the other hand, incorporating stroma cells like cancer-associated fibroblasts can substantially improve the overall survival prediction (Table S2). These results underscore the relevance of combinations of cell type and cell state signature expression for COAD outcome, and demonstrate a prognostic value of cell type and cell state signatures inferred from spheroid single-cell transcriptomes.

Cell Cycle and Proliferative Activity of Human CRC Cells
scRNA-seq suggests the existence of cell types with different proliferative activity within individual spheroid cultures and stem-like, TA-like, and Tdiff-like subpopulations. We therefore asked whether subfractions of cells with different cell cycle and proliferative activity exist within CRC tumors in vivo and whether they are functionally relevant.
To assess proliferative heterogeneity of tumor cells, we utilized a genetic labelretaining strategy based on expression of tetracycline-regulated (Tet-off) histone 2B (H2B) green fluorescent protein (GFP) [32] ( Figure S2a). Upon doxycycline addition, nuclear H2B-GFP expression is suppressed and subsequently diluted with each cell division, allowing identification of subpopulations according to proliferative history.
To evaluate whether proliferatively inactive cells within established tumors possess TIC capacity, we transduced tumor spheroid cultures derived from seven different patients with an H2B-GFP-encoding lentiviral vector prior to xenotransplantation into NSG mice (n = 14; 1-4 mice per culture). After successful tumor formation, H2B-GFP expression was suppressed by doxycycline administration for two weeks. Analysis of H2B-GFP expression in established tumors by flow cytometry revealed presence of fast (GFP -), slow (GFP low ), and rare dividing (GFP high ) cell fractions, demonstrating proliferative heterogeneity of CRC cells in vivo. To assess whether heterogeneously proliferating cell fractions are associated with TIC activity, cells from 12 out of 14 primary xenografts were sorted into fast, slow, and rare dividing subfractions and serially transplanted into secondary mice (n = 33). Importantly, all subfractions contained cells with TIC activity irrespective of transplanted cells' proliferative history prior to re-transplantation (fast: 5/12; slow: 4/9; rare: 5/12 mice with tumors), showing that TIC activity is not strictly linked to proliferative active cell fractions but also present in proliferatively inactive populations within tumors ( Figure S2a-c). In summary, these data show that proliferatively inactive TICs exist within established tumors in vivo. We therefore conclude that within individual tumors, TIC activity can be present in cells with heterogeneous proliferative activity and is therefore not restricted to a specific proliferative state of individual cells.

Divergent Cell Type-Associated Energy Metabolic Preferences
Prominent heterogeneously expressed transcriptional programs in individual spheroid cells were related to energy metabolism. Whereas a glycolysis/hypoxia signature could be assigned to Tdiff-like cells ( Figure S1f), OXPHOS strongly overlapped with MYC-target and cell cycle signatures, both identifying cells belonging to the putative TA-like cell compartment (Figure 1c,d). Thus, we hypothesized that metabolic preferences distinguish functionally distinct cell subpopulations and focused on these for further validation.
Consistently, we observed clearly overlapping TA-like, OXPHOS, and cell cycle signatures ( Figure 1d), but no obvious association between stem-like and OXPHOS or cell cycle signatures. Of note, in the normal intestinal epithelium, intestinal stem cells actively cycle and constantly produce progeny, but their relative abundance compared to non-cycling Tdiff cells is very low [33]. Thus, we reasoned that differential metabolic trends in stem-like and Paneth-like cells could be masked by much higher or lower expression of individual metabolic signatures in highly cycling cells or the rare dividing Tdiff-like subpopulation. To overcome this, we performed pairwise comparisons of cell state signatures across CRC subpopulations that resemble normal intestinal cell types as identified by differential NNMF signature expression.
Cell cycle scores were strongly increased in TA-like cells compared to stem-like, Paneth-like, and Tdiff-like cells (p < 0.000001; respectively). The greatest differences in metabolic states existed between Tdiff-like and TA-like subpopulations, demonstrating that the majority of TA-like cells had high OXPHOS scores, whereas Tdiff-like cells showed high scores for hypoxia and glycolysis, but low scores for OXPHOS. Albeit less pronounced, similar and highly significant differences were detectable for stem-like and Paneth-like cells. In comparison to Paneth-like cells, stem-like cells showed increased OXPHOS scores and decreased glycolysis/hypoxia scores (p < 0.000001, respectively; Figure 3a).
In addition to the overall high OXPHOS scores, the stem-like signature was associated with enhanced expression of OXR1 and PON2. Being essential for protection against oxidative stress, these genes may counteract higher reactive oxygen species (ROS) levels resulting from enhanced OXPHOS rates [34,35]. Another gene included in the stem-like signature is MAP2K6-an essential p38 signaling component [36] known to be associated with high OXPHOS levels in intestinal stem cells [23] ( Figure S1f, Table S1).
Collectively, these results demonstrate an association between tumor cell differentiation and metabolic identities in this three-dimensional in vitro CRC model. primary samples were merged and analysis focused on epithelial cells only (total: 253 cells). As observed in spheroids, clustering of cells from LGR5 + PDO, PDX, and primary tumor cells revealed subpopulations of stem-like, TA-like, or Tdiff-like cells. Additionally, a prominent fraction of Paneth-like (deep crypt secretory-like, REG4 + ) cells [39] was detected in the PDX ( Figure S3a-c).
To distinguish functionally distinct subpopulations, LGR5 levels were determined and sufficient levels detected in O1, O3, and X1. Since the absolute cell numbers after quality control in the primary tumors were low (T1: 362; T2: 538; T3: 724 cells), the three primary samples were merged and analysis focused on epithelial cells only (total: 253 cells). As observed in spheroids, clustering of cells from LGR5 + PDO, PDX, and primary tumor cells revealed subpopulations of stem-like, TA-like, or Tdiff-like cells. Additionally, a prominent fraction of Paneth-like (deep crypt secretory-like, REG4 + ) cells [39] was detected in the PDX ( Figure S3a-c).
Importantly, applying the signatures identified by NNMF of spheroid scRNA-seq data (Table S1) revealed similar trends for heterogeneous metabolic states associated with distinct cell types, that is, OXPHOS in stem-like and TA-like, and glycolysis/hypoxia in Tdiff-like cells (Figure 3b-d). This shows that transcriptional states and cellular composition identified in spheroids are representative of further patient-derived CRC models as well as patient tumors.

Spatial Distribution of OXPHOS and Distinct Cell Types in CRC Spheroids
To analyze spatial organization of metabolic states, we stained spheroids with mitochondrial live-dyes for visualization of mitochondrial membrane potential (MMP) and OXPHOS activity. Histological examination of spheroids (n = 3 cultures) revealed crypt-like structures formed by partially polarized cells around lumina, morphologically showing some degree of differentiation. Cells within individual spheroids demonstrated highly heterogeneous MMP, with MMP high cells consistently localized at outer 'budding' regions of spheroids and around crypt-like structures ( Figure S4a).
Multiplexed RNA fluorescence in situ hybridization (FISH) for intestinal cell type markers LGR5 (stem-like), DEFA5 (Paneth-like), and FABP1 (Tdiff-like) resulted in discrete staining of individual cells by either a single or none of the markers, indicating existence of distinct intestinal cell types in all three patient cultures. Cellular subtypes also showed tendencies for spatial localization. DEFA5 + cells were primarily detectable in inner regions of spheroids.
LGR5 + cells preferably localized towards outer regions. Frequently, DEFA5 + cells were identified in proximity to LGR5 + cells (Figure 4a,b). In the intestinal crypt, LGR5 + cells reside at the crypt base [27,40], and-in line with our observations-imaging of intestinal organoids has shown localization of LGR5 + cells close to Paneth cells [41].

Heterogeneous Energy Metabolism in Patient-Derived Models
To assess whether cellular subfractions with distinct OXPHOS activities can be identified in viable cells, we used an MMP dye for flow cytometry allowing distinction of cells with different mitochondrial activity based on fluorescence intensity. Indeed, heterogeneous fluorescence intensities allowed separation of populations with different MMP (MMP low , MMP high ; Figure 5a). To understand whether heterogeneous metabolic activity is relevant in patient CRC tumors, we determined OXPHOS activity of two freshly purified patient tumors and a patient tumor expanded as PDX in vivo by flow cytometry-based MMP analysis. In all samples, two cell populations with distinct MMP were identified (Figure 5a), indicating that heterogeneous mitochondrial activity also exists in PDXs and patient tumors. To correlate MMP with specific cell types, we combined mitochondrial staining and multiplexed RNA-FISH, showing DEFA5 + and FABP1 + cells to be largely excluded from MMP high regions, whereas LGR5 + cells were primarily located in MMP high regions. Matching our scRNA-seq results, quantitative image analysis in thousands of single cells revealed that the fraction of LGR5 + cells located in MMP high regions is indeed much higher compared to DEFA5 + and FABP1 + cells in all examined cultures (n = 3; Figure 4c,d, Figure S4b,c).
Hence, in situ RNA fluorescence microscopy further confirmed cell type-specific metabolic preferences of putative stem-like, Paneth-like, and Tdiff-like cell subtypes in CRC. In addition, metabolic activities of cellular subtypes are associated with specific spatial localization within spheroids.

Heterogeneous Energy Metabolism in Patient Tumors
Identification of cell type-specific metabolic preferences in patient-derived CRC cultures raised the question whether heterogeneously expressed metabolic signatures can be identified directly in CRC patient tumors. To address this, we analyzed primary tumors (n = 25 patients) and liver metastases (n = 25 patients) by immunohistochemistry for expression of LDH-A and CA9 ( Figure S5a,b) as marker genes of hypoxia/glycolysis and Tdiff signatures ( Figure S1f, Table S1).
Within the majority of examined specimens, immunohistochemical analysis revealed that only subfractions of all cells express LDH-A and CA9, indicating existence of metabolic heterogeneity within individual patient tumors. Despite high expression of the proliferation marker MKI67, previously reported to preferentially mark TA-like cells [42], regions of CA9 expression were largely overlapping with MKI67 − areas in most patient tumors, suggesting that tumor cells with expression of the hypoxia/glycolysis signature were indeed less proliferative, and actively cycling TA-like cells might prefer OXPHOS to generate energy ( Figure S5c).

Heterogeneous Energy Metabolism in Patient-Derived Models
To assess whether cellular subfractions with distinct OXPHOS activities can be identified in viable cells, we used an MMP dye for flow cytometry allowing distinction of cells with different mitochondrial activity based on fluorescence intensity. Indeed, heterogeneous fluorescence intensities allowed separation of populations with different MMP (MMP low , MMP high ; Figure 5a).
To understand whether heterogeneous metabolic activity is relevant in patient CRC tumors, we determined OXPHOS activity of two freshly purified patient tumors and a patient tumor expanded as PDX in vivo by flow cytometry-based MMP analysis. In all samples, two cell populations with distinct MMP were identified (Figure 5a), indicating that heterogeneous mitochondrial activity also exists in PDXs and patient tumors.
This finding was further supported by proteomic analysis of MMP low and MMP high populations of LGR5 + (i.e., LGR5 score > 1) spheroid cultures (P1, P4, P7, P11) which revealed differentially abundant proteins between the two populations. Interestingly, three proteins contributing to the stem-like signature (PROX1, GRN, DEFA6; Table S1) were significantly higher abundant in MMP high compared to MMP low ( Figure S5d).

Increased Spheroid and Tumor Formation Capacity in OXPHOS High Cells
scRNA-seq data suggested that subfractions of MMP low and MMP high spheroid cells preferentially harbor Tdiff-like and Paneth-like (MMP low ) or stem-like and TA-like tumor cells (MMP high ). As spheroid and tumor forming capacity is supposed to be restricted to stem-like tumor cells [43], we calculated spheroid-forming cell (SFC) frequencies in vitro and TIC frequencies in vivo by limiting dilutions of sorted MMP low and MMP high cell fractions.
SFC frequencies were strongly increased in MMP high cell fractions compared to MMP low fractions or bulk spheroid cells in four out of five cultures (Figure 5b). Spheroid cells (P1) sorted according to JC-1 aggregation-a different MMP indicator-also demonstrated increased SFC frequency in the MMP high subpopulation (MMP high : 1/26; MMP low : 1/46; Figure S5e).
As increased mitochondrial OXPHOS is linked to enhanced ROS levels [23], we further assessed the association of SFC frequency and OXPHOS by staining spheroid cells (P4) with a live-dye fluorescent upon ROS oxidation. In vitro limiting dilutions revealed substantial enrichment of SFCs in the sorted ROS high compared to the ROS low subpopulation (ROS high : 1/9; ROS low : 1/117; Figure S5f).  We then asked whether MMP high cells exhibit a growth advantage in competition with MMP low cells. Spheroid cultures (P1, P4, P5) were transduced with a lentiviral vector encoding for enhanced green fluorescent protein (EGFP) under the control of the human phosphoglycerate kinase (PGK) promoter in order to allow follow up of sorted populations by assigning presence or absence of EGFP expression to the metabolic state at the time of sort. To achieve this,~40-50% EGFP + cultures were stained for MMP (t 0 ) and cells were sorted as co-cultures of MMP high EGFP + and MMP low EGFPspheroid cells (1:1 ratio; t 1 ). After three weeks (t 2 ), despite similar relative contributions of MMP low and MMP high fractions, co-cultures were nearly completely EGFP + , indicating a growth advantage of MMP high compared to MMP low cells (Figure 5c-e).
To quantify TIC frequency in MMP high and MMP low subpopulations in vivo, spheroid cultures (n = 2) were sorted according to MMP. Descending cell numbers of each population were subcutaneously injected into NSG mice.

Discussion
We here analyzed functional CRC intra-tumor heterogeneity at single-cell level and demonstrate that distinct functional programs within individual CRC cells can be assigned to specific cellular subpopulations.
In healthy tissues including normal intestine, functional cellular heterogeneity is established by differentiation processes of stem and progenitor cell populations, which control the tissues' functionality in a demand-dependent manner [27]. Similarly, in CRC and other solid tumors as well as in hematological malignancies, functional heterogeneity of tumor and non-tumor cells in the surrounding microenvironment exists and acts as driver of tumor progression [31,45]. Although the majority of tumor cells in CRC cycles actively, we identified proliferatively inactive cells in patient-derived cultures and within established xenograft tumors in vivo-in line with recent data on the healthy intestine [46]. Nevertheless, these cells eventually can re-enter the cell cycle and exhibit TIC activity, suggesting that cells escape from a quiescent state, possibly driven by cellular plasticity as described before for CRC [47]. Accordingly, slow or non-cycling cells were suggested to exhibit increased chemoresistance and drive relapse following initial successful therapy [17,48].
This has striking parallels to the normal intestine, where ablation of stem cells under pathological conditions (e.g., irradiation) can be compensated by a reserve pool of stem cells that are rare during homeostasis but can regenerate all different cell populations including stem, progenitor, and differentiated cell types upon activation, thereby maintaining a functional intestine after tissue injury [49].
The complex composition of different subpopulations within normal and malignant intestinal epithelium and their dynamic interactions are poorly understood. Their characterization has been hampered by the dependency of experimental approaches on purifying cell populations, which cannot fully distinguish between or comprehensively capture distinct cell types and intermediates and might fail to detect rare and poorly characterized cell populations. Recent studies shed light on this complexity by utilizing single-cell approaches to detect and characterize rare cell types in the normal intestine and CRC [42,46,[49][50][51][52][53]. We here demonstrate that scRNA-seq further allows the identification of cell type-specific expression modules in CRC and enables identification of functional states during TIC differentiation based on transcriptional heterogeneity.
In line with observations in other entities, transcriptional programs across multiple CRC patients were dominated by inter-patient heterogeneity, most likely due to individual genetic and epigenetic alterations [21,25,54]. Interestingly, most patient-derived spheroid cells clustered according to primary tumor or metastasis site, suggesting either a stable effect of tumor environment on transcriptional programs or selection of tumor cells with specific expression profiles.
Gene sets most heterogeneously expressed within individual spheroids and PDOs included genes specifically expressed in distinct cell types of the normal intestinal epithelium (e.g., a gene set including LGR5 for stem-like, a gene set including KRT20 for Tdiff-like cells) [55]. This further supports the notion that, in CRC, there exist functionally distinct cell types that phenotypically reflect those of the normal intestinal epithelium [6,8]. Still, in contrast to the normal intestinal epithelium where distinct cellular subpopulations can be discriminated in high resolution by scRNA-seq technologies [55], gene expression within identified subfractions of CRC was less distinct. However, individual subfractions shared transcriptional traits, potentially reflecting continuous cell type transitions after malignant transformation comparable to reports on hematopoietic stem cell differentiation [56]. In glioblastoma, bulk RNA-sequencing of individual tumors was used to analyze transcriptional heterogeneity and identified different tumor subtypes, while scRNA-seq revealed different proportions of cell types within individual tumors underlying transcriptional heterogeneity rather than distinct homogeneous tumor subtypes [54]. This is in line with our data showing cellular diversity of cell types and cell states within individual patient tumors. Of note, four out of 12 spheroid cultures did not meet inclusion criteria for NNMF analysis due to low LGR5 scores. Accordingly, previous findings show that, while LGR5 + tumor cells can be detected in tumors from all CRC subtypes independent of their cellular composition [53], up to a third of individual CRCs tumors may lack detectable LGR5 levels [14]. Furthermore, LGR5 plasticity has recently been shown to drive CRC metastasis [57]. In the presented study, we only focused on patient-derived cultures with high expression of LGR5. Future analyses of the hierarchical organization of LGR5cultures and existing cellular subpopulations in comparison to the cellular subpopulations and cellular states described in this study could further widen the understanding of cellular heterogeneity in CRC.
Our approach to decipher transcriptional programs heterogeneously expressed in functionally distinct CRC cell subfractions identified heterogeneous gene expression programs related to cell cycle, immune response, and metabolic states like OXPHOS and glycolysis. Given the considerable functional and proliferative differences between distinct cell populations, cell-to-cell variability in energy turnover and demand appears likely. A recent study has linked decreased biosynthetic capacities to differentiation [58]. As OXPHOS can be more efficient in energy production [59], highly proliferative TA-like tumor cells might prefer OXPHOS over glycolysis to generate energy. Even though such cell type-specific metabolic identities are known from the normal intestinal epithelium [23], distinct metabolic preferences within normal and malignant stem cell systems are not uniform across different tissue types and tumor entities, and are not necessarily correlated with proliferation activity in general. For example, TICs in hepatocellular carcinoma [60], breast cancer [61], osteosarcoma [62], and nasopharyngeal carcinoma [63] rely on glycolysis for tumor formation, while TICs in pancreatic ductal adenocarcinoma [64], glioma [65], and acute myeloid leukemia [66] prefer OXPHOS. Importantly, tumor cells can also alternate between glycolysis and OXPHOS, thereby adapting to metabolic challenges [67].
Here, we were able to assign the metabolic demand of OXPHOS to functionally relevant stem-like and TA-like cells and observed substantial enrichment of self-renewing and proliferating SFCs and TICs in OXPHOS high cell subfractions. As a consequence, inhibition of OXPHOS impaired spheroid formation in vitro identifying OXPHOS as a novel druggable target in CRC. Since high OXPHOS levels were detected in stem-like and TA-like cell compartments, targeting OXPHOS as treatment strategy might eliminate the most self-renewing and proliferating cell types simultaneously.
Interestingly, stem-like tumor cells demonstrated overexpression of OXR1 and PON2, both involved in protection against ROS accumulating as co-product of OXPHOS [34]. Further studies are needed to address whether expression of OXR1 and PON2 may be involved in a mechanism by which this long-lived and thus vulnerable population of stem-like tumor cells protects itself against ROS-mediated damage.
In our proteomic analysis, proteins significantly higher abundant in the MMP high subpopulation included PROX1, one of the top markers of the stem signature and usually expressed in the enteroendocrine lineage [51]. Interestingly, PROX1 has been reported to be positively correlated with LGR5 expression in CRC [43] and linked to stem cell maintenance and metastasis [68,69]. Another protein significantly more abundant in MMP high was DEFA6, a protein expressed in normal Paneth and Paneth-like tumor cells [70]. Its moderate expression in the stem-like cell population might reflect a continuous rather than a stepwise process underlying transition from stem-like to Paneth-like cell subsets (and potentially vice versa) in CRC. While Paneth cells constitute the niche for LGR5 + cells in the small intestinal epithelium, this function is performed by REG4-expressing deep crypt secretory cells in the colon [39,71,72]. REG4 was also included in the NNMF Paneth-like signature, suggesting that both cell types might contribute to this signature.
Of note, the expression signatures identified by scRNA-seq of patient-derived CRC spheroids have shown a prognostic relevance for CRC patients comparable to previously reported subtypes linked to cancer-associated fibroblasts [31] or CMSs [29], indicating that cell types and cell states might indeed be biologically distinct and of potential clinical relevance for CRC patients.
In summary, we here show that distinct functional cell states during TIC differentiation can be identified by single-cell transcriptomes. Targeting differentiation of cancer cells and associated transcriptional states might represent a novel therapeutic strategy for human CRC.

Primary CRC Spheroids and Organoids
Human CRC samples (male and female patients) were obtained from Heidelberg University Hospital in accordance with the Declaration of Helsinki. Informed consent on tissue collection was received from each patient, as approved by the University Ethics Review Board on 19 May 2009 (323/2004) and 7 June 2013 (S-649/2012). Tumor sample processing and purification procedures were described previously [4,73,74].
Spheroid and organoid cultures were authenticated using Multiplex Cell Authentication by Multiplexion (Heidelberg, Germany) as described [80]. The SNP profiles matched known profiles or were unique. The purity of spheroid and organoid cultures was validated using the multiplex cell contamination test by Multiplexion (Heidelberg, Germany) as described recently [81]. No mycoplasma, SMRV or interspecies contamination was detected. To assure pure epithelial cell content and exclude contaminations with murine or hematopoietic cells, established cultures were tested for EPCAM, H2kd, and CD45 expression by flow cytometry.

Laboratory Animals
Male and female immunodeficient NSG mice purchased from The Jackson Laboratory (Bar Harbor, ME, USA) were further expanded in the Centralized Laboratory Animal Facilities of the DKFZ, Heidelberg. Animals were group-housed in standard individually ventilated cages with wood chip embedding (LTE E-001, ABEDD, Vienna, Austria), nesting material, autoclaved tap water and ad libitum diet (autoclaved mouse/rat housing diet 3437, Provimi Kliba, Kaiseraugst, Switzerland). Room temperature and relative humidity were adjusted to 22.0 ± 2.0 • C and 55.0 ± 10.0%, respectively, in accordance with Appendix A of the European Convention for the Protection of Vertebrate Animals used for Experimental and Other Scientific Purposes from 19 March 1986. According to FELASA recommendations, all animals were housed under strict specific pathogen-free conditions. The light/dark cycle was adjusted to 14 h lights on and 10 h lights off with the beginning of the light and dark period set at 6 am and 8 pm, respectively. The age of transplanted mice was at least seven weeks. All animal experimentation performed in this study was conducted according to national guidelines and was reviewed and confirmed by an institutional review board/ethics committee headed by the responsible animal welfare officer. The Regional Authority of Karlsruhe, Germany finally approved the animal experiments as the responsible national authority (approval numbers G228/12 (29 January 2013), G49/14 (26 June 2014), G233/15 (17 November 2015)).

scRNA-seq of Spheroids
To generate single-cell suspensions, cells were trypsinized as described. Trypsinization was enhanced by applying shear forces with a pipette every 5 min. After stopping the reaction, cells were washed twice with PBS and filtered through a 15-20 µm cell strainer (PluriSelect, Leipzig, Germany). To count and test for cell viability using an automated cell counter (Countess, ThermoFisher, Waltham, MA, USA), single-cell suspensions were stained with Hoechst and propidium iodide (ReadyProbes Cell Viability Imaging Kit, ThermoFisher, Waltham, MA, USA) for 10 min at room temperature. Only samples with at least 85% viability were used for further processing. For isolation of single cells, reverse transcription, and cDNA amplification, the Rapid Development Kit (Wafergen, Fremont, CA, USA; compare: SMARTer iCELL8 3 DE Reagent Kit, TakaraBio, Kusatsu, Japan) for inchip reverse transcription-PCR amplification with the iCELL8 system (TakaraBio, Kusatsu, Japan) [19] was used. The cell suspension was diluted to 25 cells/µL. Cells were dispensed from a 384-well source plate into a nanowell chip (SmartChip v1/v2 kit, TakaraBio, Kusatsu, Japan; P7: v2; others: v1) containing uniquely barcoded oligo-dT primers for each well, resulting in up to 30% of wells containing single cells following Poisson distribution. Wells were imaged using an automated fluorescence microscope (BX43, Olympus, Shinjuku, Japan) and image processing was performed using CellSelect (TakaraBio, Kusatsu, Japan). Additional manual curation for multiplets and dead cells was performed.

scRNA-seq of Tumors, PDXs, PDOs
To generate single-cell suspensions, cells were trypsinized as described. After stopping the reaction, cells were washed with PBS and filtered through a 40 µm cell strainer. Cells were washed, resuspended in PBS supplemented with 0.05% bovine serum albumin, and filtered through a 20 µm cell strainer. Single-cell suspensions were loaded following the Chromium Single Cell 3 Library Kit v2 (10× Genomics, Pleasanton, CA, USA) protocol to generate cell and gel bead emulsions. Reverse transcription, cDNA amplification, and sequencing library generation were performed according to manufacturer's protocol. Each library was sequenced in one lane of the NextSeq500 (Illumina, San Diego, CA, USA; high-output mode, paired-end, 26 × 49 bp).

Preprocessing and Analysis of iCELL8 Data
scRNA-seq data were preprocessed using an automated in-house workflow (Roddy; https://github.com/TheRoddyWMS/Roddy). FastQC was used to evaluate read quality. Assignment of iCELL8 library barcodes to corresponding nanowells was performed with the Je demultiplexing suite [82]. Sequences were trimmed for primer sequences, poly-A/T tails, and low-quality ends using Cutadapt with the '-nextseq-trim' option. Mapping to the reference genome hs37d5 was performed (STAR aligner). Quantification of mapped BAM files was performed using featureCounts (reference annotation gencode v19). Only scRNA-seq libraries matching the following criteria were used: (i) >100,000 reads, (ii) >1000 detected genes, (iii) <15% mitochondrial reads. Strong PCA outliers as well as libraries with top 5% of reads for every patient independently were removed. As previously published [25], expression levels based on raw read counts were quantified as with CPM i,j as the counts-per-million for gene i in sample j. Aggregate expression of each gene across all cells was calculated as with genes with Ea < 3.5 being excluded to focus on highly or intermediately expressed genes. Combined filtered and normalized data of all patients were used for evaluation of inter-patient gene expression differences. The R package Seurat [38] was used for identification of highly variable genes, PCA, clustering, two-dimensional visualization, and differential expression analysis (Wilcoxon rank sum test: adjusted p-value < 0.05; log fold-change > 0.25).
Before combining the data of all patients, relative expression levels were calculated individually for each patient using a mean-centering approach to eliminate global inter-patient gene expression shifts. PCA was applied and-for visualization-the top 30 genes with low and high scores in the first principal component were clustered using average group linkage (UPGMA) by the 'aheatmap' function from R's 'NMF' package. Gene set enrichment analysis [83] was performed on the top 300 genes with highest and lowest PC scores.
Transcriptional signatures shared across patients were identified using NNMF [24] of mean-centered data of all patients defined as LGR5 + (n = 8 patients; Table 2). Analysis was performed in MATLAB (MathWorks, Natick, MA, USA; 'nnmf') with a factor number of k = 25 and negative events set to 0. To exclude patient-specific signatures, pairwise overlaps in frequency distributions of cell scores for individual factors were determined and factors with overlaps <50% in at least five patients were excluded. Biological relevance of factors and their associated genes was analyzed manually and by gene set enrichment analysis [83]. Factors potentially driven by technical artifacts were excluded. Signature scores were defined as averaged expression of the top 200 genes per factor. To reduce redundancy for visualization, signatures showing similar enrichment and clustering patterns were combined to meta-signatures (Figure S1a-e, Table S1).
Meta-signature scores (calculated based on the combined gene lists of the comprised signatures) were clustered using complete linkage of Euclidean distances. NNMF analysis was repeated with various numbers of factors resulting in identification of similar core signatures.
To test whether cell type-specific transcriptional programs (stem-like, TA-like, Panethlike, Tdiff-like) are active in individual cells or-in other words-to differentiate between cells that belong to the four cell type-specific subpopulations, we adapted the above described cell scoring approach based on the expression of inferred NNMF metasignatures [25] and used control random gene sets as background model to control for technical confounders as library complexity. Cell type-specific transcriptional programs were defined as active if their expression in individual cells was >1 standard deviation above the mean across all cells. Inferred cell state-specific signatures were scored for cells of a particular cell type to assess the degree to which cell states are active in specific cell types.
Subsequent downstream analysis was performed with standard Seurat workflow, including log-normalization and scaling as well as PCA and clustering using the top 2000 variable genes. Datasets were visualized using two-dimensional t-distributed stochastic neighbor embedding maps [84]. The three primary tumor samples were aligned using canonical correlation analysis implemented in Seurat [85]. In brief, this method identifies pairwise correspondences between single cells across different datasets belonging to specific biological states, termed 'anchors'. These anchors are the basis of harmonizing datasets. Differentially expressed genes between identified clusters were identified using Wilcoxon rank sum test. Identified clusters were scored for cell state signatures using the 'AddModuleScore' function (Seurat), using gene signatures from NNMF analysis (Table S1).

Patient Clustering and Survival Analysis
Cell type and cell state signatures obtained from spheroid scRNA-seq data (Table S1) were evaluated in a patient survival analysis. Bulk transcriptomic data for COAD patients with available survival data were collected from TCGA (level 3 RNA-seq, n = 328 patients) [28] and log-transformed. For each TCGA patient, the mean expression of gene signatures was calculated and used to cluster bulk transcriptomes by complete linkage of Euclidean distances. Patients were grouped according to different combinations of cell type and cell state signatures. In a new clustering process, the sample space was progressively subdivided using the main signatures defining each cluster of patients: First, OXPHOS_1, G1/S, G2/M, and stem signatures separate cl2 and cl3 (high) from the rest (low; Euclidean distances). Then, hypoxia/glycolysis_1 and TNFα_2 signatures distinguish cl2 (low) from cl3 (high; Euclidean distances). Next, fatty acid and TNFα_1 signatures separate cl1 (high) from cl4, cl5, and cl6 (Euclidean distances). Subsequently, stem and TA signatures separate cl6 (stem low ) from cl4 and cl5 (stem high ; correlation). Finally, G1/S, G2/M, and OXPHOS_1 signatures also distinguish between cl4 (medium) and cl5 (low; Euclidean distances). Complete linkage of Euclidean distances was used to cluster stem high and stem low patients. Kaplan-Meier survival curves were generated using 'survival' and 'survminer' libraries in R. We performed Cox proportional hazards modeling and multivariable models with and without cell type and cell state clusters were compared by performing analysis of variance (ANOVA). 'CMScaller' [29] was used to stratify the TCGA COAD cohort. To generate the contingency table, patients that could not be assigned to a CMS (n = 19 patients) were excluded.

Genetic Labelling of Spheroids
For tracking of cells within tumors, lentiviral vector particles encoding for tetracyclineregulated (Tet-off) H2B-GFP were produced in HEK293T cells, concentrated by ultracentrifugation, and titrated on HeLa cells as described [4,5]. Patient-derived spheroid cultures (n = 7) were transduced with a multiplicity of infection of 1-20 aiming at transduction efficiencies of~40% to avoid multiple vector integrations. Within 24 h after transduction, 4 × 10 5 -1.7 × 10 6 transduced cells were transplanted under the kidney capsule of NSG mice (n = 14, 1-4 mice per spheroid culture) anesthetized by 1.75% isoflurane (Abbott, Chicago, IL, USA) in the breathing air. Mice were checked daily for tumor growth, and starting two weeks prior to tumor harvesting, doxycycline (Genaxxon, Ulm, Germany) was added to the drinking water of tumor-bearing mice to shut down H2B-GFP expression.
Mice were sacrificed, xenograft tumors were digested as described [5,73] For quantitative analysis of RNA-FISH/Mitotracker imaging data, we developed a single-cell image analysis pipeline to relate metabolic activity (Mitotracker) to intestinal subtypes (RNA-FISH). To prepare spheroid images for further analysis, we performed maximum intensity projection on each channel separately. For automated nuclei instance detection and segmentation in spheroids, a deep learning object detection and instance segmentation workflow incorporating Mask R-CNN [86] was implemented. The neural network was initialized using pretrained models trained on the 'Microsoft COCO: Common Objects in Context' dataset [87] and fine-tuned using images of nuclei acquired from various unrelated sources. Maximum intensity projections of DAPI images were used as inputs for the neural network to produce segmentation for each individual nucleus as outputs. Nuclei sizes were calculated using these segmented DAPI masks, and objects smaller than 350 pixels were filtered out and excluded from subsequent analysis.
For quantification and analysis of transcript abundance of marker mRNAs specific for stem-like (LGR5), Paneth-like (DEFA5), and Tdiff-like (FABP1) cells, maximum intensity projections of RNA-FISH channels were binarized using 'Maximum Entropy' thresholding (FIJI/ImageJ; https://imagej.nih.gov/ij/). Transcript abundance was estimated by overlaying nuclei masks on maximum projected probe channels and calculating number of pixels lying within each mask. To account for cytoplasmic fluorescence signals localized outside of nuclei masks, we expanded nuclei before quantification by morphological dilation (two iterations) as implemented in scikit-image (Python). To quantify mitochondrial abundance per cell, Mitotracker signals were quantified similarly, but binarization of fluorescence signal was based on 'Moments' thresholding (FIJI/ImageJ). We then performed k-means clustering on frequency distributions of pixel counts per cell to identify and separate cells into two distinct positive 'ON' (high expression/abundance) and negative 'OFF' (low expression/abundance) states. k = 2 was used for mRNA probes, while k = 3 was used for Mitotracker signals to better capture gradual differences between cells. Finally, the fraction of stem-like, Paneth-like, and Tdiff-like cells that are Mitotracker high at the same time was calculated by dividing the number of Mitotracker high LGR5 + , DEFA5 + , or FABP1 + cells by the total number of LGR5 + , DEFA5 + , or FABP1 + cells.

Flow Cytometry and Sorting of Metabolic Subpopulations
Spheroid cultures were dissociated into single-cell suspensions as described above. Sorted cells were collected in culture medium supplemented with 100 µg/mL streptomycin and 100 U/mL penicillin. For

Assessment of SFC Frequency
For each sorted cell population (OXPHOS low , OXPHOS high ), 48 wells with 10 cells, 24 wells with 100 cells, and 16 wells with 1000 cells per well were sorted into 96-well ultra-low attachment plates (Corning, Corning, NY, USA) containing 100 µL of culture medium (50% fresh, 50% conditioned (filtered medium of the bulk culture harvested during collection of cells)) supplemented with 100 µg/mL streptomycin and 100 U/mL penicillin per well. Fresh cytokines and medium were added every four days. Spheroid formation was analyzed 5-7 days after sorting using conventional light microscopy (Axiovert 40C, Zeiss, Oberkochen, Germany). Based on the fraction of spheroid-containing wells for each dilution, SFC frequencies were calculated using Poisson statistics and maximum likelihood (L-Calc, StemCell Technologies, Vancouver, BC, Canada). In vitro limiting dilution assays upon Mitotracker staining were performed three times for MMP low and MMP high subpop-ulations of P1 and P4, twice for P5 as well as bulk (all living, i.e., TOTO3cells) populations of P1 and P4, and once for P2 and P10.

Assessment of TIC Frequency
Mitotracker stained cells were sorted as described above, pelleted, resuspended in medium, and counted. Different cell counts were pelleted, resuspended in medium, mixed with matrigel (Corning, Corning, NY, USA), and injected subcutaneously into the flanks of immunodeficient NSG mice. For MMP low as well as MMP high fractions of P1, four mice with 10 6 cells, five mice with 10 5 cells, six mice with 10 4 cells, and six mice with 10 3 cells were transplanted. For MMP low as well as MMP high fractions of P4, three mice with 3 × 10 5 cells, four mice with 3 × 10 4 cells, 5-6 mice with 3 × 10 3 cells (six mice for MMP low , five mice for MMP high ), 5-6 mice with 3 × 10 2 cells (five mice for MMP low , six mice for MMP high ), and six mice with 3 × 10 1 cells were transplanted. The experiments were performed blindly until observable tumor development.
Mice were monitored daily for tumor formation and sacrificed when tumors reached the maximum tolerable size or when experiments were ended (P1: seven weeks; P4: five weeks after transplantation). Based on the fraction of tumor formation for each dilution, TIC frequencies were calculated using Poisson statistics and maximum likelihood (L-Calc).

Co-Cultivation Experiments
Spheroid cultures (n = 3) were transduced with a lentiviral vector encoding for EGFP under control of the human PGK promoter at multiplicities of infection of 0.5 (P1, P4) or 1 (P5), yielding transduction efficiencies of~40-50%. After expansion, cells were stained with Mitotracker and prepared for flow cytometry as described above. In addition to Mitotracker and TOTO-3, EGFP fluorescence was detected (488 nm laser; 505 LP, 525/50 filter). Cells were gated for low and high Mitotracker signal (MMP low , MMP high ) as well as for negative or positive EGFP signal (EGFP -, EGFP + ). For each culture, a set of 5 × 10 4 MMP high EGFP + and 5 × 10 4 MMP low EGFPcells as well as a set of 5 × 10 4 MMP high EGFPand 5 × 10 4 MMP low EGFP + cells were sorted simultaneously. To assess sorting efficiency, sorted samples were reanalyzed by recording 1000 living cells. Sorted cells were cultivated in 24-well ultra-low attachment plates (Corning, Corning, NY, USA). Spheroid formation and EGFP signal for each sample set were observed by fluorescence microscopy (Axiovert 200, Zeiss, Oberkochen, Germany). After 21 days in culture, cells were dissociated, stained with Mitotracker, and reanalyzed by flow cytometry as described.

Inhibitor Treatments
To assess SFC frequencies upon pretreatment with OXPHOS inhibitors, 5 × 10 5 tumor spheroid cells (P1, P4, P5) were seeded into two wells of 6-well ultra-low attachment plates (Corning, Corning, NY, USA). After seven days, 25 µM CCCP or DMSO (Sigma-Aldrich, St. Louis, MO, USA) were added and cells were incubated for 4 h at 37 • C. Cells were dissociated, stained with 200 nM TOTO-3 in PBS, and prepared for cell sorting as described. Living (i.e., TOTO-3 -) cells were sorted into 96-well ultra-low attachment plates containing 100 µL of fresh culture medium supplemented with 100 µg/mL streptomycin and 100 U/mL penicillin per well. Limiting dilution and determination of SFC frequency were performed as described.

Immunohistochemistry
Formalin-fixed and paraffin-embedded tumor specimens of primary colorectal adenocarcinomas (n = 25 patients) and liver metastases (n = 25 patients) resected between 2013 and 2016 at the University Hospital Heidelberg were extracted from the archive of the Institute of Pathology, Heidelberg University, with the support of the tissue bank of the NCT (#2831). Tissues were used in accordance with the ethical regulations of the tissue bank of the NCT defined by the local ethics committee. Diagnoses were made according to the recommendations of the World Health Organization classification of tumors of the digestive system [88].
Immunohistochemical staining was performed as previously described [89]. In brief, tissue sections were cut, pretreated with an antigen retrieval buffer, and stained for Ki-67, CA9, and LDH-A using an automatic staining device (Ventana Benchmark Ultra, Roche, Basel, Switzerland; Table S3).
Raw files were processed using MaxQuant (https://www.maxquant.org; version 1.5.1.2) [91] against the human Uniprot database (20170801_Uniprot_homo-sapiens_ canon-ical_reviewed; 20,214 entries) using the Andromeda search engine with the default search criteria: enzyme was set to trypsin/P with up to two missed cleavages. Carbamidomethylation (C) and oxidation (M)/acetylation (protein N-term) were selected as fixed and variable modifications, respectively. Protein quantification was performed using the label-free quantification algorithm of MaxQuant. On top, intensity-based absolute quantification intensities were calculated with a log-fit enabled. Identification transfer between runs via the 'matching between runs' algorithm was allowed with a match time window of 0.3 min. Peptide and protein hits were filtered at a false discovery rate of 1% with a minimal peptide length of seven amino acids. The reversed sequences of the target database were used as a decoy database. Proteins only identified by a modification site, contaminants, as well as reversed sequences were removed from the dataset.
Differential expression analysis was performed using limma moderated t statistics (R package version 3.36.3; one-sample, two-sided) [92]. Here, data was first normalized based on median label-free quantification densities per sample. Next, ratios between MMP high and MMP low cells were calculated. Significantly differentially expressed proteins were defined to show a Benjamini-Hochberg adjusted p-value < 0.05 and an absolute log 2 -fold change > 1.

Quantification and Statistical Analysis
Statistical tests and sample size used for individual experiments are described in the corresponding figure legends or methods. The threshold for statistical significance was defined as p < 0.05. Significance levels were denoted by asterisks: * p < 0.05, ** p < 0.01, **** p < 0.0001. The threshold for statistical significance in univariable and multivariable survival analyses was defined as p < 0.1.

Data and Code Availability
scRNA-seq data have been deposited at the European Genome-phenome Archive (EGA) which is hosted at the EBI and the CRG, under accession number EGAS00001004064.
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE [93] partner repository with the dataset identifier PXD018230.

Conclusions
In this study, we show at single-cell resolution that transcriptional heterogeneity identifies functional states during tumor-initiating cell differentiation in colorectal cancer. Targeting specific transcriptional states associated with cancer cell differentiation unravels novel potential vulnerabilities in human colorectal cancer.
Supplementary Materials: The following are available online at https://www.mdpi.com/2072-669 4/13/5/1097/s1, Figure S1: Identification of heterogeneous gene expression programs, Figure S2: Cell cycle and proliferative activity of human colorectal cancer cells, Figure S3: Single-cell expression of representative stem and differentiation markers in patient-derived tumor models, Figure S4: Spatial localization in spheroid cultures in situ, Figure S5: Metabolic heterogeneity in colorectal tumors and patient-derived spheroids, Figure S6: Gating strategy for sorting according to mitochondrial membrane potential (MMP) using Mitotracker, Table S1: Signatures defined by non-negative matrix factorization (NNMF) of colorectal cancer spheroid single-cell RNA-sequencing data, Table S2: Univariable and multivariable survival models, Table S3: Antibodies for immunohistochemistry.  2013)). All animal experimentation performed in this study was conducted according to national guidelines and was reviewed and confirmed by an institutional review board/ethics committee headed by the responsible animal welfare officer. The Regional Authority of Karlsruhe, Germany finally approved the animal experiments as