Connectivity Map Analysis of a Single-Cell RNA-Sequencing -Derived Transcriptional Signature of mTOR Signaling

In the connectivity map (CMap) approach to drug repositioning and development, transcriptional signature of disease is constructed by differential gene expression analysis between the diseased tissue or cells and the control. The negative correlation between the transcriptional disease signature and the transcriptional signature of the drug, or a bioactive compound, is assumed to indicate its ability to “reverse” the disease process. A major limitation of traditional CMaP analysis is the use of signatures derived from bulk disease tissues. Since the key driver pathways are most likely dysregulated in only a subset of cells, the “averaged” transcriptional signatures resulting from bulk analysis lack the resolution to effectively identify effective therapeutic agents. The use of single-cell RNA-seq (scRNA-seq) transcriptomic assay facilitates construction of disease signatures that are specific to individual cell types, but methods for using scRNA-seq data in the context of CMaP analysis are lacking. Lymphangioleiomyomatosis (LAM) mutations in TSC1 or TSC2 genes result in the activation of the mTOR complex 1 (mTORC1). The mTORC1 inhibitor Sirolimus is the only FDA-approved drug to treat LAM. Novel therapies for LAM are urgently needed as the disease recurs with discontinuation of the treatment and some patients are insensitive to the drug. We developed methods for constructing disease transcriptional signatures and CMaP analysis using scRNA-seq profiling and applied them in the analysis of scRNA-seq data of lung tissue from naïve and sirolimus-treated LAM patients. New methods successfully implicated mTORC1 inhibitors, including Sirolimus, as capable of reverting the LAM transcriptional signatures. The CMaP analysis mimicking standard bulk-tissue approach failed to detect any connection between the LAM signature and mTORC1 signaling. This indicates that the precise signature derived from scRNA-seq data using our methods is the crucial difference between the success and the failure to identify effective therapeutic treatments in CMaP analysis.


Introduction
Currently, mTORC1 inhibitor sirolimus is the only drug approved by the Food and Drug Administration (FDA) that improves pulmonary dysfunction and decelerates LAM progression in most patients [1]. However, sirolimus treatment does not lead to progressionfree survival and has a cytostatic rather than a cytocidal effect. Lung function decline resumes following drug discontinuation and thus uninterrupted drug exposure is required for prolonged benefit [1,2]. The drug cannot completely eliminate LAM cells, potentially because chronic exposure to sirolimus induces refractoriness and resistant behavior of mTORC1-hyperactive LAM cells [3]. Therefore, it is urgent to identify remission-inducing and durably effective therapeutic agents for LAM.
As an alternative to de novo drug discovery, identifying new therapeutic uses of existing drugs by leveraging large compendia of biomedical data, also known as drug repositioning, has been used as a potential tool in drug discovery and development [4][5][6]. In the connectivity map (CMap) drug repositioning [7], the transcriptional signature of disease is constructed by differential gene expression analysis between the diseased tissue or cells and the control. The negative correlation between the transcriptional disease signature and the transcriptional signature of the drug treatment is used to identify drugs capable of "reversing" the disease process to be used as potential therapeutics. For example, histone deacetylase (HDAC) inhibitor vorinostat, which is known to treat cutaneous T-cell lymphoma, has been shown to be effective in treating gastric cancer [8], and drug topiramate has been identified as a potential candidate to treat inflammatory bowel disease (IBD) by comparing gene expression signatures of IBD against drug perturbational signatures [9]. The most recent edition of the connectivity map library, generated by the integrated network-based cellular signatures (LINCS) project, catalogues transcriptional signatures of more than 20,000 drugs and uncharacterized small chemicals across 77 cell lines facilitating drug repositioning and identification of new therapeutic agents [10,11].
A major limitation of traditional CMaP analysis is the use of signatures derived from bulk disease tissues. Since the key driver pathways are most likely dysregulated in only a subset of cells, the "averaged" transcriptional signatures resulting from bulk analysis lacks the resolution to effectively "connect" disease with effective therapeutic agents. With the recent progress of next-generation sequencing technologies, single-cell RNA-seq (scRNA-seq) has emerged as a powerful tool to investigate inter-cellular heterogeneity at single-cell level. The gene expression dynamics of individual cells provides means to study complex disease mechanisms at an unprecedented resolution. The use of single-cell RNA-seq (scRNA-seq) transcriptomic assay facilitates construction of disease signatures that are specific to individual cell types. Although considerable research has been devoted to using bulk transcriptional signatures for computational drug repositioning and methods for analysis of scRNA-seq data are being developed at breath-taking speed, methodologies for connecting diseases, genes, and drugs using scRNA-seq data are lacking.
In this paper, we present the complete protocol for performing connectivity analysis using scRNA-seq data, including signatures construction and connectivity analysis with individual drug signatures as well as the whole classes of drugs with the same mechanism of action. The methods are described in the context of CMaP of LAM scRNA-seq signatures. Our analyses successfully predict therapeutic effects of currently used drugs inhibiting mTORC1 signaling. Importantly, we demonstrate that these results are contingent on use of scRNA-seq data and our methods for constructing single-cell disease signature and would not be possible by connectivity analysis of standard bulk RNA-seq disease signatures.

Overview of scRNA-Seq Connectivity Analysis
Conventional transcriptome profiling methods such as bulk RNA-seq rely on averaging molecular signals across a large population of cells. The goal of our analysis methods is to construct a transcriptional signature of disease-critical cells, which may represent only a small fraction of profiled cells, by comparing them to the matched cell type in the control non-diseased tissue. Such a disease signature factors out the cell-type to cell-type variability and facilitates identification of effective therapeutics when the standard connectivity analysis of bulk disease signatures fails.
The analytical workflow of scRNA-seq signature construction and connectivity analysis proceeds as (1) cluster analysis of disease and controls samples; (2) construction of cluster annotating signature (CAS) for each cluster in the disease sample and identification of the disease-critical cell subpopulation using the panel of disease marker genes; (3) identification of matching control cell populations in the non-diseased sample; (4) construction of disease-characterizing signature (DCS) by comparing the disease-critical cells with the matched control cells; and (5) "connecting" DCS to LINCS-L1000 chemical perturbational signatures. Here, we illustrate the methodology in the analysis in the context of the LAM scRNA-seq data. Technical details of each step are provided in the Methods and outlined in Supplementary Figure S1.

Identification of Distinct Cell Populations
scRNA-seq data were generated using 10× Chromium platform on dissociated lungs from one naïve LAM patient (LAM1), one sirolimus treated LAM patient (LAM2), and one normal patient (WT), respectively, and have been previously described and analyzed [12]. In total, 19,384 cells (7244 cells from LAM1, 6545 cells from LAM2, and 5595 cells from WT) passed quality control filters, with an average number of detected genes (UMI > 0) of 2089, 2466, and 1564 per cell in LAM1, LAM2, and WT, respectively (Supplementary Figure S2). The analytical workflow outlined above was carried out for LAM1 and LAM2 samples separately. Cluster analysis identified 19 clusters in each of the samples.

Construction of Cluster Annotating Signatures
To construct cluster annotating signatures (CAS), pairwise comparisons for each cluster ( Figure 1A) were conducted and then combined into a single cluster-specific signature (Methods; Supplementary Figure S1). The top most significantly (FDR < 0.05) up-regulated genes, namely, cluster annotating signature (CAS), were then used to annotate cell clusters. This step was iterated for each cluster separately, and lists of all cluster annotating genes are provided in Supplemental Tables.

Construction of Disease Characterizing Signature
Disease characterizing signature of LAM1 was constructed by comparing LAM1cluster16 with the transcriptionally analogous WT clusters. The analysis of overlaps between the LAM1cluster16 CAS and CASes of all WT clusters identified cells in WT clusters 9 and 12 (Figure 2A,B) as being the most similar to the LAM cells in LAM1cluster16. Single- To identify disease-critical cell sub-population, we utilized a set of eight marker genes identified as the markers of LAM from the literature ( Figure 1B); Supplementary Table S1). All the markers were exclusively highly expressed in cluster 16 of LAM1 ( Figure 1B), and this cluster was the only one whose signature was enriched for expression of the marker genes ( Figure 1C; Supplementary Table S2), indicating that the cluster (herein denoted as LAM1 cluster16 ) consists of LAM cells.
To further characterize cells in different clusters, we performed enrichment analysis [13] of the top 200 most significantly up-regulated genes from each cluster for cell type marker from three databases: human cell landscape (HCL) [14], cellMarker (CM) [15], and PanglaoDB (PDB) [16], and the tissue markers derived from the gene atlas dataset [17]. Top three most significantly (FDR < 0.05) enriched tissue and cell-type categories with log odds ratio above 1.5 from each cluster were selected for each cluster. Associations between the clusters and cell and tissue type are summarized in the Supplementary Figure S3. The analysis identified clusters of epithelial, endothelial, and immune cells. The LAM1 cluster16 cells were also enriched for markers of mesenchymal and uterus cells signatures (Supplementary Figure S3). The list of all enriched pathways is provided in the Supplemental Tables.

Construction of Disease Characterizing Signature
Disease characterizing signature of LAM1 was constructed by comparing LAM1 cluster16 with the transcriptionally analogous WT clusters. The analysis of overlaps between the LAM1 cluster16 CAS and CASes of all WT clusters identified cells in WT clusters 9 and 12 ( Figure 2A,B) as being the most similar to the LAM cells in LAM1 cluster16 . Single-cell disease characterizing signature (DCS) of LAM was then constructed by differential gene expression analysis between cells in LAM1 cluster16 and cells in WT clusters 9 and 12. For comparison, we also constructed pseudo-bulk signature of LAM1 by differential expression between all LAM1 cells and all WT cells (Methods). This signature mimics the signature that would be obtained by the bulk RNA-seq analysis.  The pathway analysis of the LAM single-cell DCS against GO [18], KEGG [19], and MSigDB (Hallmark) [20] gene sets with clusterProfiler [21] implicated MTORC1 signaling The pathway analysis of the LAM single-cell DCS against GO [18], KEGG [19], and MSigDB (Hallmark) [20] gene sets with clusterProfiler [21] implicated MTORC1 signaling hallmark gene sets as being enriched in the DCS ( Figure 2C), along with gene sets and pathways associated with cell proliferation, invasion, and metastasis. Although most of these signaling pathways are known features of LAM, identifying their activity within the LAM cell populations based on a transcriptional signature is not a trivial task. The analysis of the pseudo-bulk LAM signature does not reveal increased MTOR signaling (Supplementary Figure S4A), demonstrating the critical increase in precision of our DCS in comparison to a typical signature constructed from bulk tissue profiling.

Connectivity Analysis
We developed methods for CMaP analysis of DCS's with the aim of identifying the Mechanism of Action (MOA) of bioactive compounds capable of reversing the LAM. The single-cell DCS is first correlated with correlated with 143,374 LINCS signatures in response of 15,349 chemical perturbagens (CP) ( Figure 3A). The enrichment of signature with high negative correlations among CPs with a specific MOA was then assessed using small-sample bias corrected logistic regression (technical details in Methods). This demonstrates the importance of carefully constructing single-cell DCS for the successful connectivity analysis. In terms of individual compounds, we found sirolimus, AZD-8055, OSI-027, and WYE-125132 showing consistently strong negative correlation across all the dosages with LAM1 DCS (Supplementary Figure S5A). Cyclin-dependent kinase inhibitors (CKI) [22,23] CGP-60474, PHA-793887, alvocidib (CDK1/2 inhibitors), and palbociclib (CDK4/6 inhibitor) also showed strong negative correlation with LAM1 single-cell DCS across different concentrations and cell lines (Supplementary Figure S5A).
Another class of compounds implicated by the connectivity and functional enrichment analysis of LAM1 DCS were MEK/MAP kinase/protein kinase inhibitors. Estrogeninduced activation of MAPK signaling was associated with enhanced cell proliferation [24] and survival of LAM cells [25]. Estrogen increased the expression of oncogene c-MYC, which plays a critical role in cell cycle progression by suppressing p21 Cip1 expression [26], in LAM cells ( Figure 3C) and might have induced MAPK signal transduction pathways In the CMaP analysis of LAM1 DCS, most enriched MOA categories included MTOR inhibitors, dual inhibition of PI3K/MTOR, and CDK inhibitors ( Figure 3B,C, and Supplementary Table S3). Given the known etiology of LAM, and the use of the sirolimus MTOR inhibitor in the treatment of LAM, ability of MTOR inhibitors to reverse the LAM was expected and also in line with the functional analysis results from the previous section. However, the same connectivity analysis repeated on the pseudo-bulk LAM signature failed to identify MTOR inhibitors as putative therapeutics (Supplementary Table S5). This demonstrates the importance of carefully constructing single-cell DCS for the successful connectivity analysis.
Another class of compounds implicated by the connectivity and functional enrichment analysis of LAM1 DCS were MEK/MAP kinase/protein kinase inhibitors. Estrogeninduced activation of MAPK signaling was associated with enhanced cell proliferation [24] and survival of LAM cells [25]. Estrogen increased the expression of oncogene c-MYC, which plays a critical role in cell cycle progression by suppressing p21 Cip1 expression [26], in LAM cells ( Figure 3C) and might have induced MAPK signal transduction pathways [24,27]. Moreover, inhibition of mTORC1 is known to activate MAPK signaling cascade [28]. The combination therapies of concurrent inhibition of mTORC1 and MAPK are currently being investigated [29].

Signature Construction and Connectivity Analysis of Sirolimus Treated LAM
Similar to naïve LAM, we repeated the analytical workflow for sirolimus-treated LAM sample (LAM2). The clustering algorithm identified 19 clusters in LAM2 ( Figure  4A), and we used LAM marker genes to identify LAM cells in LAM2. However, unlike LAM1, the expression of LAM markers was not localized in any particular cluster, and cells expression were dispersed in all clusters making it impossible to identify a single LAM cluster (Supplementary Table S2). We hypothesized that the frequency of the LAM cells in LAM2 sample may be too low for them to be identified in the unsupervised fashion. As an alternative strategy, we combined LAM1 and LAM2 cells and re-clustered them.  Table S2). We hypothesized that the frequency of the LAM cells in LAM2 sample may be too low for them to be identified in the unsupervised fashion. As an alternative strategy, we combined LAM1 and LAM2 cells and re-clustered them. A total of 13,789 cells from LAM1 and LAM2 were combined using Seurat's [13] im- A total of 13,789 cells from LAM1 and LAM2 were combined using Seurat's [13] implementation of multiple dataset integration and 18 clusters were detected ( Figure 4B). Briefly, the data were normalized and variable genes were identified separately for the two datasets. The "anchoring" pairs of cells in both datasets were identified by joint k-nearest neighbor analysis based on the cannonical correlation, and k = 5 and batch normalization was performed as described in the Seurat paper. Majority of the markers were highly expressed in both LAM1 and LAM2 part of cluster 16, which was further supported by the enrichment of LAM markers in the joint cluster (Supplementary Table S2). All the 57 cells from LAM1 cluster16 were also present in the joint cluster 16. The 12 LAM2 cells in the joint cluster 16 were assumed to represent LAM cells in the LAM2 samples and were denoted as LAM2 joint-cluster16 . Please note that the fact that LAM cells clusters, the LAM1 sample (Figure 1), and combined clustering ( Figure 4B) were both labeled as cluster 16 was purely coincidental.
Cluster annotating signatures (significant genes listed in the Supplemental Tables) of the joint clusters showed similar cell and tissue types as in LAM1 analysis (Supplementary Figure S6). Cluster annotating signatures were further used to find the WT clusters akin to LAM2 joint-cluster16 . Similar to LAM1 cluster16 , WT cluster 9 and 12 had maximum number of overlapping genes with LAM2 joint-cluster16 ( Figure 5A,B). The single-cell DCS of LAM2 cells was constructed by differential gene expression analysis between cells in LAM2 joint-cluster16 and the WT clusters 9 and 12. The pathway analysis of the LAM2 DCS identified pathways associated with the regulation of cell-cell adhesion, response to interferongamma, and tumor necrosis factor, but not MTOR signaling ( Figure 5C). The list of all enriched pathways is provided in the Supplemental Tables.  Connectivity analysis of LAM2 DCS ( Figure 6A) revealed several MOA categories, including single-agent proteasome inhibitors, dual inhibition of NF-κB pathway/proteasome inhibitors, and HSP inhibitors. (Figure 6B, Supplementary Table S4). Mutation of TSC2 and its leading activation of MTORC1 upregulates the proteasome [30], which may facilitate estrogen-enhanced survival of tumor cells [31,32]. MTOR also Connectivity analysis of LAM2 DCS ( Figure 6A) revealed several MOA categories, including single-agent proteasome inhibitors, dual inhibition of NF-κB pathway/proteasome inhibitors, and HSP inhibitors. (Figure 6B, Supplementary Table S4). Mutation of TSC2 and its leading activation of MTORC1 upregulates the proteasome [30], which may facilitate estrogen-enhanced survival of tumor cells [31,32]. MTOR also activates NF-κB [33], a major regulator of cell survival, pro-inflammatory cytokines such as TNF-α, and cell adhesion molecules which may allow LAM cells to survive [34,35]. We also found response to interferon gamma and cell adhesion molecules in the functional enrichment of LAM2 DCS ( Figure 6C), which might activate NF-κB and support the anti-apoptotic behavior of the LAM cells. Proteasome inhibitor, which inhibits NF-κB activation, has been found to reduce estrogen mediated survival of TSC2-null cells in LAM [32] and was one of the top hits in our connectivity analysis with LAM2 DCS. Signatures of tyrosine kinase and cyclooxygenase inhibitor drugs were also implicated ( Figure 6B,C). Interestingly, several drugs related to this MOA, such as multi-kinase inhibitor imatinib, Src inhibitor Saracatinib, and Cyclooxygenase inhibitor Celecoxib, are being currently tested in clinical trials as LAM therapeutics, confirming the relevance of the connectivity analysis results. We also found MTOR inhibitors to be one of the most enriched MOA categories although with relatively low strength of association (odds ratios) ( Figure 6B).

Discussion
The connectivity analysis leveraging large databases of transcriptional perturbation signatures such as LINCS-L1000 along with the open accessibility to processed transcriptomics data [36,37] and signatures [38,39] enables in silico discovery of novel therapeutics. However, disease-related biological processes and resulting transcriptional dysregulation are not uniform across all cell types within the diseased tissues. The differences in expression profiles between cells of different types usually dwarf the differences between diseased and non-diseased cells of the same type. Consequently, the cell-averaging in the traditional bulk assays can produce disease transcriptional signatures of no relevance for finding putative therapeutics via connectivity analysis. This has been clearly

Discussion
The connectivity analysis leveraging large databases of transcriptional perturbation signatures such as LINCS-L1000 along with the open accessibility to processed transcriptomics data [36,37] and signatures [38,39] enables in silico discovery of novel therapeutics. However, disease-related biological processes and resulting transcriptional dysregulation are not uniform across all cell types within the diseased tissues. The differences in expression profiles between cells of different types usually dwarf the differences between diseased and non-diseased cells of the same type. Consequently, the cell-averaging in the traditional bulk assays can produce disease transcriptional signatures of no relevance for finding putative therapeutics via connectivity analysis. This has been clearly demonstrated in our analysis of LAM data. Our methodology for constructing and CMaP analysis of scRNA-seq signatures effectively circumvented this fundamental limitation.
The functional and CMaP analysis of LAM scRNA-seq signature firmly establishes the dysregulation of mTORC1 signaling as the primary target for therapeutic intervention and recapitulates known disease mechanism of LAM. At the same time, the analysis of corresponding pseudo-bulk signatures completely fails to establish the connection. To the best of our knowledge, this is the first analysis that describes and clearly demonstrates the importance of single-cell transcriptional signature based CMaP analysis. It is quite possible that a bulk analysis of more samples with different disease stages would have also identified the signature of mTOR signaling. Furthermore, our results are based on relatively few cells in a single sample of naïve LAM lung tissue, and analyses of additional samples will be necessary to establish the robustness of the results across different patients. However, the results still illustrate the power of scRNA-seq in constructing cell-type and patient-specific signatures and using them to search for promising therapies via CMaP analysis. Each of the steps in our analysis can be accomplished using a wide range of statistical methods, and breath-taking pace of methods development makes it difficult to choose the optimal methods [40,41]. For example, for the critical step of identifying the very small clusters of LAM cells we used graph-based Louvain-Jaccard clustering algorithm [42,43]. The fact that we were able to detect the LAM cell populations validates this choice. At the same time, using a different cluster analysis strategy and the parameters that determine the number of clusters [44,45], or even an ensemble of clustering results using different methodologies [46], may yield better results in another context. Similar considerations can lead to different choice of methodology for differential gene expression analysis [47], connectivity metrics, and MOA enrichment analysis of negatively correlated perturbation signatures. The rigorous benchmarking of different choices at each step would be difficult, and it is beyond the scope of this paper. We demonstrate that our methodology is conceptually sound and that choices we made are reasonable because they lead to positive results in a context of a very difficult problem where standard CMaP analysis methodologies would fail.
The analysis of LAM data here serves as proof of concept for general approach for pleiotropic effect of dysregulation of the mTOR signaling pathway in various human disorders [48][49][50] and aging itself [48,51,52]. mTOR inhibitors are currently the only pharmacological treatment shown to extend lifespan in model organisms [53][54][55], and mTOR signaling has been directly implicated in age-associated disorders such as Alzheimer's disease [56]. Numerous inhibitors of mTOR signaling have been developed, and new drugs that modulate activity of mTOR signaling are under development [57]. Our results illustrate how CMaP analysis of disease scRNA-seq data can accurately detect mTOR dysregulation and predict mTOR signaling inhibitors as effective drugs when the classical CMaP of bulk tissues fails. The scRNA-seq data used in our analysis was previously described and analyzed by Guo et al. [12], and our pathway analysis results of naïve LAM signatures are consistent with results presented in that paper. Unlike Guo et al., we were also able to identify a small set of cells expressing known LAM markers in the sirolimus treated LAM sample. However, the most important contribution of our study is the CMaP of the LAM signatures.
In addition to mTORC1 inhibitors, our analysis also identified additional classes of drugs, as well as specific drugs, capable of reverting the LAM signature, such as antiproliferative CDK inhibitors, and MEK/MAPK inhibitors, which might induce cytotoxicity against the LAM cells. In this computational study, we make no attempts to experimentally validate any of these predictions and they remain speculative, although some of already been established and studies are under way to test effectiveness.

Single-Cell RNA-Seq and LINCS-L1000 Data
Single-cell RNA-seq (scRNA-seq) was performed on dissociated lung tissue samples that were collected from three different sources including an untreated LAM patient (LAM1); patient treated with sirolimus (LAM2); and a brain-dead, beating-heart, organ donor control patient (WT). Both LAM patients were undergoing lung transplantation. Single-cell suspensions of the two explanted LAM lungs and the normal lung were subjected to 10× Chromium scRNA-seq. CellRanger pipeline was used for read alignment and quantification. Raw gene counts data used in this analysis have been previously described and submitted to GEO [12] (GSE135851). LAM1 data correspond to the sample GSM4035465, LAM2 data correspond to sample GSM4035466, and WT sample corresponds to sample GSM4035472.
For connectivity analysis, we utilized LINCS-L1000 database, which is comprised of an extensive library of over a million gene expression profiles [11]. L1000 assay, a low-cost high-throughput technology developed by the Broad Institute, measures the expression of 978 landmark genes. The gene expression profiles were generated in response to a wide range of perturbing agents, including~20,000 small molecule compounds in more than 100 human cell lines and cell types for a total of 473,647 signatures [10]. We considered 143,374 chemical perturbation signatures available via iLINCS [38] that were constructed by merging level-4 L1000 signature replicates into level-5 moderated Z-scores and only the reproducible signatures were retained.

Single-Cell RNA-Seq Data Pre-Processing and Clustering
For scRNA-seq data, we filtered low-quality cells that were expressed (unique molecular identifies (UMI) > 0) in less than 500 genes and had more than 10% mitochondrial UMI counts. Initial data pre-processing, normalization, and clustering were performed using Seurat3 [13] for LAM1, LAM2, and WT samples individually. Data were normalized by the global-scaling normalization method ("LogNormalize"), and top 2000 genes with highest standardized variance (method = "vst") were selected for principal component (PC) analysis. For clustering, shared nearest-neighbor (SNN) graph was constructed with top 30 PCs with highest variances and Louvain algorithm for community detection [42] and resolution parameter of 0.8 was used for clustering of cells within each sample. For integrated clustering of LAM1 and LAM2, both samples were merged using "IntegrateData" based on the anchors from "FindIntegrationAnchors" object with default parameters in Seurat3. Resolution parameter was set to 0.4 for cell clustering in the integrated LAM.

Construction of Cluster Annotating and Disease Characterizing Signatures
We employed a two-step strategy to annotate cell clusters and construct disease characterizing signature. In step 1, pairwise differential expression (DE) of each cluster was computed using MAST [47] Bioconductor package, which generated n t − 1 DE for each cluster (Supplementary Figure S1A), where n t is the number of clusters in sample t. For each pairwise comparison, we calculated π-score [58] by multiplying log2 fold change (LFC) and negative logarithm of p-values (corrected for multiple testing using Benjamini-Hochberg (BH) method [59]). This can be written as π irc = ϕ irc ·(−log 10 P irc ), where ϕ irc, and P irc are LFC and p-values for i th gene, r th comparison, and c th cluster, respectively. A positive π score indicates an up-regulation of a gene, whereas a negative score means down-regulation. A one-sided one sample Student's t-test was carried out to combine the n t − 1 DEs into a cluster specific signature under the following hypotheses: H 0 : µ π ic = µ 0 vs.H 1 : µ π ic > µ 0 , where µ π ic is the mean π score for gene i and cluster c. The null value was considered as 2 based on the cutoff of a gene being called differentially upregulated with pre-specified LFC of 1 and p-value of 0.01. p-values from t-test were further corrected for multiple testing using Benjamini-Hochberg method [59]. Top 200 most significantly (FDR < 0.05) up-regulated genes were considered for cell-type/tissue enrichment via CLEAN [60]. The cluster of disease-critical LAM cells was identified as the one most enriched for 8 LAM marker genes.
In step 2, LAM-specific cell cluster (LAM cluster16 ) was matched with WT clusters in terms of top 200 differentially upregulated (DU) genes (Supplementary Figure S1A). Similarities between LAM and WT clusters based on the number of overlapping genes were determined using complete linkage-based hierarchical clustering with Euclidean distance measure. Significance of the overlaps among LAM and WT clusters was assessed via Fisher's exact test. Finally, disease characterizing signature of both LAM1 and LAM2 was constructed by comparing LAM1 cells and LAM2 cells from LAM cluster16 with the matched WT clusters separately. Pseudo-bulk signatures for LAM1 and LAM2 were constructed by comparing all the LAM1 cells with WT cells and LAM2 cells with the WT cells respectively using MAST [47] Bioconductor package.

Connectivity Analysis
LINCS-L1000 chemical perturbational (CP) signatures were considered for connectivity analysis. We selected 250 most significantly (FDR < 0.05) differentially expressed (125 up-regulated and 125 down-regulated) genes from the LAM characterizing signature and matched them with the 978 L1000 landmark genes. Let Q i be the LAM signature and L ij be the LINCS-CP signatures, where i is the set of matched genes and j is the set of LINCS CP signatures. Pearson correlation Cor j Q, L j was computed between LAM and each of the LINCS CP signatures (Supplementary Figure S1B) to assess the strength of relationship between the signatures. Negative correlation p-values were calculated for each signature correlation. A total of 86,538 LINCS CP signatures associated with 1005 unique mechanism of action (MOA) categories corresponding to the small molecules/drugs were considered for further MOA enrichment.
Let M be a binary variable, where m k = 1, f or the k th MOA category 0, f or all other categories (1) Here, k = 1, 2, . . . . . . , 1005. Inspired by the LRpath method [61], we then fitted a small sample bias corrected binary logistic regression model [62] for M: logit(Pr(M k = 1)) = X T k β, where negative logarithm of down-regulated p-values of correlation between LAM and LINCS-CP signatures is the predictor variable (Supplementary Figure S1B). β > 0 indicates that the signatures of the drugs for a specific MOA are "connected" with the disease signatures.  Data Availability Statement: All data used has been previously described and is publicly available.

Conflicts of Interest:
The authors declare no conflict of interest.