Pan-Cancer Analysis Identifies Tumor Cell Surface Targets for CAR-T Cell Therapies and Antibody Drug Conjugates

Simple Summary Identification of tumor cell surface targets is vital for chimeric antigen receptor-T cell (CAR-T) therapies and antibody drug conjugates (ADCs). This study utilized the Cancer Genome Atlas (TCGA) database to perform a series of conditional screenings of tumor-specific surface proteins. Accordingly, we found a tumor tissue-specific gene set associated with the survival of cancer patients. Furthermore, these tumor-specific surface proteins can function to render the ability of tumor cells to metastasize. Correlation analysis revealed that these overexpressed membrane proteins were positively correlated, which suggests they maybe potential dual-drug targets. Our findings reveal the significance of tumor cell surface targets in CAR-T- and ADC-related drug development. Abstract Tumor cells can be recognized through tumor surface antigens by immune cells and antibodies, which therefore can be used as drug targets for chimeric antigen receptor-T (CAR-T) therapies and antibody drug conjugates (ADCs). In this study, we aimed to identify novel tumor-specific antigens as targets for more effective and safer CAR-T cell therapies and ADCs. Here, we performed differential expression analysis of pan-cancer data obtained from the Cancer Genome Atlas (TCGA), and then performed a series of conditional screenings including Cox regression analysis, Pearson correlation analysis, and risk-score calculation to find tumor-specific cell membrane genes. A tumor tissue-specific and highly expressed gene set containing 3919 genes from 17 cancer types was obtained. Moreover, the prognostic roles of these genes and the functions of these highly expressed membrane proteins were assessed. Notably, 427, 584, 431 and 578 genes were identified as risk factors for LIHC, KIRC, UCEC, and KIRP, respectively. Functional enrichment analysis indicated that these tumor-specific surface proteins might confer tumor cells the ability to invade and metastasize. Furthermore, correlation analysis displayed that most overexpressed membrane proteins were positively correlated to each other. In addition, 371 target membrane protein-coding genes were sifted out by excluding proteins expressed in normal tissues. Apart from the identification of well-validated genes such as GPC3, MSLN and EGFR in the literature, we further confirmed the differential protein expression of 23 proteins: ADD2, DEF6, DOK3, ENO2, FMNL1, MICALL2, PARVG, PSTPIP1, FERMT1, PLEK2, CD109, GNG4, MAPT, OSBPL3, PLXNA1, ROBO1, SLC16A3, SLC26A6, SRGAP2, and TMEM65 in four types of tumors. In summary, our findings reveal novel tumor-specific antigens, which could be potentially used for next-generation CAR-T cell therapies and ADC discovery.


Introduction
Cancer is the second leading cause of death all over the world with 9.96 million deaths every year [1]. Although tumor cells can be recognized and eliminated by immune surveillance, the immune system may lose efficacy due to the immune suppressive environment induced by tumor cells by altered gene expression, such as mutations and copy number variations [2,3]. The specific proteins encoded by these altered genes expressed on the surface of the tumor cells can shield the tumor cells to evade immune system clearance [4,5]. On the other hand, these tumor specific membrane proteins can be used as drug targets for cancer immunotherapy [6,7], which reactivates the anti-tumor immune response to eradicate tumors.
Nowadays, cancer immunotherapies. such as immune checkpoint blocking therapy (ICB) have been introduced into multiple lines of cancer treatment with great success [8].
Recently, two targeted immune-related therapeutic approaches based on tumor surface antigens (TSAs) have emerged in the cancer immunotherapy research field. The first approach use the antibody drug conjugates (ADCs), a drug class represented by attaching 3-8 molecules of a potent cytotoxic agent to a monoclonal antibody, which targets specific TSAs [9]. The second approach is the chimeric antigen receptor-T (CAR-T) cell therapy [10]. During the T cell engineering process, the expanded T lymphocytes are modified to recognize specific tumor-associated antigens and then transferred back into the cancer patients to eradicate the tumor cells [11]. Current CAR-T therapies mainly focus on CD19 [12], CD20 [13], BCMA [14], MUC1 [15], GD2 [16], CSPG4 [17], HER2 [18], EGFR [19], FAP [20] etc., and have achieved great success in pre-clinical assays or clinical applications. However, there are still many limitations in current TSA-based immunotherapies for tumors, particularly solid tumors [21]. It is mostly due to the lack of specific TSAs (unlike the scenarios in hematologic malignancies, which have specific and well-validated TSAs) [22,23] or heterogenous TSA expression in solid tumors [24,25]. Another major challenge of TSAbased immunotherapies is the "on-target, off-tumor toxicity" effects [26]. Thus, it is quite crucial to identify specific TSAs that are abundantly expressed in tumor cells and less or not expressed in normal tissue cells, to limit the potential toxic and adverse effects.
Here, we exploited the publicly available Cancer Genome Atlas (TCGA) databases to identify specific TSAs in various cancer types. Firstly, we identified highly expressed genes present on the tumor cell surface. Secondly, we performed a survival analysis to select genes that were significantly associated with survival outcomes in cancer patients. In addition, we analyzed the function and correlation of these genes and the prognostic value of the correlated genes with survival rates of cancer patients. To sum up, our work identified specific TSAs that might serve as useful targets for CAR-T cell therapies, ADCs, or co-targeting strategies for the treatment of solid tumors.  Table 1 for detailed sample numbers). Human membrane protein information was obtained from the databases: Membranome (https://membranome.org/ (accessed on 1 September 2020)) [27,28] and Uniprot (https://www.uniprot.org/ (accessed on 1 September 2020)) [29]. Human immune cell biomarkers were obtained from the database: CellMarker, and the membrane proteins on the immune cells were excluded [30]. The RNA expression data of 54 human normal tissues were obtained from the GTEx (Genotype-Tissue Expression) project data set (V8 release) [31]. The protein expression abundance data of a total of 7 types of cancer tissues were obtained from the CPTAC database. The detailed cancer types and sample numbers are shown in Table 1.

TCGA
A rank-sum test was used to analyze the differential expression of membrane proteincoding genes between tumor and normal samples. A threshold of Log2FoldChange (Log2FC) was greater than 1.00 and the adjusted p value was less than 0.01.

CPTAC
Taking into account the missing values of protein abundance in the CPTAC data, we first deleted genes whose expression was missing in more than 10% of samples of the tumor (i.e., if there are 100 samples for tumor A, and the expression of gene 1 is missing in more than 10 samples, we deleted the gene), and then filtered the missing values through the K-nearest neighbor method (k = 10). Finally, the relative normalized protein expression abundance profiles of 7 cancers were obtained. Since the protein abundance levels were normalized and log transformed, the difference in the expression abundance was calculated as the abundance in the tumor tissue minus the abundance in the normal tissue via the rank-sum test.

Enrichment Analysis
A hypergeometric test was used to analyze the enrichment relationship between highexpression and high-risk membrane protein-coding genes in the tumor tissues according to the ten cancer hallmarks [32].

Log-Rank Test
The tumor patients were divided into a high-expression group and a low-expression group by the mean value of membrane protein expression. A log-rank test was used to compare the survival rates of these two groups of patients using the R package "Survfit" [33].

Multivariate/Univariate Cox Regression
Multivariate Cox regression was used to analyze the impact of membrane protein pairwise combinations on the survival rates of the cancer patients. Hazard ratios (HR) of each membrane protein based on the expression levels of the protein in the sample and the prognostic information of the patient were analyzed through univariate cox regression. Among them, the genes with HR > 1.00 were considered to be poor prognostic factors for the cancer patients.

Risk-Score System Establishment
Each gene score was constructed as the selected gene expression level (exp) multiplied by its regression coefficient (β) obtained from the univariate Cox regression model. Each patient's prognostic risk-score was calculated as the sum of two gene scores; the formula is as follows [34,35]: Based on this formula, the risk-score of each sample was calculated (Table S2). According to the median risk-score, the patients were divided into a high-risk or low-risk group. The prognostic differences of these two groups were calculated by a log-rank test.

Correlation Analysis
We analyzed the correlation between every two membrane proteins by Pearson correlation. The visualization process was depicted using the R package "corrplot".

Identification of Up-Regulated Tumor Cell Membrane Proteins
To analyze the TSAs in tumor tissues, we first identified the membrane protein-coding genes with up-regulated expression levels through a series of pan-cancer screenings. Firstly, the expression profiles of mRNA were obtained from the TCGA database by excluding the data from the cancer types with less than three normal controls ( Figure 1A). Secondly, we further selected the tumor cell membrane-coding genes by intersecting the filtered immune cell markers genes and the cell membrane protein-coding genes ( Figure 1B). Finally, we examined the potential utility of these membrane genes as drug targets ( Figure 1C).
In the first step of the analysis, we found that 3919 membrane proteins were differentially expressed in most tumor samples, compared with their corresponding adjacent normal tissues (Figures 2A and S1). Specifically, we found that the number of up-regulated membrane proteins was larger than that of the down-regulated genes in the 17 cancer types We further illustrated a heatmap to demonstrate the up-regulated membrane proteins in tumor tissues ( Figure 2A). Furthermore, a volcano plot shows the differential expression of these target membrane protein-coding genes, which are listed in Table 2.

Most Highly Expressed Membrane Protein-Coding Genes Could Serve as Risk Factors for Cancer Patients
To further explore the impact of these highly expressed membrane proteins of the tumor tissues on the prognosis of the tumor patients, Cox risk regression analysis of these genes was applied to the patients from TCGA (Table S1). Additionally, we found that most of these membrane proteins were risk factors for tumors. To be specific, 427, 584, 431 and 578 genes were identified as risk factors for LIHC, KIRC, UCEC (uterine corpus endometrial carcinoma), and kidney renal papillary cell carcinoma (KIRP), respectively ( Figure 2C). The detailed prognostic values of the membrane proteins have been summarized for further verification in Table S1. To confirm the reliability of the previous analysis, we searched for whether these target proteins are currently used in TSA-based therapy with solid evidence and found that most membrane proteins we identified were consistent with the published data (Table 3). Furthermore, these tumor cell membrane proteins were either highly expressed in tumors or prognostic risk factors for tumor patients. Furthermore, some of them have been proven to be drug targets in solid tumors, such as GPC3 in liver cancer [36], MSLN in gastric cancer, and EGFR in glioma ( Figure 2D).

Identification of Up-Regulated Tumor Cell Membrane Proteins
To analyze the TSAs in tumor tissues, we first identified the membrane protein-coding genes with up-regulated expression levels through a series of pan-cancer screenings. Firstly, the expression profiles of mRNA were obtained from the TCGA database by excluding the data from the cancer types with less than three normal controls ( Figure 1A). Secondly, we further selected the tumor cell membrane-coding genes by intersecting the filtered immune cell markers genes and the cell membrane protein-coding genes ( Figure  1B). Finally, we examined the potential utility of these membrane genes as drug targets ( Figure 1C). In the first step of the analysis, we found that 3919 membrane proteins were differentially expressed in most tumor samples, compared with their corresponding adjacent normal tissues (Figures 2A and S1). Specifically, we found that the number of up-regulated membrane proteins was larger than that of the down-regulated genes in the 17 cancer types, such as cholangiocarcinoma (   shows the differential expression of these target membrane protein-coding genes, which are listed in Table 2

Paired Membrane Proteins Displayed More Precise Prognostic Value
To further investigate the combinatorial effects of these genes on the survival of the cancer patients, we established a risk scoring system using a formula containing the gene expression levels and the regression coefficients from the univariate Cox regression model (Table S2). We found that the combined analysis of these genes displayed greater accuracy in predicting the prognostic outcomes of the cancer patients (Table S1 and Figure 4A,B). For example, the prognostic risk stratification power was improved by the following combination of groups, such as SEZ6 and ULBP1, ULBP2 and MAFA, and PCDHD1 and MAFA, compared with the results when the genes were individually analyzed. . Enrichment analysis of significant high-expression and high-risk genes using the ten cancer hallmarks. The horizontal axis represents the logarithm of the p value, and the vertical axis represents the terms of the ten cancer hallmarks. p values less than 0.05 were considered statistically significant, indicated by the red color bar.

Paired Membrane Proteins Displayed More Precise Prognostic Value
To further investigate the combinatorial effects of these genes on the survival of the cancer patients, we established a risk scoring system using a formula containing the gene expression levels and the regression coefficients from the univariate Cox regression model (Table S2). We found that the combined analysis of these genes displayed greater accuracy Figure 3. Enrichment analysis of significant high-expression and high-risk genes using the ten cancer hallmarks. The horizontal axis represents the logarithm of the p value, and the vertical axis represents the terms of the ten cancer hallmarks. p values less than 0.05 were considered statistically significant, indicated by the red color bar.   High-expression and low-expression groups were divided by the mean value of the gene expression. p < 0.05 was considered significantly different.

The Highly Expressed Cell Surface Proteins of the Tumor Tissues were Highly Correlated
To investigate whether these membrane proteins were correlated, we calculated the association among the membrane proteins in the 17 tumor types through a Pearson's correlation test. As shown in Figures 5 and S2, the membrane proteins identified previously were significantly positively correlated with most membrane proteins in every tumor type (Figures 5 and S2).

The Highly Expressed Cell Surface Proteins of the Tumor Tissues were Highly Correlated
To investigate whether these membrane proteins were correlated, we calculated the association among the membrane proteins in the 17 tumor types through a Pearson's correlation test. As shown in Figure 5 and Figure S2, the membrane proteins identified previously were significantly positively correlated with most membrane proteins in every tumor type ( Figure 5 and Figure S2).   Table 2. The red bubbles represent positive correlations. The blue bubbles represent negative correlations. * means p < 0.05, ** means p < 0.01, and *** means p < 0.001.

Identification of TSAs That Are Expressed Less in Normal Tissues
To further investigate the potential "on-target, off-tumor toxicity" effect of these proteins, we obtained the expression levels of the 371 target membrane protein-coding genes (See Table 2 for details) from 54 normal human tissues from the GTEx database. The cumulative distribution analysis demonstrated that the TPM expression levels of most genes were logarithmically distributed between −2 and 1.60 ( Figure S3). We defined the genes with expression levels greater than or equal to 1.60 as high-expression genes, while genes with expression levels less than 1.60 as low-expression genes. According to this threshold, the genes were divided into two categories ( Figure S3). Our results indicated that 184 genes were expressed less in these tissues ( Figure 6) and the other 187 genes, which were specifically and highly expressed in some normal tissues (Figure 7). The genes in part one were relatively highly expressed in all brain tissues, while the genes in part two were widely expressed in all tissues. The genes in part three were mainly expressed in the human epidermis, mucous membranes, and glands. The genes in part four were highly expressed in blood cells, lymphocytes, and the spleen (Figure 7).

Identification of TSAs That are Expressed Less in Normal Tissues
To further investigate the potential "on-target, off-tumor toxicity" effect of these proteins, we obtained the expression levels of the 371 target membrane protein-coding genes (See Table 2 for details) from 54 normal human tissues from the GTEx database. The cumulative distribution analysis demonstrated that the TPM expression levels of most genes were logarithmically distributed between −2 and 1.60 ( Figure S3). We defined the genes with expression levels greater than or equal to 1.60 as high-expression genes, while genes with expression levels less than 1.60 as low-expression genes. According to this threshold, the genes were divided into two categories ( Figure S3). Our results indicated that 184 genes were expressed less in these tissues ( Figure 6) and the other 187 genes, which were specifically and highly expressed in some normal tissues (Figure 7). The genes in part one were relatively highly expressed in all brain tissues, while the genes in part two were widely expressed in all tissues. The genes in part three were mainly expressed in the human epidermis, mucous membranes, and glands. The genes in part four were highly expressed in blood cells, lymphocytes, and the spleen (Figure 7).

Validation of Protein Expression of the Selected Genes
To further validate our above findings, we checked the protein expression levels of the obtained genes from the CPTAC tumor protein database (CPTAC, Clinical Proteomic Tumor Analysis Consortium). We identified the expression levels of 23 proteins in four types of tumors (To be specific, KIRC: 8 proteins, LUAD: 3 proteins, LIHC: 11 proteins, UCEC: 1 protein) (Figure 8). It was found that eight proteins (ADD2, DEF6, DOK3, ENO2, FMNL1, MICALL2, PARVG, and PSTPIP1) in KIRC cancer were more highly expressed in tumor tissues (100%), compared with normal tissues. The expression of two proteins (FERMT1, and PLEK2) in LUAD cancer were higher in tumor tissues (66.6%) than in normal tissues. The expression of nine proteins (CD109, GNG4, MAPT, OSBPL3, PLXNA1,

Discussion
The identification of antigens specifically expressed on the surface of tumor cells is vital for the design of adoptive T cell therapy and ADCs. Our study found that there were

Discussion
The identification of antigens specifically expressed on the surface of tumor cells is vital for the design of adoptive T cell therapy and ADCs. Our study found that there were various highly expressed membrane proteins in a variety of tumor types (17 cancer types), which were also significantly associated with the survival rate of patients. These findings expand the recent document that identified 200 genes as breast cancer subtype-specific targets by differential expression analysis of RNA-seq data from TCGA [60].
In our discovery, 184 genes were lowly expressed in normal tissues, which further supports the advantage of our strategy in identifying potential targets. Some of them were proved to be successful targets with significant curative effects on some malignant hematological tumors, such as anti-CD19 CAR-T therapy for the treatment of chronic lymphocytic leukemia [61], and anti-CTL019 CAR-T therapy for the treatment of relapsed and refractory B-cell acute lymphoblastic leukemia [62]. In addition, CD66c, CD318, TSPAN8, and CLA were identified as candidate targets for CAR-T therapy in a pancreatic tumor patient-derived xenograft model [63].
Our enrichment analysis identified the relationship of these highly expressed surface proteins and the human immune status, which has a great advantage over the findings from Schreiner et al. Although they could predict the surface antigens of several hematological tumors which may be more applicable across cancer types, they did not exclude the potential adverse effects on immune cells, which may dampen the efficacy of the targets discovery [64]. For example, although the CAR-T therapy targeting CD276 was applied for the treatment of tumors [65], it may cause the death of dendritic cells since CD276 is also expressed in dendritic cells [66]. In addition, our screening methods also excluded CD66c, which was screened out by Schäfer D et al, because of its expression in granulocytes [67]. Furthermore, we excluded the use of the CAR-T target of gliomas: CD70 [68], which can consistently activate T cells and lead to T cell dysfunction [69]. The expression of DLL3 on the tumor tissues of patients with invasive breast cancer can promote the infiltration of immune cells, including plasma cells, CD8 T cells, CD44 memory-activated cells, macrophages, and T regulatory cells [70]. Amir et al., demonstrated that MUC-1 was a target for MUC1-positive cancer cells [58]. Stephen et al. developed an engineered CAR-T cells targeting the HER2+ glioma cells, which also improved disease control in patients with glioma [48]. In sum, our target membrane proteins are ubiquitously lowly expressed in normal tissues. Therefore, it may reduce the immune-associated adverse events during the application of CAR-T or ADCs in cancer treatment [71].
Correlation analysis further demonstrated that the surface proteins with poor prognosis were significantly correlated with each other, which suggests that the effects of the surface proteins on tumor progression may not work independently but coordinate with each other. In general, the paired target membrane proteins could be good potential candidates for dual-target CAR-T therapy and ADCs with fewer side effects [72].
Since the selected surface protein-coding genes were closely associated with each other, we speculated that the selected surface proteins could share a similar transcription pattern. When choosing specific types of surface proteins as drug targets, the off-targets side effects may be prevented by controlling the activation of engineered T cells using integrated multi-input signals [3,73]. Bispecific CAR-T cells targeting PD-L1 and MUC16 have an enhanced killing effect on ovarian cancer cells and significantly prolong the survival time of tumor-bearing mice [74]. CAR-T cells with CEA and MSLN as dual targets can accurately locate the tumor site and have higher toxicity to pancreatic malignancy [75]. Consistently, we found the surface proteins that can be used to design bispecific CAR-T cells or ADCs targeting common solid tumors through big data analysis.
In spite of the limited data in the protein database, we still validated the protein expression levels of some of the selected genes. These results make it more convincing that the identified tumor surface protein genes may be potential targets for CAR-T cell therapies and ADCs. Further integration of proteomic information may boost the discovery of TSAs because the post-modification of proteins such as glycosylation, and lipidation, have also been identified as a source of tumor surface antigens by liquid chromatographymass spectrometry [76,77].

Conclusions
In sum, our study revealed some potential tumor-specific surface proteins for the rational design of TSA-based immunotherapies. These findings might pave the way for a comprehensive and efficient approach to construct novel CAR-T cells and ADCs in preclinical animal studies and clinical practice by utilizing tumor-specific surface proteins as multi-target binding sites.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers14225674/s1, Figure S1: Analysis of the differential expression of membrane protein-coding genes; Figure S2: Pearson correlation analysis of the significantly high-expression and high-risk membrane protein-coding genes in some tumor tissues in TCGA; Figure S3: Cumulative distribution of the expression of 371 candidate membrane proteins. The red dashed line was determined to separate the differential expression patterns of these genes; Table S1: The information on the results of HR, FC, Correlation and Prognosis of individual or paired genes; Table S2: Patients' risk scores calculated for gene pairs based on the risk score system.