A Pan-Cancer Approach to Predict Responsiveness to Immune Checkpoint Inhibitors by Machine Learning

Immunotherapy by using immune checkpoint inhibitors (ICI) has dramatically improved the treatment options in various cancers, increasing survival rates for treated patients. Nevertheless, there are heterogeneous response rates to ICI among different cancer types, and even in the context of patients affected by a specific cancer. Thus, it becomes crucial to identify factors that predict the response to immunotherapeutic approaches. A comprehensive investigation of the mutational and immunological aspects of the tumor can be useful to obtain a robust prediction. By performing a pan-cancer analysis on gene expression data from the Cancer Genome Atlas (TCGA, 8055 cases and 29 cancer types), we set up and validated a machine learning approach to predict the potential for positive response to ICI. Support vector machines (SVM) and extreme gradient boosting (XGboost) models were developed with a 10×5-fold cross-validation schema on 80% of TCGA cases to predict ICI responsiveness defined by a score combining tumor mutational burden and TGF-β signaling. On the remaining 20% validation subset, our SVM model scored 0.88 accuracy and 0.27 Matthews Correlation Coefficient. The proposed machine learning approach could be useful to predict the putative response to ICI treatment by expression data of primary tumors.


Introduction
In recent years, immunotherapy has dramatically improved the treatment options in various cancers increasing the survival rates for treated patients. Among the most promising immunotherapeutic approaches there is the pharmacological manipulation of the physiologic immune checkpoints [1][2][3][4]. Immune-checkpoint blockade is the basis for the clinical antitumor activity of the most promising currently approved antibodies targeting the checkpoint molecules CTLA4 (Cytotoxic T-Lymphocyte Antigen 4) , PD1 (Programmed Cell Death 1) and PD-L1 (Programmed cell death ligand 1).Nevertheless, there are heterogeneous response rates to immune checkpoint inhibitors (ICI) [4][5][6] among the different cancer types, and also in the context of patients affected by a specific cancer. Moreover, only a minority of patients with advanced/metastatic cancer respond to ICI, thus exposing the remaining patients to potentially ineffective, toxic and costly treatments. Thus, it becomes crucial to identify predictive factors determining the response to the immunotherapeutic approaches. Intra-tumoral PD-L1 expression, evaluated by immunohistochemistry, is among the first proposed predictive biomarkers but it is not frequently successful [3,7,8]. This lack of success could be explained by the fact that a complex scenario characterized by genomic features, immune systemic state, tumor microenvironment interactions and tumor immune cell interactions is heavily involved in the efficacy of ICI [9][10][11][12][13]. Thus, it has become clear that a more robust prediction needs to involve a comprehensive investigation of the mutational and immunological aspects of the tumor diseases. Evaluation of tumor mutational burden (TMB) by whole-exome sequencing has also been proposed but it has not been demonstrated to sufficiently predict long term clinical benefits [3,4]. On the other hand, three distinct immunological phenotypes, i.e., immune inflamed, immune excluded or immune desert were proposed to categorize the majority of solid tumors in an attempt to explain their different capability to respond to ICI [8,[14][15][16]. These three different immunological subtypes were associated with different transcriptomic profiles based on tumor/tumor microenvironment/immune system cell interactions. In particular: (i) immunogenomics analyses of over 10,000 tumors identified six immune subtypes, encompassing multiple cancer types, that were hypothesized to define different patterns of immune system response with predictive/prognostic relevance [17]; (ii) an immune infiltration score and a T cell infiltration score were proposed by analyzing gene expression signatures of different cancer types to define immunogenicity and potential capability to respond to ICI [18]; (iii) a tumor inflammation signature was proposed to measure pre-existing but suppressed adaptive immune response in different tumors [19]; (iv) a lack of response to ICI was associated with a signature related to transforming growth factor β (TGF-β) signaling in tumors which showed exclusion of CD8+ effector T cells from the tumor parenchyma with, on the other hand, these cells mainly located in fibroblast and collagen reach peritumoral stroma. This TGF-β signature was mainly driven by fibroblasts present in the tumor microenvironment [20]. Overall considered, this previous evidence suggested that pre-existing T cell immunity, TMB and TGF-β signaling could affect response to immunotherapy with immune checkpoint blockade. In the present study, by performing a pan-cancer analysis on gene expression data from the Cancer Genome Atlas (TCGA, 8055 cases belonging to 29 cancer types), we set up and validated a machine learning approach to predict the potential for positive response to ICI.

Results
The study included 8055 primary tumor cases for 29 cancer types from The Cancer Genome Atlas (TCGA) cohort. The number of primary tumor cases for each project is reported in Table 1.
The distribution of TMB of the primary cases across the cancer projects are shown in Figure A1. Previous studies showed that a high TMB is associated with positive response to ICI treatments [5,8]. On the other hand, active TGF-β signaling is associated with a lack of response to ICI treatments [17,20,21]. Following this line of reasoning, we chose to classify as potentially responsive to ICI (hereafter TMB/TGF-β score positive) those cases that simultaneously had a TMB above the third quartile and the TGF-β score under the median value (TGFB_score_21050467 as described in Thorsson et al. [17]). The distribution of cases classified as responsive is reported in Table 1. Of note the tumor type with the highest number of TMB/TGF-β score positive cases was HNSC and the cancer type with the lowest number was GBM (15.57% to 4.08%). By using this TMB/TGF-β score cut off, we evaluated the overall survival (OS), disease specific survival (DSS) intervals and progression free interval (PFI) of all the cases included in the study, simultaneously considering all the TCGA projects using the last revision of the TCGA clinical data ( Figure A2) [22]. Notably, as shown in Figure 1, TMB/TGF-β score positive cases showed significantly longer OS than TMB/TGF-β score negative cases ( Table 2). The strongest associations were found when DSS were considered (Table 2). Moreover, TMB/TGF-β score positive cases showed significantly longer PFI (Table 2). When cases belonging to each project were considered separately different trends were observed (Table A1). Liu et al. [22] presented a curated and filtered analysis for clinical and survival outcome data defining the assessment and recommended use of the endpoints. Noteworthy, TMB/TGF-β score positive cases showed significantly longer OS, DSS and PFI than TMB/TGF-β score negative cases when using a restricted subgroup from 29 cancer types as recommended by Liu et al. [22] ( Figure A3A-C).
To evaluate the immune-related features of gene expression signatures of TMB/TGF-β score positive cases, we classified the cases included in the study according to the six immune subtypes defined in Thorsson et al. [17], where a multi-omic analysis of TCGA datasets allowed the definition of subtypes ( C1 (wound healing), C2 (IFN-γ dominant), C3 (inflammatory), C4 (lymphocyte depleted), C5 (immunologically quiet), C6 (TGF-β dominant) ) useful to classify cancer cases across different cancer types according to distinct immune signatures.
To perform this classification we used an implemented version of the tool proposed in [23]. The number of cases found in each subtype by performing this analysis is reported in Table A2. TMB/TGF-β score positive cases were found enriched in the C2 subtype (IFN γ dominant) characterized by highly mutated tumors. Moreover, while constructing our classification score, we observed a very low number of cases of TMB/TGF-β score positive cases in the C6 (TGF-β dominant) subtype (Table A2) [17]. By considering the entire TCGA cohort, clinical outcomes were in line with those reported in [17] (Figure 2). Notably, within both the favorable prognosis group Cluster 2 and the unfavorable prognosis group Cluster 4, TMB/TGF-β score positive cases showed significantly longer OS intervals than the TMB/TGF-β score negative counterparts (Table 3). Moreover, in Cluster 2, again TMB/TGF-β score positive cases showed significantly longer OS intervals than the TMB/TGF-β score negative counterparts by considering only the subgroup of 20 cancer types according to the recommendations reported in [22] ( Figure A4).
To select the optimal classification model, two machine learning algorithms were used: Support Vector Machines (SVM) and optimized distributed gradient boosting (XGboost). Following the approach depicted in Figure A5, the TCGA transcriptomics data was split into training and test sets. The training set was used for model development, within a 10×5fold Stratified Cross Validation [24], and the test set was used for assessing the model performance. As evaluation metrics, accuracy (ACC) and the Matthews correlation coefficient (MCC) [25,26] were used.

Discussion
The use of ICI has changed the clinical management of tumor-affected patients, although heterogeneous response rates have been found for treated patients across different cancer types as well as for patients affected by a specific tumor type. In particular, ICI might also improve the treatment of urothelial cancer, gastric cancer, colorectal cancer, lung cancer and breast cancer considering the promising results achieved so far and the relatively low efficacy of currently available treatments [27][28][29][30][31]. Given this heterogeneous response, there is the clinical need for predictive biomarkers for the definition of responsiveness to ICI treatments. Currently employed biomarkers, such as PD-L1 expression levels and TMB, have shown an incomplete predictive performance [4]. An alternative point of view could be represented by the introduction of complex biomarkers simultaneously evaluating multiple tumor/tumor microenvironment/immune system features [12,13].
To this aim, starting from genomic, transcriptomic and proteomic data, machine learning approaches could be useful to obtain accurate prediction models for response to ICI treatments [21]. In particular, different approaches, sub-typing oriented and based mainly on gene expression patterns, have been recently proposed [18,21,[32][33][34]. In these studies, machine learning supervised algorithms have been generally trained to match a known phenotype (for example, established by microscopy or with clinical features) to genetic patterns. In the last years, comprehensive immunogenomic analyses of different cancer types, based on TCGA data, have been proposed to characterize tumor heterogeneity in terms of immune-related features, possibly influencing the capability to respond to ICI treatments [17].
Different studies suggested that TMB is associated with survival prognosis in many cancer types, given the association with the formation of neoantigens capable of stimulating anti-cancer T lymphocyte clones. Nevertheless, the mechanism underlying this association could lie in the marked differences in immune cell infiltration densities and immune activities depending on tumor microenvironment immunosuppressive cell populations, T cell exhaustion and tumor associated stromal tissue [5,19,[33][34][35]. Another important point for the different behaviors according to TMB reported in literature is that the method to calculate TMB is not univocal [36,37]. A combination of 2 biomarkers, one dependent from the tumor intrinsic mutational state and one related to the tumor microenvironment, could therefore identify patients that can potentially benefit from ICI. To this aim, to perform the pan-cancer analysis we chose to use a surrogate (i.e., TMB and TGF-β score) to define cases putatively responsive to ICI treatments. The choice to use this surrogate is due to the fact that the comprehensive TCGA case cohort is not homogeneous in terms of employed anti-cancer treatments, with only a minority of cases undergoing ICI treatments.
Thus, to derive a label to be used by a machine learning classifier, we defined as potentially responsive to ICI those cases that simultaneously had a TMB above the third quartile and the TGF-β score under the median value (TGFB_score_21050467 as described in [17]). The choice to use this phenotype to classify cases putatively responsive to ICI could be considered in keeping with the fact that when primary cases of all the 29 TCGA cancer types were simultaneously considered, TMB/TGF-βscore positive cases showed significantly longer OS, DSS and PFI intervals than TMB/TGF-β score negative cases, irrespective of the type of cancer, the clinical and molecular features and the treatment managements of the analyzed cases ( Figure 1). We developed a classification model using as predictors the 2387 genes associated with 160 immuno-related signatures reported in Thorsson et al. [17].
To evaluate classification of TMB/TGF-β score positive cases, we compared SVM and XGBoost algorithms. The best classification performance was obtained using SVM. These results could be explained by the fact that SVM is usually robust, even when the training sample cohort has some bias. The obtained MCC prompts to suggest a mild correlation of the TMB/TGF-β score used to identify responsiveness to ICI with the features used to create the model. Previous proposed methods used different algorithms and combinations of data obtained from different databases [18,21,[32][33][34]. In this context, we focused only on primary tumors and transcriptomics data choosing two surrogates of possible response to ICI. A limitation of our proposed method could be represented by the high number of genes used to classify the putative responsiveness to ICI. However, similar approaches using high number of genes or multi-omic combinations with high numbers of data have been previously published [17,18,21,[32][33][34]. On the other hand, comparison among our proposed model and previously published models seems to be not feasible given the different starting data and different employed approaches. Nevertheless, the proposed model could be naturally extended with multi-modal inputs by adding appropriate embeddings, in particular clinical variables and image data. On the other hand, it is noteworthy that the proposed machine learning classifier could be useful to stratify patients according to the putative responsiveness to ICI treatment, also considering cancer patients comprehensively characterized by immune-related features associated with a favorable prognosis such as those belonging to immune subtype C2.

Datasets
The Cancer Genome Atlas (TCGA) RNA sequencing (RNA-seq) count data (FPKM-UQ) was downloaded (February 2019) from the GDC data portal (portal.gdc.cancer.gov) using the GenomicDataCommons Bioconductor package [38]. We downloaded RNA-Seq data of 29 primary tumors described by Table 1 The tumor mutational burden (TMB) was calculated from the MC3 Public MAF [39] file as described by Alexandrov and colleagues [36,40]. To characterize intratumoral immune states, we scored the 160 immune expression signatures as described by Thorsson and colleagues [17]. We used the signature published on the "Immune-Subtype-Clustering" GitHub repository [41] and then we tested the improved version of the tool [23].
For each cancer cohort, cases were labeled as responsive if they simultaneously had TMB above the third quartile and TGF-β score under the median value (TGFB_score_21050467 as decribed in Thorsson et al. [17]).

Machine learning methods
For the selection of an initial classification model, we evaluated the performance of two supervised learning methods, namely support vector machines (SVM) and extreme gradient boosting (XGBoost). The optimal hyperparameters were selected with a grid search across a space of model-specific parameters. The data were split beforehand into 80% training and 20% test partitions. All models were developed in a 10× 5-fold cross validation (CV) schema on the training partition using the 2387 genes reported by Thorsson et al [17]. Performance was assessed in terms of accuracy (ACC) and Matthews Correlation Coefficient (MCC) [25,26], the performance metric that effectively summarizes in a single value the confusion matrix of a classification task, even when the classes are imbalanced. MCC values are in the [−1, 1] range, where 1 means perfect classification, −1 perfect misclassification, and 0 random prediction or classification of every sample to the largest class. The overall performance in cross-validation is evaluated across all CV iterations as average MCC and ACC with 95% Studentized bootstrap confidence intervals (CI), and on the test partition as MCC and ACC. The classification pipeline was also run with randomized labels as a sanity check for unwanted selection bias effects: in a procedure unaffected by systematic bias, the average MCC should be close to 0. Data were log2-transformed and standardized to zero mean and unit variance before classification; in order to avoid potential information leakage, the standardization parameters from the training set were used for rescaling both training and test subsets.

Computational Details
The classification pipeline was built on top of the Scikit Learn library 0.20.3 [42] using Python 3.6. All the experiments were run on a 32-core Intel Core i7 workstation with 128GB of RAM running CentOS 7.5. Cox regression and Kaplan-Meier survival curves were computed using R (version 3.6.1 ) with survival and survminer packages. Survival curves were compared with the log-rank test. Survival analysis were performed in cases for which all census data were available according to Liu et al. [22]

Conclusions
Balancing between immunostimulative and immunosuppressive factors exerting a role in the tumor/tumor microenvironment/immune system crosstalk can influence the capability to respond to ICI treatment of cancer-affected patients. This results in heterogeneous response rates among different cancer types but also in the context of a specific cancer. In this complex scenario, there is the need to efficiently predict the capability of patients to respond to these immunotherapeutic approaches.
Here, we proposed a machine learning approach to comprehensively investigate mutational and immunological aspects of tumor diseases. This could be useful to efficiently predict the putative response to ICI treatment by expression data of primary tumors. Acknowledgments: All the results here showed are based on data generated by the TCGA Research Network: https://www.cancer.gov/tcga. For the processing of the data, tools provided by the Garr consortium were used as part of the agreement with the Ministry of Health for IRCCS, through the Garr Cloud Platform, a GDPR compliant private-cloud system certified ISO 27001, ISO 27017 and ISO 27018 for information protection.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: