Identification of Immune Function-Related Subtypes in Cutaneous Melanoma

Tumour immunotherapy combined with molecular typing is a new therapy to help select patients. However, molecular typing algorithms related to tumour immune function have not been thoroughly explored. We herein proposed a single sample immune signature network (SING) method to identify new immune function-related subtypes of cutaneous melanoma of the skin. A sample-specific network and tumour microenvironment were constructed based on the immune annotation of cutaneous melanoma samples. Then, the differences and heterogeneity of immune function among different subtypes were analysed and verified. A total of 327 cases of cutaneous melanoma were divided into normal and immune classes; the immune class had more immune enrichment characteristics. After further subdividing the 327 cases into three immune-related subtypes, the degree of immune enrichment in the “high immune subtype” was greater than that in other subtypes. Similar results were validated in both tumour samples and cell lines. Sample-specific networks and the tumour microenvironment based on immune annotation contribute to the mining of cutaneous melanoma immune function-related subtypes. Mutations in B2M and PTEN are considered potential therapeutic targets that can improve the immune response. Patients with a high immune subtype can generally obtain a better immune prognosis effect, and the prognosis may be improved when combined with TGF-β inhibitors.


Introduction
Cancer is not a single disease but a collection of multiple biological entities, and each has its own molecular characteristics and clinical significance [1]. Traditionally, cancer classification was based on histopathology and clinical characteristics, most of which depend on clinicians' judgement. However, studies show that prognosis and treatment responses vary between cancer subtypes and within subtypes. Therefore, it is necessary to develop methods that can accurately predict the key results of individual patients [2,3]. Although some progress has been made in the field, there are still unexplained intertumoural heterogeneities leading to varying survival outcomes and differences in treatment responses [4]. Gene expression features are easy to measure from tissue samples, and may be used to identify personalized "driving mutations" [5], differentially expressed genes and pathways [6], and personalized gene networks [7]. Although genes have been successfully used as biomarkers for cancer diagnosis [8], it is not clear whether gene biomarkers are the most suitable method for treatment indicator classification. Conversely, it may be more meaningful to describe diseases by using system-specific dysfunction rather than the dysfunction [9,10] of individual molecules, which is based on the gene regulatory network.

Sample-Specific Networks Were Constructed with the SING Method
We selected the C7 data set in the MSigDB database as the annotated database and the modified PAN method as SING (single sample immune signature network), which followed the analytic process below (Figure 1). Feature selection (1) The high-dimensional nature of gene expression data makes it necessary to reduce dimensionality before analysis. For each gene data set, the top 500 genes were selected according to the order of their total variance and included in subsequent analysis and calculation. (2) Construction of gene and immune annotation correlation matrix (3) , is the correlation matrix between the gene and immune annotation, in which corresponds to the sample, p corresponds to the first gene, and corresponds to the first annotation information in the immune annotation database. The corresponding element in the matrix (line , column ) is the expression value of; ; otherwise, it is recorded as 0. (4) Development of sample-specific network (5) The symmetric correlation matrix , is further generated by the matrix , . Each element in the matrix corresponds to the Euclidean distance between two annotation pieces of information and * . Afterward, to transform into a discrete network, this study further selected the edge of the top 10% weight in the symmetric correlation matrix , to construct the adjacency matrix. (6) Computing network topology information Once the discrete network construction of each sample was completed, the topological property "degree" of the network was calculated and obtained for subsequent clustering analysis. For a given sample , its network is represented as ( , ), and the corresponding topological property is denoted as h s , which is a vector, ℎ = [ℎ 1 , ⋯ , ℎ | | ] Feature selection (1) The high-dimensional nature of gene expression data makes it necessary to reduce dimensionality before analysis. For each gene data set, the top 500 genes were selected according to the order of their total variance and included in subsequent analysis and calculation. (2) Construction of gene and immune annotation correlation matrix (3) M j p,q is the correlation matrix between the gene and immune annotation, in which j corresponds to the j sample, p corresponds to the first gene, and q corresponds to the first annotation information in the immune annotation database. The corresponding element in the matrix (line p, column q) is the expression value of p; otherwise, it is recorded as 0. (4) Development of sample-specific network (5) The symmetric correlation matrix A j p,q is further generated by the matrix M j p,q . Each element in the matrix corresponds to the Euclidean distance between two annotation pieces of information q and q * . Afterward, to transform into a discrete network, this study further selected the edge of the top 10% weight in the symmetric correlation matrix A j p,q to construct the adjacency matrix. (6) Computing network topology information.
Once the discrete network construction of each sample was completed, the topological property "degree" of the network was calculated and obtained for subsequent clustering analysis. For a given sample S, its network is represented as G S V S , E S , and the corresponding topological property is denoted as h s , which is a vector, h s = h s 1 , · · · , h s

Clustering Algorithm
Clustering is an effective algorithm for molecular typing and has been successfully applied in the molecular typing of many kinds of cancer [1,27]. We used a model-based clustering algorithm based on the Gaussian finite mixture model (R package "mclust" v5.4.7) to find the best cluster numbers of the SKCM subtypes. For cancer subtype classification, we used hierarchical clustering and K-means algorithms. In the validation cohort, supervised consensus clustering was used to verify the existence of immune subtypes. The gene expression profile was resampled 500 times. In each resampling, 80% of the samples and 80% of the genes in the original matrix were taken. During each clustering process, the distance was measured by the 1-Pearson correlation coefficient, and the linkage was set as error sum of square (ESS) between classes.

Assessment of Immune and Stromal Cell Infiltration
As the degree of infiltration is related to the prognosis of immunotherapy in immune and stromal cells, we used the R package "MCPcounter" (v1.2.0) [28] to calculate the abundance fraction of eight immune cells (T cells, CD8 + T cells, NK cells, cytotoxic lymphocytes, B cell lineages, single-cell lineage cells, myeloid dendritic cells, and neutrophils) in the tumour microenvironment and two stromal cells (endothelial cells and fibroblasts). Then, "xCell" (v1.1) [29] was used to calculate 64 immune and nonimmune cell types, and "ESTIMATE" (v1.0.13) [30] was used to calculate the immune cell and stromal cell infiltration score.

Cytolytic Activity
Cell dissolution activity, also known as cell killing activity, can be used to characterize the immunogenic nature [31] of tumours. We obtained the cytolytic activity by calculating the geometric mean of granzyme A (GZMA) and perforin 1 (PRF1) gene expression in each sample.

Gene Differential Expression Analysis
The standard analysis process of "DESeq" (v1.30.1) [32] was used to analyse the differential gene expression of the original count data obtained by RNA sequencing. Nominal p values were adjusted by the false discovery rate (FDR). Genes with fold change ≥ 2 and FDR < 0.05 were considered differentially upregulated genes, while those with fold change ≤ 1/2 and FDR < 0.05 were considered differentially downregulated genes.

Differential Gene Enrichment Analysis
The gene enrichment analysis included GO functional annotation and KEGG pathway analysis. The GO functional annotation was completed by the DAVID 6.8 [33,34] online database, and KEGG pathway analysis was performed by the R package "clusterProfiler" (v3.18.1) [35]. Single-sample gene set enrichment analysis (ssGSEA) was performed based on the R package "GSVA" (v1.38.2) [36] to obtain the enrichment fraction of the gene set in a single sample. The scores of 24 microenvironment cell types were calculated, and the final microenvironment cell abundance was further adjusted by tumour purity through the formula score/(1 − tumor purity).

Recognition of Cancer Mutation-Driven Genes and Calculation of Tumour Mutation Load (TMB)
Based on cancer somatic mutation data, we used the R package "MutSigCV" (v2.0) [37] to identify the significantly driving mutated genes of the subtypes (q < 0.05). Then, "maftools" (v2.6.05) were used to calculate the TMB in each subtype [38] to reflect mutations in tumour cells.

DNA Methylation Differential Analysis
We analysed two methylation statuses of the promoter: (1) hypermethylation status and (2) demethylation status. First, low-quality probes were filtered and differentiated CpG sites were identified. This result suggested that when log2FoldChange > 0 and the adjusted p value < 0.01, the probe showed a hypermethylation level; when log2FoldChange < 0 and the adjusted p value < 0.01, the probe showed a low methylation level (R package "ChAMP" v2.20.1) [39].

Survival Analyses
To confirm whether there was a significant difference in survival between subtypes, a Kaplan-Meier curve [40] was used for survival analysis and the log-rank test for detection of significant difference (p < 0.05). Univariate Cox proportional hazards regression was used to calculate the predictors associated with cancer survival [41]. After integrating all predictors, the nomogram [42] was used to reflect the correlation between the variables in the prediction model. A multivariate Cox regression model according to the published literatures [43,44] was used to calculate the risk score, which showed prognostic value.

Determination Aand Verification of Cluster Number
Based on the degree topological property matrix of immune annotation samplespecific as input matrix, the optimal number of clusters were tested from one to six. The sample data were divided into two categories ( Figure 2A). Then, the hierarchical clustering algorithm where the distance was measured by 1-Pearson's coefficient and the clustering method was set as ward. Finally, 146 samples fell into the first category, called the C1 class and 181 samples fell into the second category, called the C2 class.
To verify the validity of the clustering results, the two clustered categories were compared with the TCGA classification results. A high correlation was found between them, and the immune subclass mainly overlapped with the C2 class ( Figure 2B). Furthermore, the two categories were compared with another published dataset ( Figure 2C) that contained 47 patients with SKCM [45] who responded to immunotherapy. C2 may have a high likelihood of responding to anti-PD-1 therapy (Bonferroni corrected p = 0.008). Then, univariate Cox regression was used to analyse those factors that may affect the survival of SKCM patients (Supplementary Table S2). Age at diagnosis, body weight, race, tumour stage, UV markers, UV rate, metastatic status, and ploidy were significantly associated with patient prognostic survival (p < 0.05). The nomogram ( Figure 2D) can predict the survival rate of patients at different factor levels.

Definition and Extraction of Immune Subtypes
To explore the immune characteristics in SKCM, we defined subtypes of C1 and C2 based on the immune and matrix score, infiltration abundance of immune and stromal cells, and cytopathic activity. As the C2 class was more immune-infiltrated, we defined the C1 class as the "normal class" and C2 as the "immune class".
The optimal clustering number of the immune class was three according to the BIC value ( Figure 3A). Based on the K-means clustering method, the number of samples for the three classes were 38, 133, and 10. Combining immune clustering with normal class (Supplementary Figure S1), "immune2 class" can be significantly separated from other classes. "Immune3 class" and "normal class" had high similarities in the tumour microenvironment. In addition, there was no significant difference in the degree of immune infiltration between the "normal class" and "immune3 class" (Supplementary Figure S2). The immune infiltration of the "immune1 class" and "immune2 class" were significantly different. Therefore, the "immune3 class" and "normal class" were combined as the "newnormal class" in this study. Comparing the tumour microenvironment and immune score of the three types ( Figure 3B,C), we found significant differences among them. "xCell" was used to calculate the invasion abundance of tumour immune and stromal cells. Tumour immune infiltration in the "immune 2 class" was significantly higher than that in the other two subtypes ( Figure 3D). A significant prognostic difference among them (p = 0.0039) was Life 2021, 11, 925 6 of 15 also observed: the "new-normal class" was the worst, followed by "immune1 class", and "immune2 class" was the best ( Figure 3E).

Definition and Extraction of Immune Subtypes
To explore the immune characteristics in SKCM, we defined subtypes of C1 and C2 based on the immune and matrix score, infiltration abundance of immune and stromal cells, and cytopathic activity. As the C2 class was more immune-infiltrated, we defined the C1 class as the "normal class" and C2 as the "immune class".
The optimal clustering number of the immune class was three according to the BIC value ( Figure 3A). Based on the K-means clustering method, the number of samples for   Combined with the immune score and survival prognosis of each subtype, we renamed the "new-normal class" the "immune inactivation subtype", the "immune 1 class" the "low immune subtype" and the "immune 2 class" the "high immune subtype".

Heterogeneity Analysis between High and Low Immune Subtypes
Based on differential gene expression analysis, the expression of 1030 genes in the "high immune subtype" were significantly upregulated relative to that in the "low immune subtype", and 87 genes were significantly downregulated ( Figure 4A). Compared with the immune cell gene set, 240 genes were found to be immune-related among the significantly upregulated genes in the "high immune subtype", while only one immune-related gene was significantly downregulated. These results indicate that the "high immune subtype" had more immune-enriched characteristics.

Immunity Subtype Validation
In this study, four independent cohorts were integrated as a verification set for supervised analysis (206 tumour samples). First, the existence of an immune class (Cluster 1) and a normal class (Cluster 2) were verified, with the immune class having higher immune signatures (Supplementary Figure S6). To further validate immune subtypes based on the tumour microenvironment, three subtypes of the TCGA cohort were analysed for pairwise differential expression, and the corresponding differentially expressed genes were selected as the specific signatures of the three SKCM immune subtypes. Among them, a 737-gene signature was identified for "immune inactivation subtypes", a 525-gene signature for "low immune subtype", and an 839-gene signature for "high immune sub- Based on 1117 differentially expressed genes between the "high immune subtype" and "low immune subtype", GO functional annotation and KEGG pathway enrichment were further analysed. These differentially expressed genes were mainly related to the immune response, inflammatory response, T cell proliferation, leukocyte proliferation, and chemokinemediated signalling pathways ( Figure 4B). Using KEGG pathway analysis, 16 signalling pathways were identified (Supplementary Table S3). Among those, 13 also appeared in Life 2021, 11, 925 9 of 15 the normal and immune classes. The PI3K-Akt, MAPK, and TGF-β signalling pathways appeared in the enrichment pathway of differentially expressed genes between the "low immune subtype" and "high immune subtype". Several KEGG pathways were also found to be related to Th1 cell differentiation, Th2 cell differentiation, Th17 cell differentiation, and PD-L1 expression. They showed an increasing trend among the "immune inactivation subtype", "low immune subtype" and "high immune subtype" (Supplementary Figure S3). There were significant differences (Th1: p < 2.2 × 10 −16 , Th2: p = 3.5 × 10 −11 , Th17: p = 4.0 × 10 −12 ), and the abundance of MDSCs was the highest in the "immune inactivation subtypes" and lowest in the "high immune subtype" (p = 9.1 × 10 −15 ).
The influence and interaction between various factors in the tumour were complex, and the methylation status and gene mutations in the tumour had a strong relationship with gene expression, which would affect tumour occurrence and development. Therefore, we continued to explore the differences between the "high immune subtype" and "low immune subtype" based on tumour methylation and gene mutation. In this study, 21,981 probes were screened for methylation differences between high and low immune subtypes; 982 probes were hypermethylated, and 4620 probes were demethylated. After mapping these probes to genes and annotating with immune gene sets, a total of 597 genes showed hypermethylation but were epigenetically silenced in the "high immune subtype", among which 10 genes were immune system related. In addition, 1490 genes were demethylated, of which 60 were related to immunity. Then, six significantly mutated genes were identified with a p-value < 0.01, including NRAS, BRAF, B2M, PTEN, TP53, and CDKN2A (Supplementary Figure S4). NRAS appeared in both the "high immune subtype" and the "low immune subtype". In addition, BRAF and CDKN2A, which are star mutations in SKCM [46], appeared in the "high immune subtype", and the mutation frequency of BRAF in the "high immune subtype" was greatest. BRAF is commonly seen in the MAPK pathway (RAS/RAF/MEK/ERK pathway), and mutation of this pathway is an important factor that causes growth and development of melanoma. These results indicate that NRAS and BRAF mutations were specific characteristics of the two subtypes. Additionally, B2M and PTEN mutations occurred only in the "immune inactivation subtype".
There were no significant differences in TMB levels between the "high immune subtype" and "low immune subtype". Because the TMB level of the "immune class" was significantly higher than that of the "normal class" (p = 0.00058; Supplementary Figure S5), we suggested that the immune class was more likely to benefit from immunotherapy. Moreover, the "high immune subtype" was more promising for PD-1 treatment [45] (Bonferroni correction p = 0.03; Figure 4D). Cell cytotoxicity and cytolysis are also markers of tumour immunogenicity. The cytotoxicity and cytolysis of the "high immune subtype" was significantly higher than that of the "low immune subtype" (p = 3.2 × 10 −9 ; Figure 4C).

Immunity Subtype Validation
In this study, four independent cohorts were integrated as a verification set for supervised analysis (206 tumour samples). First, the existence of an immune class (Cluster 1) and a normal class (Cluster 2) were verified, with the immune class having higher immune signatures (Supplementary Figure S6). To further validate immune subtypes based on the tumour microenvironment, three subtypes of the TCGA cohort were analysed for pairwise differential expression, and the corresponding differentially expressed genes were selected as the specific signatures of the three SKCM immune subtypes. Among them, a 737-gene signature was identified for "immune inactivation subtypes", a 525-gene signature for "low immune subtype", and an 839-gene signature for "high immune subtype". Using the above characterization gene signatures for supervised consensus clustering ( Figure 5A), three subtypes of the validation cohort were reproduced: "high immune subtype" (98 samples), "low immune subtype" (58 samples), and "immune inactivation subtypes" (50 samples). Among the three subtypes, significant differences were observed in cell cytotoxicity and cytolysis, immune score, and stromal score ( Figure 5B-D, p < 0.05). ronment, stromal score, and immune score of the three clusters ( Figure 5E-G, p < 0.05), we found significant differences among them. Tumour immune infiltration in the "C1" was significantly higher than that in the other two clusters, which was highly similar to the "high immune subtype". However, cytolytic activity score among three clusters were not significant as the expressions of the cell lines were too low (Supplementary Figure S7). Above all, these results suggested that specific immune subtypes were present in cutaneous melanoma and that their immune responses varied according to subtype characteristics in tumour samples and cell lines. Figure 5. Identifications of "high immune subtype", "low immune subtype", and "immune inactivation subtype". (A) Consensus clustering by using three gene signatures on validation sets. (B-D) Significant differences among the three subtypes were found in cell cytotoxicity and cytolysis, immune score, and stromal score (all, p < 0.05). (E-G) Significant differences among the three clusters in cell lines were found in tumour microenvironment, immune score, and stromal score (all, p < 0.05).
Then, we validated the results on cell lines of cutaneous melanoma. We applied the SING method on the cell lines data (n = 49), the optimal clustering number was three according to the BIC value (Supplementary Figure S7). Comparing the tumour microenvironment, stromal score, and immune score of the three clusters ( Figure 5E-G, p < 0.05), we found significant differences among them. Tumour immune infiltration in the "C1" was significantly higher than that in the other two clusters, which was highly similar to the "high immune subtype". However, cytolytic activity score among three clusters were not significant as the expressions of the cell lines were too low (Supplementary Figure S7). Above all, these results suggested that specific immune subtypes were present in cutaneous melanoma and that their immune responses varied according to subtype characteristics in tumour samples and cell lines.

Discussion
In recent years, research on tumour immunotherapy has made rapid progress, and immunotherapy led by SKCM has received extensive attention [18]. At present, molecular typing based on immune function is considered an effective method to improve the efficacy of tumour immunotherapy [47] because it can identify people with specific biological characteristics for valuable treatment. The construction of a sample gene network [48] can help to characterize complex and heterogeneous multigroup data sets and improve the accuracy of subtype identification and subsequent analysis. LIONESS and PAN [14][15][16] have already been proven to identify subgroups of different disease types and to solve clinically related classification problems.
We herein started by considering the tumour microenvironment and tumour cells and built the SING method by immune annotation information as background to explore the immune function of cutaneous melanoma-related subtypes of skin. The SING algorithm is helpful to identify immune functions related to subtypes. The "immune inactivation subtype" had two unique genetic mutations, B2M and PTEN, which meant that they were molecular features of the normal class. Studies have found that B2M is associated with antigen presentation in the tumour immune cycle [49]. B2M gene-encoded β-20 microglobulin (MHC-1) is an important adjunct to major histocompatibility complex class 1, and its absence leads to a decline in MHC-1 expression, affecting antigen presentation and resulting in PD-1 drug resistance [50]. A meta-analysis to systematically review [51] showed that PD-1 inhibitors significantly improved the progression-free survival (PFS), overall survival (OS) and overall response rate (ORR) in patients with advanced melanoma. Anti-PD-1 drugs such as pembrolizumab and nivolumab can achieve long-term survival for patients with metastatic melanoma [52]. PTEN, as a tumour suppressor gene [53], was found only in the normal class, suggesting that PTEN mutations may be a factor affecting immune properties. Therefore, we suggest that they are potential therapeutic targets to improve the immune response. Compared with the "low immune subtype", "high immune subtype" showed higher immune infiltration and immune activation characteristics and was significantly enriched in TGF-β signalling pathways [54,55]. TGF-β signalling pathways have been repeatedly reported [54,56,57] to be associated with immune regulation, and the use of TGF-β inhibitors may improve the immune response to immunotherapy. In recent years, inhibitors targeting TGF-β pathway have been discovered and investigated by pharmaceutical companies for cancer therapy, and some of them are in clinical trial now, such as Phase I study of GC1008 (fresolimumab), Ph II study of NCT01453361 (Gemogenovatucel-T) and so on [58,59]. In 2019, Kaczorowski et al. [60] suggested that overexpression of SMAD7 may be a new hallmark inhibitor of TGF-β in melanoma. In 2021, Liu et al. [61] proposed a triple combination therapy with PD-1/PD-L1, BRAF, and MEK inhibitor in stage III-IV melanoma as significantly improving PFS of patients. Based on the risk score between the three immune subtypes, "high immune subtype" showed better prognostic value. Therefore, we speculated that patients with "high immune subtype" biological characteristics can generally obtain a better immune prognosis and that the immune prognosis is improved if combined with TGF-β inhibitors. In the "low immune subtype", only one significantly mutated gene, NRAS, was found, which was considered a molecular feature of this subtype.
Although we divided the immune subtypes of cutaneous melanoma from the effects of tumour cells and the tumour microenvironment, there were still some inextricable links among various factors that we did not consider. We also have not considered the interaction between factors from comprehensive perspectives. Not with-standing its limitation, this study does suggest that the research of cancer typing based on immune function can be enhanced by using a sample-specific network. Further study is needed on how to combine the key factors that influence the response to cancer and treatment to conduct a complete analysis of cancer.

Conclusions
In summary, this comprehensive study classified cutaneous melanoma skin samples based on immune function and proposed three subtypes according to SING and the tumour microenvironment. Mutations in B2M and PTEN are considered potential therapeutic targets that can improve the immune response. "High immune subtype" patients can generally obtain a better immune prognosis effect if TGF-β inhibitors are combined.