Novel Genetic Variants of ALG6 and GALNTL4 of the Glycosylation Pathway Predict Cutaneous Melanoma-Specific Survival

Because aberrant glycosylation is known to play a role in the progression of melanoma, we hypothesize that genetic variants of glycosylation pathway genes are associated with the survival of cutaneous melanoma (CM) patients. To test this hypothesis, we used a Cox proportional hazards regression model in a single-locus analysis to evaluate associations between 34,096 genetic variants of 227 glycosylation pathway genes and CM disease-specific survival (CMSS) using genotyping data from two previously published genome-wide association studies. The discovery dataset included 858 CM patients with 95 deaths from The University of Texas MD Anderson Cancer Center, and the replication dataset included 409 CM patients with 48 deaths from Harvard University nurse/physician cohorts. In the multivariable Cox regression analysis, we found that two novel single-nucleotide polymorphisms (SNPs) (ALG6 rs10889417 G>A and GALNTL4 rs12270446 G>C) predicted CMSS, with an adjusted hazards ratios of 0.60 (95% confidence interval = 0.44–0.83 and p = 0.002) and 0.66 (0.52–0.84 and 0.004), respectively. Subsequent expression quantitative trait loci (eQTL) analysis revealed that ALG6 rs10889417 was associated with mRNA expression levels in the cultured skin fibroblasts and whole blood cells and that GALNTL4 rs12270446 was associated with mRNA expression levels in the skin tissues (all p < 0.05). Our findings suggest that, once validated by other large patient cohorts, these two novel SNPs in the glycosylation pathway genes may be useful prognostic biomarkers for CMSS, likely through modulating their gene expression.


Introduction
Cutaneous melanoma (CM) is one of the most lethal malignancies of the skin [1]. It is estimated that 96,480 new CM cases will be diagnosed in 2019 in the United States, accounting for about 5.5% of all new cancer cases, and 7230 patients will die of the disease in 2019 [2]. Age, sex, distant metastasis, ulceration, mitosis, and Breslow thickness are known to affect the prognosis of CM patients [3]. In addition, genetic variants in some genes of critical biological pathways may also play an important role in the prognosis [4].
Single-nucleotide polymorphisms (SNPs) are the common form of genetic variants that may affect gene expression and functions, likely leading to the development and progression of CM [5]. However, in genome-wide association studies (GWASs), few functional SNPs have been found to be associated with the prognosis of CM patients [6,7]. This is because a hypothesis-free GWAS always focuses on the most important SNP/genes with a strict p value after multiple tests correction for a large number of SNPs in some nascence comparisons. In the post-GWAS era, one can use available genotyping data from multiple previously published GWAS datasets in a hypothesis-driven approach to perform a biological pathway gene-set analysis with a narrow focus on the SNPs for more relevant comparisons. This approach gives investigators an improved statistical power to more likely identify novel functional loci with minor but detectable effects; therefore, investigators are able to further examine the functional relevance of these loci to unravel potential mechanisms underlying the observed associations [8].
Glycosylation refers to the formation of glycoside bonds between carbohydrates and amino acid residues in proteins catalyzed by glycosyltransferases, which eventually leads to changes in the protein products in cells [9]. As a post-translational modification, glycosylation plays an important role in the regulation of cellular protein functions during cell growth and differentiation, likely affecting the progression of tumor cells [10]. Thus far, hundreds of glycosyl groups have been identified as capable of binding to proteins or lipids and glycosylating to form glycoproteins, glycolipids, and glycans outside the cell membrane [11].
Depending upon a particular glycan, glycosylation is divided into N-linked glycosylation, O-linked glycosylation, C-mannosylation, glypiation, and phospho-glycosylation [12]. For example, the ALG6 alpha-1,3-glucosyltransferase (ALG6) gene, located on chromosome 1p31.3, encodes a protein that adds the first glucose residue to a lipid-linked oligosaccharide precursor, which is essential for N-linked glycosylation [13]. Another polypeptide N-acetylgalactosaminyltransferase 18 (GALNTL4) gene, also known as GALNT18, is located on chromosome 11p15.4, encoding a protein catalyzing the initial reaction in O-linked oligosaccharide biosynthesis [14]. In general, glycosylation of proteins can affect the spatial structure and stability of peptide chains and participate in cell signal transduction, recognition and adhesion, receptor activation, and other biological behaviors. Thus, aberrant glycosylation and modification may affect the proliferation, apoptosis, invasion, metastasis, drug resistance, and immune escape of tumor cells [15]. For example, using a systems biology approach to assess glycosylation in matched samples of primary and metastatic melanoma, one study found an increased core fucosylation that was mediated by fucosyltransferase 8 in metastatic melanoma [16]; another study determined that fucosyltransferase 8 could facilitate invasion and tumor dissemination, in part due to a reduced cleavage of the cell adhesion molecule L1 [17]; and others identified aberrant glycosylation and related molecules as potential therapeutic targets in cancer treatment, because a disaccharide-based inhibitor of glycosylation could attenuate metastatic melanoma cell dissemination [18].
Since glycosylation may play an important role in the progression and metastasis of melanoma, we hypothesize that genetic variants of glycosylation pathway genes are associated with survival in CM patients. Therefore, we tested this hypothesis by using genotyping data from publicly available melanoma GWAS datasets.

Patient Features
The discovery dataset included 858 CM patients from The University of Texas MD Anderson Cancer Center (MDACC), and the replication dataset included 409 CM patients from the Nurses' Health Study (NHS) and Health Professionals Follow-up Study (HPFS); and the baseline characteristics of these patients have been described elsewhere [19,20]. In the MDACC discovery dataset, CM patients were between 17 and 94 years old at diagnosis (52.4 ± 14.4 years old), with a median follow-up time of 81.1 months. There were more males (496, 57.8%) than females (362, 42.2%) and many more diagnosed with stages I/II (709, 82.6%) than with stages III/IV (only one stage IV case) (149, 17.4%). In the NHS/HPFS replication dataset, patients' age ranged between 34 and 87 years at diagnosis (61.1 ± 10.8 years old), and 66.3% (271) were females. These patients experienced a comparatively longer median follow-up time (179.0 months). The death rates, however, were similar between the MDACC (95/858, 11.1%) and NHS/HPFS (48/409, 11.5%) datasets (Table S1). As none of the principal components were significantly associated with CM survival, no noticeable population stratification in either the MDACC or NHS/HPFS datasets was found; therefore, we did not adjust for these principal components in either the discovery or replication analyses. Figure 1 shows the flowchart of the present study design. We first performed a single-locus analysis for associations of 4,770 genotyped and 29,326 imputed SNPs in the 227 glycosylation pathway genes with CMSS. We found that 1,564 SNPs were associated with CMSS (p < 0.05) in an additive genetic model. After multiple test correction by Bayesian false discovery probability (BFDP) < 0.8, 1362 SNPs remained noteworthy; after subsequent replication in the NHS/HPFS dataset, 11 SNPs in five genes remained significantly associated with CMSS. These 11 newly identified SNPs remained statistically significant in the meta-analysis of the two datasets without obvious heterogeneity (Table 1).

Independent SNPs to be Associated with CMSS
We included the 11 significant SNPs, together with other covariates, in a stepwise multivariable Cox regression analysis for the MDACC dataset.
We found that five SNPs (rs10889417, rs672748, rs13297246, rs12270446 and rs7287710) in five genes (ALG6; GALNT10 polypeptide N-acetylgalactosaminyltransferase 10; B4GALT1 beta-1,4-galactosyltransferase 1; GALNTL4; and LARGE LARGE xylosyl-and glucuronyltransferase 1) remained significantly associated with CMSS (p < 0.05). Then, we expanded the model by further including 40 previously reported significant survival-associated SNPs from the MDACC GWAS dataset; two of the newly identified SNPs (ALG6 rs10889417 and GALNTL4 rs12270446) remained independent and significantly associated with CMSS (Table 2). Specifically, we observed a significant protective effect of the ALG6 rs10889417 A allele (P trend = 0.023) and the GALNTL4 rs12270446 C allele (P trend = 0.015) on CMSS. These effects were also replicated in the NHS/HPFS dataset (P trend = 0.032 and 0.020, respectively) and in the combined MDACC and NHS/HPFS datasets (P trend = 0.024 and 0.001, respectively) ( Table 3). Kaplan-Meier survival curves were plotted to visually show the associations between CMSS and the genotypes of ALG6 rs10889417 and GALNTL4 rs12270446, respectively ( Figure 2 a-f). The results of all the selected SNPs are also summarized in a Manhattan plot ( Figure S1). Figure S2 provides the quantile-quantile plot of all the SNPs we used. The regional association plot for each of the two independent and statistically significant SNPs is shown in Figure S3. By using the versatile gene-based association study (VEGAS) method, we performed the gene-based test and identified 7 of 221 genes as having an empirical p value <0.05. Although no significant gene-based test statistic was found for ALG6 and GALNTL4, the top SNPs in both of these two genes had a significant p value <0.05 (Table S2).

Survival of CM Patients with Combined Protective Genotypes
To substantiate the associations between the genotypes of the two independent SNPs and CMSS, we combined the protective genotypes of ALG6 rs10889417 GA+AA and GALNTL4 rs12270446 GC+CC into one variable. Patients were divided into three groups according to the number of protective genotypes (NPGs), and the trend test for a dose-response effect of NPG was statistically significant. Specifically, after adjustment for covariates, wherever appropriate, the effect on CMSS was statistical associated with an increased NPG in the MDACC dataset (P trend = 0.001), the NHS/HPFS dataset (P trend = 0.005), and the MDACC and NHS/HPFS combined dataset (P trend = 0.0003) ( Table 3).

Stratified Analysis for Combined Protective Genotypes' Effect on CMSS
Next, the stratified analysis was performed to test whether the combined effect of the protective genotypes on CMSS was modified by clinicopathological covariables, including age, sex, Breslow thickness, tumor stage, mitotic rate, and ulceration in the MDACC dataset, but only age and sex in the NHS/HPFS dataset. Compared with those with 0 NPG, patients with 1-2 NPG had a significantly better survival, particularly evident in the subgroups aged >60 years, female, regional/distant metastasis, Breslow thickness >1 mm, no ulceration, and mitotic rate >1 in the MDACC dataset and the subgroup aged >60 years in the NHS/HPFS dataset. However, there were no significant interactions between these strata (p > 0.05 for all strata, Table S3).

Receiver Operating Characteristic (ROC) Curve and Internal Replication
To further evaluate predictive effects of the two independent SNPs, the time-dependent area was generated using the area under receiver curve (AUC) function of the ROC curve for CM patients in the MDACC and NHS/HPFS datasets in the presence of other covariables. In the MDACC dataset, the predictive performance of 5-year CMSS ROC was modified by the risk genotypes that were added to the model, with the AUC increasing from 65.88% to 72.01% with other covariables (age and sex) as classifiers; however, the change was not statistically significant (p = 0.821) ( Figure S4b). In the NHS/HPFS dataset, the predictive performance of 5-year CMSS ROC was modified by protective genotypes added to the model, with the AUC increasing from 54.05% to 67.26%, with other covariates (age and sex) as classifiers, and the change was statistically significant (p = 0.031) ( Figure S4c,d).

In Silico Functional Validation
With the online tools for predicting putative functions of genetic variants, we found that the rs10889417 A allele was significantly associated with increased expression levels of ALG6 mRNA in cultured skin fibroblasts (p = 0.015, Figure 2j) and whole blood cells (p = 0.001, Figure 2j). Besides, the GALNTL4 rs12270446 C allele was correlated with lower mRNA expression levels in normal tissue samples from 517 donors' suprapubic skin and 605 donors' sun-exposed lower leg skin from the genotype-tissue expression (GTEx) project (all p < 0.05, Figure 2k). Additionally, we also performed expression quantitative trait loci (eQTL) for the correlations between corresponding mRNA expression levels and genotypes of SNPs of the same gene in 373 normal lymphoblastoid cell lines from the 1000 Genomes Project database; however, we did not find any evidence for such a correlation ( Figure S5).
The functional prediction was then performed for the two independent SNPs by the online bioinformatics tools SNPinfo [21], RegulomeDB [22], and Haploreg [23] in order to predict their functions, the results of which are summarized in Table S4. Based on experimental data from the ENCODE project available for the 227 glycosylation pathway genes (Table S5), the rs10889417 is located on the Pitx2 motif ( Figure S6). By using the PROMO online program, we also found that ALG6 was a potential target gene of the transcription factor Pitx2 (Table S6). To clarify the effect of gene expression on the survival of cancer patients, we used an online tool based on the TCGA dataset to compare the survival of patients by the cuff-values at the 30th upper percentile and 30th lower percentile of the corresponding gene. We found that the expression levels of ALG6 seemed not to be substantially associated with melanoma survival (p = 0.671) ( Figure S7a). On the other hand, the higher expression levels of GALNTL4 seemed to be associated with a poorer melanoma survival, but the correlation was also not statistically significant (p = 0.084) ( Figure S7b). However, the higher expression levels of ALG6 were associated with a better survival in colon adenocarcinoma (p = 0.028) and lung squamous cell carcinoma patients (p = 0.038), respectively ( Figure S7c,e). Meanwhile, the higher expression levels of GALNTL4 seem to be associated with a poor survival probability in colon adenocarcinoma (p = 0.385) and stomach adenocarcinoma patients (p = 0.088), respectively; however, the correlation was also not statistically significant (Figure S7d,f).

Discussion
In the present study, we investigated the associations between 34,096 genetic variants of 227 glycosylation pathway genes and CMSS by using a two-phase analysis of genotyping data from two previously published GWAS datasets: A hospital case-control dataset for the discovery and a cohort follow-up dataset for the replication. We found that two SNPs (i.e., ALG6 rs10889417G>A and GALNTL4 rs12270446G>C) were independently associated with the survival of CM patients. In subsequent genotype-mRNA expression correlation analysis, we found that the low death-risk-associated rs10889417 A allele was associated with increase in ALG6 mRNA expression levels in cultured skin fibroblasts and whole blood cells and that the rs12270446 G allele was associated with decrease in GALNTL4 mRNA expression levels in skin tissues.
Since about half of human proteins are glycoproteins, and glycosylation is one of the most important post-transcriptional modifications, it is undeniable that glycosylation of different proteins plays a key role in multiple cellular activities, including tumor proliferation, invasion, metastasis, tumor-induced immune regulation, and drug resistance [10]. For example, the biological function of integrin is related to the adhesion of the extracellular matrix, and integrin is also found to be one of the abundant proteins in metastatic melanoma cells [30]. One study has identified that integrin-reduced cell adhesion is the result of modification of N-acetylglucosaminyltransferase III, which is involved in the N-glycosylation process [31].
Although the present study suggested that the expression of ALG6 seemed not to have a significant effect on CM survival, a high ALG6 expression was associated with a better overall survival (OS) in colon adenocarcinoma and lung squamous cell carcinoma patients. In a previous animal study, however, we noted that knockdown of the mouse Alg6 gene increased melanoma lung metastasis without affecting primary tumor growth, which may lead to a shorter survival [32]. However, the gene expression profiles in tumor tissues are more likely affected by mutations in the driver genes commonly seen in tumor tissues. Therefore, further epidemiological investigation and tissue-based functional experiments are needed to clarify these discrepancies in the future.
Mutations in ALG6 may lead to congenital disorders of glycosylation (CDG), called ALG6-CDG, of which missense mutations P. A333V and P. I299Del are the most common mutations [33]. Some patients with CDG have severe immune deficiencies, likely leading to tumorigenesis [34]. For example, in acquired immunodeficiency patients with CM, such as those with human immunodeficiency virus infection, tumors may have explosive progress [35] while in the melanoma immune response, the inflammation and tumor immune response have shown significant abnormalities, leading to the progress of the disease [36,37]. These suggest that a possible effective treatment for CM may be immunotherapy, which has recently become important in improving the prognosis of melanoma patients [38]. Although the ALG6 gene malfunction may play a role in the poor prognosis of melanoma, we also showed that the mutation rate of ALG6 in melanoma tissues was less than 2.56%, and thus the overall expression levels of ALG6 in melanoma tissues are more likely to be affected by SNPs. Further supporting evidence is that ALG6 rs10889417 is located on the Pitx2 motif, indicating its potential regulatory roles in ALG6 mRNA expression. Because of the close relationship established between ALG6 and N-linked glycosylation and melanoma cell activity [39], genetic variation in ALG6 is likely to play a role in CM progression and prognosis.
No published study has investigated the role of GALNTL4 in CM tumor progression and survival. However, previous GWAS studies have shown an association between genetic variants of GALNTL4 and protective effects against differentiated thyroid cancer [40], and GALNTL4 SNPs are reportedly associated with the cisplatin sensitivity of urothelial cancer [41] and the efficacy of gemcitabine combined with platinum in the chemo treatment of bladder urothelial carcinoma [42]. In the present study, we showed that the rs12270446 G>A located in the intron of GALNTL4 was associated with CMSS. Given the fact that the mutation rate of GALNTL4 in melanoma tumors is low, it is likely that genetic variant-associated GALNTL4 gene expression may be the mechanism underlying the observed association, which deserves further investigation.
The present study is subject to several limitations. First, the two GWAS datasets we employed were both from Caucasian populations, which may not be generalizable to the general population. Second, all the participants in the two GWAS studies may have been treated with distinctive therapies that were not made available for our analysis. However, we believe these therapy regimens, if any, might not have been selective to the genetic variation of the patients. Third, we obtained the glycosylation-related genes from the GSEA/MSigDB website, a major publicly recognized dataset, but we might have missed some other unknown and important genes involved in this metabolic pathway. Fourth, the literature has well documented that the prognosis of stage III/IV melanoma is significantly different. Unfortunately, we had only one case of stage IV CM cases in the MDACC dataset. This made it impossible for us to perform stratified analysis by stages III and IV separately. Lastly, we were not able to investigate the biological mechanisms to understand how ALG6 rs10889417 G>A and GALNTL4 rs12270446 G>C influence CMSS, which should be further investigated in the future.

Study Populations
The discovery analysis used genotyping data from the MDACC melanoma GWAS study, and the replication analysis used genotyping data of the GWAS dataset from NHS/HPFS. In the MDACC study, all patients from a hospital-based case-control CM study were recruited among non-Hispanic white patients. In the NHS/HPFS study, patients were those who developed CM during the course of follow-up. Detailed descriptions of the subject selection and data collection for these two GWASs have been published elsewhere [19,20]. According to the protocol approved by the Institutional Review Boards of the MDACC, Brigham and Women's Hospital, and Harvard T.H. Chan School of Public Health, all subjects provided a written informed consent and the registration form for participation as required.

Gene Selection and SNP Genotyping
We selected 227 glycosylation pathway genes located on the autosomes by inquiring the Molecular Signatures Database of the GSEA website [43,44] (Table S5). Genomic DNA was extracted from whole blood cells in the MDACC GWAS dataset and used for genotyping by the Illumina HumanOmni-Quad_v1_0_B array. The National Center for Biotechnology Information Database of Genotypes and Phenotypes (dbGaP Study Accession: phs000187.v1.p1) provided the genotyping data. Based on the 1000 Genomes Project, phase I v2 CEU (March 2010 release), utilizing the MACH software, we performed the genome-wide imputation. We included both typed (having a genotyping success rate of 95% with a Hardy-Weinberg equilibrium p value of 10 -5 ) and imputed (r 2 > 0.8) common SNPs (a minor allele frequency of 0.05) within ±2 kilobase flanking regions of these glycosylation pathway genes. In the NHS/HPFS GWAS dataset, whole blood DNA samples were used for genotyping with the HumanHap610 array, Affymetrix 6.0 array, and Illumina HumanHap550 array, and imputation was based on the haplotype information and genotyped SNPs from phase II HapMap CEU data (March 2012 release), applying the program (MACH March 2012 release) using a similar quality control to that for the MDACC GWAS dataset. For each glycosylation-related SNP, quantile-quantile and Manhattan plots were generated to summarize the genome-wide meta-analysis.

Statistical Methods
In the MDACC discovery analysis using a multivariable Cox proportional hazards regression model, we first assessed in a single-locus analysis the associations between selected SNPs in 227 glycosylation pathway genes and CMSS by calculating HR and its 95% CI using R software (GenABEL package) [45]. After being adjusted for other covariaales, including age, sex, Breslow thickness, ulceration, tumor stage, and mitotic rate, the multivariable analysis was performed. In the NHS/HPFS replication analysis, however, the only covariaales available for adjustments were age and sex. Survival time was defined as the time between the dates of diagnosis of CM to the date of death. Patients known to be alive were censored at the time of the last contact.
Since most SNPs were estimated with a high level of linkage disequilibrium (LD), we utilized BFDP with a threshold of r 2 =0.8 for LD in multiple test corrections, as recommended [40], rather than the false discovery rate. We also used a prior probability of 0.1 to detect an HR of 2.0, which was related to the variant genotype or minor allele of SNP (p < 0.05). Next, in performing stepwise multivariable Cox regression analysis with those validated SNPs for selecting independent tagging SNPs, we used the MDACC dataset that had more detailed covariate information. We then conducted a meta-analysis to combine the estimates from the MDACC dataset with those in the NHS/HPFS dataset using PLINK 1.90 with the Cochran's Q statistics and I 2 . A fixed-effects model was used, because no significant heterogeneity was found between the two datasets (Q test p > 0.1 and I 2 < 25.0%). We also performed gene-based tests by using the VEGAS approach that was integrated in the VEGAS2 program [46,47]. In brief, for a given gene with n low LD SNPs, the correlation p value was first transformed into a Chi-squared statistic with one degree of freedom. Then, the gene-based test statistics were calculated by adding all Chi-squared statistics within the gene. A large number of simulations were carried out with multivariable normal distribution, and the proportion of simulated test statistics based on the empirical p-value of gene is more than that of observed test statistics based on the gene. Kaplan-Meier estimation of the survival functions and log-rank test was carried out, and the comprehensive effect of the visual protective genotype was taken as a genetic score for predicting CMSS.
ROC curves were further constructed to illustrate the ability of the time-dependent AUC to predict CMSS. The ROC plots were generated by using the R packages "survival" and "timeROC" [48]. For the stratified analyses by subgroup, we calculated the inter-study heterogeneity and assessed the interaction between strata. For the significant and independent SNPs identified from the multivariable analysis, functional prediction was performed by using bioinformatics online tools: RegulomeDB [22,49], SNPinfo [21,50], and HaploReg [23,51]. The transcription factors in the promoter regions were predicted with PROMO online tools [52].
By using R (version 3.5.0) software for linear regression analysis, the eQTL were further analyzed to evaluate the correlation between SNPs and the mRNA expression levels of the genes. The mRNA expression data of these genes were obtained from 373 lymphoblastic cell lines of European descent included in the 1000 genome project [43] and GTEx project version 8 [44]. Next, we evaluated the associations of mRNA expression with the OS of additional cancers through Kaplan-Meier analysis in colon adenocarcinoma, lung squamous cell carcinoma, and stomach adenocarcinoma patients [53]. Unless otherwise specified, all statistical analyses were carried out using SAS software (version 9.4; SAS Institute, Cary, NC).

Conclusions
Two independent SNPs (i.e., ALG6 rs10889417G>A and GALNTL4 rs12270446G>C) were found to be significantly associated with CMSS in the MDACC discovery and NHS/HPFS replication datasets. The combined analysis showed that these two SNPs were significantly associated with survival, and patients' better prognosis may be achieved through more protective genotypes influencing gene expression. Our findings provide some new insights for further functional studies to identify potential molecular mechanisms underlying the observed CM survival.
Supplementary Materials: The following materials are available online at http://www.mdpi.com/2072-6694/12/2/ 288/s1, Table S1: Distributions of characteristics of CM patients in MDACC and NHS/HPFS; Table S2: Significant results of glycosylation related genes in the gene-based test with the VEGAS method; Table S3: Stratified analysis of protective genotypes of the identified SNPs in the MDACC and NHS/HPFS datasets; Table S4: Function prediction of two independent SNPs in GALNTL4 and ALG6; Table S5: List of 227 selected glycosylation pathway genes. Figure S1: Manhattan plots of associations between SNPs and CMSS in the MDACC dataset (a) and the NHS/HPFS dataset (b). Figure S2: Quantile-quantile plot of all SNPs in the MDACC GWAS dataset. Figure S3: Regional association plots contained 100-kb up and downstream of the gene regions in (a) ALG6 and (b) GALNTL4. Figure S4: Time-dependent AUC and ROC curves for five-year CMSS prediction. Figure S5: Associations between genotypes and their corresponding mRNA expression levels. Figure S6: Functional prediction of SNPs in the ENCODE project. Figure S7