Control of Glycosylation-Related Genes by DNA Methylation: the Intriguing Case of the B3GALT5 Gene and Its Distinct Promoters

Glycosylation is a metabolic pathway consisting of the enzymatic modification of proteins and lipids through the stepwise addition of sugars that gives rise to glycoconjugates. To determine the full complement of glycoconjugates that cells produce (the glycome), a variety of genes are involved, many of which are regulated by DNA methylation. The aim of the present review is to briefly describe some relevant examples of glycosylation-related genes whose DNA methylation has been implicated in their regulation and to focus on the intriguing case of a glycosyltransferase gene (B3GALT5). Aberrant promoter methylation is frequently at the basis of their modulation in cancer, but in the case of B3GALT5, at least two promoters are involved in regulation, and a complex interplay is reported to occur between transcription factors, chromatin remodelling and DNA methylation of typical CpG islands or even of other CpG dinucleotides. Transcription of the B3GALT5 gene underwent a particular evolutionary fate, so that promoter hypermethylation, acting on one transcript, and hypomethylation of other sequences, acting on the other, cooperate on one gene to obtain full cancer-associated silencing. The findings may also help in unravelling the complex origin of serum CA19.9 antigen circulating in some patients.


Introduction
The extreme complexity of multicellular organisms requires that their molecular and cellular interactions are tuned with absolute precision in terms of temporal and spatial organization. Compared with that of a simple multicellular organism of few hundred cells only, such as C. elegans, the organization of a mammal of several billion cells is more complex by many orders of magnitude. This level of complexity requires that the information contained in the genetic code is translated in molecular and cellular interactions by mechanisms ensuring both fidelity and flexibility. While fidelity is easily guaranteed by the classical deterministic mechanisms at the basis of the template-driven biosynthesis of nucleic acids and proteins, flexibility requires highly-tunable and promptly-reversible mechanisms. Evolution has developed levels of control of gene expression and of protein function that fulfill these requirements: the first is represented by the epigenetic mechanisms regulating transcription; the second by posttranslational modifications of proteins, among which glycosylation plays a pivotal role.
Glycosylation is a metabolic pathway consisting of the enzymatic modification of proteins and lipids through the stepwise addition of sugars that gives rise to the glycoconjugates (or glycans): glycoproteins, glycolipids and proteoglycans. Unlike the biosynthesis of nucleic acids or proteins, which is a deterministic process, glycosylation is a stochastic (or probabilistic) event in which the final sugar chain product results from the cooperative and competitive interaction of multiple players. A pivotal role in glycosylation is played by glycosyltransferases, the enzymes responsible for the step-wise addition of individual sugar residues, but other factors, such as the availability of the sugar donors, of the protein or lipid substrates, or the action of the glycohydrolases, contribute to determine the full complement of glycans that cells produce (the glycome). Historically, protein-or lipid-linked sugar chains have been considered to play multiple biological roles [1], and in general, it can be said that they mediate the -fine tuning‖ of cellular and molecular interactions. This means that, besides contributing to biological processes through carbohydrate-protein interactions, sugar chains can also regulate protein-protein interactions. An example of a carbohydrate-protein interaction is provided by the cell adhesion phenomena mediated through members of the selectin family and their carbohydrate ligands as the sialyl Lewis antigens [2]. An example of protein-protein interaction finely tuned by the action of carbohydrates is provided by immunoglobulins G, where the sugar chain N-linked to asparagine 297 of their Fc portion determines the pro-or anti-inflammatory effects of the antibody [3].
Epigenetic chromatin modifications and protein or lipid glycosylation, although profoundly different from the mechanistic point of view, share important key features. First of all, they are both the product of non-deterministic processes. DNA methylation and histone methylation/acetylation depend on the relative abundance of the enzymes adding or removing methyl and acetyl groups, and glycosylation depends on that of enzymes adding or removing sugar residues. Second, both processes are reversible, although usually very stable. Third, although both modifications can have a huge functional impact on DNA and proteins, the basic information carried by these macromolecules resides at the level of their primary sequence.
Epigenetic mechanisms deeply impact the glycosylation machinery through either DNA methylation or histone methylation/acetylation, or both (many of them are reviewed in [4] and [5]). Relevant examples include the biosynthesis of bioactive molecules, such as the histo-blood group ABO, Lewis and Sda antigens, the T antigen, the mucins [6] and the GlcNAcylation of proteins, which, in turn, affects the epigenome per se [7,8].
Through epigenetic mechanisms of regulation, the genome can cope with environmental challenges without the need to select advantageous random mutations, a process that would require many generations. Analogously, an antibody with a given antigen specificity can change its downstream effects simply by adding or removing a specific monosaccharide on its N-linked chain.
It has been proposed that the epigenetic control of networks of genes, like those involved in glycosylation, is a crucial resource used by higher organisms to compete or collaborate with microorganisms [5] in response to environmental stimuli. Even more interesting, there is increasing evidence of the transgenerational transmission of epigenetic changes [9,10]; however, the molecular mechanisms responsible for the transmission of these changes to gametes remain unknown.
Despite such an enormous possibility of future research, the present state-of-the-art concerning the complex interplay among genetic and epigenetic mechanisms regulating the expression of glycosylation-related genes (glycogenes) is far from being elucidated.
In the last few decades, an innumerable amount of research articles reported the silencing of genes determined by DNA methylation in the context of CpG islands located near promoters and the consequent possibility of reactivating them through action on demethylating agents, with particular focus on cancer-associated gene silencing. This overwhelming stream of data point to the equation: DNA hypermethylation equals cancer equals gene silencing. However, it has been reported since the 1980s [11,12] that cancer transformation is indeed associated with a global DNA hypomethylation, since the majority of CpG dinucleotides, scattered in the genome at a relative low density and far from gene promoters, are indeed hypomethylated in cancer [13,14]. In other words, only CG dinucleotides within DNA regions of high CpG density, i.e., CpG islands frequently located near gene promoters, are hypermethylated in cancer and hypomethylated in normal cells, while the opposite occurs for all other CpGs. While the strong association between CpG island hypermethylation and gene silencing in cancer has been widely documented and characterized for a long time, the role of the hypermethylation of the other CpG dinucleotides in normal cells versus hypomethylation in cancer has not been investigated until very recently. An increasing amount of recent data suggests that the methylation of CpG dinucleotides outside typical CpG islands plays a relevant role in alternative promoter usage, regulation of short and non-coding RNAs, alternative RNA processing and enhancer activity, even in the context of neoplastic transformation [15].
The aim of the present review is to briefly describe some relevant examples of glycogenes controlled through DNA methylation and to focus on the intriguing case of a glycosyltransferase gene, the regulation of which involves at least two promoters and a complex interplay between transcription factors, chromatin remodelling, DNA methylation of typical CpG islands or even of other CpG dinucleotides. It appears responsible for efficient cancer-associated silencing and probably fine tissue-specific expression of a galactosyltransferase enzyme isoform (B3GALT5), which also underwent a particular evolutionary fate.

Methylation Control of Glycogenes
The global effect of methylation on the glycome was studied by high-throughput techniques in cells treated with 5-aza-2'-deoxycytidine (5AZA), a DNA methyltransferase inhibitor that has shown substantial potency in reactivating epigenetically silenced tumor suppressor genes. It has revealed a strong impact on sialylation, core fucosylation and N-linked branching [16]. Other relevant examples are the following.
Glycosyltransferases: A variety of glycosyltransferases are regulated by epigenetic mechanisms, mainly promoter methylation. Aberrant promoter methylation is often at the basis of glycosyltransferase modulation in cancer. Examples are provided by enzymes directly involved in selectin ligand biosynthesis [17], such as sialyltransferase ST3GAL6 [18] and fucosyltransferase FUT3 [19], as well as enzymes affecting indirectly selectin ligand biosynthesis, because of their involvement in the biosynthesis of alternative structures, such as ST6GALNAC6 [20], the sulfate transporter, DTDST [21], or B4GALNT2 [22][23][24]. The latter provides an example of a gene downregulated in cancer, in which promoter demethylation is a condition necessary, but not sufficient to restore a physiological level of enzyme expression [25]. Furthermore, glycosyltransferases involved in the high branching of the N-linked chains of glycoproteins, such as GlcNAcT-IV [26,27] and GlcNAcT-V [28], display methylation-dependent modulation in cancer. Finally, the downregulation of the glycosyltransferase responsible for the biosynthesis of A antigen (of the ABO blood group system), frequently observed in cancer, is at least partially dependent on the hypermethylation of its promoter [29][30][31].
Enzymes of sugar nucleotides biosynthesis: The methylation-dependent inhibition of UDP-GlcNAc 2-epimerase/ManNAc kinase, the key enzyme of the biosynthesis of the sugar donor, CMP-sialic acid, may be responsible for decreased sialylation, even in the presence of unaltered sialyltransferase levels [32,33]. This modification can be induced by latent HIV infection of T-cells [32].
Galectins: Galectins are galactose-specific mammalian lectins that mediate a variety of biological phenomena, including cell proliferation and apoptosis, either through binding to galactose-containing glycoconjugates or through carbohydrate-independent intracellular mechanisms [34]. Promoter methylation appears to be the major mechanism regulating galectin expression in physiological and pathological conditions. Galectin-1, a product of the LGALS1 locus, induces apoptosis of T-lymphocytes and cancer cells. In colorectal cancer, galectin-1 expression is silenced by promoter hypermethylation, resulting in reduced apoptosis [35], while in mixed lineage leukemia (MLL)-rearranged B-lymphoblastic leukemias, galectin-1 is overexpressed, because of the histone methylation of the LGALS1 promoter [36]. Galectin-3 expression is frequently altered in cancers with divergent effects on tumor growth [37]. In prostate tissues, the galectin-3 promoter is unmethylated, while it becomes strongly methylated in the early stages of prostate cancer, but less methylated in high-grade cancers [38,39]. Galectin-3 modulation is controlled by promoter methylation also in thyroid cancer [40], in colon cancer of the mucinous type [41] and in pituitary tumors [42]. Galectin-7 acts as a tumor suppressor in gastric cancer, and its downregulation is due to promoter hypermethylation [43], while in lymphoma progression, it is upregulated, because of promoter hypo-methylation [44].

The Intriguing Case of the B3GALT5 Gene
β1,3 galactosyltransferase B3GALT5 is responsible for type 1 chain oligosaccharide synthesis, including the selectin ligand, sialyl-Lewis a, epitope of tumor marker CA19.9 and other Lewis antigens, such as Lewis a and Lewis b (Figure 1) [45,46]. In mammary gland, thymus and trachea, as well as in some human cancer cell lines, transcription is mainly driven by a promoter that was found to be sensitive to nuclear factor NF-Y also in mice [47]. Due to the conservation among distant mammalian species, it was referred to as the native promoter. In the organs of the gastrointestinal tract (such as the colon, stomach and pancreas) acts another promoter, stronger than the native promoter [47][48][49]. This alternative promoter has a retroviral origin (named LTR), which was probably acquired about 25-30 million years ago [50] and is regulated through the hepatocyte nuclear factor HNF1 [48,49]. In various cell lines of different tissue origin, but even among those derived from the same tissue, B3GALT5 transcript, as a whole, is differentially expressed [45,51]. Moreover, it is strongly downregulated in colon cancer with respect to the normal mucosa [48,52]. Surprisingly, the amounts of transcription factors involved in the regulation of either transcript were not correlated with the expression levels of the cognate transcript. In particular, NF-Y is rather ubiquitous [53] and well detected in several cell lines, where the native B3GALT5 transcript is completely absent, or in colon cancer biopsies, where it is faintly detected [54]. Similarly, HNF1α or β, or both, are also easily detected in cell lines or tissues, including colon cancers, where the levels of the B3GALT5 LTR transcript are minimal or undetectable, opening the question about the mechanisms responsible for tissue-specific expression and cancer downregulation of both transcripts [55]. Interestingly, the native promoter is located between two CpG islands, while the 650 bp-long LTR transposon, which includes a shorter LTR first exon and the cognate promoter, contains only seven dispersed CG pairs, and no CpG island is present in the proximal sequences ( Figure 2).

Regulation of B3GALT5 Native Promoter
Methylation analysis of CpG islands 1 and 2 surrounding the native promoter, performed by both quantitative pyrosequencing and direct bisulfite sequencing of 12 and 66 CG dinucleotides of CpG islands 1 and 2, respectively, indicated a strong inverse correlation between transcript expression and DNA methylation status [54]. In particular, in HuCC-T1 cells, which express the highest level of the transcript, CpG 2 appeared almost unmethylated, and CpG island 1 was just scarcely methylated (mean methylation levels ~20%). In MKN-45 and MCF-7 cell lines, where the expression levels of transcript are from low to moderate, CpG island 1 was more methylated (mean methylation values ~80% in both lines) while CpG island 2 was almost unmethylated in MCF-7 and mildly methylated in MKN-45 (mean methylation levels ~30%). In HCT-15 and MDA-MB-231 cells, where the transcript is undetectable, both CpG islands were hypermethylated (form 70% to 90%). In matched normal and tumor colon samples, the methylation levels of both islands were increased in cancer with respect to the corresponding normal mucosa. In the latter, where the expression of the transcript is detectable at different levels, mean methylation values of CpG island 1 were below 50%, and those of CpG island 2 below 20%. In cancer samples, this increases up to 60%-70% in CpG island 1 and up to 40% in CpG island 2. Similarly, the degree of methylation was higher in breast cancer biopsies than in normal counterparts, suggesting that DNA methylation of the B3GALT5 native promoter probably accounts for transcript silencing in cancer.
The quantitative ChIP assay from the above-mentioned cell lines expressing different levels of B3GALT5 native transcript confirmed that the transcriptional activity of the native promoter is associated with chromatin status. In fact, high expression of the transcript (HuCC-T1 cells) was found together with high levels of modifications associated with transcriptionally competent chromatin (H3K4me3, H3K79me2, H3K9Ac and H3K9-14Ac) and low levels of those related to silenced chromatin (H3K27me2 and H4K20me3). Moderate to low expression of the transcript (MCF-7 and MKN-45 cells) was associated with a similar pattern, with quantitative differences especially in MKN-45 cells. The absence of the transcript (HCT-15 and MDA-MB-231 cells) was related to the opposite histone code. These cell lines, in fact, resulted in being negative for H3K4me3, H3K79me2 and H3K9-14Ac and positive for H3K27me2, H4K20me3 and H3K9Ac.
Treatment of such cells for different times with different concentrations of DNA methyltransferase inhibitor 5AZA or histone deacetylase inhibitor Trichostatin A (TSA), or with a combination of both, was unable to restore a detectable expression. However, the treatment was able to reduce methylation from ~80% to ~60% in MDA-MB-231 and from ~70% to ~40% in HCT-15. Indeed, both agents were able to increase the expression of the transcript in MKN-45 cells that express a low amount per se of the transcript. In particular, either TSA or 5AZA treatments increased the expression by ~80%. A combination of both drugs failed to provide any further improvement. In line with expression finding, pyrosequencing analysis of MKN-45 cells treated with 5AZA showed demethylation of both islands. Altogether, the results suggest that complex epigenetic modulation underlies the regulation of the native B3GALT5 promoter (Table 1).

Regulation of the B3GALT5 LTR Promoter
In MKN-45 cells, the LTR promoter of B3GALT5 is also expressed, and the amount under steady-state conditions is about eight-times higher than that of the native transcript. Surprisingly, the sensitivity to the same drugs was totally different: TSA treatment had no effect on the LTR transcript, while 5AZA strongly impaired expression. Treatment with 5AZA of COLO-205 cells, which express the LTR transcript at the highest levels found, provided similar results: strong silencing of the B3GALT5 LTR transcript and no effect on HNF1. Interestingly, LTR promoter analysis in vitro indicated that the HNF1 binding site is the only functional part of the LTR promoter, and that no other binding sites, for stimulatory or inhibitory factors, are physiologically relevant [55]. Conversely, DNA demethylation obtained through 5AZA treatment reproduced in vitro the downregulation of the transcript observed among cell lines and cancer biopsies in vivo. In fact, in treated cells, the levels of the B3GALT5 LTR transcript decreased from 3-10 to less than 0.2 fg/pg β-actin, while the amounts of HNF1 remained unchanged, as found in colon cancer biopsies. Since LTR and proximal sequences do not contain CpG islands, the methylation-sensitive DNA sequences probably represent element(s) involved in transcriptional regulation residing outside the LTR sequence and distant from the promoter. Alignment of the LTR sequence and the whole B3GALT5 gene in the context of chromosome 21 ( Figure 2) revealed that the CpG islands regulating transcription of the native B3GALT5 mRNA are the only typical promoter-associated CpG islands present. However, shorter stretches of CpG dinucleotides, referred to as CpG short islands, were detected using the EMBOSS Cpgplot software, as reported [55]. They were found one in an intron, and the others in the intergenic regions [55]. Unfortunately, due to the extremely high homology of this human sequence with that of the other primates sharing the LTR transposon [50], no prediction can be made in silico about the relevance of any such islands.
The methylation of stretches of CpG dinucleotides shorter than CpG islands associated with promoters is emerging as a relevant aspect of transcriptional control [15], being responsible for the recruitment of alternative promoters, regulation of non-coding RNA synthesis or modulation of enhancer activity. In particular, hypomethylation of enhancer sequences is reported to negatively regulate transcription in cancer and during tissue differentiation [15]. The occurrence of distal regulatory elements binding transcription factors in a methylation-dependent manner was recently reported even in breast cancer [56]. B3GALT5 transcription thus represents a promising model to address such novel issues, since hypomethylation of distant sequences, acting on the LTR transcript, and promoter hypermethylation, acting on the native transcript [54,55], cooperate on one gene to obtain full cancer-associated silencing.
Since the two transcripts differ in their 5'UTRs only, the involvement of a common non-coding RNA in the regulation of both can be hypothesized. However, since the effect of demethylating agents on the two transcripts is almost the opposite, the existence of a regulatory 3' CpG island sequence appears not probable. On the other hand, the involvement of transcriptional activators requiring the methylation of specific CG dinucleotides for efficient binding can be postulated. In this regard, we have started performing bisulfite sequencing of stretches of CG dinucleotides located at various distances from the LTR promoter. Preliminary results in cancer cell lines suggest an association between the methylation of specific CG dinucleotides and the expression levels of the LTR transcript, thus supporting such a working hypothesis.
The fine methylation-dependent silencing of B3GALT5 reported in colon cancer has the potential to represent a wider cancer-associated phenomenon, suggesting that many cancers, including those arising in the pancreas or stomach, may lack the expression of B3GALT5 transcripts and cognate Lewis antigens. In light of very recent findings demonstrating that the CA19.9 detected by immunohistochemistry in cancer specimens seems to be a technical artifact [57], the problem of the origin of CA19.9 circulating in cancer patient sera appears very relevant due to the clinical implications. In fact, the assessment of the actual CA19.9 status of a colon cancer may be very important for the prognosis, since the expression of the true antigen promotes angiogenesis and, in turn, tumor growth [57]. On the other side, recent findings indicate that the value of serum CA19.9 is not able to predict the actual expression of the antigen by colon cancer cells, as it has been assumed so far [58]. Our working hypothesis involves the possibility that mainly normal epithelial cells of gastrointestinal origin express B3GALT5 and cognate type 1 chain Lewis antigens, due to the strong epigenetic constrains. Metabolic pathways of such cells may be deranged as a consequence of the invading tumor mass, resulting in an increased reabsorption of the CA19.9 antigen in the blood stream. Interestingly, in cells of different histological origin as those of the prostate, it was recently reported that a defined CA19.9 molecule, where the sLea antigen is carried by MUC1 mucin, a different β1,3 galactosyltransferase isoenzyme, B3GALT1, is specifically involved in the biosynthesis, which appears upregulated by the enhancement of acetylated histone-3 and histone-4 induced by suberoylanilide hydroxamic acid [59].

Conclusions
The role of DNA methylation has been restricted for several years to CpGs belonging to islands located nearby gene promoters, with particular emphasis on their hypermethylated status associated with gene silencing in cancer and on the possibility to restore expression using demethylating agents. The B3GALT5 gene offers an intriguing example of the role of distant DNA sequences in controlling a promoter of retroviral origin, where CpG dinucleotides are probably dispersed or assembled in non-canonical islands. In this case, demethylation determines gene silencing and is associated with cancer. Concurrently, another promoter of the same gene is instead affected by the methylation status of classic CpG islands, but its activity is poorly restored by demethylating agents. Noteworthy is that the transcript levels correlate with enzyme activity and, in turn, substrate glycosylation in the case of B3GALT5 (51), while this is not yet known for some other epigenetically-regulated glycogenes. Future studies are necessary to directly show the link between epigenetic regulation of transcription and the actual effect on the cell glycosylation pattern.