N-Glycomic and Transcriptomic Changes Associated with CDX1 mRNA Expression in Colorectal Cancer Cell Lines

The caudal-related homeobox protein 1 (CDX1) is a transcription factor, which is important in the development, differentiation, and homeostasis of the gut. Although the involvement of CDX genes in the regulation of the expression levels of a few glycosyltransferases has been shown, associations between glycosylation phenotypes and CDX1 mRNA expression have hitherto not been well studied. Triggered by our previous study, we here characterized the N-glycomic phenotype of 16 colon cancer cell lines, selected for their differential CDX1 mRNA expression levels. We found that high CDX1 mRNA expression associated with a higher degree of multi-fucosylation on N-glycans, which is in line with our previous results and was supported by up-regulated gene expression of fucosyltransferases involved in antenna fucosylation. Interestingly, hepatocyte nuclear factors (HNF)4A and HNF1A were, among others, positively associated with high CDX1 mRNA expression and have been previously proven to regulate antenna fucosylation. Besides fucosylation, we found that high CDX1 mRNA expression in cancer cell lines also associated with low levels of sialylation and galactosylation and high levels of bisection on N-glycans. Altogether, our data highlight a possible role of CDX1 in altering the N-glycosylation of colorectal cancer cells, which is a hallmark of tumor development.


Introduction
Glycans form an important part of the outer layer of the cell and are involved in major biological processes, including cell differentiation, adhesion, and interactions with other cells, pathogens, or the extracellular matrix, as well as cellular transformations such as cancer [1][2][3]. Glycans occur in many structural variants as modifications on proteins and lipids. One type of glycosylation on proteins is the N-glycosylation, in which oligosaccharides are attached via an N-glycosidic linkage to an asparagine (N) of the consensus sequence -N-X-S/T-(X is any amino acid except proline, S = serine, T is threonine) [4]. These so-called N-glycans share a common pentasaccharide core-structure and form, depending on the elongation, high-mannose type, complex-type, or hybrid-type N-glycans ( Figure 1A) [4]. Changes of specific N-glycans and other glycans have been associated with several  (Fuc), and N-aceylneuraminic acid (NeuAc). N-glycans share a common core-structure, consisting of two GlcNAc and three Man. Depending on the elongation, three N-glycan types are differentiated, as follows: i) High-mannose type N-glycans, ii) complex-type N-glycans, and iii) hybrid-type N-glycans. The illustrated glycans represent examples. The number of monosaccharides added can vary and complex type glycans can exhibit more than two antennae. A detailed description of N-glycosylation is given by Stanley [4]. (B) Depiction of different Lewis-antigens and involved glycosyltransferase genes. Fucosyltransferase genes FUT3,4,5,6,7,9 are involved in fucosylation α1,3and α1,4-fucosylation of antennae GlcNAc, while FUT1,2 attach α1,2-fucosylation to Gal-residues. Activity of several α2,3-sialyltransferase (ST3GAL genes) attach NeuAc residues to galactoses to form sialyl Lewis antigens.
By screening different colorectal cancer cell lines, we previously showed an association of caudal-related homeobox protein 1 (CDX1) mRNA expression with increased multi-fucosylation (more than one fucose), which is indicative for antenna fucosylation [9]. Antenna fucosylation is the result of the activity of fucosyltransferases encoded by the genes FUT1, 2, 3, 4, 5, 6, 7, and 9 and leads to the expression of blood group related Lewis antigens ( Figure 1B), which are glycan-epitopes on various glycoproteins and glycolipids [10,11]. While FUT3,4,5,6,7, and 9 catalyze the addition of a fucose to the N-acetylglucosamine (GlcNAc) of the antenna in α1,3and/or α1,4-linkage, FUT1 and FUT2 are responsible for the addition of a fucose to the galactose (Gal) in α1,2-linkage, forming the H-type epitope. FUT2 is further called the secretor gene and polymorphisms leading to an inactive FUT2 lead to the absence of blood type epitopes in saliva and various epithelial cell types, the so-called non-secretor phenotype [12,13]. FUT8, on the other hand, is the gene encoding for the enzyme mediating the attachment of a fucose-residue to the first GlcNAc of the N-glycan core, the so-called core-fucosylation. More detailed information on the biological role of fucosylation has been given elsewhere [14].
Fucosyltransferases are expressed in a cell type-and organ-dependent manner and altered expression of the enzymes, as well as the produced fucose-containing glycan epitopes, have been associated with various pathological conditions [15], including colorectal cancer [16,17]. However, literature remains controversial concerning the stage-dependent differences in expression of fucosylated glyco-epitopes in colorectal cancer. Conflicting data arises from various experimental conditions (lectin and antibody based vs. mass spectrometric detection, sample preparation, Cells 2019, 8,273 3 of 21 and culture conditions) and sample sources (cell lines, mouse models, tissues vs. serum or plasma, biopsy location) as well as combining results from studies on various glycan classes (N-, O-, and glycosphingolipid glycans) vs. individual glycan classes. Moreover, core-and antenna-fucosylation are not always distinguished and only a few studies report on stage-dependent changes. While many groups reported a general increase of fucosylation in colon and other cancers [15], other recent studies have shown a stage-dependent decrease of fucosylation, especially Lewis-type antenna-fucosylation, with colon cancer metastasis [18,19]. Notably, it is still largely unclear to which extent differences in the glycosylation machinery, on the one hand, and differences in the expression levels of highly glycosylated proteins such as mucins (e.g., MUC2), on the other hand, contribute to these colorectal cancer glycosylation phenotypes.
The group around Miyoshi and Moriwaki hypothesize that deficiency of fucosylation via mutations of GDP-mannose-4,6-dehydratase (GMDS), an important player in the fucose biosynthesis pathway, helps cancer cells to escape from natural killer (NK) cell-mediated tumor immune surveillance [20,21]. Interestingly, mutations in the GMDS gene have been identified in metastatic lesions of some colon cancers (10%). Also the colon cancer cell line HCT116 bears a GMDS mutation, resulting in almost complete absence of fucosylation. Strikingly, the parental HCT116 cells with GMDS mutation revealed a more aggressive phenotype in tumor formation and metastasis in mice, as compared to the GMDS-rescued HCT116 cells [17]. The group speculated that the loss of fucosylation leads to an escape from NK cell-mediated tumor immune surveillance, while the GMDS-rescued HCT116 were more susceptible to TRAIL-induced apoptosis [21].
For the sialylated variants of the Lewis antigens ( Figure 1B), sialyl Lewis X and sialyl Lewis A, the literature is clearer. The selectin ligands sialyl Lewis X and A have been associated with advanced stages and poor prognosis in colon cancer [22,23] which were attributed to selectin-mediated adhesion and extravasation leading to metastasis. Moreover, sialyl Lewis antigens are used as clinical markers for various cancers (e.g., sialyl Lewis A = CA19-9) [16].
CDX1 is a transcription factor that is involved in the modulation of a variety of processes including proliferation, apoptosis, cell-adhesion, and columnar morphology [24]. CDX1 is a primary controller of enterocyte differentiation and its expression is necessary for the transcriptional regulation of a large number of intestine-specific genes [25] required for the intestine development, differentiation, and maintenance of the intestinal phenotype [24,26]. Several markers of differentiation, including villin and cytokeratin 20, have been shown to be directly transcriptionally regulated by CDX1 [27,28]. Loss of differentiation is known to occur during cancer progression and, in colon cancer, the loss of villin has been associated with poor survival [29].
There is evidence of the loss or down-regulation of CDX1 expression in colon cancer tumors [30,31] and cell lines [32]. With respect to the underlying mechanism, Wong et al. have shown that the loss or reduction of CDX1 is often induced by promoter methylation [33]. Together, these observations indicate a potential role of CDX1 loss in tumor development.
Although glycosylation as well as CDX1 expression have both been shown to play a role in cell differentiation and cancer (suppression), hitherto only very little has been described with regard to CDX1-associated glycosylation profiles. Triggered by our previous findings [9], which suggested a relationship between increased multi-fucosylation (Lewis type glycan epitopes) and high CDX1 mRNA expression, here we characterized the N-glycosylation phenotype of a different set of colon cancer cell lines. These cells were specifically chosen for their high and low expression of CDX1 mRNA, with the aim of replicating our previously found association of fucosylation and CDX1 expression [9] with an independent set of cell lines. The glyco-phenotypic data were further correlated with differently expressed glyco-genes.
We confirmed a higher degree of multi-fucosylation in cell lines with high CDX1 gene expression, which was supported by higher mRNA expression of fucosyltransferases FUT3 and FUT6, both involved in antenna fucosylation. Strikingly, other genes associated with fucosylation were likewise differentially expressed between the investigated cell lines with high vs. low CDX1 Cells 2019, 8, 273 4 of 21 expression. A higher GMDS gene expression is positively associated with high CDX1 mRNA expression in the tested colorectal cancer cell lines and also transcription factors hepatocyte nuclear factor (HNF)1A and HNF4A, which have been previously shown to regulate antenna fucosylation of human plasma proteins [34] and are associated with differentiation [35,36], showed higher expression in CDX1-high expressing cells. Furthermore, the N-glycomic characterization revealed a decrease in galactosylation and sialylation in cell lines with high CDX1 expression, compared to cells with low CDX1 gene expression.

Cells and Cell Culture
Details of colorectal cancer cell line origin and characteristics can be found in Table 1 and more detailed in Supplementary Table S1. CC20, COLO678, GP2D, HCT116, ISRECO1, LIM1863, LS174T, PCJW, RCM1, and SW403 cell lines were cultured in the Dulbecco Modified Eagle Medium (DMEM). CAR1, HCA46, HDC8, OXCO1, and VACO429 were cultured in Iscove's Modified Dulbecco's Medium (IMDM). HCC56 cells were cultured in the RPMI-1640 medium. All of the above media were supplemented with 10% FBS and 1% penicillin-streptomycin. Cell lines were grown in a 10% CO 2 incubator to 50% to 80% confluence before next passage or cell pellet preparation. Cell pellets were washed once with PBS, snap-frozen in dry ice, and stored at −80 • C until further analysis. Cell pellets were dissolved in water and cell counts were estimated using the Countess TM Automated Cell Counter (Invitrogen), based on trypan blue staining.

N-glycan Release, Derivatization, and Purification
N-glycans were released in duplicate from three biological replicates per cell line using a PVDF-membrane based release protocol followed by linkage-specific sialic acid derivatization. Purification by cotton-HILIC-SPE and MALDI-TOF-MS analysis was performed as described previously [9]. Shortly, cell pellets were resuspended in water, disrupted by sonication, and proteins immobilized on 96-well PVDF-filter plates (~0.5 × 10E6 cells/well; 5µL human plasma control; water as blanks) in the presence of chaotropic agents. After washing unbound materials, N-glycans were released overnight at 37 • C. Released glycans were derivatized using ethyl esterification allowing for discrimination of N-acetylneuraminic acid linkages (α2,3 vs. α2,6) [37] in a ratio of 20:100 (sample: derivatization reagent), and purified by cotton-thread HILIC-SPE [37].

MALDI-TOF-MS Analysis
Released, derivatized, and purified glycans (5 µL sample) were spotted on an anchor chip MALDI target plate (Bruker Daltonics) and co-crystallized with 0.5 µL of 5 mg/mL superDHB in 50% can, supplemented with 1 mM NaOH. Spectra were recorded in positive-ion reflector mode, after calibration with a Bruker peptide calibration kit, using a Bruker UltrafleXtremeTM mass spectrometer, controlled by FlexControl 3.4 software Build 119 (Bruker Daltonics). Mass spectra were obtained over an m/z range of 1000 to 5000 for a total of 10 000 shots (1000 Hz laser frequency, 200 shots per raster spot during complete random walk). Tandem mass spectrometry (MALDI-TOF-MS/MS) was performed for structural elucidation via fragmentation in gas-off TOF/TOF mode.

Data Processing and Analysis of MALDI-TOF-MS Spectra
Spectra were smoothed (Savitzky Golay algorithm, peak width: m/z 0.06, 4 cycles), baseline corrected (Tophat algorithm), and exported to xy-files using FlexAnalysis 3.4 (Stable Build 76). Mean average spectra were generated per technical replicate, which were summed to one spectrum using the open-source software mMass (http://www.mmass.org; [38] 111, H7N6L4F1 at m/z 3632.243) as calibrants (minimum five used), followed by peak picking in mMass, with cut-off signal-to-noise (S/N) 3. The peaklist was manually revised and analyzed in GlycoWorkbench 2.1 stable build 146 (http://www.eurocarbdb.org/) using the Glyco-Peakfinder tool (http://www.eurocarbdb.org/ms-tools/) for generation of a glycan compositions list. Our novel in-house software, developed for automated data processing, MassyTools version 0.1.8.0 [39], was used for calibration using a 3 rd degree function and targeted data extraction of the area under the curve for each individual mass spectrum. To prevent over-estimation of overlapping glycan species, only the first three isotopes were extracted and the area under the curve was corrected based on the theoretical isotopic pattern. The quality of the data was assessed using several quality parameters calculated within the software. Only good quality spectra (total intensity > 1 × 10 5 and fraction of analyte area with S/N > 9 is more than 50%) as well as analytes (S/N > 6, ppm < 20, quality score > 0.10) were included for analyses. Raw data after pre-processing is provided in the Supplementary Tables. Finally, the corrected area-under-the-curve values were rescaled to a total relative intensity of 100% for each spectrum. Selected glycan compositions were confirmed by MS/MS and a final peak list as well as MS/MS data is given in Supplementary Table S2. MS/MS spectra were manually interpreted and fragment ions annotated using GlycoWorkbench 2.1 according to the nomenclature of Domon and Costello [40]. Averages of direct traits per cell line were used to build a principle component analysis model in SIMCA Version 13.0 (Umetrics AB, Umea, Sweden), with seven random cross-validation (CV) groups.
For increased robustness, derived glycan traits such as galactosylation, fucosylation, sialylation, and others were calculated in SPSS Version 23 (IBM Corp, Armonk, NY). The formulas for calculation are given in Supplementary Table S4 and the average relative abundances are given in Supplementary Table S3. Due to non-normally distributed data, a two-tailed Mann-Whitney test was performed in Rstudio statistical software environment (Version 0.99.892, Kent, OH, USA, http://www.r-project.org/) with the significance level α = 0.05 to assess differences in N-glycosylation between CDX1 high and low expressing cells. Bonferroni correction was applied to p-values to adjust for multiple testing (Supplementary Table S3). Boxplots for visualization were generated in Rstudio and show the median with the interquartile range.

Gene Expression Microarrays and Data Analysis
Total RNA was extracted by using the RNeasy mini kit according to the manufacturer's instructions. Twenty micrograms of RNA of each sample were sent to the Molecular Biology Core Facility of the Paterson Institute for Cancer Research, Manchester, UK, for gene expression microarray analysis using the Human genome U133+2 chips, following the manufacturer's instructions (Affymetrix, High Wycombe, UK). Microarray data were analyzed using Partek Genomics Suite software. The data were log2-transformed and RMA-normalized (with GC correction) using quantile normalization with Median Polish for Probeset summarization as optional settings in the software.
Glycosyltransferase genes (Supplementary Table S5A) as well as glycan-related genes (Supplementary Table S5B) were selected for analysis and differentially expressed genes were identified from a t-test comparing mean-expression levels of the 8 CDX1 high vs. 8 CDX1 low cell lines. The significant level was adjusted for multiple testing. Fold changes were calculated for CDX1 high and CDX1 low cell lines and for significantly different expressed glycosyltransferases, data was visualized as boxplots in Rstudio showing the median and interquartile range. To evaluate the correlation between relative abundances of N-glycans traits based on mass spectrometry data and glycosyltransferase gene expression in the 16 investigated cell lines, linear regression analysis was performed in GraphPad Prism Version 6 (GraphPad Software, Inc., La Jolla, CA).

CDX1high and CDX1low CRC Cells Exhibit Different N-Glycan Profiles
Our previous data suggested a positive association between the level of fucosylation on protein N-glycans and CDX1 mRNA expression in a set of colorectal cancer cell lines (15 cell lines CDX1/villin positive, 7 cell lines CDX1/villin low or negative) [9]. To validate our previous results and to further explore expression profiles of associated fucosyltransfereases, we characterized the N-glycosylation of an independent set of colorectal cancer cell lines with high (8 cell lines; CDX1 high ) vs. low (8 cell lines; CDX1 low ) expression of CDX1 mRNA. The CDX1 high cell lines investigated here have, on average, 65-fold higher CDX1 mRNA expression as compared to CDX1 low cell lines ( Table 1, Supplementary  Table S1). Two exemplary mass spectra of the CDX1 low cell line Colo678 and the CDX1 high cell line HCA46 are shown in Figure 2. As observed in other cell line profiling studies, the N-glycomic profiles of all analyzed cell lines were dominated by high-mannose type N-glycans, but differed in the ratios of these glycans. With regard to complex type N-glycans, those derived from Colo678 exhibited more sialic acid residues (N-acetylneuraminic acid, NeuAc, purple diamond, angle indicates different linkages), especially in α2,3-linkage, while glycans from HCA46 were characterized by the presence of many fucoses (Fuc, red triangle), low sialylation level, and additional N-acetylhexosamines (HexNAc, white square). In total, 221 individual glycan species were identified across all cell lines, from which 81 could be characterized by tandem mass spectrometry (Supplementary Table S2). In order to evaluate whether CDX1 high and CDX1 low expressing cells can be distinguished based on their N-glycomic signature, a principal component analysis (PCA) was performed on the relative abundances of all individual N-glycans. This resulted in a model with four principle components (PC) explaining 68.9% of the variation in the data. The score plot of PC1 vs. PC2 showed clear separation of CDX1 high and CDX1 low cell lines along PC1 ( Figure 3A) in this unsupervised model, demonstrating a different N-glycan profile between the two groups. Table S2). In order to evaluate whether CDX1 high and CDX1 low expressing cells can be distinguished based on their N-glycomic signature, a principal component analysis (PCA) was performed on the relative abundances of all individual N-glycans. This resulted in a model with four principle components (PC) explaining 68.9% of the variation in the data. The score plot of PC1 vs. PC2 showed clear separation of CDX1 high and CDX1 low cell lines along PC1 ( Figure 3A) in this unsupervised model, demonstrating a different N-glycan profile between the two groups.  On the y-axis, the relative intensity is given with 100% corresponding to the highest peak in each spectrum. The spectrum range of m/z 2350 to m/z 4600 is enlarged in the inset. Main peaks are annotated with glycan cartoons, representing compositions, and the presence of additional structural isomers cannot be excluded. To simplify the cartoons, repeating epitopes are indicated as "nx". Green circle = mannose, Man; yellow circle = galactose, Gal; blue square = N-acetylglucosamine, GlcNAc; white square = N-acetylhexosamine, HexNAc; red triangle = fucose, Fuc; purple diamond = sialic acid, N-acetylneuraminic acid, NeuAc. Differences in N-acetylneuraminic acid linkages are indicated using different angles.

(Supplementary
spectrum. The spectrum range of m/z 2350 to m/z 4600 is enlarged in the inset. Main peaks are annotated with glycan cartoons, representing compositions, and the presence of additional structural isomers cannot be excluded. To simplify the cartoons, repeating epitopes are indicated as "nx". Green circle = mannose, Man; yellow circle = galactose, Gal; blue square = N-acetylglucosamine, GlcNAc; white square = N-acetylhexosamine, HexNAc; red triangle = fucose, Fuc; purple diamond = sialic acid, N-acetylneuraminic acid, NeuAc. Differences in N-acetylneuraminic acid linkages are indicated using different angles.

Higher Antenna Fucosylation on N-Glycans Characterized CDX1-high Expressing Colorectal Cancer Cell Lines
To assess which glycans drive the principal component separation of CDX1 high and CDX1 low colorectal cancer cells, derived glycan traits were calculated by grouping glycan species (direct traits) into classes according to glycosylation characteristics, such as sialylation and fucosylation. The relative abundances of derived traits as well as the calculations are given in Supplementary Tables S3 and S4. Derived traits gave insight into general structural differences and showed increased analytical robustness as compared to individual glycans. Observations from the example spectra were confirmed by comparing derived glycan traits of CDX1 high and CDX1 low expressing cell lines and evaluated using a Mann-Whitney test. The p-values were adjusted for multiple testing (Bonferroni). Labelling of the PC1 vs. PC2 score plots using glycan-derived traits showed that multi-fucosylated glycans ( Figure 3B, colored in green) associated with the location of CDX1 high cell lines in the score plot, in line with observations from the two exemplary mass spectra ( Figure 2). Accordingly, CDX1 high expressing cell lines exhibit significantly higher levels of multi-fucosylation (presence of more than one fucose on a glycan), indicative for antenna fucosylation (Lewis X/A, Y/B), as compared to CDX1 low expressing cell lines (∅ 54% vs. ∅ 33%; p-value = 0.011; Figure 4A, Supplementary Table S3) and confirmed the association found in our previous study [9]. In order to map the N-glycomic phenotype to transcriptomic data, a gene microarray was performed and glycosyltransferase gene expression data was extracted for the 16 investigated cell lines (Supplementary Table S5A). Further, a linear regression analysis was used to evaluate the correlation between MS-based N-glycan traits and glycosyltransferase gene expression data (Supplementary Table S6). The trait CFa, representing multi-fucosylation in complex type N-glycans, showed significant correlation with fucosyltransferase genes FUT2, 3, 4, 6, and 7 which are involved in antenna fucosylation (Supplementary Table S6), thereby indicating that the trait CFa is a good representation for antenna fucosylation in this data. The glycosyltransferase gene expression was also tested for differential expression between CDX1 high and CDX1 low cell lines and, in accordance with mass spectrometry data, all fucosyltransferases involved in antenna-fucosylation showed higher expression in CDX1 high cell lines, though after correction for multiple testing only FUT3 and 6, involved in Lewis X/A biosynthesis, showed significantly increased expression (2.7-to 5.7-fold) in the eight CDX1 high cell lines, compared to eight CDX1 low cells ( Figure 5A,B, Supplementary Table S5A). In contrast, mono-fucosylation, indicative for core-fucosylation (CFc), was lower in CDX1 high cells as compared to CDX1 low cells and also FUT8 gene expression showed a trend towards lower expression in CDX1 high cells (Supplementary Tables S3 and S5A).

CDX1high Expressing CRC Cell Lines Exhibit a Lower Level of Sialylated N-Glycans as Compared to CDX1-Low Expressing Cells
In our previous study, we observed a trend towards a negative association between overall Nglycan sialylation and CDX1 mRNA expression, though with no significant difference after correction for multiple testing. To further explore this association, we labelled the score plot of PC1 vs. PC2 with derived sialylation traits and found that sialylated glycans largely marked CDX1 low cell lines ( Figure  3C, colored in blue and red). In agreement, overall sialylation was significantly lower in CDX1 high cell lines compared to CDX1 low expressing cell lines (∅ 36% vs. ∅ 21%; p-value = 0.046; Figure 4B, Supplementary Table S3). Results from the current study showed pronounced sialylation differences for α2,3-sialylated N-glycans with ∅ 11% in CDX1 high vs. ∅ 23% in CDX1 low cell lines (p-value = 0.023;  Furthermore, the GDP-L-fucose precursor GDP-mannose 4,6-dehydratase (GMDS gene) gene expression was 3.7-fold (p-value 1.4 × 10E-03) elevated with high CDX1 expression ( Figure 5C; Supplementary Table S5B). The corresponding enzyme GDP-mannose 4,6-dehydratase is involved in the fucosylation process, indicating that various enzymes involved in fucosylation are differentially regulated in CDX1 high versus CDX1 low cells. As the transcription factors HNF1A and HNF4A have previously been shown to regulate antenna fucosylation in plasma [34] and have previously been associated with CDX1 expression as well as intestinal development [41][42][43], here we tested the expression levels in the investigated CDX1 high and CDX1 low cell lines. Interestingly, gene microarray data revealed significantly increased expression of HNF1A and HNF4A with high CDX1 mRNA levels (p-value (HNF1A) = 0.002; p-value (HNF4A) = 0.003; Figure 5J-K, Supplementary Table S5B). Moreover, soluble galectin 4 (LGALS4), a target gene of HNF4A and a glycan-binding protein, was 23-fold up-regulated in CDX1 high cells (p-value = 2.4 × 10E-06; Figure 5L, Supplementary Table S5B). Galectin 4 is highly expressed in the alimentary tract during the development and is associated with differentiation [44].

CDX1high Expressing CRC Cell Lines Exhibit a Lower Level of Sialylated N-Glycans as Compared to CDX1-Low Expressing Cells
In our previous study, we observed a trend towards a negative association between overall N-glycan sialylation and CDX1 mRNA expression, though with no significant difference after correction for multiple testing. To further explore this association, we labelled the score plot of PC1 vs. PC2 with derived sialylation traits and found that sialylated glycans largely marked CDX1 low cell lines ( Figure 3C, colored in blue and red). In agreement, overall sialylation was significantly lower in CDX1 high cell lines compared to CDX1 low expressing cell lines (∅ 36% vs. ∅ 21%; p-value = 0.046; Figure 4B, Supplementary Table S3). Results from the current study showed pronounced sialylation differences for α2,3-sialylated N-glycans with ∅ 11% in CDX1 high vs. ∅ 23% in CDX1 low cell lines (p-value = 0.023; Figure 4C; Supplementary Table S3). Different sialyltransferases are involved in this sialylation and mRNA levels of sialyltransferases ST3GAL3 and 6 correlated with relative abundances of overall sialylation in complex type N-glycans (Supplementary Table S6). In line with results from MS data, ST3GAL3,4 and, especially, ST3GAL6, all three involved in sialyl Lewis antigen biosynthesis, were decreased in CDX1 high cell lines as compared to CDX1 low cell lines, though not significantly ( Figure 5I, Supplementary Table S5A). Notably, sialylation per galactose in the MS data was likewise decreased from ∅ 48% (CDX1 low ) to ∅ 31% in CDX1 high cell lines (p-value = 0.023; Supplementary Table S3), suggesting the decrease in sialylation being an independent event and not a mere result of substrate limitation through decreased galactosylation (see below).

CDX1 Expression in CRC Cell Lines Associated with N-Glycans CARRYING Additional N-Acetylhexosamine
While differences in levels of fucosylation and sialylation mainly explain the separation of CDX1 high and CDX1 low CRC cells along PC1, we sought to determine whether other glycan-derived traits could account for the separation in other PC dimensions. Glycan structures with the number of HexNAc equaling or exceeding the number of hexoses (Hex; HexNAc≥Hex) represent glycans with bisecting N-acetylglucosamine (GlcNAc), non-galactosylated antennae, or the addition of N-acetylgalactosamine (GalNAc) and showed a significantly increased expression in CDX1-positive cells in our previous study [9]. This could be validated in the new set of cells, in which CDX1 high cell lines showed a more than 2-fold increase in the relative abundance of glycans with the feature HexNAc≥Hex, compared to CDX1 low cells (p-value = 0.023; Figure 4D; Supplementary Table S3). Corresponding gene expression of the glycosyltransferase involved in the formation of bisecting GlcNAc (MGAT3) was significantly correlated with MS data (Supplementary Table S6), suggesting the presence of bisection and refining the data obtained by mass spectrometry. However, MGAT3 gene expression showed only a trend towards increased expression in CDX1 high cells compared to CDX1 low cells ( Figure 5D, Supplementary Table S5A).
Accordingly, one of the glycosyltransferases adding a GalNAc-residue to a GlcNAc to form LacdiNAc structures on N-glycans, B4GALNT3, was significantly correlated with the MS-based trait HexNAc≥Hex (Supplementary Table S6) and was, additionally, significantly higher expressed in CDX1 high versus CDX1 low cells ( Figure 5E, Supplementary Table S5A). Other genes involved in the expression of these epitopes were not significantly different (Supplementary Table S5A).

CDX1 Expression in CRC Cell Lines Associated with Higher Branched N-Glycan-Derived Traits
Finally, N-glycan structures with seven or more HexNAcs, indicative for branched structures or (poly-) LacNAc repeats (-Galβ1-4GlcNAcβ1-3Galβ1-3/4-GlcNAc-), showed a trend towards higher expression with high CDX1 expression in our previous data and were higher expressed in CDX1 high cell lines of the new set of cells as compared to CDX1 low expressing cell lines, with ∅ 27% vs. ∅ 20% ( Figure 4F, Supplementary Table S3), though not significantly after multiple testing correction. Since our MS data could not sufficiently differentiate between LacNAc-repeat and additional antenna, we next analyzed the glycosyltransferase expression data to see if this would give a more detailed insight. Genes encoding for beta-1,3-N-acetylglucosaminyltransferase 3 (B3GNT3) and B3GNT8, both involved in the synthesis of type-1 chains and poly-LacNAc repeats, were around 2-fold up-regulated with high CDX1 expression ( Figure 5F+G, Supplementary Table S5A) and B3GNT3 gene expression showed significant correlation with the MS trait HexNAc≥7 (Supplementary Table S6), suggesting the presence of LacNAc-repeat structures, to a certain degree. Additionally, the glycosyltransferase encoded by gene MGAT4A, which is involved in the branching on the 1,3-arm of N-glycans to form tri-and tetra-antennary N-glycans, was correlated with the relative abundance of the MS N-glycan trait HexNAc≥7 (Supplementary Table S6). MGAT4A was further significantly higher expressed in CDX1 high cells as compared to CDX1 low cells ( Figure 5H, Supplementary Table S5A). Of note, N-glycans featuring HexNAc≥Hex also contribute to this trait of N-glycan structures with seven or more HexNAcs, which is therefore not solely indicative for branching and poly-LacNAc repeats. In line, B3GNT3 and B3GNT8 also showed correlation with the MS trait HexNAc≥Hex (Supplementary Table S6).
Overall, we could validate our previous results and identify specific N-glycan features associated with high CDX1 mRNA expression, which were characterized by high multi-fucosylation, elevated levels of N-glycans containing additional HexNAcs as well as low galactosylation and low sialylation levels, with particularly decreased levels of α2,3-sialylation.

Discussion
In two independent sets of colorectal cancer cell lines we observed increased multi-fucosylation in CDX1 high expressing cell lines, next to increased terminal HexNAc epitopes as well as decreased galactosylation and sialylation. Changes in glycosylation have mainly been attributed to alterations in the corresponding glycosyltransferases, as also described in literature [45,46]. Although several glycosyltransferase gene expressions correlated well in the study presented here, predictions on the glycan phenotype using gene expression data remain challenging since several biological effectors can influence not only the expression of glycan-initiating, -elongating, and -degrading enzymes, but also the enzyme activity [47].
Guo and Pierce, for example, recently reviewed the involvement of transcription factors in glycan expression and reported that they regulate the expression of glycosyltransferase genes in a tissue-and cell-specific manner [47]. Interestingly, we found CDX1, a colon-specific transcription factor involved in cell differentiation, associated with different glycosylation features. Cell lines expressing high mRNA levels of CDX1 showed an N-glycan phenotype with increased multi-fucosylation, indicative for antenna-fucosylation. Supporting this association, fucosyltransferase genes FUT3 and 6, and GMDS, which is involved in GDP-L-fucose synthesis, were positively associated with high CDX1 expression. Differences were more pronounced for FUT3, suggesting an enhanced expression of the type-1 chain epitope Lewis A in CDX1 high cells [48].
Our data suggests antenna fucosylation on N-glycans as a potential marker for epithelial differentiation in colorectal cancer cell lines. In line with our results here, a previous study using HCT116, which is characterized by a poor differentiation status and has a mutation in the GMDS gene, leading to low levels of fucosylation, showed that restoration of GMDS wild type expression enhanced fucosylation, suppressed tumor formation and reduced the metastatic potential, when injected into mice [20]. Additionally, HCT116 cells have shown higher levels of cancer stem cells (CSC) and loss of CDX1 expression, whereas induced expression of CDX1 led to reduced clonogenicity and restored the potential of cells for differentiation and lumen formation [49]. Supporting the hypothesis of fucosylation as a characteristic of epithelial cells, Breiman et al. identified fucosylated antigens in the mammary cancer cell line as markers of the epithelial state, which can contribute to cell adhesion through CLEC17A (Prolectin) [50]. During the epithelial-to-mesenchymal transition (EMT) of these cells, the expression of fucosylated antigens such as Lewis Y was decreased as a result of decreased expression of fucosyltransferases encoded by FUT1 and FUT3 genes [50].
The transcription factors HNF1A and HNF4A were also positively associated with high CDX1 mRNA expression in the examined cell lines. Lauc et al. identified in a genome-wide association study (GWAS) that fucosyltransferases FUT3, 5, and 6, all involved in antenna fucosylation, are positively regulated by HNF1A and its downstream factor HNF4A, while FUT8, initiating core-fucosylation, is inhibited [34]. The study showed that knock-down of HNF1A and HNF4A resulted in reduced expression of several fucosyltransferase genes involved in antenna fucosylation in HepG2 liver cells and, partly, in pancreatic Panc1 cells, while FUT8, responsible for N-glycan core-fucosylation, was upregulated. Furthermore, GMDS and L-Fucokinase, both involved in two different pathways of fucose synthesis, were drastically down-regulated upon HNF1A or HNF4A knock-down. Publicly available ChIP-seq data of ENCODE show that HNF4A and HNF4G transcription factors bind to FUT2, 3, and 6 genes in cell lines of non-intestinal origin and, also, Lauc et al. confirmed the binding of HNF1A and HNF4A to multiple FUT genes and/or their promotors by ChiP analysis [34]. At the same time, several groups reported on the interaction between CDX1 as well as the highly related, partially redundant, CDX2 with other transcription factors, including HNF1A and HNF4A. Boyd et al. identified CDX2 binding sites on CDX1 and HNF1B, both being positively regulated by CDX2, while HNF1B is needed for activation of HNF4A, HNF1A, and HNF3G [43]. Direct binding of CDX2 to HNF4A was also shown to positively influence transcriptional activity [43]. HNF4A was identified as a transcriptional activator for intestinal differentiation and gene expression of HNF4A and its target genes were upregulated in colorectal cancer cell lines with a more epithelial phenotype, as compared to cells with a mesenchymal phenotype [51]. One target gene of HNF4A is galectin 4, which was described as tumor-suppressor in colon cancer [52,53], but also pancreatic duct adenocarcinoma [54], and was accordingly positively associated with high CDX1 mRNA expression in the current study.
Taking our findings and the reports from literature, one may speculate that antenna fucosylation of N-glycans is a marker for the epithelial state and that CDX1 is involved in the transcriptional regulation of fucosyltransferases involved in (antenna-)fucosylation, leading to a more differentiated and less invasive phenotype in colorectal cancer cell lines.
Interestingly, in our investigated cell lines, CDX2 mRNA expression correlated with CDX1 mRNA expression (both high or both low), with the exception of the cell line RCM1. Consequently, similar observations as for CDX1-associated glycosylation were made for CDX2, but correlations with the N-glycan phenotype and glycosyltransferase expressions were less pronounced for CDX2 as compared to CDX1 (data not shown). In contrast, a direct involvement of CDX2 in the regulation of FUT2, which revealed potential binding sites for CDX1 and CDX2 and is involved in the generation of LeY/B antigens, was shown in colon cancer cell lines HT29 and DLD-1 [55]. Expression of CDX2, and thereby FUT2, could be reduced through treatment with epidermal growth factor (EGF)/bFGF [55]. On the other hand, the sialyl Lewis antigen promoting glycosyltransferases ST3GAL1/3/4 and FUT3 were transcriptionally up-regulated by c-Myc [55]. While sialyl Lewis types antigens are commonly associated with (colorectal) cancers [8], the CDX1 high cell lines show a very low expression of α2,3-sialylation, thereby limiting the substrate for specific fucosyltransferases involved in the expression of sialyl Lewis epitopes (FUT3,5,6,7). The combined expression of α2,3-sialylation and antenna fucosylation, as reflected in the trait CLFa, may be considered a proxy for sialyl Lewis epitope expression. Both groups of cell lines in the present study only showed low levels of CLFa (~4%), indicating low levels of sialyl Lewis epitopes. In the case of CDX1 low expressing cell lines, the combination of α2,3-sialylation with overall fucosylation increased (minimum of one fucose which may be core or antenna attached; ∅ 16%), which may likewise reflect increased sialyl Lewis epitope levels. Strikingly, in CDX1 high expressing cell lines the expression of multi-fucosylation (indicative for antenna fucosylation) and α2,3-sialylation of N-glycans even showed opposite trends. Though FUT6 seems to have a preference for the sialylated substrate, here, the results point towards the expression of mainly non-sialylated Lewis antigens via action of fucosylytransferases FUT3 (Lewis A) and FUT4 or FUT6 (Lewis X) [48] in the CDX1 high colorectal cancer cell lines analyzed here.
Especially in the context of sialyl Lewis epitopes, the importance to distinguish α2,3and α2,6-sialylation becomes evident. The derivatization applied here differentially modifies the sialic acids in different linkages resulting in a detectable mass shift allowing the distinction between α2,3and α2,6-sialylation [37]. Similar to α2,3-linked sialic acid, α2,8-linked sialic acid will be lactonized under the conditions of the derivatization step [56]. We therefore cannot differentiate between α2,3-linked sialic acid and α2,8-linked sialic on the basis of the observed masses alone. However, we performed in-depth MS/MS characterization of N-glycans from different colorectal cancer cells and could not find evidence for a fragment corresponding to an antenna with two or more sialic acids [9]. Although N-glycans with terminal α2,8-linked sialic acids are expressed in cancer cells, these structures might be expressed at low levels as compared to N-glycans with terminal α2,6and α2,3-linked sialic acids. Moreover, gene expression of ST8SIA1-5 was not significantly different between the here studied CDX1 high and CDX1 low expressing colorectal cell lines. Sialic acids can further be modified by O-acetyl-groups modulating the ligand function. O-acetylated glycans have been reported to be mainly present in the lower part of the intestinal tract [57]. Furthermore, a decrease of O-acetylation has been observed with colon cancer progression [58]. In line, we previously detected O-acetylated sialic acids of glycosphingolipid glycans, which were decreased in colorectal cancer tissues as compared to control tissues [59], but not on N-glycans of the same tissues [1]. The derivatization method applied in this study to characterize the N-glycans of colorectal cancer cell lines has been shown to preserve O-acetylation [60]. However, we could not find evidence of O-acetylated N-glycans in the investigated cell lines.
We further observed a decrease in galactosylation with high CDX1 mRNA expression. Previous reports showed that CDX1, CDX2, HNF1A, and HNF1B are involved in the transcriptional regulation of B3GALT5, the gene encoding for the enzyme responsible for type-1 chain (-Galβ1-3GlcNAcβ-) expression on glycolipids and glycoproteins, with preferences for O-glycans and glycosphingolipid-glycans [61]. B3GALT5 was found to be down-regulated in colon cancer cell lines, but up-regulated upon CaCo2 cell differentiation [61]. However, our mass spectrometric approach does not allow distinction between type-1 and type-2 chains (-Galβ1-4GlcNAc-), but enzyme levels showed an up-regulation of type-1 chain glycosyltransferases B3GALTs as well as B3GNTs according to observations by Isshiki et al. [61], while type-2 chain enzymes were decreased with high CDX1 expressions. Furthermore, many glycosyltransferases involved in glycan elongations act on N-glycans, O-glycans, glycosphingolipid-glycans, and/or others and changes on different glycan classes can be differently regulated. Therefore, it is striking that the N-glycomic data obtained by mass spectrometry are well in accordance with the transcriptomic data, suggesting that observed changes in, for example, fucosylation and sialylation are mainly attributed to N-glycans or are globally altered, but further investigations on O-glycans and glycosphingolipid-glycans are needed.
Our results further revealed the presence of terminal HexNAc residues to be increased in CDX1 high expressing cells. Glycan motifs that may contribute to this increase include LacdiNAc structures as well as the Sda antigen; the latter was shown to be expressed in normal colon tissues and decreased during colon cancer progression [62]. LacdiNAc, on the other hand, was associated with differentiation of mammary epithelial cells and tumor suppression in neuroblastoma, whereas its expression was increased in human prostate, ovarian, and pancreatic cancers [63]. Furthermore, bisecting GlcNAc containing glycans contribute to this group and corresponding MGAT3 gene expression was up-regulated with CDX1 high cells. Several reports describe reduced bisection in several tumors [64] and it was shown that it suppresses N-glycan branching by MGAT5 [65], as well as α2,3-sialyation [66], the latter being reduced in CDX1 high cells.
Strikingly, observed glycan phenotypes for CDX1 low cells matched largely those described for cells undergoing EMT, which involves the loss of epithelial markers such as E-cadherin and gain of mesenchymal markers such as vimentin. EMT-associated glycan changes have previously been described and include the aforementioned loss of antenna-fucosylation, but also decrease in antennarity of N-glycans and bisection, whereas enhanced levels of high-mannose type glycans, core-fucosylation, and corresponding FUT8 expression, as well as increased α2,6-sialylation and ST6GAL1 gene expression, were observed [67,68]. Inhibition of FUT3 and FUT6 has been shown to affect TGF-β receptor glycosylation, resulting in decreased fucosylation as well as FUT3/6-associated sialyl Lewis antigens and altered TGF-β-mediated EMT and invasion in colorectal cancer cells [69]. Furthermore, loss of CDX 1 and/or CDX2 was shown to impact TGF beta signaling and tumor invasion in murine APC mutant colon cancer models. Upon loss of CDX1/2, cells were poorly differentiated, invasive, and developed a villous morphology, which was accompanied by the loss of the epithelial marker E-cadherin, whereas expression of vimentin, Twist1, Zeb1, and Zeb2 was induced [70]. Finally, HNF1A and HNF4A have been described to prevent EMT in liver cancer [71][72][73]. The role of glycosylation in EMT and a potential involvement of CDX1 clearly needs further investigation.
While the involvement of transcription factors was shown for fucosyltransferases, reports on sialyltransferases and galactosyltransferases involved in the elongation of N-glycans are still lacking and more research on the integrated regulation, as well as competition of glycosyltransferases, is needed. Also, the role of LacdiNAc structures and other terminal HexNAc epitopes in differentiation and colorectal cancer needs further investigation.

Conclusions
Our data, in combination with reports from literature, suggest that CDX1 (and CDX2) are involved in the regulation of multiple glycosyltransferases, especially fucosyltransferases, likely via interactions with other transcription factors, such as HNF4A and HNF1A. However, CDX genes may (additionally) influence the expression of fucose-carrying glycoproteins themselves, thereby leading to an N-glycan phenotype with enhanced multi-fucosylation. Taking into account the interaction of the two CDX-proteins with HNF1A, HNF4A, and HNF1B and together with the proven role of HNF1A and HNF4A in the regulation of fucosylation, we hypothesize a cooperation of these transcription factors being involved in the expression of FUT genes and thereby increasing fucosylation of glycoproteins with high CDX1/CDX2 expression (summarized in Figure 6). Certainly, more mechanistic studies are needed to elucidate the role of CDX1 in glycosyltransferase regulations.
involved in the regulation of multiple glycosyltransferases, especially fucosyltransferases, likely via interactions with other transcription factors, such as HNF4A and HNF1A. However, CDX genes may (additionally) influence the expression of fucose-carrying glycoproteins themselves, thereby leading to an N-glycan phenotype with enhanced multi-fucosylation. Taking into account the interaction of the two CDX-proteins with HNF1A, HNF4A, and HNF1B and together with the proven role of HNF1A and HNF4A in the regulation of fucosylation, we hypothesize a cooperation of these transcription factors being involved in the expression of FUT genes and thereby increasing fucosylation of glycoproteins with high CDX1/CDX2 expression (summarized in Figure 6). Certainly, more mechanistic studies are needed to elucidate the role of CDX1 in glycosyltransferase regulations.