Structural Insights in Mammalian Sialyltransferases and Fucosyltransferases: We Have Come a Long Way, but It Is Still a Long Way Down

Mammalian cell surfaces are modified with complex arrays of glycans that play major roles in health and disease. Abnormal glycosylation is a hallmark of cancer; terminal sialic acid and fucose in particular have high levels in tumor cells, with positive implications for malignancy. Increased sialylation and fucosylation are due to the upregulation of a set of sialyltransferases (STs) and fucosyltransferases (FUTs), which are potential drug targets in cancer. In the past, several advances in glycostructural biology have been made with the determination of crystal structures of several important STs and FUTs in mammals. Additionally, how the independent evolution of STs and FUTs occurred with a limited set of global folds and the diverse modular ability of catalytic domains toward substrates has been elucidated. This review highlights advances in the understanding of the structural architecture, substrate binding interactions, and catalysis of STs and FUTs in mammals. While this general understanding is emerging, use of this information to design inhibitors of STs and FUTs will be helpful in providing further insights into their role in the manifestation of cancer and developing targeted therapeutics in cancer.


Introduction
The glycome, the complex glycan repertoire of the cell, is involved in a myriad of cellular events in health and disease [1][2][3][4][5]. Unlike the genome, transcriptome, and proteome, glycan biosynthesis is not template-driven but is determined by the location and coordinated activities of the glycan processing enzymes, glycosyltransferases (GTs) and glycoside hydrolases (GHs), and the availability of their substrates. Structural analysis of these two glycan-processing enzyme families highlighted that GHs exhibit a vast diversity of three-dimensional (3D) scaffolds, despite common features in their active sites, indicating an independent convergence during evolution [6]. As for GTs, this trend seems

Sequence Analysis and Conserved Patterns in STs
The catalytic domain of all STs is characterized by four conserved peptide sequences, termed sialylmotifs: large (L), small (S), 3rd (III), and very small (VS) [27][28][29]. Human ST sequences have low sequence similarity but share 10 invariant residues-five in motif L, two in motif S and VS, and one in motif III [23,27]. Motif L is mainly engaged in binding donor substrates, while sialylmotifs S, III, and VS are involved in binding acceptor substrates or both substrates [23,27,30,31]. Both L and S contain an invariant cysteine residue and participate in the formation of an intramolecular disulfide linkage essential for the active conformation of the enzyme [32,33]. Mutational analyses with motifs III and VS highlighted the involvement of motif VS in catalysis. The sequence consensuses of motifs III and VS in human STs are ((H/y)-Y-(Y/W/F/h)-(D/E/q/g)) and H-x4-E (where lowercase/capital letters imply low/strong occurrence of the amino acid), respectively [28,34]. Multiple sequence alignment of STs in vertebrates revealed the presence of family motifs containing four to twenty amino acids specific to each ST family, thus implicating another level of amino acid conservation among STs [29,31,35]. Except for ST6GALNAC, all STs contain four common amino acid sequences located eight amino acids downstream of the 3 -end of sialylmotif L, termed motif "a". The seven amino acids containing ST6GALNAC motif "a" are located four amino acids closer to sialylmotif L. Another family motif, "b", lies 20 amino acids downstream from sialylmotif L and is present between sialylmotifs L and S. Interestingly, motif "b" is highly variable in length among ST families. Motif "c" is the another family motif with two amino acids overlap at the 3 -end of sialylmotif S a The family motif "d" is located downstream from sialylmotif III in ST6GALs while motif "e" is found downstream from sialylmotif VS in the ST8SIA and ST6GALNAC families.

Cellular Localization of STs and FUTs
As of 30 January 2021, the CAZy database annotated 242 GT sequences in the human genome organized into 47 GT families (http://www.cazy.org/) Accessed on 24 April 2021. Both STs and FUTs present a complex tissue-, cell-type-, and stage-specific expression pattern, and are expressed as both membrane-bound and soluble proteins [64]. Analogous to the other Golgi-resident GTs, all human STs and FUTs cloned to date typically share a type II architecture, containing an N-terminal transmembrane domain anchored at the Golgi membrane and a C-terminal catalytic region exposed to the Golgi lumen present in the late cisternae of the Golgi [40,65]. O-FUTs, on the other hand, are ER-localized soluble proteins which fucosylate Notch and TSR domains of proteins [66] (Figure 2). POFUT1 first fucosylates EGF domains in the ER and acts as a chaperone to aid protein secretion to the cell surface. Proteins with TSR domains are fucosylated by POFUT2, but whether this occurs in the ER requires further investigation [40]. Mollicone et al. [43] cloned three active isoforms of the human FUT10 gene and investigated their subcellular distributions. The FUT10-319 isoform encodes a soluble protein expressed in human embryos. FUT10-419 and FUT10-479 are reported to be co-localized with calnexin ( Figure 2), be retained in the ER, and be expressed in the human embryo and brain, respectively.
x FOR PEER REVIEW 8 of 24

Cellular Localization of STs and FUTs
As of 30 January 2021, the CAZy database annotated 242 GT sequences in the human genome organized into 47 GT families (http://www.cazy.org/) Accessed on 24 April 2021. Both STs and FUTs present a complex tissue-, cell-type-, and stage-specific expression pattern, and are expressed as both membrane-bound and soluble proteins [64]. Analogous to the other Golgi-resident GTs, all human STs and FUTs cloned to date typically share a type II architecture, containing an N-terminal transmembrane domain anchored at the Golgi membrane and a C-terminal catalytic region exposed to the Golgi lumen present in the late cisternae of the Golgi [40,65].O-FUTs, on the other hand, are ER-localized soluble proteins which fucosylate Notch and TSR domains of proteins [66] (Figure 2). POFUT1 first fucosylates EGF domains in the ER and acts as a chaperone to aid protein secretion to the cell surface. Proteins with TSR domains are fucosylated by POFUT2, but whether this occurs in the ER requires further investigation [40]. Mollicone et al. [43]cloned three active isoforms of the human FUT10 gene and investigated their subcellular distributions. The FUT10-319 isoform encodes a soluble protein expressed in human embryos. FUT10-419 and FUT10-479 are reported to be co-localized with calnexin ( Figure 2), be retained in the ER, and be expressed in the human embryo and brain, respectively.

Global Fold Architecture
Emerging structural information on GTs in the past two decades has been consolidated into three catalytic domains of GTs, organized as GT-A, GT-B, and GT-C, while unresolved folds are characterized as orphans. GT-A and GT-B folds primarily consist of α-β-α sandwiches, analogous to the Rossmann fold. However, the third fold, GT-C, is the characteristic lipid-phosphate-dependent GT fold, containing multiple transmembrane α-

Global Fold Architecture
Emerging structural information on GTs in the past two decades has been consolidated into three catalytic domains of GTs, organized as GT-A, GT-B, and GT-C, while unresolved folds are characterized as orphans. GT-A and GT-B folds primarily consist of α-β-α sandwiches, analogous to the Rossmann fold. However, the third fold, GT-C, is the characteristic lipid-phosphate-dependent GT fold, containing multiple transmembrane α-helices [12,[67][68][69]. Both GT-A and GT-B folds employ analogous approaches to interact with nucleotide sugar donor substrates, a similarity that is attributed to the constraints of the interacting loops that extend from the Rossmann fold. However, they vary considerably in terms of their interactions with acceptor substrates.
The architecture of GT-A is reminiscent of two tightly associated β/α/β Rossmann domains, the sizes of which vary, leading to the formation of a continuous central βsheet to create the N-terminal donor and C-terminal acceptor binding regions [8]. Most GT-A enzymes display a DxD motif signature, where "x" represents any amino acid that coordinates divalent cations (typically Mn 2+ or Mg 2+ ) to the phosphate group of the nucleotide. It is noteworthy that the DxD motif is not a conserved feature of the GT-A fold, since there are examples of enzymes containing this fold that lack this motif [8].
The GT-B fold, on the other hand, consists of two separate β/α/β Rossman domains, i.e., an N-and a C-terminal domain separated by a large cleft where the active site is located and stabilized by two long C-terminal-helices. The GT-B fold lacks the DxD motif and generally does not require metal ions for catalysis. Donor and acceptor substrates bind to the C-and N-terminal regions of GT-B, respectively [8].

STs Display Variants of the GT-A Fold
Although mammalian STs belong to the GT29 family, intriguingly, they have been predicted to be similar to the CstII fold, a GT-A variant (i.e., variant 1) from Campylobacter jejuni, belonging to the GT42 CAZy family [27]. CstII is comprised of two closely associated domains. One domain has a mixed α/β fold with a central, parallel, seven-stranded, twisted β-sheet, flanked by helices on either side. The other domain is composed of a long coil and two helices forming a lid-like structure folded over the catalytic site, to shield the donor substrate from hydrolysis and create an acceptor binding site [67] (Figure 3). The N-terminal domain of CstII ST possesses some sequence similarity with sialylmotif L of the eukaryotic ST [27]. This prediction was later reinforced by the elucidation of the 3D structure of the first mammalian ST, i.e., porcineST3GAL1 [33] (also named SsST3GalI), whose catalytic domain displays a mixed α/β fold with a seven-stranded parallel β-sheet flanked by 12 α-helices. PorcineST3GAL1 exhibits a modest 10% sequence identity with CstII but contains a β-sheet core and a lid-like structure analogous to the CstII fold. Porcine ST3GAL1 is speculated to be a second distinct GT-A variant (i.e., variant 2) (Figure 3), which displays a disulfide bond linking two conserved Cys residues of sialylmotifs L and S and mirrors the signature structure of the eukaryotic ST family [36,37,40]. Crystal structures of human STs have revealed that HsST3GAL1 [70], HsST6GAL1 [71], HsST6GALNAC2 [14], and HsST8SIA3 [72] adopt a GT-A variant 2 topology ( Figure 3) and broadly resemble the fold ofporcineST3GAL1 [36,37,40] and ratST6GAL1 [73], i.e., a seven-stranded β-sheet flanked by multiple helices. Intriguingly, STs of the GT29 family have a histidine residue as acatalytic base, e.g., His-319 in porcineST3GAL1 [33], His-367 in ratST6GAL1 [73], His-316 in HsST3GAL1

FUTs Display the GT-B Fold and Variations of It
Based on crystallographic data available for FUTs in mammals, FUTs appear to adopt variations of the GT-B fold. However, there are examples of FUTs that utilize residues from both domains to interact with acceptor substrates [59][60][61]. Strikingly, the geometry of the cleft has also been found to be modulated in order to accommodate extended branched-glycan structures.

FUTs Display the GT-B Fold and Variations of it
Based on crystallographic data available for FUTs in mammals, FUTs appear to adopt variations of the GT-B fold. However, there are examples of FUTs that utilize residues from both domains to interact with acceptor substrates [59][60][61]. Strikingly, the geometry of the cleft has also been found to be modulated in order to accommodate extended branched-glycan structures.
Crystal structures of FUTs in Caenorhabditis elegans and Homo sapiens illustrated that POFUT1 and POFUT2 display the GT-B fold and variations of it [61,74] (see Figure 5). In CePOFUT1, the residues from the active site that interact with GDP-Fuc are mainly in the

FUTs Display the GT-B Fold and Variations of it
Based on crystallographic data available for FUTs in mammals, FUTs appear to adopt variations of the GT-B fold. However, there are examples of FUTs that utilize residues from both domains to interact with acceptor substrates [59][60][61]. Strikingly, the geometry of the cleft has also been found to be modulated in order to accommodate extended branched-glycan structures.
Crystal structures of FUTs in Caenorhabditis elegans and Homo sapiens illustrated that POFUT1 and POFUT2 display the GT-B fold and variations of it [61,74] (see Figure 5). In CePOFUT1, the residues from the active site that interact with GDP-Fuc are mainly in the Crystal structures of FUTs in Caenorhabditis elegans and Homo sapiens illustrated that POFUT1 and POFUT2 display the GT-B fold and variations of it [61,74] (see Figure 5). In CePOFUT1, the residues from the active site that interact with GDP-Fuc are mainly in the C-terminal domain together with those from the N-terminal domain [74]. The overall structure of HsPOFUT1 closely resembles that of CePOFUT1; however, HsPOFUT2 has a variant of the GT-B fold in which the N-and C-terminal domains interact closely with each other to form an extended protein unit [59][60][61]. Li et al. reported the crystal structure of the mouse POFUT1 in a complex with both donor and acceptor substrates, i.e., GDP/GDP-Fuc and EGF-like domains (LDs), respectively [60]. EGF-LDs lie in the wide groove between the N-and C-terminal domains of the canonical GT-B fold in the ternary complex of MmPOFUT1:GDP:EGF-LD. Similarly, in another ternary complex between CePOFUT2, GDP, and HsTSR1 (the first TSR identified as human thrombospondin 1), GDP is found in a shallow cavity of the C-terminal domain. In contrast, half of HsTSR1 is embraced by a cleft formed between both domains [61]. Intriguingly, the apo and complex structures of HsFUT8 revealed that the GT-B fold contains only one Rossman fold ( Figure 5), which contains a series of loops and an α-helix that contribute toward forming the ligand binding region [75].
C-terminal domain together with those from the N-terminal domain [74]. The overall structure of HsPOFUT1 closely resembles that of CePOFUT1; however, HsPOFUT2 has a variant of the GT-B fold in which the N-and C-terminal domains interact closely with each other to form an extended protein unit [59][60][61]. Li et al. reported the crystal structure of the mouse POFUT1 in a complex with both donor and acceptor substrates, i.e., GDP/GDP-Fuc and EGF-like domains (LDs), respectively [60]. EGF-LDs lie in the wide groove between the N-and C-terminal domains of the canonical GT-B fold in the ternary complex of MmPOFUT1:GDP:EGF-LD. Similarly, in another ternary complex between CePOFUT2, GDP, and HsTSR1 (the first TSR identified as human thrombospondin 1), GDP is found in a shallow cavity of the C-terminal domain. In contrast, half of HsTSR1 is embraced by a cleft formed between both domains [61]. Intriguingly, the apo and complex structures of HsFUT8 revealed that the GT-B fold contains only one Rossman fold ( Figure  5), which contains a series of loops and an α-helix that contribute toward forming the ligand binding region [75].

Binding Interactions of STs with Natural Donor Substrate CMP-Neu5Ac
Meng et al. [73] crystallized the structure of ratST6GAL1 and described several conserved features shared by ratST6GAL1 with CjCstII [76], porcineST3GAL1 [36,37,40], and HsST6GAL1, including the sialylmotif region involved in binding the donor substrate, i.e., CMP-Neu5Ac. Despite the fact that they were predicted to be similar to the CStII fold, the binding regions of mammalian ST structures display considerable variability with minimal conservation in the residues that directly interact with CMP-Neu5Ac regarding the C. jejuni CstII structure. CjST contains anNH2-terminal end that starts at the equivalent of sialylmotif L with an extended COOH-terminal sequence beyond the final β-strand, contributing to the catalytic domain and membrane tethering [76]. In contrast, the COOHtermini of mammalian STs terminate almost immediately after the final β-strand but display extended sequences on the NH2-terminal side of the sialylmotif, which contribute to both the catalytic domain and membrane tethering [36,37,40]. However, the sialylmotif sequences, which comprise the underlying scaffold of the Rossmann fold and adjoining loop regions, have been found to be conserved and are engaged in stabilizing the residues of the donor binding region within the Cj and mammalian STs [73].
It has been observed that CMP and CMP-Neu5Ac form multiple noncovalent interactions with the active site residues comprising the GT-A-defining nucleotide-binding Rossmann fold in HsST6GALNAC2 and HsST8SIA3, respectively. These interactions are

Binding Interactions of STs with Natural Donor Substrate CMP-Neu5Ac
Meng et al. [73] crystallized the structure of ratST6GAL1 and described several conserved features shared by ratST6GAL1 with CjCstII [76], porcineST3GAL1 [36,37,40], and HsST6GAL1, including the sialylmotif region involved in binding the donor substrate, i.e., CMP-Neu5Ac. Despite the fact that they were predicted to be similar to the CStII fold, the binding regions of mammalian ST structures display considerable variability with minimal conservation in the residues that directly interact with CMP-Neu5Ac regarding the C. jejuni CstII structure. CjST contains anNH2-terminal end that starts at the equivalent of sialylmotif L with an extended COOH-terminal sequence beyond the final β-strand, contributing to the catalytic domain and membrane tethering [76]. In contrast, the COOH-termini of mammalian STs terminate almost immediately after the final β-strand but display extended sequences on the NH2-terminal side of the sialylmotif, which contribute to both the catalytic domain and membrane tethering [36,37,40]. However, the sialylmotif sequences, which comprise the underlying scaffold of the Rossmann fold and adjoining loop regions, have been found to be conserved and are engaged in stabilizing the residues of the donor binding region within the Cj and mammalian STs [73].
It has been observed that CMP and CMP-Neu5Ac form multiple noncovalent interactions with the active site residues comprising the GT-A-defining nucleotide-binding Rossmann fold in HsST6GALNAC2 and HsST8SIA3, respectively. These interactions are similar to those observed for donor binding in bacterial CstII and other human STs, affirming that the sialylmotif scaffold underlying the CMP-NeuAc binding site is conserved [72,76]. Structural superimposition of HsST3GAL1, HsST6GAL1, and HsST8SIA3 clearly indicates that the donor substrate displays a similar orientation within the binding cleft and that the residues interacting with CMP or CDP are variable, while highly conserved amino acid residues are involved in recognizing the CMP or CDP of the donor substrate ( Figure 6A). The binding interaction of HsFUT8 with GDP-fucose was studied using computational techniques based on the binary complex of CePOFUT1-GDP-fucose [77]. Since the binding sites of the donor molecule in CePOFUT1 and HsFUT8 are structurally similar, the investigators placed the donor molecule into the HsFUT8 using the same positioning as seen in the structural relative CePOFUT1. Analogous to HsPOFUT1 and HsPOFUT2, Arg365 interacts with the β-phosphate of GDP. Strikingly, Arg365 interacts with the fucose moiety in HsFUT8, which has not been observed in human POFUTs. Further, Arg365 is speculated to assist the release of GDP and confer proper orientation of the fucose residue for the nucleophilic attack of the acceptor [77]. Jarva et al. solved the ternary complex of GDP:HsFUT8:GlcNAc2Man3GlcNAc2-Asn(A2-Asn) and revealed another unique property of FUT8 in mammals, which is that it undergoes a conformational change upon binding to GDP [51]. Loop A (Arg365-Ala375) and loop B (Asp429-Asn446) are disordered in the unliganded HsFUT8 structure but become ordered upon binding GDP. This transformation leads to the formation of new interactions between loops A and B; in particular, the electrostatic interactions between Asp368 and Arg365 of loop A and Arg441 of loop B are involved (see Figure 7). Arg365 forms a salt bridge with the β-phosphate of GDP. These findings imply that the binding of the GDP moiety with FUT8 reorganizes the  [32]. The apo structure contains Tyr354, which interacts with the CMP-Neu5Ac at both the phosphate and Neu5Ac moieties, implying that the unliganded enzyme has an inherent interaction with the donor substrate. However, the bound state displays an alternate conformation, which prepares HsST6GAL1 to perform the hydrolysis step. Comparison of this new liganded structure [32] with the previously reported HsST6GAL1 [71] revealed the following differences: (a) the region 366-372, corresponding to motif "d" and sialylmotif VS, is unstable; however, binding to either (i) the acceptor substrate or (ii) α-helix 100-121, irrespective of the acceptor interaction, is speculated to stabilize this region; (b) the disulfide bond C353-C364 exists in a different orientation in the new structure, implying a movement in this region upon binding the acceptor; and (c) binding of CMP-Neu5Ac involves the side chain at C-5 of the sugar residue, which is directed toward empty space at the surface of HsST6GAL1. Interestingly, the exact binding mode of Neu5Ac directly involves thesialylmotifs L, S, and III, and transfers the sialylmotif VS into the immediate vicinity. Hydrophobic interactions, π-alkyl and amide π-stacking, and a multitude of hydrogen bonds stabilize the overall structure of HsST6GAL1 [32].

Binding Interactions between FUT and Its Natural Donor Substrate GDP-Fucose
Despite significant differences in sequence and domain architecture, the interaction of FUT with its donor substrate, GDP-Fuc, is analogous among the different human FUTs with solved structures to date. Interestingly, the residues interacting with the fucose moiety are variable, while highly conserved residues are involved in recognizing the nucleotide moiety of the donor substrate. For instance, in both HsPOFUT1 and HsPOFUT2, most residues interacting with the GDP of the donor fucose are conserved [74]. β-phosphate of GDP-Fuc interacts with Arg240 and Arg294 through hydrogen bonding and electrostatic interactions in both HsPOFUT1 and HsPOFUT2. Furthermore, the residues Asn46/Asn57, His238/His292, Asp340/Asp371, Ser356/Ser387, Ser357/Thr388, and Phe358/Phe389 interact with GDP in HsPOFUT1/HsPOFUT2, contributing to the tethering of the donor substrate ( Figure 6B). However, the residues responsible for recognizing and stabilizing the fucose moiety, namely Arg43/Asp244 of HsPOFUT1 and Pro53/Gly55 of HsPOFUT2, are variable [74].
The binding interaction of HsFUT8 with GDP-fucose was studied using computational techniques based on the binary complex of CePOFUT1-GDP-fucose [77]. Since the binding sites of the donor molecule in CePOFUT1 and HsFUT8 are structurally similar, the investigators placed the donor molecule into the HsFUT8 using the same positioning as seen in the structural relative CePOFUT1. Analogous to HsPOFUT1 and HsPOFUT2, Arg365 interacts with the β-phosphate of GDP. Strikingly, Arg365 interacts with the fucose moiety in HsFUT8, which has not been observed in human POFUTs. Further, Arg365 is speculated to assist the release of GDP and confer proper orientation of the fucose residue for the nucleophilic attack of the acceptor [77]. Jarva et al. solved the ternary complex of GDP:HsFUT8:GlcNAc2Man3GlcNAc2-Asn(A2-Asn) and revealed another unique property of FUT8 in mammals, which is that it undergoes a conformational change upon binding to GDP [51]. Loop A (Arg365-Ala375) and loop B (Asp429-Asn446) are disordered in the unliganded HsFUT8 structure but become ordered upon binding GDP. This transformation leads to the formation of new interactions between loops A and B; in particular, the electrostatic interactions between Asp368 and Arg365 of loop A and Arg441 of loop B are involved (see Figure 7). Arg365 forms a salt bridge with the β-phosphate of GDP. These findings imply that the binding of the GDP moiety with FUT8 reorganizes the encapsulating loops around the nucleotide. Additionally, the interactions involving the Asp453/His363-guanine base and the Tyr250-ribose hydroxyl groups are reported to contribute toward reorganizing both loops [51]. Later, Boruah et al. reinforced these findings that the loop regions are extended away from the donor binding site in the absence of GDP; however, the loops are flipped in to enclose the donor analog in the GDP:FUT8 complex [78]. It is notable that despite displaying this novel feature once the substrate is bound to FUTVIII, the spatial orientation and interactions with FUTVIII are nearly identical for other FUTs reported in mammals. These findings suggest that a common scaffold seems a promising approach to target human α1,6 FUT and O-FUTs [62]. tribute toward reorganizing both loops [51]. Later, Boruah et al. reinforced these findings that the loop regions are extended away from the donor binding site in the absence of GDP; however, the loops are flipped in to enclose the donor analog in the GDP:FUT8 complex [78]. It is notable that despite displaying this novel feature once the substrate is bound to FUTVIII, the spatial orientation and interactions with FUTVIII are nearly identical for other FUTs reported in mammals. These findings suggest that a common scaffold seems a promising approach to target human α1,6 FUT and O-FUTs [62].

Binding Interactions of STs and FUTs with their Acceptor Substrates
Based on the nature of the acceptor substrate (i.e., glycan or protein), STs/ FUTs can be classified as glycan-or protein-modifying GTs. Glycan-modifying GTs include all ST and FUT subfamilies except the O-FUTs, which are protein-modifying GTs in humans.

Glycan-Modifying STs and FUTs
Glycan-modifying GTs include mammalian GT29 sialyltransferases that employ analogous conserved sugar donors but recognize diverse acceptor substrates. The acceptor binding regions of STs have shown striking differences in their sequence, secondary structure, and position among the members of the GT29 family [73]. The crystal structure of HsST6Gal1 bound to the Gal2GlcNAc2Man3GlcNAc2-Asn acceptor illustrates that the C6-OH group of the terminal Gal residue from the Gal-β-1,4-GlcNAc moiety of the acceptor is adjacent to the His catalytic base [72,73] (Figure 8). Interestingly, the HsST3Gal1-Galβ-1,3-GalNAc-oNP binary complex revealed that the plane of the terminal Gal acceptor residue is rotated by 180°, which ultimately positions its C3-OH group adjacent to the catalytic His residue for glycan transfer [73] (Figure 8). This flipped geometry alters the nature of the interaction with the acceptor in HsST3Gal1. In fact, extensive hydrogen bonding stabilizes the complex while taking advantage of the down-facing axial hydroxyl groups of the disaccharide acceptor. These interactions are quite different from the mode of hydrophobic stacking and hydrogen bonding that is common among Gal-specific binding proteins, including HsST6GAL1 [73]. The ternary complex of CMP-3F-NeuAc-HsST8SIA3-NeuAc-α-2,3-Gal-β-1,4-GlcNAc-6-SO4 showed that the acceptor NeuAc of ST8SIA3 primarily forms hydrogen-bonding interactions by positioning the nucleophilic C8-OH group of NeuAc adjacent to the His catalytic base [72] (Figure 8). The crystal structure of the CMP-HsST6GALNAC2 binary complex, but without an acceptor, revealed

Binding Interactions of STs and FUTs with Their Acceptor Substrates
Based on the nature of the acceptor substrate (i.e., glycan or protein), STs/ FUTs can be classified as glycan-or protein-modifying GTs. Glycan-modifying GTs include all ST and FUT subfamilies except the O-FUTs, which are protein-modifying GTs in humans.

Glycan-Modifying STs and FUTs
Glycan-modifying GTs include mammalian GT29 sialyltransferases that employ analogous conserved sugar donors but recognize diverse acceptor substrates. The acceptor binding regions of STs have shown striking differences in their sequence, secondary structure, and position among the members of the GT29 family [73]. The crystal structure of HsST6Gal1 bound to the Gal2GlcNAc2Man3GlcNAc2-Asn acceptor illustrates that the C6-OH group of the terminal Gal residue from the Gal-β-1,4-GlcNAc moiety of the acceptor is adjacent to the His catalytic base [72,73] (Figure 8). Interestingly, the HsST3Gal1-Gal-β-1,3-GalNAc-oNP binary complex revealed that the plane of the terminal Gal acceptor residue is rotated by 180 • , which ultimately positions its C3-OH group adjacent to the catalytic His residue for glycan transfer [73] (Figure 8). This flipped geometry alters the nature of the interaction with the acceptor in HsST3Gal1. In fact, extensive hydrogen bonding stabilizes the complex while taking advantage of the down-facing axial hydroxyl groups of the disaccharide acceptor. These interactions are quite different from the mode of hydrophobic stacking and hydrogen bonding that is common among Gal-specific binding proteins, including HsST6GAL1 [73]. The ternary complex of CMP-3F-NeuAc-HsST8SIA3-NeuAcα-2,3-Gal-β-1,4-GlcNAc-6-SO4 showed that the acceptor NeuAc of ST8SIA3 primarily forms hydrogen-bonding interactions by positioning the nucleophilic C8-OH group of NeuAc adjacent to the His catalytic base [72] (Figure 8). The crystal structure of the CMP-HsST6GALNAC2 binary complex, but without an acceptor, revealed minimal sequence and structural similarity in the primary sequence of the loop regions and secondary structural elements involved in acceptor substrate recognition compared to HsST6GAL1, HsST3GAL1, and HsST8SIA3, implying the presence of significant diversity in the acceptor binding region among members of the GT29 family [14]. minimal sequence and structural similarity in the primary sequence of the loop regions and secondary structural elements involved in acceptor substrate recognition compared to HsST6GAL1, HsST3GAL1, and HsST8SIA3, implying the presence of significant diversity in the acceptor binding region among members of the GT29 family [14]. The crystallization of the GDP:FUT8:GlcNAc2Man3GlcNAc2-Asn ternary complex was a milestone toward understanding the interaction of FUT8 with its substrates and The crystallization of the GDP:FUT8:GlcNAc2Man3GlcNAc2-Asn ternary complex was a milestone toward understanding the interaction of FUT8 with its substrates and catalysis in humans. FUT8, a GT involved in the core fucosylation of mammalian branched N-glycans, has been extensively explored to identify conserved and divergent structural features for acceptor recognition employing the ternary complex. FUT8 features an N-terminal coiled-coil domain, a catalytic domain, and a C-terminal SH3 domain, a unique characteristic among GT proteins [62]. The ternary complex of GDP:HsFUT8:GlcNAc2Man3GlcNAc2-Asn (A2-Asn),investigated by Kotzler et al., displays that a hexasaccharide is required as a minimal acceptor structure [79]. The 6-OH of the GlcNAc-1 of the hexasaccharide must be nearby the anomeric position of the Fuc residue of the bound donor for its transfer to the 6-OH of the GlcNAc-1 of the acceptor. The hydrogen bonds and hydrophobic interactions of the acceptor with GDP-Fuc contribute significantly to the binding of the acceptor. The branch at the 3-position mannose is bound through multiple hydrogen bonds between the flexible loop and the C-terminal β-sheets. Despite being distant from the site of the fucosyl transfer, GlcNAc-5 at the 3-mannose branch is essential for substrate specificity and exhibits transient interactions with the Lys541 side chain and with the flexible loop. The carboxyl group of Glu373 is essential for catalytic activity by engaging in interactions with the OH-3 and OH-4 of GlcNAc-5 [79]. Recently, novel insights have been made regarding HsFUT8 glycan acceptor recognition. Gracia et al. captured a ternary complex of GDP:HsFUT8:A2-Asn and showed that the catalytic domain is connected to the N-terminal coiled-coil domain by interdomain α3; however, the C-terminal SH3 domain is in contact with the catalytic domain by the β10-β11 loop [75]. GDP is partly buried and confined within the catalytic domain. GlcNAc-1 and -2 of the core region of A2-Asn are also present in the catalytic domain. However, α3/α6-branches of A2-Asn are located in the exosite formed by the β10-β11 loop and the SH3 domain. Interestingly, the β6-α8 loop (residues 365-378) is partly disordered in the apo form, but undergoes a conformational change in the presence of substrates; thus, the key residues Arg365, Lys369, and Glu373 not only recognize GDP and A2-Asn, but also contribute to catalysis since Glu373 acts as the catalytic base for the 6-OH group of the GlcNAc-1 of A2-Asn [75]. When the GDP:HsFUT8:A2-Asn ternary complex is aligned with the GDP-Fuc:HsPOFUT2 binary complex, the fucose residue from donor substrates occupies the appropriate position within the active site of FUT8 such that C1 of the Fuc residue is directly aligned for nucleophilic attack by the OH-6 of the GlcNAc-1 of A2-Asn ( Figure 9A), as recently demonstrated [62].
To further understand the preferences of the acceptor site, Jarva et al. solved the ternary structure of FUT8 in both mice and humans. The SH3 domain emerged as contributing toward the evolution of FUT8 as a dimer, which restricts the movement of the SH3 domain and stabilizes the acceptor binding site [51]. Glu373 displays close hydrogen bonding with the 6-OH group of the GlcNAc residue of A2-Asn, which interacts with Lys369 and in turn is in close contact with the β-phosphate of GDP. Notably, the activity of FUT8 depends on the terminal GlcNAc of the α3-branch since the intimate hydrogen bonding between His353 and the 6-OH group of this GlcNAc contributes toward acceptor binding. In another study, Boruah et al. performed kinetic studies on HsFUT8 with the acceptor A2-Asn and its structural analogs to investigate the restricted substrate recognition of the enzyme [78]. The structural superimposition of HsFUT8 bound to the donor substrate analog with four distinct glycans (i.e., A1-Asn, A2-Asn, A3 -Asn, and NM5N2-Asn) corroborates their observation that the trisaccharide GlcNAc-β1,2-Man-α1,3-Man moiety is the key determinant for the acceptor recognition of FUT8 [78] (Figure 9B,C). It is notable that FUT8 displays a highly rigid active site that allows access to only a few potential structures, despite having a common Man3GlcNAc2-Asn structure in the core region of these acceptor substrates.

Protein-Modifying FUTs
Protein-modifying GTs target the hydroxyl group of Ser or Thr in proteins. They must first bind to the acceptor protein to orient the respective Ser/Thr hydroxyl nucleophile correctly for the transfer of a sugar moiety from the donor substrate. These include POFUTs, which modify folded cysteine-rich domains [14]. POFUT1 transfers fucose to the Ser or Thr residue of EGF repeats containing the consensus sequence C 2 -X-X-X-X-(S/T)-C 3 [80]. However, POFUT2 glycosylates Ser or Thr residues in the consensus sequence C 1 -X-X-(S/T)-C 2 or C 2 -X-X-(S/T)-C 3 of TSRs of groups 1 and 2, respectively (where X is any amino acid) [80]. While disulfide bridges of group 1 TSRs follow the pattern C 1 -C 5 , C 2 -C 6 , and C 3 -C 4 , the TSRs of group 2 are arranged as C 1 -C 4 , C 2 -C 5 , and C 3 -C 6 . Interestingly, the binding region of each of these O-FUTs is complementary to the face of the domain of the protein, which interacts with the cleft through multiple hydrogen bonds, especially within the loop with the consensus sequence. This orients the domain to transfer the OH-group of the Ser/Thr acceptor substrate exactly in the correct position to perform the nucleophilic attack on the anomeric C-1 of the Fuc residue of the donor substrate [60,61].

Protein-Modifying FUTs
Protein-modifying GTs target the hydroxyl group of Ser or Thr in proteins. They must first bind to the acceptor protein to orient the respective Ser/Thr hydroxyl nucleophile correctly for the transfer of a sugar moiety from the donor substrate. These include POFUTs, which modify folded cysteine-rich domains [14]. POFUT1 transfers fucose to the Ser or Thr residue of EGF repeats containing the consensus sequence C 2 -X-X-X-X-(S/T)-C 3 [80]. However, POFUT2 glycosylates Ser or Thr residues in the consensus sequence C 1 -X-X-(S/T)-C 2 or C 2 -X-X-(S/T)-C 3 of TSRs of groups 1 and 2, respectively (where X is any amino acid) [80]. While disulfide bridges of group 1 TSRs follow the pattern C 1 -C 5 , C 2 -C 6 , and C 3 -C 4 , the TSRs of group 2 are arranged as C 1 -C 4 , C 2 -C 5 , and C 3 -C 6 . Interestingly, the binding region of each of these O-FUTs is complementary to the face of the domain of the protein, which interacts with the cleft through multiple hydrogen bonds, especially within the loop with the consensus sequence. This orients the domain to transfer the OH-group Experimental structures of HsPOFUTs in complexes with their acceptor substrates are not yet solved; however, the ternary complexes of O-FUTs have been crystallized in mice and C. elegans. The ternary complex of GDP/GDP-Fuc:MmPOFUT1:EGF-LDs revealed multiple points of interaction between POFUT1 and EGF-LDs. However, the core of the interaction involves a conserved preformed cleft on POFUT1 and conserved, sequence-independent structural elements on the fucosylation motif, which is common to all EGF-LDs [60]. The interactions outside the core region display POFUT1 residues, which are flexible, and EGF-LD residues that are highly variable among POFUT1 substrates. Thus, the observed plasticity of MmPOFUT1 is an important feature, which enables the enzyme to accommodate the sequence diversity of its EGF-LD substrates. The structure of CePOFUT2-GDP-HsTSR1 highlighted that the CePOFUT2 binding domain contains large cavities that are filled by an intricate network of water molecules [61]. The complex is stabilized by a limited number of direct hydrogen bonds and stacking interactions between CePOFUT2 and HsTSR1 that are complemented by many water-mediated interactions. The proactive role of these water molecules is speculated to bestow promiscuity to CePOFUT2 toward dissimilar TSRs, which might be claimed for similar GTs that modify a wide variety of peptide sequences.

Mechanism of Catalysis
The catalytic mechanism is independent of the overall fold of GTs since both inverting and retaining enzymes are common among the GT-A and GT-B superfamilies. Nucleotidedependent GTs catalyze a glycosyl transfer reaction either by retention or inversion of stereochemistry at the anomeric reaction center of the donor substrate to generate diverse biological glycans with distinct anomeric configurations [8] ( Figure 10A). The orientation of the acceptor hydroxyl group relative to the donor anomeric carbon is the critical step in establishing the catalytic mechanism for GTs.

STs Display S N 2 Catalysis
STs are classified as metal-ion-independent inverting enzymes that employ an S N 2 single-displacement reaction mechanism ( Figure 10B), in which the nucleophilic hydroxyl group of the acceptor attacks the anomeric carbon of the donor sugar, i.e., sialic acid, and a catalytic base assists in the deprotonation of the nucleophile; the nucleotide moiety leaves from the opposite face, resulting in the inversion of the anomeric configuration of the product [16]. As aforementioned, histidines appear to serve as catalytic bases for the sialyltransfer reaction of GT29 STs, thus emphasizing the role of histidine as an important base in their catalytic mechanism ( Figure 4).

FUTs Display S N 1 or S N 2 Catalysis
Analogous to the ST family, the S N 2 single-displacement reaction mechanism is characteristic of fucosyltransferases such as CePOFUT2 61 and HsPOFUT2 [59] (Figure 10B). Despite the absence of ligands, FUTs utilizing an S N 2 inverting mechanism usually contain catalytic residues located in their binding pocket, such as Glu52 in CePOFUT2 [61] and Glu54 in HsPOFUT2 [59]. Contrary to POFUT2, an S N 1 mechanism involving the formation of a close ion pair is postulated for POFUT1, in which the glycosidic bond is cleaved before the nucleophilic attack [81]( Figure 10C). In CePOFUT1, Asn43 may be positioned at the hydroxyl group of the acceptor, close to the β-phosphate. However, Arg240 is the key catalytic residue, through which hydrogen bonding with the glycosidic-bound oxygen may facilitate the cleavage of the glycosidic bond. Additionally, in MmPOFUT1, Asn51 and Arg245 are engaged in hydrogen bonding with acceptor molecules and the βphosphate group, respectively [60]. It is notable that no basic residues are present in the active sites for CePOFUT1 and MmPOFUT1 that could act as assisting bases.
Based on in silico modeling of GDP-Fuc-bound HsFUT8 with A2Asn, an S N 2-like reaction is predicted for the enzyme, in which β-phosphate is speculated to play the role of the catalytic base [77,79]. Subsequently, the GDP:HsFUT8:A2-Asn crystal structure revealed that the motion of the β6-α8 loop brings the essential catalytic base Glu373 to the binding site in the presence of ligands [75], information which was not evident from previous computational studies [77,79], thus emphasizing the need to directly capture donor-enzyme-acceptor ternary complexes to delineate the molecular basis of catalysis for GTs. action is predicted for the enzyme, in which β-phosphate is speculated to play the role of the catalytic base [77,79]. Subsequently, the GDP:HsFUT8:A2-Asn crystal structure revealed that the motion of the β6-α8 loop brings the essential catalytic base Glu373 to the binding site in the presence of ligands [75], information which was not evident from previous computational studies [77,79], thus emphasizing the need to directly capture donorenzyme-acceptor ternary complexes to delineate the molecular basis of catalysis for GTs.  HsST3GAL1,HsST6GAL1,HsST8SIA2,HsST8SIA3,HsST8SIA4, and FUTs)(e.g.,CePOFUT2,HsPOFUT2,andHsFUT8). (C) SN1 inverting mechanism for CePOFUT1 and MmPOFUT1. The cyan E displays the enzyme (ST/FUT); the purple B represents the catalytic base, which deprotonates the acceptor nucleophile; and the red O indicates the nucleophilic oxygen of the acceptor substrate. CMP, GMP, and GDP (blue) are cytidine monophosphate, guanosine monophosphate, and guanosine diphosphate, respectively.  ., HsST3GAL1, HsST6GAL1, HsST8SIA2, HsST8SIA3, HsST8SIA4, and FUTs) (e.g., CePOFUT2, HsPOFUT2, andHsFUT8). (C) S N 1 inverting mechanism for CePOFUT1 and MmPOFUT1.
The cyan E displays the enzyme (ST/FUT); the purple B represents the catalytic base, which deprotonates the acceptor nucleophile; and the red O indicates the nucleophilic oxygen of the acceptor substrate. CMP, GMP, and GDP (blue) are cytidine monophosphate, guanosine monophosphate, and guanosine diphosphate, respectively.

Molecular Interactions of Porcine ST3GAL1 with a Potent Inhibitor That Mitigates Tumor Cell Metastasis
Cells treated with a lithocholic acid analog (Lith-O-Asp) revealed reduced activities of ST3GAL1, ST3GAL3, and ST6GAL1 in in vitro and cell-based activity analyses [81]. Lith-O-Asp has been proposed to abrogate tumor cell metastasis, partly by inhibiting ST activity to attenuate the expression of cell-surface sialylated antigens such as integrin-β1 and inhibit FAK/paxillin/Rho signaling activity in vivo [81]. To understand the nature of the interaction between the mammalian ST and Lith-O-Asp, we performed a molecular docking analysis between porcineST3GAL1 (PDB ID: 2WNB) and Lith-O-Asp that clearly revealed the strong binding affinity of Lith-O-Asp with active site amino acid residues with an interaction energy of −8.75 kcal/mol ( Figure 11). Our computational analysis revealed major interactions stabilizing the enzyme-inhibitor complex, which include: (a) a close interaction of the acidic end of Lith-O-Asp and His319, Thr 272, and Tyr233 of ST3GALI, and (b) interaction of the amine end of Lith-O-Asp with Glu324. Recently, Ortiz-Soto et al. [70] generated a model of HsST3GAL1 based on the ternary porcine ST3GAL1 complex and investigated the molecular interactions between the enzyme and its substrates to further understand the correlations among the structure, activity, and stability of ST3GAL1 in humans. The removal of hydrogen bonds and/or stacking interactions among both donor and acceptor substrates and residues such as Tyr191, Tyr230, Asn147, Ser148, and Asn170 influences the activity of ST3GAL1 to different extents. Intriguingly, the removal of disulphide Cys59-Cys64 reduces the activity of donor and acceptor substrates in vitro. Here, computational techniques could be employed to gain insight into the interactions of ST3GAL1 with its substrates to provide a theoretical model to further evaluate the interaction of Lith-O-Asp and similar metabolic inhibitors with ST3GAL1 and other sialyltransferases in humans. Thus, a computational biology approach toward developing selective therapeutic targets may act as a catalyst for drug discovery in cancer.

Molecular Interactions of Porcine ST3GAL1 with a Potent Inhibitor That Mitigates Tumor Cell Metastasis
Cells treated with a lithocholic acid analog (Lith-O-Asp) revealed reduced activities of ST3GAL1, ST3GAL3, and ST6GAL1 in in vitro and cell-based activity analyses [81]. Lith-O-Asp has been proposed to abrogate tumor cell metastasis, partly by inhibiting ST activity to attenuate the expression of cell-surface sialylated antigens such as integrin-β1 and inhibit FAK/paxillin/Rho signaling activity in vivo [81]. To understand the nature of the interaction between the mammalian ST and Lith-O-Asp, we performed a molecular docking analysis between porcineST3GAL1 (PDB ID: 2WNB) and Lith-O-Asp that clearly revealed the strong binding affinity of Lith-O-Asp with active site amino acid residues with an interaction energy of −8.75 kcal/mol ( Figure 11). Our computational analysis revealed major interactions stabilizing the enzyme-inhibitor complex, which include: (a) a close interaction of the acidic end of Lith-O-Asp and His319, Thr 272, and Tyr233 of ST3GALI, and (b) interaction of the amine end of Lith-O-Asp with Glu324. Recently, Ortiz-Soto et al. [70]generated a model of HsST3GAL1 based on the ternary porcine ST3GAL1 complex and investigated the molecular interactions between the enzyme and its substrates to further understand the correlations among the structure, activity, and stability of ST3GAL1 in humans. The removal of hydrogen bonds and/or stacking interactions among both donor and acceptor substrates and residues such as Tyr191, Tyr230, Asn147, Ser148, and Asn170 influences the activity of ST3GAL1 to different extents. Intriguingly, the removal of disulphide Cys59-Cys64 reduces the activity of donor and acceptor substrates in vitro. Here, computational techniques could be employed to gain insight into the interactions of ST3GAL1 with its substrates to provide a theoretical model to further evaluate the interaction of Lith-O-Asp and similar metabolic inhibitors with ST3GAL1 and other sialyltransferases in humans. Thus, a computational biology approach toward developing selective therapeutic targets may act as a catalyst for drug discovery in cancer.

Structural Modeling of STs and FUTs
Molecular modeling of STs and FUTs is challenging since the number of available crystal structures remains limited for humans. Previously, in the absence of crystal structures, homology modeling along with fold recognition or threading techniques were used to predict structures. The first structural model for the mammalian FUT family, FUT4 in mice, was developed using this technique [82]. Furthermore, homology models have been proposed for HsFUT3 and HsFUT7, as described in detail by de Vries et al. [41]. Homology models for HsST8SIA1 and HsST8SIA4 have also been developed with RosettaCM50 using a template alignment generated with Modeller [72]. Additionally, the interaction of HsST8SIA4 with its acceptor substrate has been explored with RosettaDock [72]. The SWISS-MODEL server has been used to generate a homology model of HsST3GAL1using the crystal structure of porcine ST3GAL [70]. Strecker et al. used Schrodinger's Maestro software to recreate a model of HsFUT8 for donor substrate binding using its crystal structure [83]. In addition, a molecular dynamic (MD) simulation has been performed with Desmond v3 to explore the flexibility of HsFUT8 [83]. Since both STs and FUTs are inverting glycosyltransferases that involve large conformational movements, we propose that the MD simulation techniques could be useful to study the flexible loops that are proposed in their catalysis.

Conclusions
STs and FUTs display conformational plasticity upon substrate binding and catalysis. Notable progress has been made in the last decade in the successful crystallization and molecular modeling of several STs and FUTs, which has opened new avenues to fill gaps in our understanding of their structural architecture, interactions with substrates, and catalytic mechanisms. This progress holds the promise to impact medical and biotechnological development. Despite the remarkable breakthrough in the determination of high-resolution crystal structures of mammalian STs and FUTs in both apo and binary complexes, very few are available as ternary complexes. It is therefore essential to capture donor-enzymeacceptor ternary complexes to delineate the molecular basis of the catalytic mechanisms of these glycosyltransferases, which are sought-after therapeutic targets in diseases such as cancer.