GAG-DB, the New Interface of the Three-Dimensional Landscape of Glycosaminoglycans

Glycosaminoglycans (GAGs) are complex linear polysaccharides. GAG-DB is a curated database that classifies the three-dimensional features of the six mammalian GAGs (chondroitin sulfate, dermatan sulfate, heparin, heparan sulfate, hyaluronan, and keratan sulfate) and their oligosaccharides complexed with proteins. The entries are structures of GAG and GAG-protein complexes determined by X-ray single-crystal diffraction methods, X-ray fiber diffractometry, solution NMR spectroscopy, and scattering data often associated with molecular modeling. We designed the database architecture and the navigation tools to query the database with the Protein Data Bank (PDB), UniProtKB, and GlyTouCan (universal glycan repository) identifiers. Special attention was devoted to the description of the bound glycan ligands using simple graphical representation and numerical format for cross-referencing to other databases in glycoscience and functional data. GAG-DB provides detailed information on GAGs, their bound protein ligands, and features their interactions using several open access applications. Binding covers interactions between monosaccharides and protein monosaccharide units and the evaluation of quaternary structure. GAG-DB is freely available.


Introduction
Proteoglycans (PGs) constitute a diverse family of proteins that occur in the extracellular matrix (ECM) and pericellular matrix (PCM) and on the surface of mammalian cells. They consist of a core protein and one or more covalently attached glycosaminoglycan (GAG) chains. PGs play critical roles in numerous biological processes, which are mediated by both their protein part and their GAG chains [1,2].
GAGs refer to six major polysaccharides in mammals: chondroitin sulfate (CS) [3], dermatan sulfate (DS), heparin (HP), heparan sulfate (HS) [4,5], hyaluronan (HA) [6], and keratan sulfate [7,8]. Their molecular mass ranges from a few kDa to several million Da for hyaluronan. Despite significant compositional differences, GAGs also share common features. They are linear polysaccharides made of disaccharide repeats. The disaccharides are composed of uronic acid and an hexosamine, alternatively involves galactose (Galp) and N-acetylglucosamine (GlcpNAc) [7]. In contrast to the five other GAGs, hyaluronan is not sulfated and does not bind covalently to proteins to form proteoglycans. Variations in the pattern of GAG sulfation at various positions, create an impressive structural diversity. Two hundred and two unique disaccharides of mammalian GAGs have been identified so far, including 48 theoretical disaccharides in HS [9].
In addition to their contribution to the physicochemical properties of PGs, GAGs play an essential role in the organization and assembly of the extracellular matrix. They also regulate numerous biological processes by interacting with proteins in the extracellular milieu and at the cell surface. The six mammalian GAGs were shown to interact with 827 proteins in the recently published GAG interactome [10].
Many of these GAG interactions have been investigated and characterized in health and disease. According to [10], they take place in various locations (intracellular, cell surface, secreted, and blood proteins) and the protein partners range from individual growth factors (e.g., fibroblast growth factor-2) to large multidomain extracellular proteins such as collagens I and V, and fibronectin with different affinity and half-life [11,12]. These proteins are involved in a variety of biological processes such as extracellular matrix assembly, cell signaling, development, and angiogenesis [10,13,14]. Besides, glycosaminoglycans play a role in host-pathogen interactions by binding to bacterial, viral, and parasite proteins [15][16][17][18][19][20]. The significance of the understanding and mastering the molecular features underlying the interaction of GAGs to proteins was magistrally demonstrated by the development of the antithrombotic drugs as reviewed in [21].  [22]. The abbreviations are as follows: Glcp for glucose, Idop for idose), Galp for galactose, N for amine, S for sulfate, A for acid, and NAc for Nacetyl.  [22]. The abbreviations are as follows: Glcp for glucose, Idop for idose), Galp for galactose, N for amine, S for sulfate, A for acid, and NAc for N-acetyl.
The length, sequence, substitution pattern, charge, and shape of GAGs control both their physicochemical properties and their biological functions. Understanding the functions of GAGs first requires methods to accurately assess their molecular weight, their composition and their sequences. This is made possible through ongoing progress in mass spectrometry, and heparan sulfate has been sequenced by liquid chromatography-tandem mass spectrometry (LC-MS/MS) [23][24][25][26][27]. Furthermore, the structural and conformational complexity of GAGs challenges the characterization of their three-dimensional features using either experimental or theoretical methods. In a sense, GAGs concentrate most on the difficulties faced in structural glycoscience. They combine the challenges associated with both glycans and polyelectrolytes. Several experimental techniques have been used to solve GAG structures, including fiber X-ray crystallography, nuclear magnetic resonance (NMR) [28,29], electron microscopy, small-angle X-ray scattering (SAXS) [30], and neutron scattering (elastic incoherent neutron scattering EINS [31], and small-angle neutron scattering SANS [32]. Still, no single technique can cope with such complexity, but, computational methods offer valuable tools to integrate partial information collected experimentally. These, in turn, are useful to validate and improve simulation strategies. However, these approaches remain limited due to the intrinsic properties of GAGs. Like any other complex glycans, they are highly flexible, create many solvent-mediated interactions and have a polyanionic character. Nevertheless, progress in this field is underway, as detailed in [33] that investigates structures from monosaccharides to polysaccharides.
GAG-protein complex structures available in the PDB have been compiled by Samsonov and coworkers [34]. They concluded that this dataset does not represent the diversity of natural GAG sequences. It implies that computational approaches will be critical in understanding GAG structural biology and their mechanisms of interaction with their protein partners [35][36][37]. Significant progress has been made to investigate GAG structures, isolated and complexed with proteins, both at all-atom and coarse-grained levels [33,[38][39][40][41]. However, appropriate tools for data mining of GAG-protein interactions are still missing [12,14].
MatrixDB (http://matrixdb.univ-lyon1.fr/) is a biological database focused on molecular interactions between extracellular proteins and polysaccharides [42]. It offers the first step to investigate the molecular mechanisms of GAGs-protein interactions. In this resource, building and displaying the three-dimensional structural models of GAGs was rationalized through an effort to standardize the format of GAGs sequences and group GAG disaccharides into a limited number of families [9]. However, the relative spatial orientations of key GAG chemical groups interacting with (potential) "hot spots" on the proteins was not characterized. The conformational features displayed by the long-chain GAGs polysaccharides were not considered either. To move forward, we collected further evidence of experimental GAG and GAG-protein interaction data, from databases and in the relevant literature.
Experimental details of protein or protein complex three-dimensional structures are comprehensively recorded in the Protein Data Base [43] While being an essential repository, the glycan-related data stored in the PDB is not easily accessible to non-glycoscientists. This difficulty was identified in the glycoscience community and gave rise to several initiatives. Tools were designed to correct inconsistencies in the data [44][45][46]. Data was organized in publicly available databases, cross-referenced, and interoperable with the glycomic, and other omic, databases to ease data access and analysis, such as Glyco3D [47], UniLectin3D [48], and MatrixDB [42] for GAG-extracellular protein complexes. We now report the development of GAG-DB, a database containing three-dimensional data on GAGs and GAGs-protein complexes retrieved from the PDB. It includes protein sequences and standard nomenclature of GAG composition, sequence, and topology. It provides a family-based classification of GAGs, cross-referenced with glyco-databases, with links to UniProtKB via accession numbers [49]. The 3D visualization of contacts between GAGs and their protein ligands is implemented via the protein-ligand interaction profiler (PLIP) [50] and the nature of the structure that GAG polysaccharides can adopt, either in the solid-state or in solution is also reported. Finally, characterized quaternary structures of the complexes improve understanding if and how GAGs participate in long-range, multivalent, binding with the potential synergy when several chains are involved in interactions.

Database Construction
GAG-DB is available at https://www.gagdb.glycopedia.eu. The database is populated with information extracted from the PDB [51]. It includes the three-dimensional structural information on GAG and GAG oligosaccharides in interaction with proteins. We propose a classification based on the nature of GAGs, e.g., hyaluronan, heparin/heparan sulfate, chondroitin sulfate/dermatan sulfate, and keratan sulfate. GAG mimetics are included, as long as they appear in the PDB. The content of GAG-DB is focused on three-dimensional data, with an appropriate curation of the nomenclature, and extended related information. The entries are structures of GAGs and GAG protein complexes obtained by a wide range of methods.
To avoid any confusion; we note that under the name GAG database, a resource to gather genomic annotation cross-references has been developed and published in 2013 (The GAG database: a new resource to gather genomic annotation cross-references, T Obadia, O Sallou, M Ouedraogo, G Guernec, Available annotation data includes all transcripts and their identifiers, functional description of genes, chromosomal localisation, gene symbols, gene homologs for model species (human, chicken, mouse), and several identifiers to link those genes to external databases (UniProt, HGNC).
The GAG-DB database contains 15 entries of long-chain GAGs established from fiber X-ray diffraction. A value of 3.0 Å is assigned to the structural models that have been proposed from X-ray fiber diffraction, and to 0 for those established by solution NMR or X-ray scattering (the structures are not filtered). It also contains 125 manually curated entries extracted from PDBe [52,53] (September 2020 release). These three-dimensional structures have been experimentally determined with methods involving either X-ray single-crystal diffraction, or X-ray fiber diffraction and solution NMR, in conjunction with molecular modeling. The number of GAG-protein complexes amounts to 105. The value of the resolution index indicates the accuracy of the experimental conditions, high values (e.g., 4 Å) indicate a poor resolution and low values (e.g., 1.5 Å) a good resolution. The median resolution for X-ray crystallographic data in the Protein Data Bank is 2.05 Å. Proteins of the database can be grossly separated into enzymes and skeletal proteins. Interestingly, the size distribution of oligosaccharides complexed with proteins varies from 34 disaccharides to only one polysaccharide with a degree of polymerization (DP) of 10 (DP 3 (1), DP 4 (18), DP 5 (13), DP 6 (15), DP 7 (7), DP 8 (8), and DP 9 (1). More than 80% of the GAGs involved in the complexes are heparin and hyaluronic acid oligosaccharides. However, these figures tend to reflect the interest of a community in investigating those GAGs more obviously involved in biological and biomedical applications.
Our collection is far from covering the molecular diversity of GAGs. This lack of data echoes the limitations of carbohydrate synthesis that fails to provide sufficiently long sequences needed to properly investigate the molecular features driving interactions with proteins. Nonetheless, progress is in sight, as recently described in [54,55].
At present, information associated with each entity of the database is added manually. This allows for proper curation and annotation, at the expense of a time lag between the date of deposition and the date of release in the database. Technically, the database was developed with PHP version 7, Bootstrap version 3 and MySQL database version 7. The interface is compatible with all devices and browsers. The pages are dynamically generated to match user-selected search criteria in the query window. Interactive graphics are developed in JavaScripts on D3JS libraries version 3. A tutorial is available on the first page.

Description of the Search Interface
The database can be searched and explored with an advanced search tool handling a range of criteria. Figure 2 shows the different fields that can be searched. Possible inputs are: • The name of the polysaccharide gag_name, or its protein ligand, macromolecule_name. • Cross-entries with external databases, such as pdb, UniProt, and repository GlytouCan.

•
The biological role, such as function, process, or cellular compartment (compliant with Gene Ontology terms).

•
The origin such as organism.

•
The experimental condition(s) used to solve the structure: method and resolution.

•
Characteristics of the GAG such as nature, (is_gag differentiates GAG and mimetics) and size (gag_max, gag-length, and gag-mass

Description of the Search Interface
The database can be searched and explored with an advanced search tool handling a range of criteria. Figure 2 shows the different fields that can be searched. Possible inputs are:  The name of the polysaccharide gag_name, or its protein ligand, macromolecule_name.  Cross-entries with external databases, such as pdb, UniProt, and repository GlytouCan.  The biological role, such as function, process, or cellular compartment (compliant with Gene Ontology terms).  The origin such as organism.  The experimental condition(s) used to solve the structure: method and resolution.  Characteristics of the GAG such as nature, (is_gag differentiates GAG and mimetics) and size (gag_max, gag-length, and gag-mass

Curated Information for Each GAG Entry
For each entry, a detailed page is available, with 3D visualization, interactions, conformations, nomenclature, and links to external databases ( Figure 3). The PDB code assigned to each entry is used to list alternative structures and to display additional information. Each structure is related to a protein with a UniProt accession number [49]. Each oligosaccharide is given a GlyTouCan identifier [60]. The 3D structures of the protein and the interacting GAG are visualized directly and interactively with LiteMol [61] and NGL Viewers [62]. High-resolution images of both the protein-GAG complex and the GAG are available for download. The atomic coordinates of the GAG, isolated from its interaction with the proteins, can also be downloaded for further use.

Curated Information for Each GAG Entry
For each entry, a detailed page is available, with 3D visualization, interactions, conformations, nomenclature, and links to external databases (Figure 3). The PDB code assigned to each entry is used to list alternative structures and to display additional information. Each structure is related to a protein with a UniProt accession number [49]. Each oligosaccharide is given a GlyTouCan identifier [60]. The 3D structures of the protein and the interacting GAG are visualized directly and interactively with LiteMol [61] and NGL Viewers [62]. High-resolution images of both the protein-GAG complex and the GAG are available for download. The atomic coordinates of the GAG, isolated from its interaction with the proteins, can also be downloaded for further use. GAG-DB cross-references to several other databases that rely on a variety of strategies for visualizing the interaction between the GAG ligand and its protein environment (Figure 4). Several applications are available through the four different PDB sites, RSCG ORG, PDBe [51,53], PDBj [63], and PDB SUM [64]. Biomolecules 2020, 10, x FOR PEER REVIEW 6 of 17 GAG-DB cross-references to several other databases that rely on a variety of strategies for visualizing the interaction between the GAG ligand and its protein environment (Figure 4). Several  Additional information on the interactions formed between the GAG and the protein can also be obtained using the protein-ligand interaction profiler (PLIP) server [50]. The NGL viewer [62] adapted to SwissModel [65] displays the interactions identified by the PLIP application that calculates and displays atomic level interactions (hydrogen bonds, hydrophobic, water bridge, etc.) occurring between GAGs and proteins. The specific features of the glycans interacting with the surrounding amino acid residues and possible metal ions are shown in 3D. The SwissModel application [65] provides direct access to the PDBsum deployed by the EMBL-EBI [64], CATH [66], and PLIP [50].
A cross-link to the PISA application [67,68] enables the exploration of quaternary structure formation and stability. The potential contribution of GAGs to the formation of quaternary macromolecular complexes requires the evaluation of energetic stability. The structural information relates to the interfaces between the macromolecular entities, the individual monomers, and the resulting assemblies, from which complex stability can be assessed or predicted. Supplementary Figures S1 and  S2 provide examples of the interaction features offered by several visualization applications.

Monosaccharides
Repeated disaccharide units of glycosamine and uronic acids with a non-uniform distribution of sulfated and acetylated groups along the chain constitute the main structural features of sulfated GAGs. Despite the high diversity of potential structures, only 28 unique monosaccharide structures occur in GAGs. Three of them correspond to 4,5 unsaturated uronic acids resulting from the eliminative cleavage of GAGs oligo-or polysaccharides containing (1->4)-linked d-glucuronate or l-iduronate residues and (1->4)-alpha-linked 2-sulfoamino-2-deoxy-6-sulfo-d-glucose residues to give oligosaccharides with terminal 4-deoxy-alpha-d-gluco-4-enuronosyl groups at their non-reducing ends.
The cartoon representation of monosaccharides was extended [42] in compliance with the SNFG representation of glycans [22] to link this description with the GlycoCT [58] and condensed IUPAC [56] codes of the monosaccharides.
While these nomenclatures have become widely popular in the field of glycoscience, they are not used to identify and describe monosaccharides in the PDB, which has its carbohydrate nomenclature in its ligand dictionary [69]. Therefore, we established the cross-references between some of these nomenclatures ( Figure 5).
The cartoon representation of monosaccharides was extended [42] in compliance with the SNFG representation of glycans [22] to link this description with the GlycoCT [58] and condensed IUPAC [56] codes of the monosaccharides.
While these nomenclatures have become widely popular in the field of glycoscience, they are not used to identify and describe monosaccharides in the PDB, which has its carbohydrate nomenclature in its ligand dictionary [69]. Therefore, we established the cross-references between some of these nomenclatures ( Figure 5).  Except for l-idopyranosides, and the 4,5 unsaturated uronic acids, all monosaccharides exist as hexopyranosides. The predominant conformation being 4 C 1 . As for l-idopyranosides, the following 1 C 4 , 4 C 1 , and 2 S 0 conformations may be found. Figure 6 depicts the 3-dimensional representations of these unusual conformations, along with the corresponding SNFG extensions.

Disaccharides
The PDB dataset consisting of 105 proteins-GAG complexes contains 270 disaccharides [9]. Table 1 displays the major disaccharides as extracted from GAG-DB.

Major Disaccharides Found in the GAG-Protein
Complexes Extracted from the PDB Number Such a rich set of experimental data provides useful information to validate and improve computational strategies to build GAG models. The determination of the conformational preferences of GAG disaccharides can be assessed by computing potential energy surfaces as a function of their glycosidic torsion angles Φ and Ψ as implemented in the CAT application [70]. As an example, Figure 7 displays two such potential energy surfaces (alternatives are not shown). In all cases, the experimentally observed Φ and Ψ are plotted on the corresponding potential energy surfaces. While being somehow scattered, they are all located on the lowest energy basins.
Biomolecules 2020, 10, x FOR PEER REVIEW 10 of 17 Such a rich set of experimental data provides useful information to validate and improve computational strategies to build GAG models. The determination of the conformational preferences of GAG disaccharides can be assessed by computing potential energy surfaces as a function of their glycosidic torsion angles  and  as implemented in the CAT application [70]. As an example, Figure  7 displays two such potential energy surfaces (alternatives are not shown). In all cases, the experimentally observed  and  are plotted on the corresponding potential energy surfaces. While being somehow scattered, they are all located on the lowest energy basins. Similar features are observed for all disaccharides (or disaccharide units) irrespective of the presence and the positions of sulfate groups on the monosaccharides. The agreement between the repertoire of the experimentally determined conformations and those predicted by computational methods provided the basis to develop a pipeline to translate glycosaminoglycans sequences into 3D models (http://glycan-builder.cermav.cnrs.fr/gag/) [47].

GAG Structures in the Solid-State
The solid-state features of chondroitin sulfate, dermatan sulfate, hyaluronan, and keratan sulfate have been established by X-ray fiber diffraction [71] (Table 2) and are available in GAG-DB. They encompass several allomorphs that occur in different experimental conditions, including the nature of the counterions (Na + , Ca ++ , and K + ). More structural features such as the polarity of the polysaccharide chains, their interactions with the counterions and packing features can be deduced.
The organization of all these polysaccharide chains in the form of helices seems recurrent. Two parameters, n and h, characterize helical structures, where n is the number of repeat units (disaccharide unit) per turn of the helix and h is the projection of one repeat unit on the helical axis. The sign attributed to n indicates the chirality of the helix. The positive value of n corresponds to the right-handed helix and a negative value to a left-handed helix. Such helical descriptors provide a Similar features are observed for all disaccharides (or disaccharide units) irrespective of the presence and the positions of sulfate groups on the monosaccharides. The agreement between the repertoire of the experimentally determined conformations and those predicted by computational methods provided the basis to develop a pipeline to translate glycosaminoglycans sequences into 3D models (http://glycan-builder.cermav.cnrs.fr/gag/) [47].

GAG Structures in the Solid-State
The solid-state features of chondroitin sulfate, dermatan sulfate, hyaluronan, and keratan sulfate have been established by X-ray fiber diffraction [71] (Table 2) and are available in GAG-DB. They encompass several allomorphs that occur in different experimental conditions, including the nature of the counterions (Na + , Ca ++ , and K + ). More structural features such as the polarity of the polysaccharide chains, their interactions with the counterions and packing features can be deduced. Table 2. Characterization of the helix symmetry of GAGs polysaccharides in the solid-state.

Glycosaminoglycans Structure of the Main Repeating Disaccharides Helix Symmetry
Ref. The organization of all these polysaccharide chains in the form of helices seems recurrent. Two parameters, n and h, characterize helical structures, where n is the number of repeat units (disaccharide unit) per turn of the helix and h is the projection of one repeat unit on the helical axis. The sign attributed to n indicates the chirality of the helix. The positive value of n corresponds to the right-handed helix and a negative value to a left-handed helix. Such helical descriptors provide a simple way to classify the secondary structures and their potential allomorphs.

Hyaluronan -4)-β-d-GlcpA-(1-3)-β-d-
As with the disaccharide segments of GAGs, the values of the Φ and Ψ torsional angles found in all the conformations of GAGs fall in the low energy regions of the corresponding potential energy surfaces. It is therefore relevant to question whether secondary structures other than those derived from crystallographic characterization do occur. The sets of (Φ, Ψ) values corresponding to the low energy conformations can be propagated regularly, to generate structures, which can be further optimized to form integral helices. When applied to hyaluronan structures, the analysis indicates that this polysaccharide display a wide range of energetically stable helices (Figure 8). They span the left-handed 4-fold symmetry to the right-handed five-fold symmetry with a rise per disaccharide between 9.51 and 10.13 Å [81]. As with the disaccharide segments of GAGs, the values of the Φ and Ψ torsional angles found in all the conformations of GAGs fall in the low energy regions of the corresponding potential energy surfaces. It is therefore relevant to question whether secondary structures other than those derived from crystallographic characterization do occur. The sets of (Φ, Ψ) values corresponding to the low energy conformations can be propagated regularly, to generate structures, which can be further optimized to form integral helices. When applied to hyaluronan structures, the analysis indicates that this polysaccharide display a wide range of energetically stable helices (Figure 8). They span the lefthanded 4-fold symmetry to the right-handed five-fold symmetry with a rise per disaccharide between 9.51 and 10.13 Å [81]. The results indicate that small variations in the glycosidic torsion angles might have a significant influence on the symmetry and pitch of the resulting helices without any noticeable energetic cost. This illustrates the capacity of hyaluronic acid to display different sites available for interactions with proteins and would occur, at no cost in energy, without altering the directionality of the polysaccharide chain. The results indicate that small variations in the glycosidic torsion angles might have a significant influence on the symmetry and pitch of the resulting helices without any noticeable energetic cost. This illustrates the capacity of hyaluronic acid to display different sites available for interactions with proteins and would occur, at no cost in energy, without altering the directionality of the polysaccharide chain.

GAG Structures in Solution
The database contains the structure of heparin as established by NMR in solution (PDB entry 1HPN, 1XT3) and analogue (2ERM). Other structures have been reported for the solution structures of four different heparin oligosaccharides, determined by a combination of analytical ultracentrifugation, synchrotron X-ray solution scattering that gave the radii of gyration and maximum length extension [30,83] (PDB code 3IRI, 3IRJ, 3IRK, and 3IRL). Constrained molecular modeling of randomized heparin conformers resulted in 9-15 best-fit structures for each degree of polymerization (dp) DP18, DP24, DP30, and DP36 that indicated flexibility and the presence of short linear segments in mildly bent structures. All the conformations of the experimental conformations are somewhat scattered. They are all located in the lowest energy region of the corresponding Φ and Ψ maps (see Figure 9). The idopyranose residues experienced some changes, either 1 C 4 or 2 S 0 , without any influence on the Φ and Ψ maps. This establishes a model of heparin in solution as a semi-rigid object.  Such a computational protocol was used to model the disordered features of hyaluronic acid [81] and chondroitin sulfate [84]. As with heparin, the semi-rigid behavior and the stiffness of these GAGs polysaccharides could be established.

Conclusions
The aim of the article was to integrate three-dimensional data of GAGs, GAGs oligosaccharides as complexed with proteins. The sources of data are multiple: X-ray fiber diffraction, solution NMR, small angle X-ray scattering for GAGs, and X-ray biomolecular crystallography for protein-GAGs and protein-GAG mimetics complexes. A series of descriptors were selected to guide the search. They include cross-references to PDB, UniProtKB, MatrixDB, and GlyTouCan. GAG-DB opens the possibility of deciphering the full potential of GAGs as bioactive fragments or a structurally important multivalent scaffold for interaction synergy at assembling proteins within quaternary structures. The inspection of the many features of the database supports the reporting of robust facts/knowledge and the determination of what remains to be investigated or discovered. The amount and the quality of the 3D structures of GAG-protein complexes are amenable to comparison Such a computational protocol was used to model the disordered features of hyaluronic acid [81] and chondroitin sulfate [84]. As with heparin, the semi-rigid behavior and the stiffness of these GAGs polysaccharides could be established.

Conclusions
The aim of the article was to integrate three-dimensional data of GAGs, GAGs oligosaccharides as complexed with proteins. The sources of data are multiple: X-ray fiber diffraction, solution NMR, small angle X-ray scattering for GAGs, and X-ray biomolecular crystallography for protein-GAGs and protein-GAG mimetics complexes. A series of descriptors were selected to guide the search. They include cross-references to PDB, UniProtKB, MatrixDB, and GlyTouCan. GAG-DB opens the possibility of deciphering the full potential of GAGs as bioactive fragments or a structurally important multivalent scaffold for interaction synergy at assembling proteins within quaternary structures. The inspection of the many features of the database supports the reporting of robust facts/knowledge and the determination of what remains to be investigated or discovered. The amount and the quality of the 3D structures of GAG-protein complexes are amenable to comparison between the observed and the calculated 3D descriptors. Such a rich set of experimental information provides a solid basis for validating and improving computational strategies. We could confirm previously described features such as the lack of counterion effect in the interaction between GAGs; the definition of the preferred amino acids bringing the electrostatic neutrality of the interaction; and the lack of influence of sulfate groups on the glycosidic torsion angles. All the observed conformations fell within the low energy basins, thereby comforting the suitability of the computational protocol to model GAGs conformation in a disordered state. An emerging picture is the description of these polysaccharide chains as propagating linearly in a preferred direction, with extended fragments separated by kinks. The semi-rigid character of the chains involves microarchitectural domains. They contain preformed conformation for optimal binding to protein targets. The separation of such domains, at a long enough distance, offers the possibility of multivalent binding to create further spatial arrangements that can induce the formation of functional assemblies of proteins.

Conflicts of Interest:
The authors declare no conflict of interest.