Three-Dimensional Structures of Carbohydrates and Where to Find Them

Analysis and systematization of accumulated data on carbohydrate structural diversity is a subject of great interest for structural glycobiology. Despite being a challenging task, development of computational methods for efficient treatment and management of spatial (3D) structural features of carbohydrates breaks new ground in modern glycoscience. This review is dedicated to approaches of chemo- and glyco-informatics towards 3D structural data generation, deposition and processing in regard to carbohydrates and their derivatives. Databases, molecular modeling and experimental data validation services, and structure visualization facilities developed for last five years are reviewed.


Introduction
Knowledge of carbohydrate spatial (3D) structure is crucial for investigation of glycoconjugate biological activity [1,2], vaccine development [3,4], estimation of ligand-receptor interaction energy [5][6][7] studies of conformational mobility of macromolecules [8], drug design [9], studies of cell wall construction aspects [10], glycosylation processes [11], and many other aspects of carbohydrate chemistry and biology. Therefore, providing information support for carbohydrate 3D structure is vital for the development of modern glycomics and glycoproteomics.
Appending of structural repositories with 3D structural data opens the way for computational glycobiology and modeling of carbohydrate structures at atomic resolution. Design of novel workflows and techniques to connect carbohydrate spatial structure modes and experimental data with verification, processing, analysis and deposition of associated data has gained increased popularity in glycoscience community [27]. A Carbohydrate Structure Database (CSDB, [28]) module for carbohydrate 3D structure modeling is a demonstrative example of 3D structural data integration facilities (as a database) combined with dedicated interface (as a glycoinformatics project). Further details on CSDB 3D facilities are discussed below.  Herein we focus on the important aspects of carbohydrate 3D structure availability to researchers: structural repositories; glycoinformatics tools and workflows to assist structure building, modeling and erroneous molecular geometry data detection and remediation; carbohydrate 3D structure presentation and visualization methods.

Structural Databases
Structural databases make significant contribution to bringing information technologies to glycoscience [29]. With no focus on spatial structure, glycan databases and online tools have been recently reviewed [30][31][32]. Depositing huge number of carbohydrates with detailed data for each entry, databases are valuable sources of structural information, biological assignments, references and external links. Structural data are often accompanied by original and sometimes assigned experimental observables: NMR spectra, HPLC and MS profiles, etc. The services built on top of the databases can include 3D structure simulation, validation, and storage. A viewpoint of the authors at the ideal integration of data resources and services in glycoinformatics is summarized in Figure 2. A subject of this review is databases providing theoretical or empirical 3D structures of carbohydrates and related data-mining tools. Herein we focus on the important aspects of carbohydrate 3D structure availability to researchers: structural repositories; glycoinformatics tools and workflows to assist structure building, modeling and erroneous molecular geometry data detection and remediation; carbohydrate 3D structure presentation and visualization methods.

Structural Databases
Structural databases make significant contribution to bringing information technologies to glycoscience [29]. With no focus on spatial structure, glycan databases and online tools have been recently reviewed [30][31][32]. Depositing huge number of carbohydrates with detailed data for each entry, databases are valuable sources of structural information, biological assignments, references and external links. Structural data are often accompanied by original and sometimes assigned experimental observables: NMR spectra, HPLC and MS profiles, etc. The services built on top of the databases can include 3D structure simulation, validation, and storage. A viewpoint of the authors at the ideal integration of data resources and services in glycoinformatics is summarized in Figure 2. A subject of this review is databases providing theoretical or empirical 3D structures of carbohydrates and related data-mining tools.

Figure 2.
Networking between glycoinformatics projects and related services that promotes achievement of data integration in glycomics. Reproduced with permission from [29], © 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
The majority of existing repositories for carbohydrate 3D structures offer open-access data via web interface. Deposited datasets can be represented by glycoproteins, protein-carbohydrate complexes, poly-and oligosaccharides with 3D structure experimentally resolved or specified by means of NMR, X-ray crystallography, cryoEM, small angle X-ray scattering, etc. [27]. Several databases such as GLYCAM-Web, EK3D, 3DSDSCAR, GlycoMapsDB contain data from molecular dynamics simulations. We have also mentioned databases featuring information on protein structures involving carbohydrate moiety in terms of glycosylation (as post-translational modification, dbPTM), carbohydrate active enzymes (CAZy) and homology modeling (SWISS-MODEL). Table 1 displays currently active structural databases maintaining three-dimensional data on carbohydrates.
For Table 1, we have selected carbohydrate and related databases using the following criteria: • Database can be freely accessed through web user interface; • Database must contain experimentally confirmed and/or predicted 3D structures (preprocessed and/or generated on-the-fly from a primary structure input) of glycans, glycoproteins, or protein-carbohydrate complexes; • Stored 3D structures must be deposited as atomic coordinates in PDB, MOL, or other format, and the structures must contain a saccharide moiety; • Databases with records linked to other large 3D data collections (e.g., RCSB PDB, PDBe, PDBj, PDBsum, UniProtKB etc.) are included in Table 1 (as long as database entries contain carbohydrate moiety, e.g., as a part of a lectin or an antibody); • Databases with derived carbohydrate 3D structural data (conformational maps, conformer energy minima, etc.) are included in Table 1 even if they provide no atomic coordinates (e.g., GlycoMapsDB and GFDB).
The majority of existing repositories for carbohydrate 3D structures offer open-access data via web interface. Deposited datasets can be represented by glycoproteins, protein-carbohydrate complexes, poly-and oligosaccharides with 3D structure experimentally resolved or specified by means of NMR, X-ray crystallography, cryoEM, small angle X-ray scattering, etc. [27]. Several databases such as GLYCAM-Web, EK3D, 3DSDSCAR, GlycoMapsDB contain data from molecular dynamics simulations. We have also mentioned databases featuring information on protein structures involving carbohydrate moiety in terms of glycosylation (as post-translational modification, dbPTM), carbohydrate active enzymes (CAZy) and homology modeling (SWISS-MODEL). Table 1 displays currently active structural databases maintaining three-dimensional data on carbohydrates.
For Table 1, we have selected carbohydrate and related databases using the following criteria: • Database can be freely accessed through web user interface; • Database must contain experimentally confirmed and/or predicted 3D structures (preprocessed and/or generated on-the-fly from a primary structure input) of glycans, glycoproteins, or protein-carbohydrate complexes; • Stored 3D structures must be deposited as atomic coordinates in PDB, MOL, or other format, and the structures must contain a saccharide moiety; • Databases with records linked to other large 3D data collections (e.g., RCSB PDB, PDBe, PDBj, PDBsum, UniProtKB etc.) are included in Table 1 (as long as database entries contain carbohydrate moiety, e.g., as a part of a lectin or an antibody); • Databases with derived carbohydrate 3D structural data (conformational maps, conformer energy minima, etc.) are included in Table 1 even if they provide no atomic coordinates (e.g., GlycoMapsDB and GFDB).

2013present
• mammalian glycans • pre-built libraries of predicted 3D structures of common bioglycans • 3D structure models * • 3D-atomic coordinates generation (http://glycam.org/Pre-builtLibraries.jsp) a Where unknown, the year of the first publication is given. b Database is marked as curated if manual verification of data was reported in the original publication or at the database web site. c Published coverage data can be outdated; database interface provides no statistics on current coverage. * Database provides no search facilities for indicated carbohydrate 3D structural data.

Carbohydrate 3D Structure Modeling
Methods to probe a 3D structure of carbohydrate-containing biomolecules has been developed for decades. NMR techniques (interatomic distances derived from NOE, and torsion angles derived from coupling constants), X-ray crystallography, and electron cryo-microscopy (the two latter being atomic models built on the basis of electron density map) are among most demanded methods for 3D strucural elucidation. These methods have been reviewed [93][94][95][96] and are beyond the scope of this review focused in information technologies. For use of instrumental methods for the validation of a simulated structure, please refer to Section 5 "Experimental data validation".
Structural investigation of large biological systems involving protein-glycan interactions requires leveraging more resources and employing more complex experimental techniques compared to solely oligo-and polysaccharides studies. Advances in NMR methods hold great potential for direct spatial structure determination of carbohydrate-protein complexes in solution based on intermolecular NOEs which affords estimation of atomic contacts between a protein and a carbohydrate ligand [97,98]. Further extraction of NOE-derived distance restraints for a saccharide molecule results in generation of representative conformational ensembles [99][100][101].
To date, the following theoretical models and methods are applied for in silico design of carbohydrate three-dimensional structure [112][113][114][115][116]: Based on Scopus [135] article count we estimated the application rate for quantum mechanics (10759 publications) and molecular mechanics (14871 publications) methods applied for carbohydrate structure modeling for the recent five years (2015-2020). Search queries included abundant carbohydrate terms, typical glycan moieties, and common modeling approaches (query details are given in Supplementary Table S1). In spite of growing interest to QM approaches in carbohydrate structure simulation, the major contribution to the statistics for such resource-intensive calculations is application of QM to relatively simple model compounds. For complex bioglycans in solution predominance of MM methods is more pronounced [6,8].

Molecular Mechanics and Dynamics
Molecular dynamics methods have achieved broad scope of application in terms of reasonable computer resource consumption. They fulfill advantageous compromise between calculation accuracy and performance, when applied to glycan molecules and their structural complexity (variety of known monomeric elements, presence of ionogenic groups), high bridge flexibility and stereo-electronic effects [112,113,136,137].
In molecular mechanics simulations, Newtonian mechanics principles are applied to calculate potential energy of a system using parameter set specific for a class of compounds under study (force field). Particular features of carbohydrate moiety, e.g., ring puckering, rotational barriers, hydrogen bonds, must be taken into account to perform precise analysis of molecular behavior in vacuo or in solution [138]. Molecular dynamics simulations consider Newtonian motion equations to observe evolution of a system during a certain timespan. Conformation ensemble generation occurs via calculation of molecular trajectory at given temperature. Accuracy of calculation depends on the employed force field and sufficient conformational sampling. MD simulations are commonly used for interpretation and analysis of the NMR and X-ray observables in the context of carbohydrate 3D structure [139]. Enhanced molecular dynamics sampling technologies, such as replica-exchange MD (REMD) [140,141], Hamiltonian replica-exchange MD (HREX) [142][143][144], multidimensional swarm-enhanced sampling MD (msesMD) [145,146], Gaussian accelerated MD (GAMD) [147,148] have been reported. Density maps or energy maps built for a set of the glycosidic torsion angles (ϕ, ψ, ω) are a typical way to report conformational preferences of a glycan provided by population analysis of its MD trajectory. As a representative example, conformational characteristics of highly flexible branched oligosaccharide Glc 1 Man 9 GlcNAc 2 (GM9) were investigated by explicit-water REMD study and validated using paramagnetism-assisted NMR spectroscopy [149] (Figure 3a,b). Due to the structural complexity of GM9, adequate exploration of conformational space requires long-timescale simulations. Regular MD simulations of similar manno-oligosaccharides were reported to fail reproduction of experimental data [150]. Replica-exchange approach implies running periodically swapped parallel replicas of the system at different temperatures. Ensemble of GM9 conformers sampled by this method was consistent with the NMR observables. Populated areas of density maps built for glycosidic linkages of Glc 1 Man 3 branch of GM9 ( Figure 3c) were close to crystallographic conformations of a linear Glc 1 Man 3 tetrasaccharide (a GM9 determinant recognized by lectins) from PDB. Molecular dynamics simulations consider Newtonian motion equations to observe evolution of a system during a certain timespan. Conformation ensemble generation occurs via calculation of molecular trajectory at given temperature. Accuracy of calculation depends on the employed force field and sufficient conformational sampling. MD simulations are commonly used for interpretation and analysis of the NMR and X-ray observables in the context of carbohydrate 3D structure [139]. Enhanced molecular dynamics sampling technologies, such as replica-exchange MD (REMD) [140,141], Hamiltonian replica-exchange MD (HREX) [142][143][144], multidimensional swarm-enhanced sampling MD (msesMD) [145,146], Gaussian accelerated MD (GAMD) [147,148] have been reported. Density maps or energy maps built for a set of the glycosidic torsion angles (φ, ψ, ω) are a typical way to report conformational preferences of a glycan provided by population analysis of its MD trajectory. As a representative example, conformational characteristics of highly flexible branched oligosaccharide Glc1Man9GlcNAc2 (GM9) were investigated by explicit-water REMD study and validated using paramagnetism-assisted NMR spectroscopy [149] (Figure 3a,b). Due to the structural complexity of GM9, adequate exploration of conformational space requires long-timescale simulations. Regular MD simulations of similar manno-oligosaccharides were reported to fail reproduction of experimental data [150]. Replica-exchange approach implies running periodically swapped parallel replicas of the system at different temperatures. Ensemble of GM9 conformers sampled by this method was consistent with the NMR observables. Populated areas of density maps built for glycosidic linkages of Glc1Man3 branch of GM9 ( Figure 3c) were close to crystallographic conformations of a linear Glc1Man3 tetrasaccharide (a GM9 determinant recognized by lectins) from PDB.  Force field (or potential energy function) is represented by atomistic parameter set obtained for a considered compound class. Potential energy value can be calculated as a sum of interaction potentials for bonded (covalent bond stretching, angle bending, proper torsions) and non-bonded (electrostatic and van der Waals interactions) terms, and can include other terms (e.g., improper torsions, solvation, hydrogen bonds [151], nonconventional hydrogen bonds [101], for protein-carbohydrate complexes-CH-π stacking interactions [152][153][154][155], CHI Carbohydrate Intrinsic (CHI) energy contribution [156,157]).
Several force fields developed for general representation of wide range of organic compounds (e.g., Allinger's MM2, MM3, MM4) can be applied to carbohydrate 3D modeling [151,158,159]. Of them, despite being a universal force field, MM3 [160,161] still exhibits good performance on glycans [162][163][164] (Reviews), [165,166] (exemplary Articles). However, a number of force fields specially tuned for carbohydrates have been developed (Figure 4). In Supplementary Table S2, we provided citation metrics of articles reporting carbohydrate-dedicated and selected general force fields that could be applied to carbohydrate structure modeling. Unfortunately, usage of general force fields could not be adequately estimated via number of citations. Automated full-text analysis and retrieval of data, needed to confirm employment of force fields for carbohydrate molecules, is beyond the scope of this review. Nevertheless, statistical data obtained for general force fields supported in popular MD software packages (e.g., AMBER, CHARMM, GROMACS, Tinker) shows obsolescence of modern force fields above Allinger's ones, and MM3 in particular (see more detailed data, references to original publications and absolute values in Supplementary Table S2). Force field (or potential energy function) is represented by atomistic parameter set obtained for a considered compound class. Potential energy value can be calculated as a sum of interaction potentials for bonded (covalent bond stretching, angle bending, proper torsions) and non-bonded (electrostatic and van der Waals interactions) terms, and can include other terms (e.g., improper torsions, solvation, hydrogen bonds [151], nonconventional hydrogen bonds [101], for protein-carbohydrate complexes-CH-π stacking interactions [152][153][154][155], CHI Carbohydrate Intrinsic (CHI) energy contribution [156,157]).
Several force fields developed for general representation of wide range of organic compounds (e.g., Allinger's MM2, MM3, MM4) can be applied to carbohydrate 3D modeling [151,158,159]. Of them, despite being a universal force field, MM3 [160,161] still exhibits good performance on glycans [162][163][164] (Reviews), [165,166] (exemplary Articles). However, a number of force fields specially tuned for carbohydrates have been developed (Figure 4). In Supplementary Table S2, we provided citation metrics of articles reporting carbohydrate-dedicated and selected general force fields that could be applied to carbohydrate structure modeling. Unfortunately, usage of general force fields could not be adequately estimated via number of citations. Automated full-text analysis and retrieval of data, needed to confirm employment of force fields for carbohydrate molecules, is beyond the scope of this review. Nevertheless, statistical data obtained for general force fields supported in popular MD software packages (e.g., AMBER, CHARMM, GROMACS, Tinker) shows obsolescence of modern force fields above Allinger's ones, and MM3 in particular (see more detailed data, references to original publications and absolute values in Supplementary Table S2). Detailed comparisons of all-chemical and dedicated force fields in a context of glycan modeling have been published [114,139,151,167]. CHARMM36, GLYCAM06, GROMOS and OPLS-AA-SEI were reported as commonly used force fields for handling carbohydrate or glycoconjugate molecules. More details are provided in Figure 5.
CHARMM36 force field with modern carbohydrate parameter table (C36 [168]) was derived from CHARMM all-atom biomolecular force field [169,170]. Currently, CHARMM36 parameterization features include monosaccharides in furanose [171] and pyranose [172] forms, glycosidic linkages between monosaccharides [171,173], complex carbohydrates and glycoproteins Detailed comparisons of all-chemical and dedicated force fields in a context of glycan modeling have been published [114,139,151,167]. CHARMM36, GLYCAM06, GROMOS and OPLS-AA-SEI were reported as commonly used force fields for handling carbohydrate or glycoconjugate molecules. More details are provided in Figure 5.
GROMOS represents a broad family of carbohydrate force fields. Having been a classic one since 2005, GROMOS 45A4 [183] parameter set is used for explicit-solvent simulation of hexopyranose-based saccharides. In the recent decade, several parameters of 45A4 were optimized in GROMOS 56ACARBO [184] including lipopolysaccharides [185]. GROMOS 53A6GLYC was improved for explicit-solvent simulations [186] and extended for glycoproteins [187]. GROMOS 56ACARBO_R [188] was designed to improve description of ring conformational equilibria in hexopyranose-based saccharide chains as compared to the previous 56ACARBO version. Another modification of 56ACARBO named 56ACARBO_CHT [189] was developed for chitosan and its derivatives. Recently, extensions of GROMOS 56ACARBO/CARBO_R parameter set were adapted towards charged, protonated and esterified urinates [190] and furanose-based carbohydrates [191]. GROMOS96 43A1 was reported to have good performance on glycan structure simulation in glycoproteins [192,193].
OPLS-AA scaling of electrostatic interactions (SEI) force field [194] consists of improved parameters for conformational changes associated with φ-ψ dihedrals combined with enhanced accuracy of QM relative energy calculation in carbohydrate molecules refined for OPLS-AA biomolecular force field [195,196]. Additionally OPLS force field was improved for explicit-water simulations [197].
Rapidly developing CHARMM Drude polarizable force field for carbohydrates based on classical Drude oscillator has to be mentioned. Parameter sets obtained for hexapyranoses [198] and their aqueous solutions [199], aldopentafuranoses and methyl-aldopentafuranosides [200], GLYCAM06 force field is compatible with carbohydrates of all ring sizes and conformations for both mono-and oligosaccharides built of residues common for mammalian glycans, such as widespread aldoses, N-acetylated amino-sugars, sialic, glucuronic and galacturonic acids [177]. Parameter set was extended to non-carbohydrate moieties such as lipids [178], glycolipids [179,180], lipopolysaccharides [181], proteins and nucleic acids. Parameterization of GLYCAM06 for glycosaminoglycans was reported [182].
GROMOS represents a broad family of carbohydrate force fields. Having been a classic one since 2005, GROMOS 45A4 [183] parameter set is used for explicit-solvent simulation of hexopyranose-based saccharides. In the recent decade, several parameters of 45A4 were optimized in GROMOS 56A CARBO [184] including lipopolysaccharides [185]. GROMOS 53A6 GLYC was improved for explicit-solvent simulations [186] and extended for glycoproteins [187]. GROMOS 56A CARBO_R [188] was designed to improve description of ring conformational equilibria in hexopyranose-based saccharide chains as compared to the previous 56A CARBO version. Another modification of 56A CARBO named 56A CARBO_CHT [189] was developed for chitosan and its derivatives. Recently, extensions of GROMOS 56A CARBO / CARBO_R parameter set were adapted towards charged, protonated and esterified urinates [190] and furanose-based carbohydrates [191]. GROMOS96 43A1 was reported to have good performance on glycan structure simulation in glycoproteins [192,193].
OPLS-AA scaling of electrostatic interactions (SEI) force field [194] consists of improved parameters for conformational changes associated with ϕ-ψ dihedrals combined with enhanced accuracy of QM relative energy calculation in carbohydrate molecules refined for OPLS-AA biomolecular force field [195,196]. Additionally OPLS force field was improved for explicit-water simulations [197].
MARTINI coarse-grained (CG) force field [204] can be used alternatively to all-atom (AA) level simulations with advantage of modeling large carbohydrate systems (solutions of oligo-, polysaccharides, glycolipids [205][206][207]) on a long time scale at reasonable computational cost. Blocked ring puckering (only 4 C 1 conformation is allowed) and restrictions on the anomeric effect and glycosidic bond flexibility cumulatively provide reduction of available degrees of freedom. Another CG model PITOMBA [208] for carbohydrate simulations was developed based on GROMOS 53A6 GLYC force field.

Model Building and Analysis Tools
Currently available web-based tools along with standalone software packages were developed to facilitate work with carbohydrate 3D structure. Versatile online services for in silico molecular modeling allow users to start from a user-friendly structure input, and to automatize further procedures (see Table 2 for references). GLYCAM-Web provides tools for glycan structure prediction, glycosylated protein 3D model generation, grafting and docking. CHARMM-GUI modeler offers options for 3D structure generation and modeling of glycans including N-/O-glycoproteins and glycolipids [226,227]. Biological membranes can be simulated with the assistance of CHARMM-GUI Membrane Builder (by combining features of LPS and glycolipid CHARMM-GUI Modelers) and GNOMM (a tool for building lipopolysaccharide-rich membranes). Noteworthy standalone programming frameworks for structure modeling are Glycosylated (modeling of glycans, glycoproteins and glycosylation) and Rosetta Carbohydrate (loop modeling [228], glycan-to-protein docking, and glycosylation modeling).
To build diverse saccharide 3D models online, one can use such tools as REStLESS and SWEET-II. doGlycans standalone framework can be used for preparation of the atomistic models of glycopolymers, glycolipids and glycoproteins. Complex polysaccharide 3D models can be generated via POLYS and CarbBuilder. Another special class of polysaccharide builders is dedicated to glycosaminoglycans (GAGs) which can be accessed using POLYS GAG-builder and GLYCAM-Web GAG-builder. Recently, another approach for building GAG molecules was reported [229] (exemplary data pipeline only). Unfortunately, application scope of the majority of the existing structure building and modeling services is limited to rigidly defined set of supported sugar residues, and lacks non-carbohydrate moiety support.
Tools for locating and identification of a carbohydrate moiety (e.g., pdb2linucs, GlyFinder, Glycan Reader) are useful for the atomic coordinate analysis and extraction of glycoproteins and protein-carbohydrate complexes deposited in Protein Data Bank (PDB). Automated molecular geometry processing facilities can be accessed via glycoinformatics tools designed for conformational data analysis (CAT, BFMP), nuclear Overhauser effect (NOE) calculation (MD2NOE, Distance Mapping) and 3D structural data analysis related to glycan moieties from PDB (GlyTorsion, GlyVicinity, GS-align).
In Table 2, we summarized freely available tools for generation and processing carbohydrate 3D structural data and divided them into eight categories of application.    a Web-service implies an automated pipeline for running a specific software (e.g., molecular modeling, structure building, carbohydrate coordinate extraction, format conversion). It results in 3D structural data output starting from primary structure input or atomic coordinate file upload. Web-tool is employed for 3D structural data processing and analysis without 3D structural data output; it is a simpler application designed primarily for statistics and visualization. Other types are self-explanatory.

Experimental Data Validation
Vast variety of methods provide information about 3D structure of individual glycans and glycan moieties of glycoproteins and protein-carbohydrate complexes ( Figure 6) [285,286]. The following approaches are most utilized for 3D structural data validation [287][288][289]: • Ccombination of carbohydrate simulated geometry data with X-ray crystallographic data analysis [225,290]; • Analysis of inter-glycosidic NMR spin couplings, which depend on glycosidic bond torsions [114,291,292]

Experimental Data Validation
Vast variety of methods provide information about 3D structure of individual glycans and glycan moieties of glycoproteins and protein-carbohydrate complexes ( Figure 6) [285,286]. The following approaches are most utilized for 3D structural data validation [287][288][289]: • Ccombination of carbohydrate simulated geometry data with X-ray crystallographic data analysis [225,290]; • Analysis of inter-glycosidic NMR spin couplings, which depend on glycosidic bond torsions [114,291,292]; • Deriving nuclear Overhauser effects (NOEs) from relative populations of the interatomic distances, with subsequent comparison to the experimental NOEs in solution [99,293,294]; • Purely informatic detection of errors, such as incompatible atomic coordinates originating from incorrect processing or simulation [295][296][297][298]; • Simulation by other computational methods at higher levels of theory [102,103,105,108]. Unfortunately, most of the data obtained on the basis of crystallographic experiments can dramatically differ from glycan conformations in solution or have poor resolution which needs further adjustment [299,300]. Moreover, not all of the objects of interest can be obtained as a single crystal. Electron cryo-microscopy gains popularity for carbohydrate 3D structural research [301], however, this method requires additional refinement procedures due to resolution restrictions of the obtained density Unfortunately, most of the data obtained on the basis of crystallographic experiments can dramatically differ from glycan conformations in solution or have poor resolution which needs further adjustment [299,300]. Moreover, not all of the objects of interest can be obtained as a single crystal.
Electron cryo-microscopy gains popularity for carbohydrate 3D structural research [301], however, this method requires additional refinement procedures due to resolution restrictions of the obtained density maps [302][303][304]. Recently, cryo-EM data were used for the refinement of SARS-CoV-2 spike glycoprotein stucture using Privateer (see Table 3 for references) software [305,306].
Van Beusekom et al., illustrated [295] quality improvement of the PDB glycan structure model with incorrect (1-6)-linked fucose annotation, poor fit to the electron density, and missing (1-3)-linked fucose (Figure 7a) with the help of PDB-REDO ( Figure 7b) and CARP (Figure 7d) tools (see Table 3 for references). Structure model obtained by PDB-REDO treatment was further manually inspected ( Figure 7c): corrections were made for acetylamino group geometry, distorted (1-6)-linked fucose ring conformation, and (1-3)-linked fucose residue was added. Despite successful automated resolution of residue annotation problem and poor electron density refinement, complete revision could not be achieved without manual intervention. maps [302][303][304]. Recently, cryo-EM data were used for the refinement of SARS-CoV-2 spike glycoprotein stucture using Privateer (see Table 3 for references) software [305,306]. Van Beusekom et al., illustrated [295] quality improvement of the PDB glycan structure model with incorrect (1-6)-linked fucose annotation, poor fit to the electron density, and missing (1-3)-linked fucose (Figure 7a) with the help of PDB-REDO ( Figure 7b) and CARP (Figure 7d) tools (see Table 3 for references). Structure model obtained by PDB-REDO treatment was further manually inspected (Figure 7c): corrections were made for acetylamino group geometry, distorted (1-6)-linked fucose ring conformation, and (1-3)-linked fucose residue was added. Despite successful automated resolution of residue annotation problem and poor electron density refinement, complete revision could not be achieved without manual intervention. NMR techniques are a powerful approach to investigate conformational and dynamic behavior of carbohydrate moieties in biomolecules [307][308][309][310]. However, the nature of NOE enhancement factor has been hampering obtaining the sufficient number of distance restrains [99]. In the case of saccharides with their multiple rotatable bonds, the stable 3D structure was difficult to define, making molecular modeling essential for this class of compounds. Adjustment of experimental conditions helped to overcome the mentioned limitation and to reproduce crystal structures of oligosaccharides by modeling with NOE-derived distance restraints [100,101]. NMR techniques are a powerful approach to investigate conformational and dynamic behavior of carbohydrate moieties in biomolecules [307][308][309][310]. However, the nature of NOE enhancement factor has been hampering obtaining the sufficient number of distance restrains [99]. In the case of saccharides with their multiple rotatable bonds, the stable 3D structure was difficult to define, making molecular modeling essential for this class of compounds. Adjustment of experimental conditions helped to overcome the mentioned limitation and to reproduce crystal structures of oligosaccharides by modeling with NOE-derived distance restraints [100,101].
Since there is no direct way to derive detailed three-dimensional representation from the observed NOE intensities, additional molecular modeling protocols are required to establish comprehensive view of conformational space at the atomic level [311][312][313]. Frank et al., demonstrated conformation filtering based on the observed NOE obtained by molecular dynamics in explicit solvent [314]. As a representative example, Figure 8 depicts 1 H-1 H spatial contacts and conformation selection criteria illustrated by Moraxella catarrhalis lgt2∆ bacterium heptasaccharide, which adopts an unusual conformation. Since there is no direct way to derive detailed three-dimensional representation from the observed NOE intensities, additional molecular modeling protocols are required to establish comprehensive view of conformational space at the atomic level [311][312][313]. Frank et al., demonstrated conformation filtering based on the observed NOE obtained by molecular dynamics in explicit solvent [314]. As a representative example, Figure 8 depicts 1 H-1 H spatial contacts and conformation selection criteria illustrated by Moraxella catarrhalis lgt2Δ bacterium heptasaccharide, which adopts an unusual conformation.

Protein Data Bank and Its Validation
Protein Data Bank (PDB) [315] and Cambridge Structural Database (CSD) [316] are historically considered the main repositories of experimentally determined carbohydrate three-dimensional structures. CSD is reported to deposit over 4000 crystal structures of oligosaccharides [93]. Unlike Cambridge Structural Database, Protein Data Bank provides open access to the entire structural archive. Carbohydrate moieties deposited in PDB are usually represented as covalently bound to protein or imply non-covalently bound protein-carbohydrate complex formation [302]. According to recent reports, as at November 18, 2019 Protein Data Bank contained ~13500 carbohydrate structures representing ~9.4% of total database records [317].
Another issue of concern related to Protein Data Bank is large proportion of errors in deposited coordinates, leading to requirement for a thorough checkup and development of data remediation services [319]. Commonly occurring problems associated with nomenclature, poor glycan geometry, linkage errors, missing or surplus atoms can seriously decline the quality of the obtained 3D structures [300,320,321]. Using Privateer software, it was discovered [299], [301] that PDB deposits significant number of erroneous N-glycosylated structures with pyranose ring distortions, considering preferred adoption of 4 C1 conformation for D-sugars and 1 C4 conformation for L-sugars ( Figure 9). In most cases, poor electron density of carbohydrate moiety results in anomalous high-energy pyranose ring conformations (envelopes, half-chairs, boats, skew boats, etc.). To obtain a reasonable structure model, experimental data refinement programs should be applied to derive geometric restraints for sugar monomers. Notably, despite a cryo-EM method has a resolution limit

Protein Data Bank and Its Validation
Protein Data Bank (PDB) [315] and Cambridge Structural Database (CSD) [316] are historically considered the main repositories of experimentally determined carbohydrate three-dimensional structures. CSD is reported to deposit over 4000 crystal structures of oligosaccharides [93]. Unlike Cambridge Structural Database, Protein Data Bank provides open access to the entire structural archive. Carbohydrate moieties deposited in PDB are usually represented as covalently bound to protein or imply non-covalently bound protein-carbohydrate complex formation [302]. According to recent reports, as at November 18, 2019 Protein Data Bank contained~13500 carbohydrate structures representing~9.4% of total database records [317].
Another issue of concern related to Protein Data Bank is large proportion of errors in deposited coordinates, leading to requirement for a thorough checkup and development of data remediation services [319]. Commonly occurring problems associated with nomenclature, poor glycan geometry, linkage errors, missing or surplus atoms can seriously decline the quality of the obtained 3D structures [300,320,321]. Using Privateer software, it was discovered [299], [301] that PDB deposits significant number of erroneous N-glycosylated structures with pyranose ring distortions, considering preferred adoption of 4 C 1 conformation for D-sugars and 1 C 4 conformation for L-sugars ( Figure 9). In most cases, poor electron density of carbohydrate moiety results in anomalous high-energy pyranose ring conformations (envelopes, half-chairs, boats, skew boats, etc.). To obtain a reasonable structure model, experimental data refinement programs should be applied to derive geometric restraints for sugar monomers. Notably, despite a cryo-EM method has a resolution limit disadvantage, observed results indicate larger content of atypical conformations solved by X-ray crystallography, as compared to cryo-EM data.
disadvantage, observed results indicate larger content of atypical conformations solved by X-ray crystallography, as compared to cryo-EM data.
Exceptions for the relevancy of high-energy conformations were found in complexes involving carbohydrate-active enzymes, which force pyranose ring distortion enabling catalytic transformation of a carbohydrate substrate via transition states (e.g., glycosydic bond hydrolysis) [322]. Fushinobu has performed glycosidic torsion analysis for a set of PDB entries of crystal structure complexes bound to ligands bearing lacto-N-biose I (LNB, both α-and β-anomers) disaccharide unit presented in type-1 antigens. The study was supported by GlycoMaps DB (see Table 1 for references) [323]. Obtained φ-ψ data for LNBs bound to various proteins was plotted against corresponding free energy maps. Distortion of the energetically favored ring conformation strongly depended on substrate catalytic and recognition mechanisms. To date, existing tools for carbohydrate structural error detection and correction in PDB files (Table 3) cannot be used directly as an integral part of Protein Data Bank. Nevertheless, initiative aimed at improvement of quality at wwPDB was carried out via collaboration with glycoscience community in July 2020 [324] (https://www.wwpdb.org/documentation/carbohydrate-remediation). It includes data annotation and validation of carbohydrate-containing records. Exceptions for the relevancy of high-energy conformations were found in complexes involving carbohydrate-active enzymes, which force pyranose ring distortion enabling catalytic transformation of a carbohydrate substrate via transition states (e.g., glycosydic bond hydrolysis) [322]. Fushinobu has performed glycosidic torsion analysis for a set of PDB entries of crystal structure complexes bound to ligands bearing lacto-N-biose I (LNB, both αand β-anomers) disaccharide unit presented in type-1 antigens. The study was supported by GlycoMaps DB (see Table 1 for references) [323]. Obtained ϕ-ψ data for LNBs bound to various proteins was plotted against corresponding free energy maps. Distortion of the energetically favored ring conformation strongly depended on substrate catalytic and recognition mechanisms.
To date, existing tools for carbohydrate structural error detection and correction in PDB files (Table 3) cannot be used directly as an integral part of Protein Data Bank. Nevertheless, initiative aimed at improvement of quality at wwPDB was carried out via collaboration with glycoscience community in July 2020 [324] (https://www.wwpdb.org/documentation/carbohydrate-remediation). It includes data annotation and validation of carbohydrate-containing records.
Proportion of carbohydrate-containing structures in PDB has been recently reported in [302]. Figure 10 presents our analysis of data published in the framework of Protein Data Bank carbohydrate remediation project. 14117 PDB entries from carbohydrate remediation list (https://cdn.rcsb.org/ wwpdb/docs/documentation/carbohydrateRemediation/PDB_carbohydrate_list.list) were sorted by release year and plotted against the growth of PDB structures released annually (https://www.rcsb.org/ stats/growth/growth-released-structures) (as on August 10, 2020; 167,327 PDB entries were available). Obtained results indicated that~8.4% of PDB records contained a carbohydrate moiety. Additionally, each PDBx/mmCIF file corresponding to PDB ID from carbohydrate remediation list was parsed to reveal the presence of N-or O-glycosylation site annotations, which resulted in~4.2% (7076 N-glycosylated entries) and 0.2% (362 O-glycosylated entries) of total database records. A few S-and C-glycans (24 entries in total) were neglected.
Statistics on glycans in Protein Data Bank was reported [259,302,317,325], as well as tools that could facilitate collection of statistical data (Glycan Reader [70,260,261], GlyFinder [258], pdb2linucs and pdb-care [326] Proportion of carbohydrate-containing structures in PDB has been recently reported in [302]. Figure 10 presents our analysis of data published in the framework of Protein Data Bank carbohydrate remediation project. 14117 PDB entries from carbohydrate remediation list (https://cdn.rcsb.org/wwpdb/docs/documentation/carbohydrateRemediation/PDB_carbohydrate_lis t.list) were sorted by release year and plotted against the growth of PDB structures released annually (https://www.rcsb.org/stats/growth/growth-released-structures) (as on August 10, 2020; 167,327 PDB entries were available). Obtained results indicated that ~8.4% of PDB records contained a carbohydrate moiety. Additionally, each PDBx/mmCIF file corresponding to PDB ID from carbohydrate remediation list was parsed to reveal the presence of N-or O-glycosylation site annotations, which resulted in ~4.2% (7076 N-glycosylated entries) and 0.2% (362 O-glycosylated entries) of total database records. A few S-and C-glycans (24 entries in total) were neglected.

3D Structure Input and Visualization
Carbohydrate structure visualization in publications and computer interfaces is extremely important in terms of perception universality, unambiguity, and machine-readability. Hence, carbohydrate input [335][336][337] and visualization [338,339] tools are actively developing. Feature comparison of glycan sketchers, builders and viewers (occasionally including 3D ones) was reported in a recently published review [340]. In our review, we gave more emphasis to 3D visualization approaches.
Being informative to represent glycan primary structure, most of graphical input tools such as GlycanBuilder [341], DrawRINGS [342], SugarSketcher [343], DrawGlycan-SNFG [344,345] and GlycoGlyph [337] are inappropriate for obtaining 3D structural models and their visualization due to lack of underlying modeling and insufficient data conversion functionality.
At present, glycan 3D molecular models can be built in user-friendly software allowing constructing glycans from individual saccharide components. Free web-tools, such as GLYCAM-Web, CHARMM-GUI, POLYS glycan builder, GAG-builder, SWEET-II should be noted (more references are listed in Table 2). A few commercial molecular modeling software is equipped with special plugins for glycan 3D structure building based on a list of predefined monosaccharide templates, e.g., Sugar Builder tool in HyperChem (http://www.hyper.com/?tabid=360) software [346] or Azahar [235] plugin in PyMol package (Schrödinger software) (https://pymol.org/2/) [347].
NGL viewer was developed mainly for convenient protein macromolecule structure processing. It allows only ball-stick representation for small molecules or non-peptide fragments, such as saccharide residues. LiteMol (and its successor, Mol*) viewer could be applied for the visualization of an arbitrary glycan with facility of highlighting carbohydrate fragments or displaying specific interactions in protein-carbohydrate complex structure. Due to these features, it was implemented in multiple carbohydrate structure databases (e.g., CSDB, Glyco3D, MatrixDB, and EPS-DB).
Despite the absence of the experimental 3D structural data, a number of carbohydrate databases have opportunity to simulate 3D atomic coordinates for deposited or inputted compounds from primary structure owing to tools developed by glycoinformatics community. CSDB (REStLESS API [265]), GLYCOSCIENCES.de (SWEET-II [264,350]) and GLYCAM-Web (http://glycam.org/) portals make it possible to generate 3D atomic coordinates recorded in PDB (all) and MOL (CSDB) file formats. POLYS developed by Glyco3D project enables the construction of polysaccharides in PDB format; it was introduced in MatrixDB and EPS-DB databases. More details are provided in Table 2.
Atomic coordinates and all-atom molecular models have not been popular in publications due to a lack of human readability. First attempts [358,359] of prof. Kuttel et al., to visualize carbohydrate molecules in an efficient and simple way were made by developing PaperChain and Twister graphic algorithms as a part of CarboHydra [360] and Visual Molecular Dynamics [361] software packages. Later, group of prof. Pé rez suggested to restrict visualized molecule to skeletal atoms via conditional cycle plane coloring in accordance with the color code adopted in SNFG [338] visualization scheme (SweetUnityMol software [362], Figure 11a). Another UnityMol visualization approach called Umbrella Visualization [363,364] was tailored for N-glycan structures. Azahar plugin for PyMol [235] affords cartoon models with polygons and rods. Several solutions for convenient visualization came up with the development of SNFG notation [339]. Thus, group of prof. Woods proposed to combine molecular structure elements with 3D SNFG icons (Figure 12a). Such convenient visualization technique was integrated in LiteMol (Figure 12b) [365] and Mol* (Figure 12c) [324,356]. 3D SNFG visualization plugins are available via Visual Molecular Dynamics platform [366] (http://glycam.org/docs/othertoolsservice/ 2016/06/03/3d-symbol-nomenclature-for-glycans-3d-snfg/) and UCSF Chimera [367] visualization software Tangram plugin (https://github.com/insilichem/tangram_snfg). Designed as part of CCP4mg [368] molecular-graphics software, Glycoblocks [369] representation of monosacchrides uses shapes and colors, identical to those in SNFG (Figure 12d). Available as PyMol plugin developed by Widmalm group (http://www.organ.su.se/gw/doku.php?id=3dcfg), 3D-CFG representation [370] based on CFG notation [371] (often referred to as a predecessor of SNFG) should also be noted as earlier approach to interpretation of carbohydrate 3D structures based on a symbol library.

Conclusions
Development of glycoinformatics resources makes great impact on treating enormous masses of data sets produced by glyco-related research. Tools for carbohydrate 3D structural information retrieval provide a framework for experimental and computational data quality validation. Data sources based on conformational ensemble generation and analysis assist structure-function and structure-activity relationship prediction of biologically relevant bioglycans and glycoconjugates. In this review, we have summarized existing facilities on working with glycan spatial features that can provide harmonious network of structural databases, web-services, tools and standalone software

Conclusions
Development of glycoinformatics resources makes great impact on treating enormous masses of data sets produced by glyco-related research. Tools for carbohydrate 3D structural information retrieval provide a framework for experimental and computational data quality validation. Data sources based on conformational ensemble generation and analysis assist structure-function and structure-activity relationship prediction of biologically relevant bioglycans and glycoconjugates. In this review, we have summarized existing facilities on working with glycan spatial features that can provide harmonious network of structural databases, web-services, tools and standalone software for modeling and processing structural data. Further advances in this field will help building better understanding of glycan participation in biological processes and supply glycoscience community with user-friendly access to voluminous data collections. Funding: The work with carbohydrate molecular modeling and PDB data was funded by Russian Foundation for Basic Research grant 18-04-00094. The work with structural databases, glycoinformatic tools and visualization was funded by Russian Science Foundation grant 18-14-00098.

Conflicts of Interest:
The authors declare no conflict of interest.