1. Introduction
The human asialoglycoprotein receptor (ASGP-R), also called hepatic lectin, is a C-type (calcium dependent) lectin of hepatocytes that recognizes desialylated glycoproteins for endocytosis and lysosomal degradation. It has been largely studied in recent years [
1–
10] due to its possible role in a wide range of practical applications in human health.
The ASGP-R is an integral membrane protein constituted by four functional domains: a cytosolic N-terminus domain of 40 residues, a single transmembrane domain, a stalk segment of 80 residues and a C-terminal carbohydrate recognition domain (CRD) of 150 amino acid residues. It is able to bind terminal non-reducing galactose and
N-acetyl-galactosamine residues of desialated tri or tetra-antennary N-linked glycans [
1]. The functional form of the human receptor is a noncovalent hetero-oligomer composed of two homologous subunits, generically called H1 and H2 [
2].
Simultaneous expression of both subunits was found to be necessary to generate high affinity binding sites. Optimal ligands are triantennary N-linked glycans which bind with a KD in the nanomolar range. Specificity and affinity of ligand binding are accomplished by the simultaneous interaction of at least three terminal ligand residues with three carbohydrate recognition domains (CRDs) [
1].
Ligand-receptor cross-linking has revealed that two of the galactose moieties belonging to the ligand specifically interact with H1, and the third one with H2. This finding emphasizes the importance of specific geometric requirements for ligand binding in the spatial arrangement of the CRDs within the receptor complex. Yet, the underlying hetero-oligomeric organization of the subunits is still poorly understood [
1].
Simple geometric considerations suggested precise locations of the gal-binding sites of the H1 and H2 subunits; Lodish [
9] suggested that they could be arranged so as to generate a triangle of sides 1.5 nm, 2.2 nm and 2.5 nm, at whose vertices are three gal-binding sites for the triantennary oligosaccharide with the highest affinity; two of them should be located on H1 subunits and one on H2.
Moreover, concerning the interactions with the H1 subunits, experimental assays suggested that the two groups of the ligand interact with two different galactose binding sites on two different H1 subunits [
11].
These findings infer a few relatively different hypotheses about the 3D arrangement of subunits in the ASGP-R oligomers that give rise to the receptor functional form. In this work we propose a step-wise procedure for building a minimum assessed oligomeric structure, i.e., a H1-H1-H2 trimer. The model obtained proved capable of explaining experimental observations reported in literature and could be used for predictive purposes.
Indeed, the ultimate target of the study is to contribute to increasing the knowledge, at a molecular level, of interactions between the human ASGP-R and carbohydrate ligands with regard to applications in the field of hepatic tissue engineering. In this perspective, hepatic cells seeded on a natural biodegradable carbohydrate scaffold will grow more successfully if the characteristics of the carbohydrate scaffold itself are optimal for scaffold-cells interactions.
2. Results and Discussion
2.1. Starting Structures
2.1.1. H1 Subunit
In the initial structure for the H1 CRD, taken from PDB (ID: 1DV8, 2.30 Å of resolution), the crystallographic water molecules were removed, except those forming coordination bonds with the Ca
2+ ions in the binding sites. These water molecules are important since they are replaced by the oxygen atoms of particular hydroxyl groups of the sugar molecule upon binding. The 3D model was subjected to a check of the overall structure and hydrogen atoms were added on the web server MolProbity [
12].
The program added H atoms to the initial model only containing heavy atoms, and detected some residues to be flipped (HIS202, ASN208, ASN217, GLN269). The structure was then submitted to geometric analysis. Some warnings were evidenced, the most severe (highlighted in bold in
Table 1) refers to rotamer outliers, others, less severe (highlighted in italic), refer to Ramachandran favored and to MolProbity scores. In order to heed these warnings and to remove bad contacts due to the added hydrogen atoms, the structure was submitted to energy minimization by applying the cff91 forcefield implemented in the Discover program (within the InsightII package), with 100 iterations of steepest descent and conjugate gradient until a drms value of 0.001 kcal/Å
2 was reached.
The minimized structure was submitted to a further geometry check by Molprobity revealing an almost complete resolution of the initial troubles (see
Table 1), the warning about rotamers being solved and other indices generally improved. The minimized 1dv8 structure (1dv8_min) was used in the subsequent step of molecular modeling.
2.1.2. H2 CRD Modeling
3D structure of the CRD belonging to the H2 subunit was obtained (within swisspdbviewer) thanks to its homology with the H1 CRD previously optimized (1dv8_min), and taken as template structure. The homology based modeling protocol relied on a sequence alignment, obtained from the web server Clustalw (as shown in
Figure 1), where only the CRD regions were considered, taken respectively from P07307 (human H2 ASGP-R) and P07306 (human H1 ASGP-R) sequences. The alignment score revealed a sequence similarity of about 65%, which ensured that highly reliable models for H2 CRD could be obtained.
The row sequence of CRD of H2 subunit was loaded in swisspdbviewer together the 3D structure of the optimized CRD of H1 (1dv8_min). The alignment, shown in
Figure 1, enabled automatic building of a model for H2 CRD, which was then refined on the swissmodel server.
Thanks to the high similarity between the H1 and H2 CRD sequences, the three binding sites (
Table 2) identified in the H1 CRD [
10] are reasonably conserved in the H2 CRD.
In particular, inspection of the alignment reported in
Figure 1, shows that the residues involved in the coordination bonds of the three H1 CRD binding sites are perfectly conserved in H2 CRD with the exception of Asp 242 (belonging to sugar binding site 1) that in H2 CRD is substituted by an Asn residue, which actually possesses very similar chemical properties.
Moreover, the obtained H2 CRD structure was submitted to the web server Q-site Finder [
13], a new method for ligand binding site prediction. It works by binding hydrophobic (CH3) probes to the protein, and finding clusters of probes with the most favorable binding energy. Such clusters are ranked by their likelihood of being a binding site, in accordance with the sum total binding energies for each cluster. The results showed that three of the sites predicted by q-site finder (A, B and C in
Table 3) include the same residues (highlighted in bold in
Table 3), which belong to the three Ca
2+ binding sites in the H1 CRD.
After ensuring that, in the predicted binding sites of the H2 CRD model, residues corresponding to the ones involved in coordination bonds with Ca2+ ions (in the H1 CRD) were included, three Ca2+ ions were added in the H2 CRD model, so that the classical coordination geometry of the Ca2+ ion with the O atoms of the relevant residues was retained. Moreover, two and three waters molecules were added in sugar binding site 2 and sugar binding sites 1 and 3 respectively, in order to make the coordination geometry of the Ca2+ ions to be complete.
The resulting model was then checked on the MolProbity server. Some warnings were found in regard to Ramachandran checks, bad angles and several clashscores. The new model was then submitted to energy minimization, in order to reduce such structural bugs, following a protocol analogous to the ones previously mentioned (100 iteration of steepest descent and then conjugate gradient until a drms value of 0.001 kcal/Å
2 was reached). In this last case, atoms of the protein backbone and Ca
2+ ions were allowed to move during energy minimization, while only the distances between each Ca
2+ ions and their coordinated O atoms were allowed to change in the 3.10–2.10 Å range, in order to retain the coordination bonds. The minimized structure was further submitted to a geometry check on the MolProbity server, revealing a significant improvement compared to the model initially built. The most severe warnings are highlighted in bold (in
Table 4) while other less severe ones are highlighted in italic.
The minimized H2 CRD structure (H2_min) obtained by homology from H1 CRD (1dv8_min) was used in the following step of molecular modeling.
2.1.3. Ligands
As previously mentioned, the ligands analyzed in this work were taken from an article by Lodish
et al. [
9]. They are reported in
Figure 2 with new names, used here for brevity purposes.
A photoaffinity labeling study carried out in rat hepatocytes [
11] had shown that a highly ordered binding mode occurs between all three lectin subunits and the three branches of tri-antennary ligands. In particular, for the Lod1a ligand, it was found that the Gal 1 and Gal 2 units bind to the H1 CRD, while the Gal 3 unit binds to the H2 CRD (see
Figure 2). Furthermore it has been observed that Gal 1 and Gal 2 bind to different binding site on the two H1 CRDs; Gal 1 should bind to the highest affinity site (site 2) while Gal 2 should bind to another site close to the first one (site 1).
Energy minimization, ligand docking and induced fit studies, which involved the Lod1a ligand, were performed with the aim of reliably assessing the relative locations of the three binding sites (see
Table 5).
The three protein subunits were so named on the basis of a trimeric model in complex with the Lod1a ligand. The remaining ligands were similarly allocated in the three binding sites where possible (ligands 1, 3, 4 and 6). Ligands 2 and 5 could give rise to a certain ambiguity. In order to solve this, all plausible starting orientations were sampled and only the best results were considered.
2.2. Step-Wise Construction of the Model for a H1-H1-H2 Trimer
The model for the H1s1-H1s2-H2 trimer (where only CRDs were included) was built by following a step-wise procedure. At first, H1s1-H1s2 dimers were built by using three different rigid-body docking programs. Among the H1s1-H1s2 models obtained, the best one was chosen on the basis of different selection criteria, taking into account which one of them fitted to the best experimental binding data involving the ASGP-R and bi-antennary ligands. A further step of rigid-body docking was then performed, so that a H2 CRD unit was added to the selected H1s1-H1s2 dimeric model. The different hypothetical trimeric models obtained were subjected to validation, by applying criteria analogous to the ones mentioned above. The model capable of fitting the best to the affinity trend, shown by bi- and triantennary ligands toward the ASGP-R, was retained as the most plausible model for the H1s1-H1s2-H2 trimer.
2.2.1. H1s1-H1s2 Dimers
The optimized model for H1 CRD (1DV8_min) was subjected to the three rigid-body docking programs Rosetta, HEX, GRAMM in order to obtain different hypotheses for interactions between two H1 CRD monomers. For each program, at least two or three runs were performed changing the initial relative orientation of the two monomers. For each run, the dimeric models were subjected to visual inspection. Only the ones showing orientation between monomers, which appeared to be compatible with available experimental evidence (distances between binding sites, etc., as described in more detail later), were selected and retained for subsequent steps of the study.
After the above preliminary selection, other aspects were considered in order to select valid models, for example properties of the monomer contact surfaces were considered. In particular for each proposed dimer, surface number connections, H bonds (true or potential) and salt bridges were counted. Structure 12 and 28 of the HEX run n. 2 had the most interesting results with regard to the contact between single monomers (see
Table 6).
2.2.1.1. Energy Minimization
In order to remove bad contacts, the poses found for the H1s1-H1s2 dimer (dimer12 and dimer28) were submitted to a two-step energy minimization protocol carried out by using the cff91 forcefield of Discover, with 100 iteration of steepest descent and conjugate gradient until a drms value of 0.001 kcal/Å2 was reached. Distances between each Ca2+ ion and its coordinated O atoms in the protein monomers or ligands were allowed to only take values in a range of 3.10–2.10 Å, in order to retain the coordination bonds.
2.2.1.2. Validation of H1s1-H1s2 Dimers
The identification of the most plausible dimeric model from the two previously selected (dimer structures 12 and 28 from the HEX run n. 2), was accomplished by estimating which one of them fitted best the experimental binding data involving ASGP-R and three known bi-antennary ligands (Lod4, Lod5, Lod6).
The AutoDock3 program was used for docking the above ligands into the binding areas of dimer12 and dimer28. The three ligands were not properly allocated by the program when dimer12 was considered. It means that no favorable conformations enabling interactions between each ligand and the two Ca
2+ ion of interest were found. Optimal ligand conformations interacting with the two Ca
2+ ions were found, instead, in the case of dimer28. Moreover, docking energy scores obtained for such conformations are in optimal agreement with the experimental affinity data (
Figure 3). Based on the above validation check, the model that most realistically fits the experimental affinity trends is dimer28.
2.2.2. H1s1-H1s2-H2 Trimers
After its selection, dimer28 was subjected to the three rigid-body docking programs Rosetta, HEX, GRAMM in order to obtain different hypotheses for the binding surface between the H1s1-H1s2 dimer itself and one subunit of the previously optimized model for H2 CRD. For each program, at least two or three runs were performed changing the initial relative orientation of the partners. For each run, only the trimers showing orientation between monomers compatible with the available experimental evidence (distances between binding sites, etc.) were selected by visual inspection and retained. Further checks were performed on the models retained by considering properties of the monomer contact surfaces. In particular for each proposed trimer, surface number connections, H bonds (true or potential) and salt bridges were counted (data not shown).
Among the many different hypotheses generated by the above approach, one (suggested by different HEX and GRAAM runs) was selected as the one closest to what was hypothesized by Lodish [
9]. It refers to an oligomeric model where two H1 CRDs (H1s1 and H1s2) and one H2 CRD unit are arranged according to a triangular shape. At its vertice,s three binding sites for galactose moieties of triantennary high affinity oligosaccharides take place. In the selected model the triangular shape hypothesized by Lodish [
9] is constituted by the two H1 CRDs (H1s1 and H1s2) and by the sugar site of H2 CRD, usually referred to as site2, which will be labeled thereafter as H2s2.
2.2.2.1. Refinement of the Starting Hypothetical Trimeric Models
For the energy minimization of the starting trimeric model the protocol described in Section 2.2.1.2. was used. Further conditions were applied during the simulation. The distances between Ca
2+ ions reasonably involved in the binding with carbohydrate ligands for each protein monomer were allowed to change according to values found in the Lodish model [
9] as reported below:
The model for the H1s1-H1s2-H2s2 trimer, after energy minimization, is shown in
Figure 4.
The distances (in Å) of Ca
2+ ions after the minimization are reported in
Table 7.
In such a model, a distance of 1.5 nm between H2 and H1 suggested by Lodish could correspond to the distance between the H2 CRD sugar binding site 2 and the H1 CRD sugar binding site 1; moreover the distance of 2.5 nm between H2 and H1 suggested by Lodish could correspond to the distance between the H2 CRD binding site 2 and the H1 CRD binding site 2 (belonging to the H1 subunit of the trimer).
2.2.2.2. Induced Fit
Possible “induced fit” phenomena were analyzed with the aim of estimating the relevance of conformational transitions of the oligomeric receptor upon binding of carbohydratic ligands. The complexes involving the optimized trimeric model and the seven ligands reported by Lodish [
9] were further subjected to energy minimization following the protocol described before.
Other than applying the already mentioned restrains on Ca
2+ atom, the distances between the Ca
2+ atom of the binding sites (involved in the interaction with the ligands) and the appropriate O atom (belonging to hydroxylic group 3-OH or 4-OH) of the sugars were allowed to change in the same range (3.10–2.10 Å) in order to permit the formation of coordination bonds (displacing the Ca
2+ coordinate water molecules). The picture of one of the optimized complexes is reported in
Figure 5. The structural analysis showed that significant conformational transitions occur in the trimeric model upon ligand binding.
3. Methods
3.1. Starting Structures
The starting structure for H1 was taken from PDB [
16] where the structure of the carbohydrate recognition domain (CRD) of this subunit of the human ASGP-R is available. It was determined via X-ray diffraction at a resolution of 2.30 Å (ID: 1DV8, year: 2000). The corresponding region of the H2 subunit (H2 CRD) was obtained by molecular modeling techniques, based on homology between the two subunits as described in the following sections.
The web server Q-site finder [
13] was used to find protein-ligand binding sites. It is an energy-based method and uses the interaction energy between the protein and a simple van der Waals probe to locate energetically favorable binding sites. Energetically favorable probe sites are clustered according to their spatial proximity and clusters are then ranked according to the sum of interaction energies for sites within each cluster.
Ligands and relevant binding data considered in this work are the ones described by Lodish [
9] (
Figure 2). Their structures were drawn into the Sweet2 web server [
17] where a program, enabling the construction of 3D models of saccharides from their sequences expressed through standard nomenclature, is available.
3.2. Energy Minimizations
All the energy minimizations were carried out in two steps by means of the Discover program (within the InsightII package [
18]), using the cff91 force field with 100 iterations of steepest descent and conjugate gradient until a drms value of 0.001 kcal/Å
2 was reached. The Ca
2+ ions were fixed; moreover, distances between each Ca
2+ ion and its coordinated O atoms in the protein monomers or ligands were allowed to only take values in a range of 3.10–2.10 Å, in order to retain the coordination bonds. Other constrains were imposed for particular cases; they are described in the Results section.
3.3. Homology Modeling
The row sequence of the H2 CRD subunit was loaded in the swisspdbviewer [
19] together with the 3D structure of the optimized CRD of H1. The alignment was adjusted based on the ones obtained from the ClustalW web server [
20]. Then the model of H2 CRD was automatically built and refined by the swiss-model server [
19].
3.4. Rigid-Body Docking
In order to built plausible H1-H1 dimers, two optimized models of H1 CRD were subjected to the three different “rigid-body” docking programs Rosetta [
21], HEX [
22] and GRAMM [
23], so that different hypotheses for the binding interaction between two H1 CRD monomers could be obtained. For each program, at least two or three runs were performed changing the initial relative orientation of the two monomers. For each run, only the dimers showing orientation between monomers compatible with the available experimental evidence [
9] were selected by visual inspection and retained for further steps of the study. After validation (carried out according to what is reported in Section 2.2.1.), a H1-H1 dimer was selected and submitted to other rigid-body docking steps in order to add a CRD model coming from the H2 subunit so that a model for the H1-H1-H2 trimer could be obtained.
3.4.1. Rosetta
The Rosetta package/program was developed at the Baker laboratory of the University of Washington. Its use is free through the web. It works by simultaneous optimization of side-chain conformation and rigid body position of the two docking partners. The former task is performed by a “packing” algorithm, while the latter one is performed by a rigid-body Monte Carlo Minimization (MCM) strategy.
Prior to docking, the sidechains of the native protein are removed and replaced using the Rosetta sidechain packing algorithm to prevent errors in docking due to irregularities (e.g., crystal contacts) in the native protein.
The full atom run can take two forms, depending on one’s confidence in the native structure. Sometimes biochemical and genetic information can be used to localize the binding site to a small region on one or both partners. In this case, one performs a perturbation run, exploring only a small region of space around the suspected binding site. For predictions where there is no biological information about the interface, one usually performs a global search, exploring all the conformational space of both partners [
21].
3.4.2. HEX
The HEX program was developed at the Department of Computing Science, University of Aberdeen (UK). Its use is free through the web. In HEX’s docking calculations, each molecule is modeled by using 3D parametric functions which are exploited to encode surface shape, electrostatic charge and potential distributions. The parametric functions are based on expansions of real orthogonal spherical polar basis functions. Essentially, this approach allows each property to be represented by a vector of coefficients. HEX’s surface shape representation uses a novel 3D surface skin model of protein topology, whereas the electrostatic model is derived from classical electrostatic theory. By writing an expression for the overlap of pairs of parametric functions, it is possible to derive a corresponding expression for docking scores as a function of the six degrees of freedom in a rigid body docking search (three translational and three rotational freedom degrees).
With suitable scaling factors, the docking score so obtained can be interpreted as an interaction energy, which may be subjected to minimization. Due to the peculiar orthogonality property of the basis functions, the correlation between a pair of 3D functions (
i.e., the overlap expressed as a function of translation/rotation operations) can be computed by means of expressions which only involve the original expansion coefficients. In many respects, this approach is similar to conventional fast Fourier transform (FFT) docking methods based on the use of a Cartesian grid to perform the Fourier transforms. However, the FFT approach only accelerates a docking search in three (translational) degrees of freedom whereas with a spherical polar approach, it is possible to both translate (with some effort) and rotate (relatively easily) the coefficient vectors. Candidate docking orientations are so generated and interaction energies may be estimated in what is effectively a six dimensional Fourier correlation [
22].
3.4.3. GRAMM
This program works thanks to a geometry-based algorithm for predicting the structure of a possible complex between molecules of known structures, by performing an exhaustive six-dimensional search through the relative translations and rotations of the molecules. It can provide quantitative data related to the quality of the contact between the molecules. The intermolecular energy calculation relies on the well established correlation and Fourier transformation techniques exploited in the field of pattern recognition. The docking calculations performed by GRAMM enable predicting the structure of the complex formed between the two constituent molecules by using their atomic coordinates, without any prior information as to their binding sites [
23].
3.5. AutoDock
The obtained plausible 3D model for the H1-H1 dimer was subjected to a preliminary validation step by estimating its interaction energies with bi-antennary ligands, for which binding data toward the ASGP-R are known. That was achieved by performing flexibile ligand docking studies, by means of the AutoDock program [
24].
The starting location of each ligand was manually arranged by approaching the galactose ending moieties of the ligands to the Ca
2+ ions supposed to interact with the Ca2 site of a H1 subunit and the Ca1 site of the adjacent subunit in the H1-H1 dimer. The two Ca
2+ ions were named according to Meyer
et al. [
10]. In particular, the 1–6 branches are close to Ca1 and the 1–3 branches are close to Ca2, based on what was suggested by Lodish [
9] and Rice
et al. [
11]. The locations of the ligands were subsequently subjected to energy minimization by means of the cff91 forcefield implemented in the Discover program. During the simulations, all atoms of the H1-H1 dimer (whic included Ca
2+ ions) were fixed, while ligands were allowed to be completely free to move. Spatial restraints were added so that 3-OH and 4-OH atoms of the galactose ending moieties were restrained with respect to the two Ca
2+ ions within a range of 2–3 Å.
Auto Dock Tools, an accessory program that allows the user to interact with AutoDock from a GUI (Graphic User Interface), was used for preparation of the AutoDock input files. The polar hydrogens and united atom Kollman charges were assigned for the H1-H1 dimer during the preparation of the protein input file, containing fragmental volume and solvation parameters. For the preliminarly optimized ligands, partial atomic charges were determined by a modified Gasteiger method which ensures unit charge on each residue. Moreover, rotatable bonds in the ligands were assigned. Prior to the AutoDock, AutoGrid was carried out for the preparation of the grid map using a grid box with a number of points in xyz (npts) of 40-62-40 Å, which defines the simulation space. The box spacing was 0.375 Å and the grid was set in order to cover the entire space of binding site. A distance-dependent function of the dielectric constant was used for the calculation of the energetic maps.
A scoring grid was calculated from a reference ligand (the one labeled as Lod4 in
Figure 1), to minimize the computation time. Finally AutoDock was run using maximum number of energy evaluations retries and generations of 10000 and 27000, respectively. The Lamarckian genetic algorithm (LGA) with the pseudo-Solis and Wets modification (LGA/pSW) method was used with default parameters for calculation of the docking possibilities.
4. Conclusions
In the work presented here a homology model of the CRD of the H2 subunit of human ASGP-R was built based its high sequence similarity with the H1 subunit. Then, two H1 and one H2 subunits were added in a step-wise articulated protocol to build the minimal plausible oligomeric form of ASGP-R needed to bind tri-antennary and bi-antennary carbohydratic ligands, that are the most affine (in particular tri-antennary) for such a receptor.
In the first step, dimers H1-H1 were generated by using rigid-body docking programs. Some relevant hypotheses were investigated by comparison with experimental binding data reported in the literature. Furthermore, a docking study was performed using three bi-antennary ligands. It was shown that the docking energies of the ligands in the dimer were in optimal agreement with the experimental affinity data in the case of one of the hypothesized H1-H1 dimers.
Starting from such a validated hypothesis for the H1-H1 dimer, a second step of rigid-body docking was performed in order to add the H2 subunit and build the H1-H1-H2 trimer. Even in this second case, the generated hypotheses were investigated by comparison with experimental binding data reported in the literature. Moreover, potential “induced fit” phenomena were investigated on the best performing H1-H1-H2 trimer. Each of the seven ligands considered in this work were allowed to energy minimize in the starting hypothesis of the trimer.
Significant conformational changes induced on the trimeric 3D theoretical model by the ligands were revealed as we can expect for a proteic receptor responsible for endocitosis.
The conformational changes were evaluated in terms of changes on the monomer interfaces upon ligand binding. Among the superficial contributions evaluated after ligand binding, a good agreement between the Gap volume index and experimental data was observed (R2 = 0.73).
In conclusion, this work gives:
The knowledge at the molecular level of interactions between the human ASGP-R and carbohydrate ligands is expected to contribute to the progress in the field of hepatic tissue engineering. In this perspective, the selection of optimal scaffolds, made up of natural biodegradable carbohydrates, will enable successful growth of hepatic cells that are expected to positively interact with the scaffold through the ASGP-Rs located of cell surface.