Analysis of Procollagen C-Proteinase Enhancer-1/Glycosaminoglycan Binding Sites and of the Potential Role of Calcium Ions in the Interaction

In this study, we characterize the interactions between the extracellular matrix protein, procollagen C-proteinase enhancer-1 (PCPE-1), and glycosaminoglycans (GAGs), which are linear anionic periodic polysaccharides. We applied molecular modeling approaches to build a structural model of full-length PCPE-1, which is not experimentally available, to predict GAG binding poses for various GAG lengths, types and sulfation patterns, and to determine the effect of calcium ions on the binding. The computational data are analyzed and discussed in the context of the experimental results previously obtained using surface plasmon resonance binding assays. We also provide experimental data on PCPE-1/GAG interactions obtained using inhibition assays with GAG oligosaccharides ranging from disaccharides to octadecasaccharides. Our results predict the localization of GAG-binding sites at the amino acid residue level onto PCPE-1 and is the first attempt to describe the effects of ions on protein-GAG binding using modeling approaches. In addition, this study allows us to get deeper insights into the in silico methodology challenges and limitations when applied to GAG-protein interactions.


Introduction
Glycosaminoglycans (GAGs) are anionic periodic linear polysaccharides, which are composed of periodic disaccharide units [1] and play a key role in many biologically relevant processes by interacting with their numerous and diverse protein targets such as cytokines and growth factors in the extracellular matrix [2][3][4][5]. However, the molecular mechanisms underlying GAG-mediated interactions are not fully understood, and experimental techniques alone are not sufficient for gaining Then we applied the in silico approaches we have previously developed to analyze the binding of PCPE-1 to GAGs at the atomic level and to determine if these interactions were exclusively electrostatic-driven or if other factors modulate the binding strength.

Modeling the Full Structure of PCPE-1
We created two ensembles of full-length PCPE-1 structures using the UNRES (from UNited RESidue) coarse-grained (CG) approach to determine the structure of the linker located between the CUB1-CUB2 and the NTR domains. In the first one, the structures of the linkers were optimized, and the domain structures were restrained, while in the second one, SAXS derived restraints were used additionally in order to reproduce the experimental data [26] (see Section 3.4 for more details). Five most probable structural models were obtained for both ensembles. For HP binding analysis we used the first three models obtained without SAXS restraints and one model obtained with SAXS restraints (SAXS Model) ( Table 1). The radii of gyration of the models obtained without SAXS restraints were significantly lower than those of the elongated structures restrained using SAXS data. As expected, the SAXS Model had a radius of gyration in agreement with the experimental value calculated using SAXS (41 ± 3 Å versus 43 ± 1 Å [26]. The obtained model was also consistent with the length of the protein determined experimentally (150 Å). Poisson-Boltzmann surface area (PBSA) calculations applied to these 4 models suggest that potential binding regions for Then we applied the in silico approaches we have previously developed to analyze the binding of PCPE-1 to GAGs at the atomic level and to determine if these interactions were exclusively electrostatic-driven or if other factors modulate the binding strength.

Modeling the Full Structure of PCPE-1
We created two ensembles of full-length PCPE-1 structures using the UNRES (from UNited RESidue) coarse-grained (CG) approach to determine the structure of the linker located between the CUB1-CUB2 and the NTR domains. In the first one, the structures of the linkers were optimized, and the domain structures were restrained, while in the second one, SAXS derived restraints were used additionally in order to reproduce the experimental data [26] (see Section 3.4 for more details). Five most probable structural models were obtained for both ensembles. For HP binding analysis we used the first three models obtained without SAXS restraints and one model obtained with SAXS restraints (SAXS Model) ( Table 1). The radii of gyration of the models obtained without SAXS restraints were significantly lower than those of the elongated structures restrained using SAXS data. As expected, the SAXS Model had a radius of gyration in agreement with the experimental value calculated using SAXS (41 ± 3 Å versus 43 ± 1 Å [26]. The obtained model was also consistent with the length of the protein determined experimentally (150 Å). Poisson-Boltzmann surface area (PBSA) calculations applied to these 4 models suggest that potential binding regions for HP were located in the NTR domain for Model

PCPE-1 Interactions with Glycosaminoglycans
We modeled and analyzed the binding of PCPE-1 and its domains, NTR and CUB1-CUB2, with the following GAGs: chondroitin sulfate-6 (CS6) made of two GalNAc6S-GlcA or three disaccharide units (dp4 and dp6, respectively), dermatan sulfate comprised of three GalNAc6S-IdoA disaccharide units (dp6), and heparin (HP) made of one, two and three GlcNS6S-IdoA2S disaccharide units (dp2, dp4, and dp6 respectively). These GAGs were selected for the following reasons: to compare the in-silico data with the experimental ones previously obtained with these GAGs [28] and to investigate the effects of epimerization, length and

PCPE-1 Interactions with Glycosaminoglycans
We modeled and analyzed the binding of PCPE-1 and its domains, NTR and CUB1-CUB2, with the following GAGs: chondroitin sulfate-6 (CS6) made of two GalNAc6S-GlcA or three disaccharide units (dp4 and dp6, respectively), dermatan sulfate comprised of three GalNAc6S-IdoA disaccharide units (dp6), and heparin (HP) made of one, two and three GlcNS6S-IdoA2S disaccharide units (dp2, dp4, and dp6 respectively). These GAGs were selected for the following reasons: to compare the in-silico data with the experimental ones previously obtained with these GAGs [28] and to investigate the effects of epimerization, length and sulfation pattern of GAGs on binding. Conventional docking approaches are severely limited in terms of the size of GAGs and can be effectively used only for the GAGs with a length up to dp6 [31]. Therefore, we used HP oligosaccharides of different lengths, from dp2 to dp6, to determine the effect of the GAG length on the binding to PCPE-1. Since HP is the strongest binder, the results obtained with HP oligosaccharides of different lengths should be the most representative. Furthermore, the GAGs studied here were selected in order to systematically evaluate the changes in binding to PCPE-1 according to the GAG length (dp4-dp6 for CS6 and dp2-dp6 for HP), the epimerization of glucuronic acid (CS6 dp6 and DS dp6), the increase in the number and position of sulfated groups (i.e., the sulfation pattern) and the net charge of the oligosaccharides (CS6, DS and HP).
Several clusters of docking solutions were obtained for each GAG tested. The polarity of the binding poses was analyzed because the orientation of the GAG chain was shown to be non-random for the IL-8 chemokine [7] and determinant for the binding specificity of the C-X-C motif chemokine ligand 14 [8], suggesting an important functional role of GAG polarity in their interactions with proteins. Then, for the most diverse binding poses within these clusters, molecular dynamics (MD) simulations were performed with binding free energy post-processing calculations and per residue binding free energy decomposition. We would like to emphasize that choosing a proper procedure of pose selection for such analysis is very challenging, since it is unclear how many clusters and solutions within each cluster should be representative, which part of the trajectory should be analyzed in terms of the free energy, if only the best scored pose from a cluster or all the poses should be taken into account for the further calculations, and how to weight their contributions in the latter case. The answers to these questions are dependent on the molecular systems and on the particular goal of the modeling study. These methodology-oriented aspects of protein-GAG modeling will be further discussed below.

The NTR-Domain
Among the found clusters of docking solutions, for CS6 dp4 and HP dp2, dp4 and dp6, one major cluster was observed, while there was more uniform distribution of the solutions between several clusters for CS6 dp6 and DS dp6 ( Table 2). This suggests that for those molecules, especially for CS6 where the clusters are especially diverse, multipose binding might be quite probable. GAG multipose binding was previously identified both experimentally and computationally for TIMP-3, which is homologous with the NTR domain [11]. Most solutions were localized near the C-terminal α-helix of the NTR domain except for CS6 dp6 (Figures 3 and 4). The size of the clusters obtained by molecular docking was not correlated with their corresponding free binding energies calculated from the MD simulation. This means that molecular docking alone was not able to properly score the solutions, although the Autodock 3 (AD3) scoring function is one of the most successful scoring schemes when applied to GAG complexes [10]. Similarities of the binding regions for the docking solutions post-processed by MD-based binding free energy decomposition per residue are reflected in Tables 3 and 4 for the obtained clusters and for each GAG ligand respectively. According to the binding free energy values obtained for the NTR domain bound to GAGs compared to the experimental complexes from the PDB [31] and given that no dissociation of these complexes was observed, we assume that the binding of the analyzed GAGs to NTR is stable. The binding strength, evaluated by the calculation of free binding energy, of CS6 dp4 and CS6 dp6 did not significantly differ, but the cluster location of CS6 dp6 differed from those of CS dp4, DS dp6 and HP dp2, dp4 and dp6. Only the third biggest cluster for CS6 dp6 was located in the region overlapping with those of other analyzed GAGs. CS6 dp6 and longer CS6 oligosaccharides might thus bind NTR differently from CS6 dp4 and other GAGs. Therefore, although the binding strength was similar for CS6 and DS, their preferred binding sites were distinct for these two GAGs, which differ only in the epimerization of glucuronic acid. This could potentially explain the results from surface plasmon resonance binding assays, which showed that CS6 did not inhibit PCPE-1 binding to HP whereas DS did [28]. Whereas DS competes with HP for the same binding site on PCPE-1, CS6 binds to a different region, which would allow HP oligosaccharides to remain bound to the NTR domain. Similar computational approaches were successfully applied to demonstrate the experimentally proven differences in binding strength between DS and CS6 interacting via the same binding pose to IL-8 [7,32] In contrast, the binding differences for those GAGs were related to certain differences in the binding pose for CXCL14 [8]. This suggests that for protein-GAG complexes, the predictive power of the computational methods is dramatically dependent on the protein involved and the distribution of the clusters on its surface, which is, in turn, also sensitive to a particular clustering procedure. HP binds the NTR domain stronger than CS6 and DS, while its increase in length stabilizes the interaction suggesting a key role of electrostatic interactions, although few hydrophobic amino acid residues (leucine and valine) and polar, uncharged, amino acid residues (asparagine and glutamine) were predicted to interact with the analyzed GAGs ( Figure 4). Most clusters revealed a bias towards specific polarity of GAG binding poses, although this trend was less pronounced for HP dp6 and DS dp6 ( Table 2). This suggests that our docking approach is able to distinguish GAG polarity, which is an important methodological finding and will allow us to investigate one of the potential parameters underlying the specificity of protein-GAG interactions [8].  shown in blue, red, yellow and green (from the most to the less populated cluster); the top 10 residues binding to GAGs according to MM-GBSA calculations averaged per GAG are highlighted in red surface. Note that the clusters for CS6 dp6 are shown for a different protein spatial orientation to allow for a better visualization. In addition, averaging the per-residue energy for very different clusters could be misleading as shown for CS6 dp6: the residues shown in red do not overlap with the surface patches where the most representative clusters of solutions are located. Note that the clusters for CS6 dp6 are shown for a different protein spatial orientation to allow for a better visualization. In addition, averaging the per-residue energy for very different clusters could be misleading as shown for CS6 dp6: the residues shown in red do not overlap with the surface patches where the most representative clusters of solutions are located.

CUB1-CUB2 Domains
Although there is no experimental evidence suggesting that CUB1-CUB2 domains of PCPE-1 directly interact with GAGs, the differences in binding of NTR and full-length PCPE-1 to HP and HS [28] indicate that CUB1-CUB2 domains could affect GAG binding to the full-length PCPE-1 protein. Therefore, we analyzed the potential binding of these domains to GAGs using the same procedure as above.
All the predicted binding poses were either located in the cleft region between the CUB domains or bridged both CUB domains (Supplementary Figures S2 and S3). In both cases, such potential binding would lead to restricted movements of the CUB domains relative to each other, which, in turn, would affect the overall flexibility of PCPE-1 and its ability to recognize and to bind its partners. Calculated GAG free binding energies were essentially less favorable than those calculated for the NTR domain (Supplementary Table S1), which is consistent with NTR being responsible for GAG binding in PCPE-1. No binding poses of the analyzed GAGs or the structures that can be obtained from them by GAG chain elongation were found to be in close proximity to the Ca 2+ binding sites or at the interface with procollagen peptides [23]. In a number of cases, the binding poses predicted by molecular docking were unstable (∆G higher than −15 kcal/mol), and the GAG dissociated from the protein. Such behavior was typically observed for HP oligosaccharides and is explained by the repulsion of these highly charged molecules by the negatively charged residues of the CUB1-CUB2 domain. Poses corresponding to the binding of CS6 and DS, which are less negatively charged than HP, were globally more stable. However, some binding poses were very stable and comparable with those found in the NTR domain (e.g., cluster 2, solution 2 for CS6 dp4). In such cases, bound GAGs protruded deeply into the cleft between the CUB1 and CUB2 domains forming strong van der Waals interactions in addition to the electrostatic interactions, which are believed to be the driving force in the formation of protein-GAG complexes [31,34]. As reported for the NTR domain, highly significant differences in free energy were found for GAGs within and beyond the same clusters. One major cluster was found for CS6 of various length in contrast to what was observed for other GAG analyzed. No correlation was found between the size of clusters and their free binding energies. The comparison between the observed clusters and the data averaged for different GAGs in terms of the most important protein binding residues showed high similarities for all GAGs, suggesting weak and rather unspecific binding to CUB1-CUB2 domains (Supplementary Figure S2, Supplementary Tables S2 and S3). Interestingly, for CS6 dp4 the differences between the clusters were more prominent than the differences of these clusters with those obtained for other GAGs. The increase in length of HP from dp2 to dp6 did not modify the potential interaction pattern with the CUB1-CUB2 domains. All clusters revealed strong polarity preferences except for the DS clusters.

Full PCPE-1
GAG binding was characterized with full-length PCPE-1 models obtained using UNRES CG simulations and HP dp6 as a ligand. Binding to Model 1, which was the most probable model among the ones obtained without the SAXS-based restraints, was significantly stronger than to Models 2, 3 and the SAXS Model (Table 5, Supplementary Table S4), as well as to the NTR domain (t-test, p-value < 0.05). Binding to Model 3 was also significantly stronger than to Model 2 and to the NTR domain. All clusters of HP dp6 solutions obtained for Model 1 were located in the region formed by the same residues of the NTR domain, the linker and the CUB1-CUB2 domains. For Model 2, the first cluster was located differently from the second and the third clusters. All the clusters correspond to the residues belonging predominantly to the NTR domain and the linker, but also partially to the CUB1-CUB2 domains. For Model 3, only NTR residues contributed to the binding of HP dp6. Cluster 1 was the most representative for Model 3 according to the molecular docking results, although not the most favorable according to MM-GBSA calculations, which again points to the essential differences in molecular docking and MD-based scoring. In the SAXS Model, GAG binding occurred at the NTR/linker interface. In all cases, the clusters were located in PCPE-1 patches corresponding to the positive electrostatic potential shown in Figure 2 and Figure S1.  4 free energy of binding obtained by MM-GBSA; 5 residues identified in the top 10 for binding according to MM-GBSA calculations per cluster ordered by the impact (starting from the most favorable one). 6 The polarity of a GAG binding pose was defined as its preferred orientation in relation to the reducing and non-reducing end.

Prediction of Ca 2+ Binding Sites
According to the experimental data, the interactions between both full-length PCPE-1 and the NTR domain with HP and HS are cation-dependent [28]. Therefore, we attempted to analyze the impact of Ca 2+ ions on HP binding in silico, which allowed us to evaluate the available computational tools in terms of sensitivity and prediction power to account for divalent ions in such calculations. As a first step, we applied three different approaches (see Section 3.7 for details) to annexin V protein, which has 9 experimentally identified occupied Ca 2+ binding sites, some of which are occupied upon HP binding [35]. The IonCom server predicted correctly eight out of nine experimentally known binding sites, while FoldX and MD approaches correctly predicted six binding sites (Table 6). Furthermore, we performed MM-GBSA calculations to estimate if the strength of the Ca 2+ binding in these experimentally known binding sites correlated with the predictions ( Table 7). As shown in the table, the total energies of interactions were positive despite the fact that all the ions were stable during the entire MD simulation performed in explicit solvent. This reflects the fact that the implicit continuous solvent model in MM-GBSA fails to properly account for the strength of binding for these divalent ions in terms of the full binding free energy. At the same time, in vacuo electrostatic energy was highly negative and could be meaningful for comparing binding sites since the studied interactions were electrostatically driven. A t-test performed for the in vacuo electrostatic energy values did not point out any statistical differences between the sites, which were properly predicted and the ones which the MD-based approach failed to predict. We applied three ion-binding site prediction methods to the NTR and the CUB1-CUB2 domains of PCPE-1 (Table 6). Neither FoldX nor IonCom found any Ca 2+ binding site for the NTR domain, while the MD approach identified from one to three binding sites, one of which being consistent through all five repetitions of MD simulations. The fact that these methods did not agree with MD simulations could be due to conformational changes of negatively charged amino acid side chains during the MD simulation, allowing them to come close to each other and to coordinate calcium ions. FoldX and IonCom used static structures, which prevents the dynamics required for the coordination of Ca 2+ . For CUB1-CUB2 domains, all methods were consistent and predicted two Ca 2+ binding sites identical to those found in the CUB1-CUB2 domain complexed with procollagen (PDB ID: 6FZV). This means that CUB1-CUB2 domains in PCPE-1 could be already prebound to Ca 2+ ions when the interaction with the procollagen is established. Furthermore, we compared the predicted Ca 2+ binding sites for PCPE-1 domains in terms of electrostatic energies obtained from MM-GBSA calculations with the corresponding energies for annexin V in order to estimate their strength (Table 8). CUB1-CUB2 binding sites were energetically comparable with those of annexin V. The Ca 2+ binding site in CUB2 was stronger than in CUB1 as suggested by more favorable electrostatic energies and by the fact that the Ca 2+ binding site in CUB2 was identified by MD in all five MD replicas, while the Ca 2+ binding site of CUB1 was correctly identified in three MD simulations. The Ca 2+ binding sites predicted for the NTR domain were significantly weaker, and only one of them was found in all MD replicas. This suggests that this site may be unoccupied when the NTR domain is in solution and not bound to a GAG. The experimental evidence that the NTR domain binding to HP is dependent on divalent cations [28] leads to the hypothesis that Ca 2+ ions could potentially bind within the interface of the NTR-HP complex. GAG binding. For this, we used two Ca 2+ binding sites corresponding to the X-ray structure (PDB ID: 6FZV) of the CUB1-CUB2 domain, and the weak Ca 2+ binding site predicted in the NTR domain ( Figure 5). Major differences in both positive and negative electrostatic potential shape were observed for the NTR domain. This is not only explained by the direct effect of Ca 2+ positive charge but also by the fact that E405, E406 and N407 were moved closer to each other to coordinate the cation, which also affects the topology of the isosurface. For CUB1-CUB2 domains, the largest positively charged patch of the potential isosurface was not noticeably affected by the presence Ca 2+ ions. To sum up, the predicted 3 Ca 2+ binding sites for both PCPE-1 domains, when occupied, could potentially affect GAG binding. This potential effect is analyzed below.

PCPE-1 Interactions with Glycosaminoglycans in the Presence of Ca 2+ Ions
In order to analyze the potential impact of Ca 2+ ions on PCPE-1-GAG interactions, HP dp2, dp4 and dp6 were docked onto NTR, CUB1-CUB2 and the full-length protein in the presence of Ca 2+ ions, followed by MD-based analysis. Two Ca 2+ ions were prebound to CUB1-CUB2 and one to the NTR domain. Despite the differences in electrostatic properties of these domains described above, no significant changes in docking results or binding free energies were observed when compared with the domains without prebound Ca 2+ (Supplementary Figures S4, S5 and Supplementary Tables S5, S6). In both cases, the binding occurred in the regions distant from the Ca 2+ ion binding sites. Only one cluster (HP dp6, cluster 3

PCPE-1 Interactions with Glycosaminoglycans in the Presence of Ca 2+ Ions
In order to analyze the potential impact of Ca 2+ ions on PCPE-1-GAG interactions, HP dp2, dp4 and dp6 were docked onto NTR, CUB1-CUB2 and the full-length protein in the presence of Ca 2+ ions, followed by MD-based analysis. Two Ca 2+ ions were prebound to CUB1-CUB2 and one to the NTR domain. Despite the differences in electrostatic properties of these domains described above, no significant changes in docking results or binding free energies were observed when compared with the domains without prebound Ca 2+ (Supplementary Figures S4 and S5 and Supplementary Tables S5  and S6). In both cases, the binding occurred in the regions distant from the Ca 2+ ion binding sites. Only one cluster (HP dp6, cluster 3 in the NTR domain) was found to be close to the Ca 2+ ion. In the corresponding binding poses the ion was coordinated by a sulfate group from the terminal GlcNS(6S) residue. However, the binding poses from this cluster were significantly less favorable than those located distantly from the Ca 2+ ion.
We also analyzed the impact of Ca 2+ ions on the GAG binding to the full-length PCPE-1 (Table 9, Supplementary Table S7, Figure 6, Supplementary Figure S6). For Models 1 and 2 and the SAXS Model, we did not observe any effect of Ca 2+ ions on the location of the structural clusters nor the direct participation of the Ca 2+ ions in binding HP dp6. For Model 3, there was a significant difference between the clusters observed in the absence and in the presence of Ca 2+ ions (e.g., cluster 2) related to the essential changes of the electrostatic potential on the protein surface in the presence of Ca 2+ . In terms of the free energies of binding, the presence of Ca 2+ ions did not significantly affect binding to Models 1 or 2 or the SAXS Model. For cluster 2 of Model 3, which was relocated in the presence of Ca 2+ ions, the free energy of binding became less favorable than in the absence of Ca 2+ ions.  4 free energy of binding obtained by MM-GBSA; 5 residues identified in the top 10 for binding according to MM-GBSA calculations per cluster ordered by the impact (starting from the most favorable one). 6 The polarity of a GAG binding pose was defined as its preferred orientation in relation to the reducing and non-reducing end. We calculated the binding poses of long (dp11) HP chains on full-length PCPE-1 in the absence and the presence of three Ca 2+ ions, two bound to CUB1-CUB2 and one to the NTR domain (Table 10, Table S8). HP dp11 was the longest GAG that we To summarize our attempts to determine the impact of Ca 2+ on GAG binding to PCPE-1, our approach to consider Ca 2+ ions as a part of the protein did not detect any favorable effect of these divalent ions on GAG binding in contrast to the experimental data [28]. Therefore, we hypothesize that Ca 2+ ions might bind to GAGs rather than to PCPE-1 prior to the complex formation. The binding of divalent ions to GAGs has been experimentally reported for Zn 2+ , Mn 2+ , Cu 2+ , Ca 2+ , Co 2+ , Na + , Mg 2+ , Fe 3+ , Ni 2+ , Al 3+ and Sr 2+ ions [36,37]. The crucial role of cations in protein-GAG interactions was experimentally shown for amyloid precursor protein [38], HP cofactor II [39], endostatin [40,41], FGF1 and IL-7 [42]. Experimental data obtained using mass spectrometry [43], NMR [44], gel-filtration chromatography [45,46] and infrared spectroscopy [47] indicate that GAGs interact with divalent ions, and that these interactions affect GAG structure and conformational properties. Divalent ions will be integrated in GAG structure in our future work on protein-GAG complexes. Another potential role of Ca 2+ ions could be to stabilize PCPE-1 structure, which would affect its interactions with GAGs.

Predicting Longer GAG Binding Poses Using the Fragment-Based Approach
We calculated the binding poses of long (dp11) HP chains on full-length PCPE-1 in the absence and the presence of three Ca 2+ ions, two bound to CUB1-CUB2 and one to the NTR domain (Table 10,  Table S8). HP dp11 was the longest GAG that we managed to assemble in these docking experiments, and it was used to model the scenario when the GAGs longer than dp6 are bound to the protein.
The increase in length of the HP chain from dp6 to dp11 significantly stabilized (p-value < 0.05) the interactions with PCPE-1 Models 1 and 2. The addition of Ca 2+ strengthened the interactions of HP dp11 with PCPE-1 for Models 1, 2 and the SAXS Model but not for Model 3. Similarly to what was observed for HP dp6, Model 1 was the strongest in terms of HP dp11 binding. Ca 2+ ions did not affect the binding sites of HP dp6 on the surface of Models 1, 2 or the SAXS Model but changed binding to Model 3, as described for HP dp6 (Figure 7, Supplementary Figure S7). In all the cases, docked HP dp6 structures overlapped very well with those of HP dp11. It could be concluded that docking HP dp6 defines the core binding unit of a HP chain. The increase in binding affinity with the increase of HP length agrees with the experimental trend observed in this study (Figure 1).

Surface Plasmon Resonance (SPR) Binding Assays
The SPR measurements were performed on a BIAcore 3000 instruments (GE Healthcare, Uppsala, Sweden), and the data were analtzed with the BIAevaluation 3.1 Software (GE Healthcare, Uppsala, Sweden) as previously described [28]. Inhibition assays of PCPE-1 binding to HP and HS by HP oligosaccharides (from dp2 up to dp8, generous gift of Rabia Sadir and Hugues Lortat-Jacob, Institut de Biologie Structurale, Grenoble, France) were carried out as previously described [28]. Briefly, HP (Sigma, St Quentin Fallavier, France) and HS (Celsus Laboratories Inc, Cincinnati, OH, USA) from porcine intestinal mucosa were biotinylated and captured on streptavidin previously immobilized on a CM4 sensor chip (GE Healthcare, Uppsala, Sweden). Human recombinant PCPE-1 (1 µM) [28] was incubated with HP oligosaccharides (5 µg/mL) in 10 mM Hepes pH 7.5 + NaCl 0.15 M + P20 0.005% (HBS) + 5 mM CaCl2 for one hour before injection over immobilized HP (39 RUs) and HS (113 RUs) at a flow rate of 30 µL/min for 4 min. The percentages of inhibition were calculated relative to PCPE-1 binding level incubated in the same conditions with HBS + 5 mM CaCl2. The running buffer was HBS and the temperature was set at 25 °C.

Surface Plasmon Resonance (SPR) Binding Assays
The SPR measurements were performed on a BIAcore 3000 instruments (GE Healthcare, Uppsala, Sweden), and the data were analtzed with the BIAevaluation 3.1 Software (GE Healthcare, Uppsala, Sweden) as previously described [28]. Inhibition assays of PCPE-1 binding to HP and HS by HP oligosaccharides (from dp2 up to dp8, generous gift of Rabia Sadir and Hugues Lortat-Jacob, Institut de Biologie Structurale, Grenoble, France) were carried out as previously described [28]. Briefly, HP (Sigma, St Quentin Fallavier, France) and HS (Celsus Laboratories Inc, Cincinnati, OH, USA) from porcine intestinal mucosa were biotinylated and captured on streptavidin previously immobilized on a CM4 sensor chip (GE Healthcare, Uppsala, Sweden). Human recombinant PCPE-1 (1 µM) [28] was incubated with HP oligosaccharides (5 µg/mL) in 10 mM Hepes pH 7.5 + NaCl 0.15 M + P20 0.005% (HBS) + 5 mM CaCl 2 for one hour before injection over immobilized HP (39 RUs) and HS (113 RUs) at a flow rate of 30 µL/min for 4 min. The percentages of inhibition were calculated relative to PCPE-1 binding level incubated in the same conditions with HBS + 5 mM CaCl 2 . The running buffer was HBS and the temperature was set at 25 • C.

Protein Structures
The structure of the N-terminal CUB1-CUB2 domains was obtained from the X-ray structure of CUB1-CUB2 fragment of PCPE-1 bound to the C-propeptide trimer of procollagen III (PDB ID: 6FVZ, 2.7 Å). In this structure, the residues 33-275 are resolved (here and further, the numeration of the sequence corresponds to the UniProtKB ID: Q15113). However, the structure of the linker (151-157) between two CUB domains was not determined due to its flexibility [22]. The structure of the NTR domain (313-442) was also obtained from the PDB (PDB ID: 1UAP, 1st NMR model) [23]. The structures of the linker between two CUB domains (151-157) as well as the interdomain linker (276-312) were built in xLeap module of AMBER16, refined using an MD approach and then used for modeling the full-length structure of PCPE-1 with a CG MD approach as described in the Section 3.4.

Electrostatic Potential Calculations
The PBSA approach as implemented in AmberTools within AMBER16 package [50] was used to calculate electrostatic potential isosurfaces corresponding to the analyzed proteins. This method proved to be successful for GAG binding site prediction on an extensive protein-GAG dataset from the PDB [31]. The default value for the grid spacing of 1.0 Å and ff99SB force field parameters were used. The electrostatic potential isosurfaces were analyzed in VMD [51]. For visualization, such values of positive and negative electrostatic potential were chosen for each molecular system so that the data were as informative as possible for GAG binding site propensities.

Coarse-Grained MD Simulations
In order to calculate the conformations of the PCPE-1 linkers (151-157 and 276-312 sequences) to obtain a model of the full-length protein, we applied a CG multiplexed replica exchange molecular dynamics (MREMD) [52,53] approach as implemented in UNRES (United Residues) [54,55], as previously described. The protocol was similar to that used in our previous work [12]. Distance restraints were imposed on the domains during the MREMD simulations. Additionally, in one of the simulations, SAXS-derived restraints [22] were imposed [56]. Each MREMD simulation consisted of 20 trajectories run at temperatures from 265 K to 370 K. Each trajectory consisted of 3.5 × 10 7 MD steps with 4.89 fs length for simulations with the restraints on the domains only and 7.1 × 10 4 steps for simulations with additional SAXS-derived restraints. The lower number of steps for simulations with information from the SAXS experiment was used because the radius of gyration of maintained structures was obtained already after that time. Only conformations from the last quarter of the simulation were taken into further analysis with the use of the weighted histogram analysis method (WHAM) [56]. The next step was minimum variance cluster analysis [57] of the conformational ensemble at T = 300 K, which enabled us to obtain five clusters, ranked according to summary probabilities of the ensembles and containing the most probable structures to the cluster with the least probable structures. For each cluster, one representative structure, closest to the cluster centroid, was selected as the representative conformation. The last step was the conversion of CG structures into all-atom ones using the PULCHRA [58] and SCWRL [59] algorithms.

Autodock 3
The docking simulations of GAG ligands to the PCPE-1 were performed with Autodock 3 (AD3) [60], which was previously shown to yield the best performance among docking programs for GAG ligands [10,31]. The blind docking procedure was used: the whole protein surface was available for the ligand when sampling a potential binding site. For all proteins, we used 127 × 127 × 127 grid points for AD3 runs. However, because of the differences in protein size, the grid step was different for different proteins to contain the whole protein molecule within a single grid box: the default value of 0.375 Å was used for the NTR domain, while 0.5 Å was used for CUB1-CUB2 domains, and the grid step of 0.6-1.0 Å was used for different models of the full-length PCPE-1 protein due to their essentially bigger sizes. All GAG ligands were docked to the protein with the use of the Lamarckian genetic algorithm. The initial population size was 300, 10 5 generations, 9995·10 5 energy evaluations and 1000 independent runs with up to 33 torsional angle degrees of freedom were carried out. 1000 docked structures for each molecular system (protein-GAG pair) were obtained and further analyzed; the 50 top-scored ones (according to AD3 scoring function) were chosen for further clustering with the DBSCAN algorithm [33]. The clustering parameters, neighborhood search radii and minimal numbers of cluster members were manually selected for each system individually in order to yield 2-4 representative clusters. The distance metric used for clustering, which is defined as the root-mean-square of atomic distances for the nearest atoms of the same type, takes into account the periodic nature of GAGs, which is more appropriate for those ligands than the classical root-mean-square deviation (RMSD) [61]. Within each cluster, those poses which were different from one another to be categorized into subgroups were selected for further analysis. Such a procedure was used to account for the multiple pose binding previously observed for GAG ligands [11,62]. The GAG glycosidic linkages from the obtained docking poses were visually filtered in order to avoid incorrect geometries that could be produced by AD3 [63].

Fragment-Based Approach
In order to dock longer HP to the full-length PCPE-1 protein, which is unfeasible for the AD3 protocol described above due to the limitation of the available number of the degrees of freedom and, in general, because of the computationally very expensive conformational sampling required for simulations with such ligands, a fragment-based docking approach we recently developed was applied [64]. In brief, first, HP dp3 of both types, IdoA(2S)-GlcNS(6S)-IdoA(2S) and GlcNS(6S)-IdoA(2S)-GlcNS(6S), were docked using the AD3 docking procedure described in Section 3.5.1. Then, all 1000 solutions for each HP dp3 fragments were used to assemble~10 5 dp11 GAG chains applying the approach standard parameters [64]. This was followed by the refinement and all-atom conversion of the chains with a slight modification in the original scripts to avoid the RMSD-based selection of the best fitting structures compared to the experimental ones due to the lack of proper atomistic experimental PCPE-1/GAG complex structures. In particular, instead of the previously described way of selecting structures, a simple RMSD-based clustering (cutoff 4 Å) was performed to filter out the duplicates and find the most relevant structures. From the resulted~5-40 atomistic structures the ones with significantly different docking poses were selected and refined together with the full protein using MD simulations applying the same procedure as described in Section 3.6, except for the minimization procedure where at the first step of 10 4 -10 5 steepest descent minimization cycles were applied before the conjugate gradient minimization step.

All-Atom MD Simulations and MM-GBSA Free Energy Calculations
All-atom molecular dynamics (MD) simulations of the PCPE-1, PCPE-1/Ca 2+ and PCPE-1/Ca 2+ /GAG complexes obtained by molecular docking were performed with the use of the AMBER16 MD package [50]. Periodic boundary conditions in a truncated octahedron TIP3P water box with at least 8 Å distance from the solute to the periodic box border were used. Na + and Cl − monovalent counterions were used to neutralize the system. ff99SB force field parameters for protein [65] and the GLYCAM06 [66] for GAGs were used, respectively. Prior to MD production runs, two energy-minimization steps were performed: first, 500 steepest descent cycles and 1000 conjugate-gradient cycles with harmonic force restraints on solute (10 kcal/mol/Å 2 ), then, 3000 steepest-descent cycles and 3000 conjugate-gradient cycles without restraints. After the minimization, the system was heated up to 300 K for 10 ps with harmonic force restraints on solute (10 kcal/mol/Å 2 ), equilibrated for 100 ps at 300 K and 10 5 Pa in isothermal isobaric ensemble (NTP). This was followed by a 100 ns MD production run in the same NTP ensemble. The SHAKE algorithm, 2 fs time integration, 8 Å cutoff for non-bonded interactions, and the particle mesh Ewald method were used. The trajectories were analyzed using the cpptraj module of AMBER Tools [46]. Free-energy calculations and per-residue energy decomposition were done using molecular mechanics-generalized born surface area (MM-GBSA) model igb = 2 [67] for protein-GAG and protein-Ca 2+ complexes for the parts of the trajectory where convergence in terms of RMSD was obtained. The obtained energy values account explicitly for the enthalpy and implicitly for the solvent entropy. For this reason, the reported energies should not be strictly interpreted as full free energy of binding: the entropic contribution to binding was not taken into account explicitly. Entropy calculations were shown to dramatically increase the overall noise in the free binding energies when used within MM-GBSA free energy calculation schemes in general [68] and for GAG containing systems particularly [69].

Ca 2+ Ion Position Prediction
We applied and compared several approaches to predict the Ca 2+ binding sites on the surface of PCPE-1 and its domains. Prior to applying these approaches to PCPE-1, we analyzed their performance for the complex between an annexin V and HP, where Ca 2+ ions are known to be stable and to contribute to GAG binding (PDB ID: 1G5N, 2.7 Å) [35]. In this structure, 9 Ca 2+ were resolved.

Molecular Dynamics Approach
We used the MD approach with the protocols described in Section 3.6 to predict the binding sites of Ca 2+ on the surface of protein. The length of each MD simulation run was 100 ns. In these calculations, Ca 2+ ions were placed randomly in the periodic box. For annexin V, 9 Ca 2+ were used corresponding to the number of Ca 2+ ions observed in the experimental structure. For CUB1-CUB2, NTR domains, three ions were used. The number of ions was chosen in order to sample effectively the protein surface within a reasonable simulation time. For the CUB1-CUB2 and the NTR domains, the simulations were repeated 5 times. The trajectories were analyzed, and the for the frames where the Ca 2+ ions were stably bound in terms of RMSD convergence for the coordination complex, MM-GBSA free energy calculations were performed. The obtained values for the predicted Ca 2+ binding sites were compared with the corresponding energies obtained from the simulations of annexin V-Ca 2+ and CUB1-CUB2-Ca 2+ crystal structures. The latter was extracted from the structure of the complex of CUB1-CUB2 with procollagen peptide (PDB ID: 6FVZ, 2.7 Å).

FoldX and IonCom
We used the scripts of FoldX available at http://foldx.embl.de and online ion ligand binding site prediction tool IonCom at https://zhanglab.ccmb.med.umich.edu/IonCom for Ca 2+ binding site predictions. FoldX represents a tool with an implemented empirical force field developed for effective evaluation of the contribution of mutations on the stability, folding and dynamics of proteins and nucleic acids [70,71]. As an output FoldX yields the positions of predicted ions on the surface of the protein. In contrast, IonCom utilizes ab initio training and template-based information to output a list of protein residues potentially involved in ion binding [72]. Both programs were used with the default parameters.

Visualization and Data Analysis
VMD [51], Chimera [73] and Pymol [74] were used for structural analysis visualization, MD trajectory analysis, as well as for the graphics production. R package was used for data analysis [75].

Conclusions
We report here the computational analysis of the interactions of full-length PCPE-1 and its domains with GAGs of various lengths and sequences. This model of full-length PCPE-1 based on SAXS restraints is in agreement with the experimental values of its radius of gyration and length. The full-length protein binds GAGs through the NTR domain and the interdomain linker, while the binding to CUB1-CUB2 domains is weaker, likely non-specific, and less energetically favorable. GAG preferential binding to the NTR domain is mostly electrostatically-driven. CS6 is predicted to bind to a different site of the NTR domain than the other GAGs, which may account for the experimental differences previously observed between CS6 and DS/HP [28]. Fragment-based docking of longer GAG oligosaccharides results in overlap with the docking poses obtained for shorter GAGs and corresponds to more favorable interactions than those established by shorter oligosaccharides in agreement with the strongest inhibition of PCPE-1-HP interactions by longer HP oligosaccharides reported in this study. Our results suggest that calcium ions may bind to GAGs before they interact with PCPE-1 or may stabilize the structure and conformation of full-length PCPE-1. Although we have used several computational approaches to predict Ca 2+ binding sites on the protein surface, considering calcium ions as a part of the protein receptor for docking is not an approach applicable to all systems.
From a methodological point of view, we have shown that the size of the clusters identified by molecular docking is not correlated with their free binding energies obtained in the MD simulation. We have successfully applied, for the first time, fragment-based docking to dp11 oligosaccharides, which will be useful for the computational characterization of protein interactions with long GAGs, which are challenging to study using conventional docking approaches.