In Silico Peptide Ligation: Iterative Residue Docking and Linking as a New Approach to Predict Protein-Peptide Interactions

Peptide–protein interactions are corner-stones of living functions involved in essential mechanisms, such as cell signaling. Given the difficulty of obtaining direct experimental structural biology data, prediction of those interactions is of crucial interest for the rational development of new drugs, notably to fight diseases, such as cancer or Alzheimer’s disease. Because of the high flexibility of natural unconstrained linear peptides, prediction of their binding mode in a protein cavity remains challenging. Several theoretical approaches have been developed in the last decade to address this issue. Nevertheless, improvements are needed, such as the conformation prediction of peptide side-chains, which are dependent on peptide length and flexibility. Here, we present a novel in silico method, Iterative Residue Docking and Linking (IRDL), to efficiently predict peptide–protein interactions. In order to reduce the conformational space, this innovative method splits peptides into several short segments. Then, it uses the performance of intramolecular covalent docking to rebuild, sequentially, the complete peptide in the active site of its protein target. Once the peptide is constructed, a rescoring step is applied in order to correctly rank all IRDL solutions. Applied on a set of 11 crystallized peptide–protein complexes, the IRDL method shows promising results, since it is able to retrieve experimental binding conformations with a Root Mean Square Deviation (RMSD) below 2 Å in the top five ranked solutions. For some complexes, IRDL method outperforms two other docking protocols evaluated in this study. Hence, IRDL is a new tool that could be used in drug design projects to predict peptide–protein interactions.


Introduction
In biological systems, non-covalent interactions between two molecular entities, mostly receptor-ligand and protein-protein, are the corner-stones of cellular functions and communication. Their importance has been known for several decades and the study of the underpinning mechanisms is a very popular research field in biology and biochemistry. A better knowledge in this field would be of paramount importance in the design of new drugs targeting proteins involved in pathological conditions. To assist these research studies there has been a rapid growth of in silico approaches to predict molecular interactions. In particular, protein-protein interactions (PPIs) are an inherent phenomenon involved in key biological processes [1,2]. Amongst PPIs, the description of peptideprotein interactions is particularly important, since these interactions have a predominant role in many regulatory processes (e.g., neuropeptide-receptor and enzyme-substrate interactions), and may constitute between 15 and 40% of all interactions present in cells [3]. Characterization at the atomic level of the molecular interactions may provide key information to develop novel therapeutic treatments for cancer [4], neurodegenerative diseases [5], autoimmune diseases [6], and numerous other pathologies [7][8][9]. In addition, the design of modulating peptides, targeting mainly PPIs, becomes one attractive field of development in drug discovery [10][11][12][13][14][15]. By targeting the surface interaction of two bound proteins instead of their respective active site, the biological (or catalytic) activity of each protein could be maintained, while the downstream signaling processes could be modified [16]. However, elucidation by experimental methods of the structure of protein-peptide complexes still remains a challenge, due to the high flexibility of linear peptides [17,18] and their transient interactions. In addition, peptides may have a variable size, from a few residues to dozens of amino acids [19]. Consequently, the use of bioinformatics and in silico tools is a complementary approach that is increasingly applied to predict the structure of peptide-protein complexes [20].
Several molecular modeling methods and bioinformatics tools dedicated to the prediction of the structure of peptide-protein complexes have been developed in the past decade. [18] Methods such as PepATTRACT [21], PEP-FOLD3 [22], Rosetta FlexPepDock [23], HADDOCK [24], SP-peptide module of Glide [25], and others [26,27] have proven their capacity to predict peptide-protein complexes. Each tool has its own method to predict protein-peptide interactions, such as de novo prediction, coarse-grained resolution, restraint-guided prediction, etc. However, all these tools would benefit from refinement of the prediction of the sidechain position, especially when the considered peptide-protein interaction triggers a conformational change (e.g., receptor activation). Indeed, small modifications in the predicted binding of the backbone or the sidechain of the peptide could totally change the mode of interaction and may not trigger the activation of the biological target as observed experimentally. Another major difficulty to address is the strong induced fit of the peptide. After binding, the peptide could fold on the receptor surface, while it is unfolded in its unbound state [28]. Fragmentation of peptides is one of the methods to address the problem of peptide flexibility. Many in silico tools have already been developed to design ligands from fragments, such as LUDI [29], Dynamic Ligand Design (DLD) [30], or AutoGrow [31]. For the latter, the program is now able to predict drug-like properties and to optimize ligands with click chemistry rules [32]. However, those tools, which are used in several docking programs, such as Hammerhead [33] or FlexX [34], are not specifically developed for peptides, for which few tools currently exist. In the 1990's, GROW was developed for the prediction of peptide-protein interactions, using single amino acids to construct peptides of interest within cavities [35]. Even if the approach brought, at that time, very useful insights into the design of peptidomimetics for the inhibition of HIV protease [36][37][38], it suffers from several drawbacks, such as the conformational sampling of the molecules in order to propose different solutions, or the prediction of the position of the starting seed.
In this current work, starting from the initial idea to consider peptide fragments instead of the whole peptides, we have designed a novel approach to predict peptide-protein interactions with the use of a covalent docking approach. The covalent docking protocol was initially developed to predict the covalent bond between an inhibitor and its biological target. We applied this concept to the prediction of protein-peptide interactions, but without creating a covalent bond between the peptide and the protein. Covalent docking is used here to iteratively build the peptide from several segments inside the binding site of a given receptor, which consists of a novel methodology employing this method. Our new approach has been compared to two docking modules of Glide [39], from the Schrödinger Suite, Standard Precision (SP), and SP-Peptide. Results show that in all cases "Iterative Covalent Docking and Linking" (IRDL) is able to retrieve the crystallographic pose of peptides. Furthermore, in 10 out of 11 cases, in the top 5 poses, IRDL is able to retrieve a Root-Mean Square Deviation (RMSD) below 2 Å between the docking pose and the crystal structure.

Results and Discussion
Protein peptide interaction is a crucial phenomenon in the living world and is involved in many biological processes. Here, we propose a docking approach to predict the correct molecular interactions and conformations of small unconstrained peptides bound to protein targets. To achieve this goal, exploration of the conformational space of the peptides should be exhaustive. However, classical searching algorithms show their limits when the number of rotatable bonds increases. Thus, the prediction of protein-peptide interaction, at atomic scale precision, becomes challenging, due to the high flexibility of these molecules. Specific conformational search tools have been developed to assess the binding mode of molecules with large numbers of rotatable bonds. We present here the development of a novel method that is able to retrieve the binding mode of peptides bound to proteins. Furthermore, we compare the results obtained with this model to those obtained by two classical docking protocols present in the same Glide software: standard precision (SP) protocol and SP-peptide protocol, developed specifically for protein-peptide interactions. All docking poses were rescored with the Extra Precision (XP) scoring function.
After rescoring with the XP scoring function, we selected all docking poses from the highest scoring pose to the pose with the lowest RMSD to the X-ray structure. We also took into account the position of reasonable docking poses, meaning poses with a deviation lower than 2 Å and 3 Å, considering the backbone or the whole peptide, respectively. For analysis, RMSD calculation was performed without hydrogens. In general, the crystal structure of the peptides with a RMSD lower than 2 Å, considering the whole peptide compared to the docking pose, was found in at least one protocol for all case studies, but did not match with the top scoring position. This is observed with many docking tools, where the top solution is often not the one with the lowest RMSD to crystal structure.

Highest Scoring Pose
The top scoring solution of each methodology was obtained by ranking all poses with Glide XP scoring function. In addition, we also analyzed the rank of the first pose having a root-mean square deviation (RMSD) below 2 Å and 3 Å for the backbone and the whole peptide, respectively. Figure 1 represents the RMSD of the top scoring pose, after rescoring all poses with Glide XP scoring function, compared to the X-ray structure of each peptide using the three protocols. Details of all calculated RMSD values of the top scoring position are given in Table S3. Taking into account only the top scoring solution, the results show good RMSD values for some peptide-protein complexes but are unreasonable for some other examples, regardless of the applied protocols. SP protocol fails to retrieve a correct solution for four complexes (PDB ID 1OSZ, 3CH8, 3MMG, and 4GRV) with a RMSD below 2 or 3 Å when considering the backbone or the whole peptide, respectively, but it can predict a solution very close to the X-ray structure, such as for PDB ID 4NNM. This classical docking protocol developed for protein-ligand complexes has shown high success in many previous protein-ligand studies [40][41][42]. For half of the systems, IRDL can retrieve the top scoring solution with a RMSD below 2 or 3 Å when considering the backbone or the whole peptide, respectively. Results associated with the second half of the system are sometimes slightly above the limits of 2 and 3 Å, except for PDB ID 4RXH and 1OSZ systems, where the RMSD is high. Similar results are obtained with the SP-Peptide protocol.
Considering not only the top scoring docking solutions but also the rank of the first structure fulfilling the criteria for a correct docking solution, i.e., with a RMSD below 2 Å or 3 Å when considering the peptide backbone or the whole peptide, respectively, the results are slightly different (Table 1). Table 1. Rank of the first structure having a RMSD below 2 Å or 3 Å for the backbone or for the whole peptide considering the three protocols. BB: RMSD values considering the backbone; WP: RMSD values considering the whole peptide; "-": no structure fulfilling the RMSD criteria; a ratio of the actual conserved hydrogen bonds between the crystal structure and the docked solution of the whole peptide (WP) obtained by IRDL. --It appears that IRDL protocol is able to retrieve a correct docking solution for each proteinpeptide complex at the first or second positions for eight and ten protein-peptide complexes when considering the backbone or the whole peptide, respectively. However, for two complexes (PDB ID 4RXH and 3T6B) the best positions were at rank 36 and 67, respectively, considering the backbone only, but a correct solution is obtained at the first rank for PDB ID 3T6B when considering the whole peptide. Furthermore, those poses conserve most of the hydrogen bond interactions present in the crystal structure. As often observed in docking experiments, the missed hydrogen bonds are compensated by other hydrogen bonds present with close residues [41]. These observations assess the good efficiency of the IRDL protocol to retrieve the correct docking poses of the peptides and the conserved interactions observed in crystal structures. While SP protocol is able to correctly retrieve the binding mode of the opioid peptide Tynorphin (PDB ID 3T6B) at the first position, SP-peptide is, surprisingly, not able to retrieve any correct docking solution for this complex. For SP and SP-peptide, several complexes lead to docking issues. Four peptides are not docked correctly using the SP protocol (PDB ID 1OSZ, 3MMG, 4GRV, and 3CH8), and likewise for two peptides using the SPpeptide protocol (PDB ID 3T6B and 4RXH). For these six complexes, SP and SP-peptide protocols are not able to retrieve any docking solution with a RMSD below 2 Å or 3 Å, considering the peptide backbone or the whole peptide, respectively. However, one has to keep in mind that the SP protocol was not developed specifically for protein-peptide complexes. Furthermore, considering the results presented in Table 1, the IRDL method outperforms the two other protocols regarding the ranking of a correct docking solution. While the SP-Peptide can provide a correct solution for nine systems, the IRDL protocol always suggests a better rank of a correct solution regarding our RMSD criteria, considering either the backbone or the whole peptide, except for PDB ID 4Q6H, where IRDL retrieves the correct solution for the second docking solutions. It is important to note that both SP and SP-Peptide protocols fail in retrieving any docking solutions with low RMSDs, but a full minimization after docking could improve the results.

PDB
Despite the fact that we succeeded in demonstrating the efficiency and the originality of the IRDL method, in its current state, some limitations still exist. First, the position of the first fragment is a crucial step for building the peptide. Currently, the C-terminus is initially docked in the binding site, allowing building from the C-terminus to N-terminus only. Second, the first fragment must interact strongly with the receptor, since its position in the active site is critical for the success of peptide building. One improvement would be to dock all fragments and then grow them from the Cto N-terminus and N-to C-terminus sides. This approach is currently unfeasible because of long calculation times. Efforts are currently under way to improve IRDL.

Poses with the Lowest RMSD to the Crystal Structure
In a second analysis, we checked if the IRDL protocol was able to retrieve the crystal structures of the eleven considered peptides, regardless of their ranking. Results, presented in Figure 2, show that in each case, IRDL was able to retrieve the crystal structures of the peptides bound to proteins with excellent precision. The docking pose with the lowest RMSD to the crystal structure was below 2 Å or 3 Å for each system, considering the backbone only or all atoms of the peptide, respectively. Remarkably, for some systems (PDB ID 1OSZ, 2D5W, 3CH8, 4GYO, and 4NNM), IRDL protocol was able to retrieve the top scoring pose with sub-angstrom deviation compared to the crystal structure. In comparison, considering the peptide backbone only, SP method was not able to retrieve a correct solution for four complexes, namely PDB ID 1OSZ, 3CH8, 3MMG, and 4GRV. Moreover, except for PDB ID 4NNM complex, SP protocol was not able to retrieve any docking solutions with subangstrom precision compared to the crystal structure, considering the entire peptide. Details of the best RMSD values are given in Table S2.
Interestingly, SP-peptide protocol, which was developed specifically for the docking of peptideprotein complexes, did not outperform the two other methods in the prediction of protein-peptide interactions. For example, SP-Peptide was not able to find a satisfying docking pose for PDB ID 4RXH, the best pose with the lowest RMSD value, which presents a deviation equal to 4.7 Å and 7.5 Å compared to the crystal structure, considering the backbone and the whole peptide, respectively. In comparison, IRDL approach and SP protocol are able to retrieve excellent binding modes for the peptide for this complex with a similar RMSD of 1.5 and 2.0 Å for the backbone and whole peptide, respectively. Considering the PDB ID 1OSZ, 3MMG, and 3T6B, SP-peptide predicts the best poses of the peptides with a RMSD above 3 Å deviation from the crystal structure, considering the whole peptide ( Figure 2). Regarding the IRDL approach, it appears that the results are always better than SP-peptide for those three cases (RMSD below 3 Å). However, results are better for SP-peptide when considering only the backbone (RMSD between 2.0 and 3.5 Å), but the values are still too high for accurate protein-peptide studies in drug design projects. Indeed, such deviations can bring about large errors in the orientation of the side chains, as shown in Figure 3 for PDB ID 3MMG. The RMSD of 2.0 Å obtained for PDB ID 3MMG is mainly due to the side chain of the last residues of the peptide. In the case of IRDL, the RMSD is only 0.8 Å and the last residues are better superimposed on the crystal structure. Surprisingly, for both complexes of PDB ID 4RXH and 3T6B, SP protocol gives better results for the pose with the lowest RMSD of the X-ray structure than SP-peptide, suggesting that these protocols are also protein-peptide complex dependent. For the other systems, we observe that IRDL protocol provides equivalent or better results (PDB ID 1OSZ) than those obtained with SP and SP-peptide protocols. The same observation could be made considering the whole peptide.  Poses with the lowest RMSD to the X-ray structure obtained with the three different protocols for PDB ID 3T6B, 1OSZ, and 3MMG complexes, respectively. X-ray positions of the peptides are represented in brown. Docking poses obtained with IRDL, SP-Peptide, and SP protocols are represented in dark blue, green, and pink, respectively. Figure 3 represents the superimposition of the crystallographic structures, with the docking poses having the lowest RMSD to the X-ray structure obtained for PDB ID 3T6B, 1OSZ, and 3MMG for the three different docking protocols. For PDB ID 3T6B complex, IRDL is able to predict the crystallographic binding mode of the opioid peptide Tynorphin with excellent precision (RMSD 1.1 Å) when considering the backbone. Regarding the whole peptide, it appears that the side-chain of the tyrosine residue is misplaced compared to the crystallographic position, where it interacts with F443 of human DPPIII. In this case, the phenol interacts with H450 in a buried part of the cavity, and makes a hydrogen bond with I390 (see Figure S1). The positions of the others residues are well superimposed with the residues of the crystal structure. SP method also suggests a correct solution (RMSD 1.4 Å), but it provides the wrong placement for the last tryptophan residue. The position of its side chain and the C-terminus are significantly shifted, pointing toward I672 (see Figure S1). Moreover, this residue seems to be critical for the interaction, since it forms two cation-π interactions with R669 and K670 of the receptor. SP-Peptide provides an alternative placement for the side chain, which interacts with R669 in a cation-π interaction, and a hydrophobic interaction with the phenyl ring of F109 of the protein. This alternative orientation of the tryptophan does not seem in the most favorable position, taking into account the interactions of residues in the crystal structure. On the contrary, SP protocol and particularly IRDL method correctly retrieve these specific interactions with R669 and K670.
Importantly, this tryptophan residue is included in the first fragment used by IRDL, meaning that this residue is in the first segment, docked with SP-peptide protocol. We hypothesize that the low number of rotatable bonds in the segment, compared to the whole peptide, allows SP-peptide to better dock this residue in the cavity and to find a conformation more similar to the crystal structure. Thus, it seems that in this particular case, splitting the whole peptide into segments induces a better exploration of the conformational space, and allows the docking algorithm to find the correct binding mode of the fragment more easily.
In the case of PDB ID 1OSZ, RMSD values for the pose with the lowest RMSD to the X-ray structure show that our method suggests an excellent solution that is very close to the X-ray structure of the 8-residue mutated vesicular stomatitis virus (VSV8 V4L) peptide. The deviation shows subangstrom accuracy regarding the backbone (RMSD of 0.6 Å) or the whole peptide (RMSD of 0.9 Å). For comparison, SP-peptide provides a reasonable docking solution of 1.9 and 3.1 Å for the backbone and the whole peptide, respectively, while SP method is not able to retrieve a solution below 6 Å. Regarding the structural details, IRDL is able to retrieve all major protein-peptide interactions present in the crystal structure ( Figure 3). SP-Peptide protocol retrieves the atomic positions of the backbone with a good accuracy, but the orientation of the side chain of the docked tyrosine at the fifth position (Y5) is poorly predicted, since this residue interacts with E152 of class I MHC H-2 Kb instead of with hydrophobic residues V9 and V97. Consequently, Q6 residue of the peptide, which interacts with E152 in the crystal structure, is also mispredicted and points toward the bulk solvent. Regarding the docked solution obtained by SP, the peptide position is strongly deviated compared to the crystal structure.
The same observation can be made for the case of the PDB ID 3MMG complex, where the mutated Tobacco vein mottling virus (TVMV) is bound to canonical peptide substrates of 8 residues. IRDL protocol proposes the closest solution to the X-ray structure, with an RMSD of 0.8 Å considering the backbone. Only the C-terminus residue of the peptide is misplaced (E2), in which the position of the glutamate moiety is shifted, interacting with H171 of the protein instead of pointing toward the bulk solvent. SP-peptide also provides a correct docking solution, even if glutamine Q7 takes the position of arginine R5 in the crystallographic structure. The remaining part of the peptide is slightly shifted but the overall conformation is in good agreement with the crystallographic structure, with an RMSD of 2.6 Å. Concerning the SP method, despite a correct prediction of the position of the first residues, it appears that the residues F6 to D9 are located outside the protein cavity.
To summarize, this novel IRDL method is always able to retrieve a correct docking pose with RMSD below 2 Å or 3 Å compared to the crystal structure, considering the backbone or the whole peptide, respectively. Even if the top scoring pose is not always accurate (as also observed for SP and SP-Peptide), our protocol is able to retrieve a correct pose for all systems, which is not systematically the case for the two other algorithms. These results show the good efficiency of IRDL method in retrieving the most similar binding mode compared to crystal structures.

Datasets
Peptide-protein systems were downloaded from the PDB database (www.rcsb.org) [43]. Several criteria have been considered for system selection: only complexes containing peptides interacting with proteins are kept, resolved by X-ray crystallography method, and with a resolution below 3 Å. Structures have only natural residues in their sequence. Complexes in which the peptides are highly solvent-exposed and make numerous interactions with crystallographic water molecules were discarded in order to avoid additional complexity in evaluating IRDL. Visual inspection was performed to check the mode of interaction of the peptides. Finally, 11 structures were selected for this study (PDB ID: 1OSZ, 2D5W, 3CH8, 3DRF, 3MMG, 3T6B, 4GRV, 4GY0, 4NNM, 4Q6H, 4RXH). Details of these structures are presented in Table 2. As mentioned in Table 2, the 11 peptide sequences contain between 5 and 8 amino acids, and between 15 (PDB ID 3T6B) and 43 (PDB ID 4RXH) torsion angles. Peptide sequences are detailed in the supplementary materials (Table S1). One can observe that different types of peptides were used, covering a diversity in flexibility and peptide types (charged, uncharged, hydrophobic).

Receptor Preparation
When multiple subunits were present, monomer A was selected in all cases. The 11 receptors were prepared with the Maestro protein preparation wizard using default parameters [44,45]. No side-chain or residue is missing on the protein in the direct neighborhood of all the studied peptides at 5 Å of the protein. OPLS-2005 force field was used in this study. Grids were generated with Glide by selecting the bound peptide to define the cavity, encompassing the entire binding sites in all the studied peptide-protein complexes. Physiological pH was used for residue protonation.

SP and SP-Peptide Docking
All peptides were prepared with the same protocol: a conformational search was performed using MacroModel [46] in order to propose different starting points for docking calculation. Default parameters for the conformational search (Mixed Torsional or Low-mode sampling algorithm) were considered. All conformations generated by MacroModel (up to 200 conformations) were used as starting points for docking. This first step was used to avoid the bias during docking introduced by using the crystallographic coordinates as the starting point. SP and SP-Peptide modules from Glide 6.5 [39] were used to perform the docking of the 11 ligands with default parameters. During docking, the receptor is considered fixed, while the peptide is kept fully flexible.

File Preparation
We have developed a new methodology to dock peptides on proteins, using an intramolecular covalent docking approach. The objective is to split the whole peptides into several segments, to dock them individually and sequentially, and finally to link the segments together to form the original peptide using a covalent amide bond formation. Consequently, IRDL works in sequential steps (one step per segment), and this approach differs from classical docking, which treats the whole peptide at the same time. However, IRDL aims at solving the docking problem caused by the high degree of peptide flexibility. By fragmenting the peptides into a few segments, this approach reduces the number of rotatable bonds per molecule to dock. Therefore, the exploration of the conformational space for finding relevant conformers that can interact with the binding site is reduced. To reconstruct the peptide, segments were linked together by intramolecular covalent docking. In the covalent docking module of Glide [47], the calculation must be associated with a chemical reaction. A set of reactions is already included in the module, but the user can add new synthesis reactions in the database. Consequently, there is currently no limit on the reaction needed to perform the fragment growth. Here, we decided to choose a simple nucleophile addition/electrophile reaction, as shown in Figure 4, involving an acid chloride moiety on one of the fragments. This fragment (fragment n+1) will react with the N-terminal amine of fragment n in order to create the peptide bond. This means that the reconstruction process of the peptide within the binding site goes from C-terminus to Nterminus. Regarding the fragmentation of the peptides into short segments, we set up several rules to optimize the procedure: (a) Segments must have 10 or less rotatable bonds. This criterion allows simplification of each calculation by reducing flexibility, therefore improving the conformational search during docking.
(b) To avoid unfeasible covalent docking solutions, we considered that the N-terminal residue should not contain any nucleophilic sidechains. Thus, residues such as Tyr, Cys, Ser, Thr, Asn, and Gln were excluded as N-terminal residues in a fragment, since the covalent docking module of Glide considers those amino acids as nucleophilic moieties. Otherwise, when the molecular topology of the fragment is not conserved after intramolecular covalent docking, the solutions are removed during the post-process analysis (see below).
Once the peptides were fragmented, IRDL methodology was applied as follows: (1) After a conformational search with MacroModel of the first fragment, as performed for SP and SP-Peptide docking, all conformers were docked in the binding site using the SP-Peptide module of Glide with default parameters. Then, we kept all docking poses up to the pose with the lowest RMSD to crystal structure. In a more realistic project, one would keep all docking poses, which will increase computational time but could improve the results.
(2) The second segment is docked using the covalent docking module of Glide (version 1.3) on each of the selected scoring poses of the first fragment. To allow the chemical reaction to occur, an acid chloride moiety is used on the C-terminal residue of the second fragment. During docking, this acid chloride moiety is automatically placed nearby the primary amine backbone of the first docked fragment. Glide will then optimize the interactions of the molecule in the binding site, keeping the distance between the acid chloride and the primary amine backbone. Then, in order to create the peptide bond, the chemical reaction between the N-terminal part of the first segment and the acid chloride moiety is made. Finally, both residues at the vicinity of the newly formed bond are optimized to remove strains and the receptor stays rigid. Then, the resulting poses are clustered into 15 clusters and a representative pose is proposed for each cluster. All cluster centers are ranked by the scoring function of the Glide covalent docking module. This scoring function averages the Glide docking score values of the two individual noncovalent poses and the resulting covalent poses involved in the peptide bond. Finally, all solutions are rescored with SP scoring function. At this step, we also evaluated the extra precision (XP) scoring function for those intermediate rescoring steps, but the results were less satisfactory (data not shown). As in step one, we decided to keep the highest scoring poses up to the pose with the lowest RMSD to the X-ray structure.
(3) Step 2 is repeated until the whole peptide is entirely created.  3) Step (2) is repeated with all successive fragments until the whole peptide is created.

Post-process Analysis
All poses obtained from either classical docking or IRDL approach were compared to the crystallographic structure of the peptides. Analysis was performed by RMSD calculation between crystal structure and docked positions, considering only heavy atoms. For each linking step between two fragments, we kept all poses between the highest scoring pose ("best-scoring pose") (top 1) and the pose with the lowest RMSD to the X-ray structure ("best-RMSD pose"), in order to increase the number of fragment poses to process. The ranking of all fragments for each peptide is analyzed to obtain the combination of the fragments associated with the best-RSMD pose and the best-scoring pose. It is important to note that the knowledge of the best-RMSD pose against the conformation of the crystal structure of the peptide is not compulsory in our approach. The main idea here is to validate the proof of concept of our methodology by considering a large number of poses with reasonable calculation times. A cut-off based on a scoring value could be applied at each step; as an example, it is possible to obtain a satisfying pose by considering the top 10 poses at each step (see Supplementary Materials, Table S3). The rescoring procedure is performed with the "score in place" method, in which the interaction energy is estimated by the Glide scoring function for the same conformation without modification of the coordinates of the peptide. Finally, all solutions obtained for all protocols were rescored with the XP scoring function, in order to improve the interaction score between the receptor and the peptide. Final scoring values were then compared between the three docking protocols.

Conclusions
Peptide-protein interactions represent more than a quarter of global interactions in cells. They are involved in numerous important processes, such as signaling, cell development, and migration. In this study, we proposed a novel approach to predict the binding mode of peptides to proteins. The innovative methodology, following specific rules, is based on splitting the peptides into several segments in order to reduce the conformational search space during docking, an important issue in docking algorithms. The generated fragments are then docked and linked from the C-terminal to the N-terminal. The first residue is docked using the SP-peptide protocol of Glide. Then, the covalent module of Glide is used to dock the next segment, and following defined atomic distances between the two fragments, an intramolecular covalent bond is created to form a peptide bond. A nucleophilic addition/elimination reaction between acyl chloride from the C-terminal of one fragment to the amine of the N-terminal of the adjacent fragment is performed, and the chemical reaction will create a peptide bond. This step is repeated until the peptide is reconstructed. The Iterative Fragment Docking and Linking (IRDL) approach has been validated on 11 different protein-peptide complexes. Even if the poses with lowest RMSD are not always scored as the top scoring position, we observe that the protocol is always able to retrieve a correct pose for all complexes with a RMSD below 2 or 3 Å compared to the crystal structures, considering only the backbone or the whole peptide, respectively. Regarding the performance of docking, IRDL provides similar results to the two docking protocols implemented in Glide (SP and SP-peptide). Each tool provides correct results for some specific complexes. In addition, errors on the position of either the backbone or the sidechain are more acceptable in the case of IRDL protocol, in contrast to the SP, and to a lesser extent, SP-peptide protocols that, in some cases, have a deviation higher than 5 Å, considering the pose with the lowest RMSD to the X-ray structure. We show that IRDL provides a reasonable accuracy in predicting the binding mode of a peptide on the protein surface.
The IRDL approach can be applied on any peptide-protein complexes harboring various binding sites, including a deep cavity or a solvent exposed surface. In addition, we provide here a proof of concept of the efficiency of the protocol to retrieve with good precision the correct structure of peptide-protein complexes. This new tool can already be used to design synthetic flexible peptide binding to a protein target. Nevertheless, some improvements will be implemented to the IRDL protocol in the future to optimize its performance and to make it fully automatic. Currently, the protocol provides a peptide construction from the C-to the N-terminus of the peptide using a classical chemical reaction. This is an actual limitation, especially in the case of peptides interacting in cavities by their N-terminus. To avoid this restriction, we envisage evaluation of further IRDL protocols by constructing the peptides from the N-to the C-terminus. Furthermore, a scoring function designed for protein-peptide interactions [48,49] could be implemented in order to obtain a better ranking of correct solutions. This new scoring function could take into account, for example, the particular behavior of peptides regarding flexibility, hydrophobic properties, cation-π interactions, etc. One can also envisage the use of Molecular Mechanics/Generalized-Born Surface Area (MM/GBSA) or Molecular Dynamics (MD)/GBSA for scoring the whole peptide. In addition, the induced fit mechanism could improve the performance of the docking tools, but this approach is often time consuming.
We developed a novel and efficient approach to predict, with excellent accuracy compared to experimental data, the structure of protein-peptide complexes. This method could be useful to help in the design of new peptide therapeutics or peptidomimetics.

Supplementary Materials:
The following are available online. Table S1: Peptide and corresponding fragment sequences used in the study. Table S2: Top RMSD pose obtained from the three approaches. Table S3: Top scoring pose obtained for the three protocols after rescoring with XP. Figure S1: Detail of the top RMSD pose obtained with SP docking method and IRDL approach for PDB ID 3T6B.