This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Here is presented an investigation of the use of normal modes in protein-protein docking, both in theory and in practice. Upper limits of the ability of normal modes to capture the unbound to bound conformational change are calculated on a large test set, with particular focus on the binding interface, the subset of residues from which the binding energy is calculated. Further, the SwarmDock algorithm is presented, to demonstrate that the modelling of conformational change as a linear combination of normal modes is an effective method of modelling flexibility in protein-protein docking.

Protein-Protein interactions are fundamental to almost all biological processes. Disruption of molecular recognition is integral to many diseases including cancer. Docking attempts to predict the structure of complexes from their monomeric constituents. The computational approach has the potential to confirm or dismiss putative interactions as well as provide structural knowledge which can be exploited for the design of therapeutic interventions in a range of diseases, should a sufficiently accurate model be produced.

The docking problem presents two main challenges; the generation of structures and the discrimination between structures with a scoring function. In this study, the focus is on the first of these challenges, for which a number of approaches have been used. The simplest method of docking two structures is to treat them as rigid bodies, usually using the Fast Fourier Transform (FFT) technique [

Linear, harmonic vibrational motions around a single minima can be calculated using normal mode analysis. These motions may then be ordered by vibrational frequency. The lowest frequency eigenvectors correspond to the motions that can be excited with the least amount of energy and hence are the most accessible with thermal energy. The elastic network model simplifies the analysis, by only taking account of the shape of the molecule under analysis, an approximation which has shown to be capable of reproducing thermal B-factors at an atomistic [

Despite the approximation only being valid for small linear motions and ignoring multiple optima and solvent dampening, many protein motions resemble a single low energy normal mode [_{1} ATPase molecular motor, using RMSD as a guide [

Due to the reduced number of degrees of freedom, a linear combination of low frequency eigenvectors has been discussed as a method of modelling flexibility in docking algorithms [

Here, we investigate the ability of fine-grained, all-atom elastic network normal modes to analytically map various components of the unbound structures onto their equivalents in the bound structures using least squares linear regression and a large data set of 236 conformational changes upon binding [_{α}

During docking, concerted change in conformation, position and orientation are desirable. SwarmDock, a memetic docking algorithm in which the translational, orientational and conformational degrees of freedom are simultaneously optimised using the Particle Swarm Optimisation (PSO) metaheuristic [

In the SwarmDock protocol, the PSO algorithm is used to optimise coefficients of a linear combination of Hessian eignevectors, position and orientation. Local and global, flexible, unbound-unbound docking is performed on a number of complexes. Whilst SwarmDock has only been implemented with a simplistic energy function, it can successfully dock flexible structures which undergo significant conformational changes upon binding. The success of the algorithm as a function of the number of soft modes included for flexible deformation of the binding partners, is reported. A brief overview of SwarmDock has been given elsewhere [

The intermolecular potential is determined by interacting residues, defined here as residues which have a non-hydrogen atom within 6 Å of a non-hydrogen atom on the binding partner. It is the ability of normal coordinates to capture the transition of these atoms that is most important for the purpose of docking. _{α}

However, when we compare the mean maximum overlap of the whole protein to that of the interface, we observe a decrease at all levels of resolution. Furthermore, while the global motion is best represented by one of the 5 lowest frequency modes (57–62% of the modes of maximum overlap), conformational change at the binding interface is not (26–29% of the modes of maximum overlap). A Kolmogorov-Smirnov test shows that there is no evidence against uniformity in the distribution of modes of maximum overlap for the interface C_{α}_{0.01} = 0.107). The same test for the global C_{α}_{0.01} = 0.107).

This analysis was extended to the the modes of maximum overlap for the lowest 500 normal modes, and a difference was seen between the motion of interface C_{α}

It has been shown that different crystal structures of the same protein can vary in their hinge bending angle, such as in the cubic and rhombic space group structures of the ligand free ribose binding protein [

The above analysis was repeated for modes calculated using RTB diagonalisation, and a comparison with modes derived from exact diagonalisation of the all-atom Hessian is shown in _{α}_{α}

While overlap gives a good measure of how close a single normal mode is to the unbound-bound transition, it is not possible to know which mode has the greatest overlap without already knowing the bound structure. Hence, the inclusion of a number of modes is preferable for use in a docking algorithm. Furthermore, using these modes in linear combination may be able to significantly enhance the ability to recapture the important conformational changes that occur upon binding beyond that achievable by considering single modes, while still significantly reducing the search space compared to Cartesian or internal co-ordinates. In order to investigate the contribution of multiple modes, the unbound-bound transitions were decomposed into linear combinations of normal modes. It should be noted that, in terms of representing dynamics with a linear combination of normal modes, phase angles and amplitudes of motion must be considered. However, the purpose of flexible docking is not to model protein dynamics, but to model the conformational change that occurs upon binding. Hence, least square fitting is an appropriate method of combining modes in this context, and it is also probable that the coefficients generated by least-square fitting correspond to a structure accessible by a combination of harmonic motions.

Linear least squares regression of unbound structures onto bound structures using a set of low frequency normal modes as a basis were performed. This yields the coefficients for the structure, of the infinite structures which can be generated with those normal modes as a basis, that minimises the RMSD against the bound form. Just as a certain subset of atoms, such as C_{α}

Generally, significant improvements in RMSD are observed in a subset of the modes considered, with other modes offering only a smaller improvement. For each protein, the mode which gives the greatest decrease in RMSD is the mode of greatest influence. Using twenty modes, the mean mode of greatest improvement for the whole fold is 6.8 and 5.0 when fitting the C_{α}

_{α}

A more detailed breakdown of mode contributions for 30 complexes of high flexibility, as calculated using 20 modes, is shown in

It is interesting to note, however, that despite the differences between the global and local change, the predicted flexibility measure found by Dobbins _{α}

In solution, the unbound proteins and the complex adopt an ensemble of conformational states. Much debate has focussed on the degree of overlap between the conformational ensembles of the binding partners unbound and in complex. At one end of the spectrum lies the ‘conformational sorting and population shift’ mechanism, in which members of the bound state ensemble are accessible by the unbound protein. At the other end lies the ‘induced fit’ hypothesis, in which bound conformations are essentially inaccessible by the unbound protein, but stabilised by the presence of the binding partner [

Global docking of four small complexes was done using up to 40 normal modes in both the receptor and the ligand, as calculated using the RTB method. For each complex, the SwarmDock algorithm was run 480 times at points spaced evenly around the receptor (4 times at each of 120 points surrounding the receptor). The rank and I RMSD are reported in

Local docking was performed for four high-flexibility models which undergo significant conformational change upon binding. In these docking runs, the search is focussed on the binding site region by performing more runs from the starting positions near the binding site and removing the starting positions away from the binding site. Of 120 positions evenly spaced around the receptor, the algorithm was run 8 times from the 10 positions nearest the binding site. For these complexes, 1EER, 1GRN, 1K5D and 1KKL, interface C_{α}

For many, but not all, of the false positives, the cluster size is small and they are not found consistently. However, the correct binding site is usually found multiple time during the docking runs. Consideration of the mechanism of the algorithm and the nature of biological interfaces offers a potential explanation. Structures near to the native complex also have low energies and, as a rule of thumb, poses further away from the binding site tend to have higher energies, resulting in an energy funnel surrounding the native complex. This is essential for the evolution of the interface, as it overcomes the binding equivalent of Levinthal’s paradox. SwarmDock is very different from other docking methods which rely upon finding low-energy structures by combining the results of many single independent trajectories through search space, or by filtering a list of putative structures using FFT correlations, geometric hashing or other scoring functions. Instead, communication between members of the swarm can result in, as an emergent property of the system, switching between exploration of diffuse regions of search space, or exploitation of narrower regions containing numerous positions which correspond to lower energy structures, depending on the nature of the energy landscape. SwarmDock takes advantage of a correlated energy landscape surrounding the true binding site, as low energy positions found by members of the swarm act as an attractor for a subset of the swarm. Further, the equation governing the velocities of the members of the swarm has a distance dependent repulsion term which acts against the contraction of a diffuse swarm, but has less effect on the contraction of a swarm which is focussing in on a particular region containing numerous low energy structures, such as the true biological interface. Indeed, previous work has shown that the mean Euclidean distance between the translational components of the particles (corresponding to the centres of mass of the ligands), decreases significantly earlier for SwarmDock runs which find the binding site compared to those that do not (data not published), indicating that fewer iterations are required to form a consensus in the population, and the presence of a wider binding funnel than for false positive sites - this information may be used in the future as part of a post-docking re-scoring step. It can now be seen that the inconsistent discovery of low energy false poses may not be because the number of starting positions is insufficient to find all low-energy poses, but because these poses are not surrounded by a wide binding funnel and hence not scrutinised by the swarm. Further information can be gleaned from the cluster size of the true positive cluster.

As well as the examples given here, the SwarmDock algorithm has also been used in rounds 16–19 of the CAPRI blind docking trials. In the CAPRI experiment, the protein docking community test their algorithms blind. Target predictions are collected, the experimental bound structure revealed, and the predictions are ranked based on their quality. SwarmDock was able to successfully produce the bound structure of three target complexes: targets 37, 40 and 41. For these targets, semi-local docking was performed by eliminating certain regions around the receptor protein as potential binding sites, as described previously [

Conformational change upon complexation has been studied with the construction of docking algorithms in mind. The capability of fine-grained elastic network models to capture unbound to bound transitions was quantified by assessing the ability of normal modes to move all atoms, backbone (C, O, N, C_{α}_{α}

The effect of the RTB method was also investigated. While this technique is shown to be less accurate for predicting conformational change across the whole fold, it has a greater ability to model conformational change at the interface. For this reason, and due to its computational efficiency, this method makes a better choice for generating pre-calculated modes for docking. Fine-grained RTB normal modes were calculated and used in a novel docking algorithm, SwarmDock. SwarmDock is the first protein-protein docking algorithm based on particle swarm optimization, and was developed for flexible docking using normal modes as a method of modeling conformational change. The algorithm works well on the test cases presented here and in the blind CAPRI experiment. Further, RMSD and rank of docked structures can be demonstrably improved by the inclusion of normal modes, even for difficult cases and there is evidence to suggest that the true binding funnel is both widened and deepened, relative to false positive binding funnels, as higher modes are included. This demonstrates that the algorithm can effectively exploit the conformational freedom given by a set of low frequency modes. Further work, focussing on an improved scoring function and post-docking refinement and filtering, should further the ability for the SwarmDock protocol to generate potential bound structures and discriminate biological interfaces from artifacts.

In normal mode analysis, the potential energy basin is approximated as a harmonic well, allowing analytical solution to the equations of motion. Using a typical molecular mechanics energy function requires lengthy minimisation to ensure that the minima is found. To avoid this difficulty, we use the elastic network model (ENM) [

Where the force constant, _{AB}_{e,AB}

In the building block (RTB) technique for diagonalising the Hessian matrix, atoms are grouped into blocks, usually of one residue or more, which can translate and rotate as a rigid unit. The all-atom Hessian is projected into a block translation and rotation subspace by the use of a projection matrix. The projected Hessian is then diagonalised, yielding the vibrational frequencies and eigenvectors [

The overlap value, _{j}^{th}

Where _{i}^{b}_{i}^{u}^{th}_{ij}^{th}^{th}_{max}

The 124 complexes in the Protein-Protein docking benchmark v3.0 were used [

This section describes the method used to determine the extent of deformation along each normal mode in order to generate the closest structure to the bound state from the unbound state. When fitting unbound structures to the bound structures using normal coordinates as a basis, a one-to-one correspondence of atoms is necessary. For some complexes, the bound and unbound structures have this correspondence. For the remainder, sequences of the two structures were aligned and non-matching residues were ignored in the mapping, but not in the construction of the Hessian matrix. When considering subsets of the structure, such as interface or backbone only, all atoms not in that subset are also ignored, leaving the _{j}_{=1}^{m}_{j}_{ij}

As the sum squared residuals are quadratic with respect to the coefficients, its derivative with respect to the coefficients is linear, with one unique solution which, as the square residuals are convex, correspond to a minimum.

Setting the gradient to zero,

which can be arranged to a series of

Written in matrix form and inversion shows how the analytically fit coefficients,

Fitting was done for all 136 unbound-bound transitions in the data set, an example from which is shown in

The SwarmDock algorithm is a novel iterative population based mimetic algorithm. Each member of the population, _{i}_{i}

The search space is composed of the ligand centre of mass (3 translational dimensions) and a quaternion representation of ligand orientation (4 orientational dimensions). The receptor is kept fixed at the origin. _{r}_{l}_{r}_{l}_{i}_{r}_{l}

SwarmDock is run at approximately evenly spaced positions surrounding the receptor, generated using a point distribution algorithm explained elsewhere [

The energy function is evaluated for all members of the initial population, then the velocities and positions are updated using the following transition functions, a variation on the well studied PSO equations [

where _{1}_{,i}_{2}_{,i}_{3}_{,i}_{1} and _{2} are respectively known as the cognitive and social aspect, both set to 2.05. The cognitive aspect is the propensity for the particle to move toward the lowest energy position it has previously experienced. The social aspect is the degree to which the particle moves toward the lowest energy position found by any particle in the neighbourhood of the particle being updated. _{i}_{n,i}_{rand} is the position of a randomly selected particle.

Note that the use of the word neighbourhood has a specific, non-colloquial meaning, as it appears in the PSO literature. A particle is deemed to be in the neighbourhood of particle

To avoid explosions and control the exploration/exploitation trade-off, velocity clamping is used to impose a limit on the distance the particles can move in any one iteration:

where _{ij}_{max}

At each iteration, the lowest energy particle in the swarm undergoes a local search step which is based on that of Solis and Wets [

The algorithm works as follows. For each dimension in search space, _{j}_{j}

If the success or fail counters reach five, then the the step size is expanded or contracted, by doubling or halving the standard deviations, _{j}

After a set number of iterations, the lowest energy complex found is returned. The results for all runs are clustered. The lowest energy structure forms the first cluster. In ascending order of energy, all subsequent structures are added to a cluster if they are within 2.5 Å RMSD of the first structure in any existing cluster, otherwise they form a new cluster. A ranked list of structures is returned, with each structure corresponding to the lowest energy member of the cluster.

The analysis of the quality of docked poses uses interface RMSD, a metric that is used in the community-wide CAPRI experiment [

A simple energy function is employed in the docking algorithm, composed of a van der Waals term and a Coulombic term between _{on}_{off}

All parameters are taken from the Charmm19 force field [

The authors would like to thank Marcin Krol, Ozge Kurkcuoglu and the members of the biomolecular modelling laboratory for their helpful comments. This work was funded by Cancer Research UK.

^{2+}-ATPase

Mean Maximum overlap across the whole fold and the interface, for all atoms, backbone atoms and C_{α}

Maximum overlap and respective mode for (A) interface residues and (B) the whole fold.

Complex 2VIS, of influenza hemagglutinin (purple) and murine IgG1,

Mean percentage reduction in RMSD as a function of the number of modes used.

Percent reduction in RMSD upon analytical fitting of C_{α}_{α}

Percentage recovery of initial RMSD for 30 structures of greatest initial fold RMSD, ordered from left to right in descending order of initial RMSD, using all atoms in the fine-grained elastic network model. Effect of inclusion of 1 to 20 modes is shown for both the interface and the whole fold. Structures included are (1) 1IRA_r; (2) 1H1V_l; (3) 1Y64_r; (4) 1FAK_r; (5) 2VIS_r; (6) 1R8S_r; (7) 1IBR_r; (8) 1EER_r; (9) 1FC2_r; (10) 2FD6_l; (11) 2C0L_l; (12) 1FQ1_l; (13) 1JMO_r; (14) 1BKD_l; (15) 1GPW_r; (16) 1I2M_r; (17) 1YVB_l; (18) 2AJF_l; (19) 1NW9_r; (20) 1E4K_r; (21) 1IBR_l; (22) 2CFH_l; (23) 1HIA_l; (24) 2OT3_l; (25) 1KKL_r; (26) 1PXV_r; (27) 1KTZ_r; (28) 1BKD_r; (29) 1EER l and (30) 1IB1 r.

Docking results and linear regression for global docking (A–D) and local docking (E–H). I RMSD is the lowest value found in the docking run.

Energy of lowest energy true positive structure and mean energy of the lowest energy member of the 5 lowest energy false positive clusters.

The complex, 1KXP, between actin (purple) and vitamin D binding protein (VDBP) in the bound (green) and unbound (red) conformations. A fitted structure (blue), obtained by linear regression to the VDBP unbound to bound transition, using 20 normal modes as a basis, has a C_{α}

An overview of the SwarmDock algorithm.

The mean cluster size for the correct binding site (I RMSD

Complex | Modes: 1–10 | 11–20 | 21–30 | 31–40 |
---|---|---|---|---|

1AY7 | 11.2 | 12.3 | 14.3 | |

1GCQ | 1.2 | 1.9 | 2.6 | |

1E6J | 10.0 | 9.9 | 11.0 | |

1TMQ | 2.1 | 2.9 | 2.9 | |

1EER | 3.6 | 3.1 | 5.3 | |

1K5D | 5.9 | 6.9 | 6.6 | |

1GRN | 26.6 | 28.7 | 27.3 | |

1KKL | 2.8 | 2.9 | 3.9 |