Exact and Effective Pair-Wise Potential for Protein-Ligand Interactions Obtained from a Semiempirical Energy Partition

In this work, the partition method introduced by Carvalho and Melo was used to study the complex between Cucurbita maxima trypsin inhibitor (CMTI-I) and glycerol at the AM1 level. An effective potential, combining non-bonding and polarization plus charge transfer (PLCT) terms, was introduced to evaluate the magnitude of the interaction between each amino acid and the ligand. In this case study, the nonbonding–PLCT non-compensation characterizes the stabilization energy of the association process in study. The main residues (Gly29, Cys3 and Arg5) with net attractive effects and Arg1 (with a net repulsive effect), responsible by the stability of protein-ligand complex, are associated with large nonbonding energies non-compensated by PLCT effects. The results obtained enable us to conclude that the present decomposition scheme can be used for understanding the cohesive phenomena in proteins.


Introduction
Computational methods are of great interest to evaluate binding affinities between proteins and ligands, with many applications in structure-based drug design (SBDD) [1]. A complete description of the correspondent molecular interactions, including the short-range polarization plus charge transfer (PLCT) effects, can only be carried out at a quantum mechanics (QM) level. However, the more accurate QM methods require very large computational resources and this has limited high-level theoretical studies in this field [2]. Current methods using classical potentials can only represent a good approximation for evaluating the nonbonding protein-ligand interactions, because they are usually designed to treat the QM effects in an average manner. For this purpose, QM methods have been currently used to parameterize force fields [3][4][5][6][7] and scoring functions [8]. This has enabled computationally intensive SBDD studies at lower theoretical levels. The hybrid quantum mechanics/molecular mechanics (QM/MM) methods are an interesting alternative approach for this type of problems, because they conjugate an appropriate description of the molecular system with moderate computational resources [9,10]. Good QM/MM methods require a rigorous partition of the molecular systems into two regions: a strongly perturbed region that should be described at a quantum level and a bulk region whose interactions can be reproduced by classical potentials. In this context, the energy partition schemes [11][12][13] can provide rational criterions to define these regions. In vitro methods and mathematical models for enzymatic reactions [14][15][16] are the prototype for evaluate kinetics and binding affinities between proteins and ligands.
In this work, the application of the SemiEmpirical Energy Based (SEEB) partition method introduced by Carvalho and Melo [17] to study protein-ligand association processes was analyzed. This method enables the stabilization energy decomposition both into physically meaningful and spatial components. As this formalism was developed at a semiempirical quantum level, it enables also the complete separability of these components. Here, the SEEB formalism was extended to describe protein-ligand interactions using a pair-wise potential. The SEEB method was then used to study the association between the Curcubita maxima trypsin inhibitor (CMTI-I) and glycerol. CMTI-I is well known by its biological importance [18,19]. Glycerol is a cryoprotectant [20], which should be washed away with solvent in the crystallization process. However, it forms a stable complex with CMTI-I that is detected in the crystallized structure. In this context, glycerol should be considered to have a large affinity to CMTI-I and this study can provide a further insight for a rational modeling of high-specific ligands for this protein.

Methods
A non-covalent (no) association between a protein (P) with n amino acid residues and a ligand (L) can be represented by equation (1): A general association process can be described by a hypothetical mechanism involving two steps (see Figure 1). In the first step, the molecular monomers (P and L) are rearranged assuming the geometries adopted in the dimer.
In the second step, the rearranged species (P(rearr) and L(rearr)) associate each other preserving their geometries and originating the dimer (P:L). According to the SEEB formalism, the stabilization energy can be partitioned as: (2) Figure 1. Hypothetical mechanism for a non-covalent association between a protein (P) and a ligand (L), which enables the application of SEEB formalism to partition the associated stabilization energy into physically meaningful components. (1) In equation (2) In equations (2) to (4), the superscript (no) indicates that the P:L association is of non covalent nature. The nomenclature used for binding states is presented elsewhere [17]. The exact definition of the terms occurring in equation (3) and (4) is presented in appendix A.
An effective pair-wise potential, combining non-bonding and PLCT contributions, is proposed in this work: In this context, the equation (4) can be rewritten as: The transformation of the exact interaction energy decomposition (equations (2) to (3)) into the effective pair-additive expression (5) is developed and justified in Appendix B.
In consistency with the SEEB formalism, the stabilization energy associated with step 2 can be also partitioned into strongly perturbed ) ( For this purpose, the amino acid residues have to be divided between these two regions. where q X is the charge of atom X, q Y is the charge of atom Y, C X,Y and D X,Y are de van der Waals parameters associated with atoms X and Y, and r XY is the distance between the same atoms. In equation (14), N A and M A are respectively the first and the last atoms of amino acid A. In the same equation, N L and M L have the same meaning for the ligand. In this work, the AMBER99 force field [21] was used to parameterize the bulk terms (14) and the atomic point charges (q X and q Y ) were calculated using both Mulliken [22] and Merz-Kollman [23] schemes. On the other hand, the effective interaction energy between a residue (C) included in the strongly perturbed region and the ligand has to be calculated at a quantum level.
In this work, the association of CMTI-I and glycerol was studied at an semiempirical level [26], AM1, using the SEEB modified formalism described above. The initial structure of CMTI-I-glycerol complex was obtained from X-ray crystallography with 1.03 Å resolution [24] and can be found in the Protein Data Bank (PDB, 2004) with the reference 1LU0. The geometries of all species (both monomers and complex) were optimized using the MOPAC2002 package [25] in a Pentium 4 computer.

Discussion
The physically meaningful components of the complex CMTI-I glycerol stabilization energy, obtained using the modified SEEB formalism, are presented in Table 1. The nonbonding term is the dominant component for the stabilization energy. However, the PLCT and the conformational rearrangements components have an important corrective effect.  The nonbonding interaction energies between the amino acids and the ligand are presented in Figure 2.
Five residues (Arg1, Cys3, Arg5, Cys28 and Glu29) have the most relevant contributions, which correspond to absolute values larger than 10 kJ mol -1 . Three residues (Cys3, Arg5 and Cys28) are involved in specific hydrogen bonds with the hydroxyl groups of glycerol (see Figure 3). Arg5 is of particular importance, because this residue establishes two hydrogen bonds of this type and is responsible for approximately 57 % of the nonbonding energy. The two terminal residues (Arg1 and Glu29) are connected by an internal hydrogen bond involving the guanidinium (Arg1) and carboxylate (Glu29) groups.
In the complex in study, the most electropositive groups of glycerol are oriented in the opposite direction of this guanidinium-carboxylate salt-bridge (see Figures 3 and 4). Therefore, the attractive (Glu29) and repulsive (Arg1) contributions of these residues can be explained by non-specific electrostatic interactions with the ligand. The PLCT energy is represented in Figure 5, as a sum of intra and inter-fragment terms (see equation A9).    Residue addictive effective PLCT terms, defined according to equations (B1) and (B2), are presented in Figure 6. To obtain a bulk region in consistency with the requirements presented in the previous section, all the residues having effective PLCT contributions larger in absolute value than 1.0 kJ mol -1 and/or a nonbonding interaction with ligand larger in absolute value larger than 3.0 kJ mol -1 were included in the strongly perturbed region. Twelve residues (Arg1, Val2, Cys3, Pro4, Arg5, Ile6, Asp13, Glu19, Cys20, Tyr27, Cys28 and Glu29) satisfied this requirement.  The global contributions of the amino acid residues for the stability of the CMTI-I:glycerol complex was evaluated using the effective pair-wise potential introduced in the previous section. The results obtained are presented in Figure 7. Including these corrective effects, only four residues (Arg1, Cys3, Arg5 and Glu29) were verified to have significant effective contributions (larger than 10 kJ mol -1 ) for the stabilization energy. A fifth residue (Cys28) had been previously identified as relevant, because its nonbonding interaction energy with ligand is markedly negative (-15.2 kJ mol -1 ). However, this residue has an opposite effective PLCT energy of 15.8 kJ mol -1 and its overall contribution is near null.
The partition into the strongly perturbed and bulk components is presented in Table 2. These components were evaluated using quantum and classical formalisms. The results obtained enable us to conclude that the interaction energy of bulk region with ligand is well-reproduced by classical potentials, when Merz-Kollman charges are used.

Conclusions
Using the SEEB formalism, the association of the CMTI-I with glycerol was analysed. The stability of the correspondent complex can be mostly associated with four residues (Arg1, Cys3, Arg5 and Glu29). The residues are also divided between a strongly perturbed and a bulk region. This spatial partition was carried out according to the appropriate requirements previously discussed. In fact, the correspondent short-range bulk component ) ( can be calculated at a molecular mechanics level. On the other hand, the strongly perturbed region should be described at an appropriate quantum level. The modified SEEB method, introduced in this work, enables the description of a protein-ligand association process in terms of a pair-wise interaction potential. This effective potential includes the nonbonding interaction between each pair and the correspondent PLCT correction associated with electronic rearrangement effects. The associated (amino acid-ligand) pair-wise energies can be assumed as physically meaningful components, which can provide an important contribution to better understanding of the protein-ligand association processes.

Detailed Description of the Physically Meaningful Decomposition
The physically meaningful decomposition, presented in equation (3) to (4) where the superscription ) (∞ represents a non-binding states where the fragments P and L are at infinite distance. These components are associated with the conformational energy rearrangements occurring in step 1 for the protein )) ( ( P E no Δ and the ligand )) ( ( L E no Δ respectively. From a physical point of view, each of these terms represents the energy cost for the correspondent specie (P or L) associated with the transition from the free optimized structure to the free rearranged structure which is the most appropriate for the P:L docking.
Consequently, the total conformational energy component is calculated as the sum of (A1) and (A2) terms: The PLCT and nonbonding components are associated with step 2, of the hypothetical mechanism presented in Figure 1. In this step, the rearranged fragments are docked, preserving these internal geometries P(rearr) and L (rearr), and originating the final optimized complex (P:L). In this sense, these components have a pure electronic nature, occurring without any modification of the intrafragment nuclear positions.
The PLCT components are then defined as: (A7) Each of these terms represents the electronic energy cost for the correspondent fragment, associated with transition of (P(rearr) or L(rearr)) species from the free rearranged state ) (∞ to the binding state (no) where the dimeric P:L specie is formed. These terms are originated by polarization plus charge transfer effects, associated with electronic density redistribution within each fragment (P(rear)) or L(rear)) and electronic charge transfer between these species. These components involve energy terms that exist both in free rearranged species (P(rearr)) or L(rearr)) and in the dimer (P:L). However, the associated values are not conserved between these two states due to the mentioned electronic effects. Each of these terms includes the intra-fragment kinetic electronic energy, the attraction energies between the electronic density of the fragment and its nuclei, and the repulsion energy associated with its electronic density. Considering that the protein (P) is constituted by n amino acids residues, its PLCT component can be partitioned into intra and inter residues terms: is the interaction energy between the amino acids A and the ligand (L) that only exists in the dimer. This term includes the attraction energies between the electronic density of A and the nuclei of L, the attraction energies between the electronic density of L and the nuclei of A, the repulsion energy between the electronic densities of A and L, and the repulsion energies between the nuclei of the two species.

Detailed description and justification of the pair-wise decomposition
The pair-wise decomposition, presented in equation (5)  For the non-covalent interactions, the simplification adopted in equations (B1) and (B2) is a natural criterion to divide the inter-residue interactions energies. For covalent interactions, this corresponds to a Mulliken-like approach that involves some degree of arbitrariness. However, in general, the methodology proposed here seems to be a reasonable approach to obtain effective PLCT terms.
An effective interaction potential between the ligand L and an amino acid (A) The mutual PLCT energy cost of two residues (A and L) is clearly correlated with the magnitude of the correspondent interaction. In fact, if the A-L interaction is extremely weak their mutual polarization effects can be neglected. On the other hand, if the amino acid A strongly interacts with L these polarization effects should be very important. Almost all PLCT effect induced on amino acid A is originated, directly or indirectly, by the interaction with the ligand. On the other hand, the PLCT effects in the ligand are induced by its interactions with the amino acids of the protein (P). In this context, the coefficient ) ( AL α can be considered as a good weighting factor to evaluate the degree of contribution of the A-L interaction to the PLCT ligand energy cost.