Energy Transfer as A Driving Force in Nucleic Acid–Protein Interactions

Many nucleic acid–protein structures have been resolved, though quantitative structure-activity relationship remains unclear in many cases. Thrombin complexes with G-quadruplex aptamers are striking examples of a lack of any correlation between affinity, interface organization, and other common parameters. Here, we tested the hypothesis that affinity of the aptamer–protein complex is determined with the capacity of the interface to dissipate energy of binding. Description and detailed analysis of 63 nucleic acid–protein structures discriminated peculiarities of high-affinity nucleic acid–protein complexes. The size of the amino acid sidechain in the interface was demonstrated to be the most significant parameter that correlates with affinity of aptamers. This observation could be explained in terms of need of efficient energy transfer from interacting residues. Application of energy dissipation theory provided an illustrative tool for estimation of efficiency of aptamer–protein complexes. These results are of great importance for a design of efficient aptamers.


Introduction
Proteins are macromolecules with a variety of functions in living cells. Protein functioning is supported by an ability to bind target specifically substances. Moreover, some proteins and enzymes catalyze further chemical conversion of bound substances. Several decades of intensive study of proteins have yielded large databases of structures of proteins and their complexes, thermodynamics of binding and catalysis, as well as kinetic data [1][2][3][4][5]. Many successful attempts to explain how structural features of the protein affect thermodynamic and kinetic parameters of interactions with a target have been reported [6][7][8][9][10][11]. However, to date, there is no general concept that allows prediction of affinity to the target or rate of enzymatic catalysis. Empirical and semi-empirical algorithms with parametrization of each interaction are widely used in docking [12][13][14], but the empiric component inevitably leads to a limited range of ligands that can be described with a good predictive force. These limitations obviously reflect overestimation of selected interactions and underestimation of some other significant aspects of protein function. Here, we describe a further attempt to find a clear and intelligible explanation of affinity of proteins to their ligands that would have not only descriptive nature but also a predictive ability.
A striking example of non-understanding of high affinity of nucleic acid-protein complexes has been reported recently [15]. We analyzed a set of complexes of thrombin with its artificial nucleic acid ligands, DNA aptamers. Different aptamers bind the same site of the thrombin having 100-fold different affinities. Moreover, there was no general correlation between aptamer affinity and parameters of the interface, such as interface area, number of atoms in the interface, and number of polar contacts. Even more, a detailed analysis of polar contacts of the best ligand and the worst one revealed no significant differences. Similarly, there was no correlation between thermodynamic parameters of aptamer structure and its affinity to thrombin.
Expanding this specific dataset to all known nucleic acid aptamer-protein complexes, we also did not find a clear correlation between interface structure and aptamer affinity to the protein [16]. These results stimulated us to search for a reason for high affinity in kinetic processes during complex formation instead of just comparing the initial and final states of the molecules. We speculated that it is the energy released during the first steps of ligand binding that could unfold the interacting molecules destroying the intermediate complex before the final complex formation [15]. If this is the case, a dissipation of this binding energy from the interacting residues will enhance the rate of complex formation, and therefore increase the affinity.
We made a further attempt to find a structure-affinity relationship for all nucleic acid-protein complexes available in the databases. The key processes that mediate energy transfer can be the following: (1) changes in H-bonds with water near the interface; (2) conformational rearrangements of the protein and the aptamer; and (3) redistribution of binding energy via residues with large sidechains that are located near the interface. In this work, amino acids from the interfaces have been thoroughly annotated and analyzed.

Complexes of Proteins with Nucleic Acid Aptamers
The dataset contained nucleic acid aptamer-protein complexes extracted from the Protein Data Bank [1] with the following criteria: (1) X-ray structures have a resolution less than 3 Å; (2) Only binary complexes (1 protein, 1 aptamer) were chosen to minimize allosteric effects; (3) Apparent equilibrium constants for the complex formation are known.
Thirty-five complexes were analyzed, of which thirteen were with the same protein, human thrombin (Table 1).
As for the nature of nucleic acid aptamers, DNA, RNA, modified DNA, and modified RNA aptamers were in the set. This set almost coincides with our previous work, where an explicit analysis of aptamer nature-affinity correlation was described [16]. Kinetic constants for complex association and dissociation were reported for several complexes and annotated in Table 1. Apparent equilibrium dissociation constants were recalculated into the changes in Gibbs free energy during binding using the equation ∆G b = RT × lnK d , where R is gas constant, and T is temperature of the binding assay.
Several approaches were applied to analyze the interfaces of aptamer-protein complexes, having selected differences in amino acids. These include: (1) Annotation of the amino acids that participate in polar contacts; (2) Annotation of the amino acids located within 4 Å vicinity to atoms that participate in polar contacts; (3) Annotation of the amino acids located within 4 Å vicinity to nucleotides that form 3 or more polar contacts (putative "hot spots").
All annotations are summarized in Table S1; derived values are summarized in Tables S2 and S3. As in our previous work [16], there was no correlation between changes in Gibbs free energy and other parameters for the whole dataset. However, when we split the dataset into a subset of G-quadruplex aptamers to thrombin and a subset with all others, the correlation became obvious (Table 2, Figure 1). Table 1. A list of aptamer-protein complexes and their parameters, including a host organism of protein, accession number in Protein data bank (PDB Id), resolution of the structure, numbers of nucleotides (#N) and amino acid (#AA) residues, change in Gibbs free energy during binding (∆G b ), kinetic constants of association (k a ), and dissociation (k d ). G-quadruplex aptamers to thrombin are shown in grey color. The values in brackets are references for kinetic constants.   Considering the total interface, i.e., the amino acids located within 4 Å vicinity to atoms that participate in polar contacts, G-quadruplex aptamers to thrombin formed a specific group that is located outside of diagonal distributions for other aptamers ( Figure 1A). Analyzing all other aptamers revealed that the least dispersion of the heterogeneous dataset was for the mean length of sidechain (Table 2, Figure 1A). Mean length of sidechain is a characteristic of amino acid size and Considering the total interface, i.e., the amino acids located within 4 Å vicinity to atoms that participate in polar contacts, G-quadruplex aptamers to thrombin formed a specific group that is located outside of diagonal distributions for other aptamers ( Figure 1A). Analyzing all other aptamers revealed that the least dispersion of the heterogeneous dataset was for the mean length of sidechain (Table 2, Figure 1A). Mean length of sidechain is a characteristic of amino acid size and volume. It was calculated as a mean number of atoms in Cα substituents of amino acids, excluding hydrogen atoms (i.e., number of C, N, O, and S). Positive correlation between this parameter and −∆G b means that large sidechains are much more common in complexes with high affinity. This effect cannot been attributed solely to high amount of aromatic amino acids ("lengths" parameters are in the range 7-10) or positively charged amino acids ("lengths" parameters are in the range 5-7), as is seen from correlation coefficients < 0.2 ( Table 2).

Aptamer
Diagonal distribution was characteristic for mean length parameters for other amino acid sets, namely, for amino acids making polar contacts ( Figure 1E), amino acids within 4 Å vicinity to "hot spots" ( Figure 1F), and mean number of aromatic or aliphatic carbons in amino acids in 4 Å vicinity of polar contacts ( Figure 1C). Recurring correlation between -∆G b and mean length parameter supports the speculation that affinity increases when the binding energy from the reactive residues is dissipated. "Mean length" parameter is proportional to the mean volume of residues; the more residues that participate in energy distribution, the tighter the complex can be formed. The number of polar contacts and the total number of atoms had no correlation with −∆G b ( Figure 1B,D), in agreement with previous observations for aptamer-protein complexes [15,16], and indicating that nucleic acid-protein complexes are more than an interaction between two complimentary surfaces.
Kinetic constants of association and dissociation were described for 8 of 21 complexes from this subset. We analyzed this small dataset in more detail. The results were quite interesting (Table 3, Figure 2). Kinetic constant of association is correlated with the number of polar contacts only, whereas kinetic constant of dissociation is correlated with mean length of sidechain of amino acids making polar contacts and amino acids within 4 Å vicinity of the putative "hot spot". Thus, a large pattern of polar contacts provides fast complex formation, whereas the possibility to dissipate the binding energy from residues involved in these contacts supports the high stability of the complex. This suggestion was tested using the extended dataset of aptamers (Figure 3), where the complexes with the highest values of the above parameters were chosen (the parameters are the number of polar contacts, the mean length of sidechain of amino acids making polar contacts, the mean length of sidechain of amino acids within 4 Å vicinity of the putative "hot spot", and the total number of atoms in amino acids in the 4 Å vicinity of "hot spots"). As a result, 8 from 11 aptamers with ∆G b ≥ 50 kJ/mol met these criteria. Thus, for the first time the exact parameters of the interface that are critical for aptamer affinity were found and proved for the whole dataset.

Complexes of HTH-type Proteins with DNA Double Helixes
An interesting and well-studied object is complexes of HTH-type proteins with DNA double helixes. HTH-type proteins have a specific DNA binding motif: helix-turn-helix (HTH). They are a classical object of studying DNA binding and recognition. Therefore, comparing interfaces of complexes of proteins with artificial nucleic acid aptamers and natural DNA double helices is of great value. Criteria for this dataset were the following: (1) X-ray structures have a resolution less than 3 Å; (2) X-ray structure is for the whole protein, not a protein domain; (3) DNA has unmodified nucleotides only; (4) Apparent equilibrium constants for the complex are known.
The selected set has bacterial proteins, including mesophiles and one thermophile (G. stearothermophilus) ( Table 4). The size of proteins varied from 62 to 246 residues, and the typical size of the DNA duplex was about 25 base pairs. Kinetic constants for complex association and dissociation were reported for 2 complexes only. Apparent equilibrium dissociation constants were recalculated into changes in Gibbs free energy using the equation ∆G b = RT lnK d , where R is gas constant, and T is temperature of the binding assay. For one of the proteins, fis, 18 complexes with different DNA duplexes were described, including optimal and non-optimal ones. All these complexes were analyzed in the same way as aptamer-protein complexes described above (Tables S4 and S5). Table 4. A list of complexes of HTH-type proteins with DNA duplexes and their parameters, including protein name, host organism of protein, DNA sequence, accession number in Protein data bank (PDB Id), resolution of the structure, numbers of nucleotides (#N) and amino acid (#AA) residues, an apparent dissociation constant (K d ), a change in Gibbs free energy during binding (∆G b ), kinetic constants of association (k a ), and dissociation (k d ), if known.

Protein
Organism  The datasets of aptamer-protein and DNA helix-protein complexes have similar distributions, e.g., mean length of sidechain of amino acids within 4 Å vicinity of polar contacts versus −∆G b ( Figure 4A). This similarity reflects similar organization of the interfaces, but HTH-type proteins had no obvious diagonal distribution per se ( Figure 4A). Interesting results were obtained from analyzing optimal and non-optimal complexes of fis protein with different DNA duplexes ( Figure 4B). These complexes have very similar interfaces, but drastically different apparent dissociation constants in the range from 0.2 nM to 140 nM. The tightest complexes are located on the diagonal distribution of aptamer-protein complexes, whereas non-optimal complexes were on the left side of the distribution. Thus, diagonal distribution could be used as a measure of efficiency of complex formation.
The datasets of aptamer-protein and DNA helix-protein complexes have similar distributions, e.g., mean length of sidechain of amino acids within 4 Å vicinity of polar contacts versus −ΔGb ( Figure  4A). This similarity reflects similar organization of the interfaces, but HTH-type proteins had no obvious diagonal distribution per se ( Figure 4A). Interesting results were obtained from analyzing optimal and non-optimal complexes of fis protein with different DNA duplexes ( Figure 4B). These complexes have very similar interfaces, but drastically different apparent dissociation constants in the range from 0.2 nM to 140 nM. The tightest complexes are located on the diagonal distribution of aptamer-protein complexes, whereas non-optimal complexes were on the left side of the distribution. Thus, diagonal distribution could be used as a measure of efficiency of complex formation.

Discussion
In contemporary conception, water as a solvent is the most efficient receiver of the excessive energy during dissipation. The water arrangement of the protein is dynamic. It fluctuates due to thermal excitation of low-frequency modes, and hydrogen bonds are broken and reformed within roughly 1 ps [14]. As for protein complexes, a considerable part of the interface has no direct interactions with the solvent. Thus, the protein or its counterpart do participate in dissipation of energy from the polar contact-forming residues to solvent.
In extreme cases, such as in plant and algae photosystems, there are special proteins that mediate efficient dissipation of energy from the light-harvesting complexes [41][42][43]. This additional help becomes critical under high light exposure. For this case, the typical time for dissipation of energy is around 20 ps (τ1/2) [41]. As for non-assisted dissipation of energy, in silico calculations gave typical time scales in the range from 10 ps to 10 ns for single proteins [44,45], and time for energy transfer to the nearby residue is about 0.5 ps [46]. Comparing the time scales, experimental techniques revealed conformational rearrangement of DNA oligonucleotide to proceed during 8 µs and fast steps of protein folding during 90 µs [47].
Direct experimental study of dissipation of energy in proteins without unusual prosthetic groups is complicated due to inability to trace specific residues only. However, bioinformatic analysis provided clues of a role of energy dissipation in protein functioning. A bridge between affinity and capacity for information transmission was paved for DNA-protein complexes [48,49]. The dissipated energy for a bit of transmitted information is defined with the equation:

Discussion
In contemporary conception, water as a solvent is the most efficient receiver of the excessive energy during dissipation. The water arrangement of the protein is dynamic. It fluctuates due to thermal excitation of low-frequency modes, and hydrogen bonds are broken and reformed within roughly 1 ps [14]. As for protein complexes, a considerable part of the interface has no direct interactions with the solvent. Thus, the protein or its counterpart do participate in dissipation of energy from the polar contact-forming residues to solvent.
In extreme cases, such as in plant and algae photosystems, there are special proteins that mediate efficient dissipation of energy from the light-harvesting complexes [41][42][43]. This additional help becomes critical under high light exposure. For this case, the typical time for dissipation of energy is around 20 ps (τ 1/2 ) [41]. As for non-assisted dissipation of energy, in silico calculations gave typical time scales in the range from 10 ps to 10 ns for single proteins [44,45], and time for energy transfer to the nearby residue is about 0.5 ps [46]. Comparing the time scales, experimental techniques revealed conformational rearrangement of DNA oligonucleotide to proceed during 8 µs and fast steps of protein folding during 90 µs [47].
Direct experimental study of dissipation of energy in proteins without unusual prosthetic groups is complicated due to inability to trace specific residues only. However, bioinformatic analysis provided clues of a role of energy dissipation in protein functioning. A bridge between affinity and capacity for information transmission was paved for DNA-protein complexes [48,49]. The dissipated energy for a bit of transmitted information is defined with the equation: where P y is the dissipated energy and C y is an information transmitted. Shannon's channel capacity equation describes the transmitted information (C y ) connected with the bandwidth of the channel (d space ), the dissipated energy (P y ), and the thermal noise (N y ): The absolute efficient molecular machines dissipate a minimal quantity of energy for a bit of information that is determined with the following equation: where k B is Boltzmann constant and T is temperature. Thus, the efficiency of the molecular machine is as follows: In relation to DNA-protein complexes, the maximal efficiencies of protein binding were calculated to be no more than 70% [48,49]. We applied this theoretical background to our results. Equation (2) was transformed to: where L SC and N AA are mean length of sidechain and number of amino acids in 4 Å proximity to polar contacts, correspondingly; L SC N AA is an analogue of the bandwidth of the channel from Equation (2). N PC is the number of polar contacts; E PC is energy of one polar contact; N PC ·E PC represents a rough estimation of the energy that is to be dissipated (P y ). RT is a rough estimation of thermal noise (N y ), with R a gas constant and T as temperature. We used the following parameters: E PC = 6 kJ/mol (1/2 from "ideal" H-bond in protein) and T = 298 K. The parameter C y reflects energy transfer by the protein part of the interface; this parameter was calculated for aptamer-protein and DNA helix-protein complexes. C y ' values are listed in Tables S3 and S5. A question remains of how parameter C y is connected with −∆G b . Changes in Gibbs free energy during binding can be represented as a sum of energy of polar contacts (P y = N PC E PC ) and a summand W, which includes energy of dehydration, conformational rearrangement, and energy from other types of interactions: From Equations (1) and (6), it follows, that: Supposing some of the complexes to be the most efficient ( t = 70% according to [49]), for those complexes, the value ε can be replaced with a constant value according to Equation (4): Here, Boltzmann constant was replaced with gas constant as Gibbs energy values are used as kJ per mole; k is coefficient of proportionality. From Equation (8), it follows that changes in Gibbs free energy are in linear dependence from the capacity of energy dissipation (C y ), if the summand W is equal for different complexes. The summand W includes all changes in energetic state of the molecule other than polar contacts, and this summand varies significantly. The example with different complexes of fis protein ( Figure 5B) clearly shows that all non-optimal complexes locate at the left side from the line for optimal complexes, reflecting the high impact of energy consuming processes. Using this observation, we chose aptamer-protein complexes that located at the right side of the diagonal distribution ( Figure 5A). Seven dots can be approximated with a straight line (R 2 = 0.97) for the most efficient complexes, and all other dots are located left of the line with the single outlier. The outlier is the complex of aptamer SL5 with its protein target (the dot is in right bottom part of the Figure 5A) that is a perfect example of energy dissipation by nucleic acid component, which is discussed further in the text. diagonal distribution ( Figure 5A). Seven dots can be approximated with a straight line (R 2 = 0.97) for the most efficient complexes, and all other dots are located left of the line with the single outlier. The outlier is the complex of aptamer SL5 with its protein target (the dot is in right bottom part of the Figure 5A) that is a perfect example of energy dissipation by nucleic acid component, which is discussed further in the text. Using the representation Cy′ versus −ΔGb it is easier to compare different types of complexes, as in this case data for G-quadruplex aptamers with thrombin, other aptamer-protein complexes and HTH-type proteins with DNA duplexes are in the same range of values. Here the most efficient complexes are assumed to dissipate the energy from polar contacts without energy-consuming conformational changes. In the examples with high affinity, nucleic acids provide a complimentary surface (large numbers of polar contacts) with an appropriate protein site (with large amino acids in the interface).
The efficiency of the sub-optimal complex can be enhanced via modification of the aptamer. An excellent example is optimization of aptamer AF113-1 into AF113-18 that led to 15 kJ/mole increase in −ΔGb value (see the upper arrow in Figure 5A). Also, a clear example of efficiency improvement of the complex can be illustrated for aptamer HD1, which is to break 2 hydrogen bonds in a thyminepair during complex formation (roughly 12 kJ/mole); the dot for its complex is located 10.5 kJ/mole left from the linear dependence (see the bottom arrow in Figure 5A). The −ΔGb value was improved through manipulation of the aptamer structure only; the protein part of the interface was the same. The most efficient complexes have an additional duplex module tightly stacked to the G-quadruplex (RE31 and NU172) or just a long substituent in thymine from the thymine pair (T4K) that has no contact with a protein but is exposed to the solvent ( Figure S1). Impairment of the stacking between duplex and G-quadruplex modules or replacement of the long substituent with an aromatic anchor led to the decrease in −ΔGb value, respectively [15]. These tiny effects revealed that the nucleic acid component plays a significant role in energy dissipation, along with the protein component.
One more excellent example is the single outlier with the highest efficiency of the complex, aptamer SL5. The −ΔGb value for SL5 is 10 kJ/mole greater than for its counterpart SL4. The only difference between these two modified aptamers is the residue in 5′-position of dU8: isobutyl (4 Using the representation C y versus −∆G b it is easier to compare different types of complexes, as in this case data for G-quadruplex aptamers with thrombin, other aptamer-protein complexes and HTH-type proteins with DNA duplexes are in the same range of values. Here the most efficient complexes are assumed to dissipate the energy from polar contacts without energy-consuming conformational changes. In the examples with high affinity, nucleic acids provide a complimentary surface (large numbers of polar contacts) with an appropriate protein site (with large amino acids in the interface).
The efficiency of the sub-optimal complex can be enhanced via modification of the aptamer. An excellent example is optimization of aptamer AF113-1 into AF113-18 that led to 15 kJ/mole increase in −∆G b value (see the upper arrow in Figure 5A). Also, a clear example of efficiency improvement of the complex can be illustrated for aptamer HD1, which is to break 2 hydrogen bonds in a thymine-pair during complex formation (roughly 12 kJ/mole); the dot for its complex is located 10.5 kJ/mole left from the linear dependence (see the bottom arrow in Figure 5A). The −∆G b value was improved through manipulation of the aptamer structure only; the protein part of the interface was the same. The most efficient complexes have an additional duplex module tightly stacked to the G-quadruplex (RE31 and NU172) or just a long substituent in thymine from the thymine pair (T4K) that has no contact with a protein but is exposed to the solvent ( Figure S1). Impairment of the stacking between duplex and G-quadruplex modules or replacement of the long substituent with an aromatic anchor led to the decrease in −∆G b value, respectively [15]. These tiny effects revealed that the nucleic acid component plays a significant role in energy dissipation, along with the protein component.
One more excellent example is the single outlier with the highest efficiency of the complex, aptamer SL5. The −∆G b value for SL5 is 10 kJ/mole greater than for its counterpart SL4. The only difference between these two modified aptamers is the residue in 5 -position of dU8: isobutyl (4 carbon atoms) in SL4 and benzyl (7 carbon atoms) in SL5. The protein parts of the interfaces are the same, and aptamer conformations and thermal stability are the same [50]. The only difference is in the residues within 4Å vicinity of the "hot spot" residue, dU17. In the case of SL4, dU8 is not in contact with dU17; but in the case of SL5, benzyl substituent of dU8 does have contact with the "hot spot" residue ( Figure S2). This example clearly indicates robustness of hydrophobic modifications of nucleic acids for affinity improvement with a possible role in energy transfer.
Besides graphical representation, numerical estimation of the efficiency of the complex can be used. Referring to 70% as the limit of efficiency of DNA binding proteins [48,49], we assume the parameter t to be 0.7 for those 7 complexes that are located on the right edge of the C y vs. −∆G b distribution. Using parameter C y (Equation 5) and its dependence on −∆G b (Equation 8), the theoretically achievable changes in Gibbs free energy can be calculated for all DNA-protein and aptamer-protein complexes: where a and b are parameters from linearization of the most efficient complexes from Figure 5 where the coefficient 0.7 reflects the 70% limit of efficiency. The data are shown in Tables S3 and S5. It is clear from both numerical analysis (Tables S3 and S5) and graphical representation ( Figure 5) that many aptamer complexes and almost all natural complexes of HTH-type protein complexes are suboptimal and obviously can be improved to achieve ∆G lim b values. Moreover, these values can be exceeded if the aptamer "hot spots" are modified to enhance energy dissipation, as for SL5 or thrombin aptamers.

Materials and Methods
The structures were uploaded from RCSB PDB [1] and processed with Pymol software (v.1.74) (Schrödinger, Cambridge, MA, USA) [13]. The details of amino acid selection are provided in appropriate sections. Lengths of sidechain were calculated as numbers of atoms in Cα substituents of amino acids, excluding hydrogen atoms (i.e., number of C, N, O, and S). Numbers of carbon atoms were calculated as numbers of carbon atoms in Cα substituents of aromatic or aliphatic amino acids. Hydrogen atoms were not counted. The data treatment and figure construction were made in Origin 2015 (OriginLab, Northampton, MA, USA).

Conclusions
Detailed analysis and description of 63 protein structures discriminated peculiarities of high-affinity nucleic acid-protein complexes. The volume of the amino acid sidechain within the interface was demonstrated to be the most significant parameter that correlates with affinity of aptamers to proteins. This correlation could be explained in terms of need of efficient energy transfer. A parameter for estimation of the efficiency for nucleic acid-protein complexes was proposed. These results are of great interest both for understanding the fundamental principles of protein functioning and for design and improvement of efficient ligands, particularly nucleic acid aptamers.
Supplementary Materials: The following are available online. Table S1: Interfaces of aptamer-proteins complexes. Table S2: Parameters for the set of aptamer-protein complexes that were used to find correlations. Table S3: Parameters for the set of aptamer-protein complexes that were used to find correlations (continued). Table S4: Interfaces of HTH-type proteins complexed with DNA duplexes. Table S5: Parameters for HTH-type proteins complexed with DNA duplexes that were used to find correlations. Figure S1: Thrombin complexes with DNA aptamers. Figure S2: PDGFB complexes with modified DNA aptamers SL4 and SL5.
Funding: This research was funded by the Russian Science Foundation, grant number 18-74-10019.