Fine-Tuning of Sequence Speciﬁcity by Near Attack Conformations in Enzyme-Catalyzed Peptide Hydrolysis

: The catalytic role of near attack conformations (NACs), molecular states that lie on the pathway between the ground state (GS) and transition state (TS) of a chemical reaction, is not understood completely. Using a computational approach that combines Bürgi–Dunitz theory with all-atom molecular dynamics simulations, the role of NACs in catalyzing the ﬁrst stages of HIV-1 protease peptide hydrolysis was previously investigated using a substrate that represents the recognized SP1-NC cleavage site of the HIV-1 Gag polyprotein. NACs were found to confer no catalytic effect over the uncatalyzed reaction there ( ∆∆ G ‡ N ∼ 0 kcal/mol). Here, using the same approach, the role of NACs across multiple substrates that each represent a further recognized cleavage site is investigated. Overall rate enhancement varies by | ∆∆ G ‡ | ∼ 12–15 kcal/mol across this set, and although NACs contribute a small and approximately constant barrier to the uncatalyzed reaction (< ∆ G ‡ uN > = 4.3 ± 0.3 kcal/mol), they are found to contribute little signiﬁcant catalytic effect ( | Furthermore, no correlation is exhibited between NAC contributions and the overall energy barrier ( R 2 = 0.01). However, these small differences in catalyzed NAC contributions enable rates to match those required for the kinetic order of processing. Therefore, NACs may offer an alternative and subtle mode compared to non-NAC contributions for ﬁne-tuning reaction rates during complex evolutionary sequence selection processes—in this case across cleavable polyproteins whose constituents exhibit multiple functions during the virus life-cycle.


Introduction
According to Pauling [1], rate enhancement due to enzyme catalysis stems from the tighter binding of the transition state (TS) as compared to the ground state (GS) of a substrate undergoing chemical reaction. This is thermodynamically equivalent to a negative activation free energy difference (∆∆G ‡ ) between the catalyzed (∆G ‡c ) and the uncatalyzed (∆G ‡u ) reaction where: Activation rates are describable within a generalized transition state theory (TST) framework [2] which extends the original formulation of TST [3] such that the rate of a chemical reaction is defined as: Figure 1. Representation of a near attack conformation (NAC) in HIV-1 protease by a nucleophilic water molecule, as described previously [31]. NACs are characterized in terms of Bürgi-Dunitz criteria corresponding to a nucleophile attack angle (WAT nuc :O nuc -p1:C-p1:O) range (100 • ≤ α ≤ 110 • ) and distance (WAT nuc :O nuc -p1:C) threshold (d a ≤ 3.2 Å). The red dot represents the catalytic center (geometric center of p1:C, p1 :N, D25:CG and D25 :CG), d n represents the nearest water distance to the catalytic center. The ground state corresponds to a nucleophilic water molecule within the bound perimeter (dashed blue circle) in the catalytic site (d n ≤ d p n ).
The catalytic contribution of any component (X) of the activation barrier is then: Multiple sequence specificity, high selectivity, and finely-tuned rate variation are significant phenomena across biochemical regulation processes and peptide hydrolysis by HIV-1 protease is an excellent example of this. HIV-1 Gag and GagPol precursor polyproteins consist of multiple protein domains sequentially connected via peptide bonds at their inter-protein junctions. Gag consists of matrix (MA), capsid (CA), spacer peptide 1 (SP1), nucleocapsid (NC), spacer peptide 2 (SP2), and protein p6. Additionally, GagPol is synthesized via a -1 frameshift with a ratio of 1:20 compared to Gag [38], which allows continued sequential translation after NC into a trans-frame-region (TFR) and then precursor Pol that contains the viral protease (PR), reverse-transcriptase (RT and RH), and integrase (IN). The viral protease recognizes and cleaves multiple sequences within Gag and GagPol [39], notably at junctions: MA-CA, CA-SP1, SP1-NC, NC-SP2, SP2-p6, TFR-PR, PR-RT, RT-RH, and RH-IN.
Turnover number, k cat , has been experimentally studied widely for HIV-1 protease for various substrates that constitute these sequences-and shows a well-defined rate ordering that ranges over 100-fold from slowest to fastest [39]. Nonetheless, the molecular origin of such differences remains not completely understood.
It has been found that for the enzyme reaction catalyzed by HIV-1 protease bound to the SP1-NC (alternatively, termed p2-NC) octapeptide cleavage substrate and a catalytic water molecule [31] that NACs do not contribute to rate enhancement (∆∆G ‡c N ∼ 0 kcal/mol), whilst non-NAC contributions account for (∆∆G ‡c nN ∼ -15 kcal/mol). As transmission effects are minor (up to γ ∼10 3 ), electrostatic preorganization of the active site is likely the major component of rate enhancement in HIV-1 protease at this cleavage junction.
Nonetheless, the role of NAC thermodynamics in affecting catalytic rate changes amongst differentially recognized substrates has not hitherto been examined. NACs do pose a small but significant barrier in both the catalyzed and uncatalyzed reactions for the SP1-NC cleavage junction [31]. Therefore, it is hypothesized that by exhibiting variation across different recognized substrates NACs may play some role in controlling enzyme specificity through changes in k cat . This hypothesis is evaluated here by comparing experimentally determined and estimated ∆G ‡ with computed ∆G ‡ N and derived ∆G ‡ nN values across the range of nine above-mentioned cleavage junctions with varying k cat , through a previously established computational approach that makes use of explicit solvent molecular dynamics simulations to compute mole fractions of NAC-formation during peptide hydrolysis in both the enzyme-bound and enzyme-free substrate systems [31].

Differential Nucleophilic Water Binding
Computation of the free energy of NAC-formation from a ground state (GS) first requires a suitable definition of the GS for the nucleophilic water molecule. In order to define this, the distance (d n ) of the nearest water molecule to the catalytic center ( Figure 1) was calculated across the ensemble of MD simulations for each of the enzyme-bound substrate systems.
The density distributions of d n reveal mostly two distinct bell-shaped regions of non-zero density for each system, corresponding to a state where a nucleophilic water molecule is bound (NWB) in the catalytic site and to when a nucleophilic water molecule is unbound (NWU). The latter density peak is due to the presence of a previously characterized [31] structural non-nucleophilic water molecule that becomes the nearest water molecule to the catalytic center when the nucleophilic water molecule leaves the catalytic site ( Figure 2). These two regions are sharply partitioned by a region of near-zero density (dashed black line) for all systems. Thus, a nearest water density description within such a partition threshold (d p n ) is a suitable definition of the GS for a nucleophilic water molecule in each enzyme-substrate complex (Table 1)  However, the profiles of the distributions for different systems do vary. The MA-CA, CA-SP1, SP1-NC, NC-SP2, and RH-IN distributions share a similar profile: the NWB and NWU states are characterized by a single major peak between ∼1.5-3.5 Å and a single minor peak between ∼5-6.5 Å, respectively. The TFR-PR and RT-RH distributions exhibit a single broad peak at 2.7 Å and 3.1 Å respectively corresponding to the NWB state and one of similar density at 6.1 Å and 5.8 Å respectively, corresponding to NWU. The SP2-p6 system exhibits a single large NWB-peak at 2.4 Å but only trace densities of the NWU state-nonetheless, these constitute two very small peaks at 5.8 Å and 7.4 Å. Thus, for the SP2-p6 system, the nucleophilic water molecule almost never escapes the catalytic site, whilst there are also rare instances when there is neither a water molecule in the GS nor a structural water molecule between the ligand and the protease. Finally, the PR-RT system is the only one for which the NWU state has a significantly higher density than the NWB state. A minor NWB-peak is exhibited at 3.4 Å with an additional trace density peak at 1.8 Å. A much larger peak corresponding to the NWU state is exhibited at 6.3 Å.
However, care has to be taken when interpreting these binding free energies quantitatively, especially those with significant favorable or unfavorable values. This is because the frequency of reversible transitions between bound and unbound states varies across the system, implying that an approximate equilibrium ensemble may not be exhibited for every system. Therefore, a simple mole fraction of each state may not be a sufficient descriptor of the free energy of water binding. For example, the favorable ∆G nwb of CA-SP1, SP2-p6 and RH-IN systems are likely to be overestimated as unbinding of the nucleophilic water molecule is rare even with the 10 µs of aggregate sampling performed here. This is indicative of a significantly higher kinetic barrier for water dissociation, but unlikely to the extent that the thermodynamics would be unfavorable. Similarly, the unfavorable ∆G nwb of the PR-RT system is also likely to be overestimated; rapid dissociation of the water molecule is exhibited during equilibration, thus production simulations are initiated in a water unbound state and only rare events of subsequent water binding are observed.
Nonetheless, despite these limitations, the partitioning of the nucleophilic water states into unbound and bound remains well-defined, thus the bound sub-ensemble of each system provides an accurate representation of the GS from which subsequent mole fractions of desired states, including NACs, can be computed.
Similarly, the GS of the enzyme-free substrate systems could also be characterized in terms of a distance metric (d o )-in this case, the distance of the nearest water molecule to the midpoint of the lytic peptide bond. The density distributions of d o expectedly reveal a single-peaked bell-shaped distribution between 3.6-4.2 Å for all systems decaying rapidly and with near-zero density (ρ(d o ) ≤ 0.001) beyond 6 Å ( Figure 3). A threshold of d p o = 6 Å was therefore set to partition bound from unbound regions-therefore, in the enzyme-free substrate systems, it can be considered that there is always a nucleophilic water molecule proximal to the lytic peptide bond and thus effectively 'bound' in the GS.

Analysis of the Hydrogen Bond Network
Four distinct hydrogen bonds (hb 1 -hb 4 ) have previously been characterized in the interaction between the catalytic aspartic acid dyad, the nucleophilic water molecule, and the peptide substrate [31] ( Figure 4A). However, this gives rise to a hydrogen bond network of potentially 16 states, when combined, it was previously shown for the SP1-NC system that only 12 of these were ever populated ( Figure 4B)-due to the mutual exclusivity of hb 1 and hb 2 . In order to explore the role of hydrogen bonding across the various peptide systems, the population density distribution of each putative state within the hydrogen bond network was computed for the NAC state and the ground state (GS) in general and compared with the network in the unbound state ( Figure 4C). Population densities are reported in terms of the potential of mean force (G in kcal/mol) derived from the relative mole fractions of each hydrogen bond state, normalized with respect to the density of the given conformation in each system-thus the row vectors in each system sum to unity in terms of population density. Only the same 12 hydrogen bond states as reported before [31] are ever populated across all nine systems-therefore, mutual exclusivity of hb 1 and hb 2 is preserved across all systems. For all systems, the most densely populated hydrogen bond states when the nucleophilic water molecule is unbound are states 1 (no hydrogen bond) and 9 (only the inter-aspartyl hydrogen bond-hb 2 ). By comparison, hydrogen bond state 5, involving hb 1 , between protonated D25 and the peptide bond carbonyl, is less densely populated in the absence of a bound water molecule. As expected, states involving hydrogen bonding with a bound water molecule are not exhibited except rare events involving hb 2 -or hb 3 -type bonding when the water molecule is at the periphery of the binding site.
The principal change in the hydrogen bond distribution when going from the unbound to the GS is a substantial shift in density towards state 2 (between water and D25 -only hb 3 ) and state 10 (hb 2 and hb 3 )-therefore, the additional presence of hb 3 compared to the unbound state. However, other hydrogen bond states are also populated, albeit less densely, in the GS. In particular, there is a notable shift towards states containing hb 1 -for example, states 5-8. There is qualitatively little difference in hydrogen bond distribution between the NAC and GS-the predominant shift with respect to the unbound state is still towards states 2 and 10 as well as a range of other less densely populated states.
There is a lack of hydrogen bond state density for the PR-RT system compared to the other systems. This may be due to the fact that the overall population density of the bound state is very small and therefore due to improper sampling of the hydrogen bond distribution.
Overall, hydrogen bond analysis reveals that NACs do not exhibit a dissimilar hydrogen bond distribution to the GS but the GS in general does promote additional adoption of states involved in the general acid/general base (GA/GB) mechanism for catalysis (specifically hb 1 and hb 3 ) as compared to the unbound state. Thus, nucleophilic water binding into the catalytic site-by increasing the probability of forming hb 3 in turn promotes formation of hb 1 , priming the water for successful nucleophilic attack.

Thermodynamic Decomposition of Activation Free Energy Contributions
The free energy of NAC formation was calculated for the enzyme-bound (∆G ‡c N ) and enzyme-free (∆G ‡u N ) substrate systems by computing the mole fraction of NACs compared to the GS in each case. All systems exhibit a relatively small enzyme-bound NAC free energy (∆G ‡c N ) but one which is significant over thermal noise. A range of 2.7 kcal/mol is exhibited, from SP1-NC, the most unfavorable (∆G ‡c N = 4.6 ± 0.2 kcal/mol-as reported previously [31]) to TFR-PR, the least unfavorable (∆G ‡c N = 1.9 ± 0.2 kcal/mol). By comparison, the enzyme-free NAC free energies (∆G ‡u N ) are of similar overall magnitude but exhibit a smaller range of 1.0 kcal/mol with SP2-p6 the most unfavorable (∆G ‡u N = 4.8 ± 0.3 kcal/mol) and MA-CA the least unfavorable (∆G ‡u N = 3.8 ± 0.1 kcal/mol). In order to determine the sensitivity of these results to the distance threshold that partitions the bound from unbound nucleophilic water molecule states, ∆G ‡c  The difference in NAC free energy (∆∆G ‡ N ) between the catalyzed and uncatalyzed systems ( Table 2) determines the contribution of NACs to the catalytic effect. These results suggest that NACs contribute only marginally to catalysis in three systems: MA-CA (-1.7 ± 0.2 kcal/mol), SP2-p6 (-1.7 ± 0.4 kcal/mol), and TFR-PR (-2.2 ± 0.3 kcal/mol). In the other systems, the NAC contribution is negligible and/or indistinguishable from thermal noise.

Discussion
The overall activation energy barrier for uncatalyzed peptide hydrolysis-based on an experimentally determined half-life of ∼500 years [42]-is estimated at ∆G ‡u ∼30 kcal/mol [31]. Furthermore, as this is likely to be independent of the sequence of flanking amino acids, this value was assigned to all uncatalyzed cleavage substrates in the current study. HIV-1 protease thus exhibits enormous catalytic power, lowering this barrier to ∆G ‡c ∼15-18 kcal/mol (a significant reduction of |∆∆G ‡ | ∼ 12-15 kcal/mol) across the range of substrates it cleaves ( Table 2). The results reported here support previous studies [31] and further reinforce the idea that NACs do not contribute significantly to the overall catalytic effect (|∆∆G ‡ N | ∼ 0-2 kcal/mol). Rather, non-NAC effects-such as electrostatic preorganization, which is known to play a powerful role in enzyme catalysis [16][17][18][19][20][21][22], are thus likely the main source of the catalytic power in HIV-1 protease, contributing a reduction of |∆∆G ‡ nN | ∼ 10-15 kcal/mol (Table 2). This range is likely to originate from the differing electrostatic properties of the various cleaved peptide sequences and their subsequent differences on the preorganization of the bound enzyme-substrate complex.
The NAC free energy in the uncatalyzed reaction (∆G ‡u N ) varies by only 1.0 kcal/mol in the calculations reported here. Therefore, when taking into account thermal noise (∼0.6 kcal/mol) all systems can be interpreted to have near-identical uncatalyzed NAC free energies, characterized by a mean value of <∆G ‡u N > = 4.3 ± 0.3 kcal/mol. Then, assuming the overall uncatalyzed reaction barrier is similar across different sequences, the uncatalyzed non-NAC contribution (∆G ‡u nN ) is also nearly identical across the different systems, with a mean value of <∆G ‡u nN > = 25.7 ± 0.3 kcal/mol. However, this is not the case in the catalyzed systems. Experimentally, variation of up to three orders of magnitude occurs across k cat [39] for the substrates of HIV-1 protease-from which the above-mentioned variation in catalyzed activation energy barrier is estimated (∆G ‡c (max)-∆G ‡c (min) ∼3 kcal/mol). This is similar to the range of variation observed in the computed ∆G ‡c N . If non-NAC contributions were similar to each other in the catalyzed systems, then variation in the overall barrier would also be correlated to variation in ∆G ‡c N . This would be consistent with NACs exhibiting a 'correlative' mode of fine-tuning the catalyzed reaction. However, decomposition into NAC (∆G ‡c N ) and non-NAC (∆G ‡c nN ) components shows that both components exhibit a similar range of variation to the overall barrier and ∆G ‡c does not correlate with ∆G ‡c N , exhibiting a correlation coefficient of only R 2 = 0.01 ( Figure 6A). This implies sequence-dependent variation of both the NAC and non-NAC contributions across the various substrates in the catalyzed reaction and suggests that differences in activation barrier from cleavage sequence changes are due to the combined and significant variations of both components.
Nonetheless, a subtle role for NACs may still emerge when considering the variation in both components. In the context of the comparably large reductions in energy barrier that non-NAC effects are capable of inducing, their ability to fine-tune such reductions to match the optimally required catalyzed rates across the various substrates may be limited. This is highlighted by comparing non-NAC contributions to the activation barrier with each other ( Figure 6B). All the peptide sequences studied here may be grouped into three distinct non-NAC activation energy bands with means at 15.3, 13.3 and 11.2 kcal/mol-where significant gaps (∼2 kcal/mol) exist between the different bands-therefore, no overlap exists when taking into account thermal noise. These bands appear to be too coarsely-tuned to achieve the required differences between the various substrates, whilst the ordering of substrates by non-NAC contributions is not consistent with that of the overall catalyzed energy barriers. Given these non-NAC contributions, invariant NAC contributions in the catalyzed systems, as estimated for the uncatalyzed systems, would resolve neither of these two issues.
Thus, minor variations in the catalyzed NAC free energies (∆G ‡c N ) exhibit a 'complementary' mode of fine-tuning both the differences between and the absolute values of the overall energy barriers to match the optimal requirements for correctly ordered maturation. Furthermore, the existence of a small thermodynamic barrier by NACs enables the catalyzed barrier to reach the virologically relevant region ( Figure 6B-yellow band) with the given non-NAC contributions-and thus to meet the overall required timescale for virion maturation. nN , free energy contributions. All systems lie within one of three distinct ∆G ‡c nN bands separated by significantly more than thermal noise (black lines with blue bands). Only one non-NAC band is immediately within the virologically relevant region (yellow) for differential enzyme specificity. NAC contributions are small (< 5 kcal/mol) and within ∼3 kcal/mol of each other, but serve to fine-tune ∆G ‡c for all substrates to both match the optimal barrier differences and reach the relevant barrier region required by the virus-neither of which is achieved, in general, by non-NAC contributions alone.
Why the HIV-1 system has evolved multiple cleavage sequences that result in sometimes very similar k cat is not fully understood. One plausible explanation could be that because sequence change can have an effect on both k cat and K M and the reaction kinetics of polyprotein cleavage [43,44] is directed by the combination of both, different sequences could control each of these parameters differentially. However, several cleavage sequences recognized by HIV-1 protease both have similar k cat and K M to each other and are therefore not consistent with the above-mentioned explanation. For example, this applies to the MA-CA (SGNY-PIVQ), SP1-NC (ATIM-MQRG), TFR-PR (SFNF-PQIT), and PR-RT (TLNF-PISP) junctions [39], yet these junctions consist of significantly varying amino acid sequences.
Another possible explanation could be because amino acid sequences can face selection pressure for multiple biomolecular functions not restricted to the given enzymatic reaction. In the case of polyprotein cleavage by HIV-1 protease, each cleavage site corresponds to the C-and N-termini of its adjunct proteins-several of which are involved in structural and functional interactions beyond cleavage alone. This may impose restrictions on which amino acids are selected in the cleavage sequence-emergent sequences being able to fulfill both the required enzymatic rate as well as the other biomolecular functions. In particular, electrostatic preorganization effects would be highly susceptible to the electrostatic profile of the substrate. Sequence changes selected by other functions may thus correspond to abrupt changes in the enzymatic barrier that are modulated by smaller NAC changes. These large reductions due to non-NAC contributions combined with the fine tuning conferred by NACs may help to fulfill the required enzymatic rate using a different sequence.
Comparing the SP1-NC and TFR-PR substrates serves as a good example to illustrate this. The SP1-NC cleavage site is one of the fastest cleaved sites by HIV-1 protease, sharing the same specificity rate constant as the TFR-PR system [39]. However, it exhibits a non-NAC free energy, which is 2.7 kcal/mol lower than the TFR-PR system ( Table 2). The nucleocapsid (NC) protein is essential for condensation of viral RNA [45]-the positively charged arginine in its N-terminus region (MQRG-) is likely to aid non-specific binding to the negatively charged RNA. As part of the SP1-NC cleavage sequence, this residue may also alter the electrostatic balance in the HIV-1 protease active site, contributing to the computed low non-NAC barrier exhibited for this junction. This in turn would lead to mistimed fast cleavage without the additional barrier afforded by the NAC component (4.6 kcal/mol) that restores cleavage rate to the required value. Conversely, the TFR-PR substrate achieves a similar k cat with a different sequence, selected in part due to the structural requirements of the HIV-1 protease N-terminus in forming an interdigitated dimer interface [46] as well as a self-associated precursor protease during intramolecular autocatalysis [47]. This less positive sequence may contribute to the larger non-NAC energy barrier with respect to SP1-NC. However, the same overall energy barrier can still be achieved by this sequence by allowing for a significantly reduced NAC contribution with respect to the enzyme-free system (∆∆G ‡ N ∼ -2 kcal/mol). The degree of decoupling between NAC and non-NAC contributions upon single amino-acid mutations has not been investigated here. It is unknown whether a given amino acid mutation in the cleavage region would affect both NAC and non-NAC contributions simultaneously. If so, this would give rise to a more complex picture of fine-tuning where multiple mutations whose sum of NAC and non-NAC contributions kept the desired rate constant. However, single mutations may exist that alter predominantly either NAC or non-NAC contributions. For example, some hydrophobic-hydrophobic mutations may affect the activation barrier more through the NAC rather than the non-NAC mode, whilst hydrophilic-charged changes may alter the non-NAC barrier but not significantly affect catalytic water entry or NAC formation. Both scenarios could easily be accommodated by the fact that k cat depends not only on the immediate residues that juxtapose the lytic bond, but on at least the P4-P4 subsites that constitute the octapeptide cleavage sequence [41].
Future studies that elucidate this might therefore link the role of NAC and non-NAC contributions to the step-wise evolution of cleavage junctions, where sequences that were not initially lytic junctions became so by mutations that first contributed large discrete non-NAC reductions in the energy barrier and whose rate constants were then fine-tuned by subsequent mutations that altered the NAC contribution. This decoupling might also in part account for the role of compensatory mutations [48,49] that restore viral fitness when antiretroviral therapy causes drug resistance mutations deleterious to fitness [50] to emerge.

Materials and Methods
The role of NAC formation was investigated by performing and comparing ensembles of all-atom explicit solvent molecular dynamics simulations of HIV-1 protease bound to each of a set of eight recognized octapeptide substrates (enzyme-bound) representing inter-protein cleavage junctions (Table 1) in addition to one octapeptide substrate (SP1-NC) that was already previously simulated and reported [31]. These nine enzyme-bound systems were further compared to corresponding simulations of the apo-ligand (enzyme-free) systems in explicit solvent.

Initial Preparation
Initial structures were taken from the 1KJ4, 1F7A, 1TSU, 1KJF, 1KJ4, 1KJ4, 1KJG, and 1KJH crystal structures of peptide-bound HIV-1 protease complexes for MA-CA, CA-SP1, NC-SP2, SP2-p6, TFR-PR, PR-RT, RT-RH, and RH-IN junctions, respectively [51][52][53] and, when required, mutated to match the corresponding peptide sequences. Crystallographic water molecules were preserved. An additional water molecule was inserted between the lytic peptide bond and the catalytic dyad as is expected in the general acid/general base (GA/GB) cleavage mechanism [32]. The inactive catalytic dyad D25N was converted into catalytically active D25 form with a monoprotonated state [54,55]. Hydrogen atoms were added, the systems were electrically neutralized (0.15 M NaCl) and explicitly solvated with TIP3P water [56] and topologies were generated using the Leap module of AMBER 14 [57]. The standard AMBER force field for bioorganic systems (ff03) [58] was used to describe all protein parameters. All equilibration and production simulations were performed using ACEMD [59].

Molecular Dynamics Equilibration and Production Simulation Protocol
The molecular dynamics equilibration and simulation protocol for all respective enzyme-bound and enzyme-free systems were identical to those previously reported for the SP1-NC system [31] and are fully described therein. Production ensembles of 100×100 ns and 10×1 µs were generated for each enzyme-bound and corresponding enzyme-free apo-ligand system, respectively, in the NVT ensemble with temperature maintained at 300 K. Coordinate snapshots were generated every 100 ps and 10 ps, respectively. Experimental accuracy of the molecular simulation protocol for the HIV-1 protease has been previously validated using NMR S 2 order parameters [60]. The effects of forcefield variation are small and have been accounted for in a previous study [31].

Analysis
For the enzyme-bound and enzyme-free systems, the nearest water distance (d n and d o respectively) was defined as the distance of the oxygen atom WAT:O of the nearest water molecule from the center of the catalytic site (defined as the geometric center between the four atoms: p1:C, p1 :N, D25:CG and D25 :CG) and the center of the lytic peptide bond (p1:C-p1 :N), respectively ( Figure 1). The ground state (GS) was characterized by the nearest water molecule being within an appropriate distance cutoff (d p n and d p o ) from the respective centers in the enzyme-bound and enzyme-free systems. These thresholds were chosen after analysis of the density distributions for nucleophilic water binding (see Results). The nucelophile attack distance (d a ) was defined as the distance between the carbon atom of the lytic peptide bond and the oxygen atom of the nearest water molecule (WAT nuc :O nuc -p1:C). The nucelophile attack angle (α) was defined as the angle between the d a vector and the vector corresponding to the carbonyl bond adjacent to the lytic peptide bond (WAT nuc :O nuc -p1:C-p1:O). NACs were characterized in terms of Bürgi-Dunitz criteria corresponding to an angle range of 100 • ≤ α ≤ 110 • and distance threshold of d a ≤ 3.2 Å. A hydrogen bond network was analyzed in the catalytic site of the enzyme-bound systems-characterized by cooperative combinations of a set of four distinct hydrogen bonds hb= { hb 1 , hb 2 , hb 3 , hb 4 } and resulting in a set of 12 non-zero density hydrogen bond states s={1,...,12 } (Figure 4). The threshold for a hydrogen bond was a donor-acceptor distance ≤ 3.5 Å and donor-hydrogen-acceptor angle of ≥ 150 • .
Probability densities (ρ(d n ) and ρ(d o )) were calculated by binning ensemble data along the corresponding reaction coordinate space, respectively (d n and d o ) using kernel density estimation with an Epanechnikov kernel and bandwidth parameter h = 0.75. The mole fraction (Γ M ) of a given macrostate (M) was computed as Γ M = m/N, where m is the number of datapoints within the above-mentioned boundary partitions of the macrostate in the corresponding reaction coordinate space, and N, the total number of datapoints in the given ensemble. The potential of mean force for a given state (G M ) was calculated as G M = −k B Tln(Γ M ). Free energy differences (∆ G) between various macrostates were calculated from the ratios of the corresponding mole fractions according to ∆G = −k B Tln(Γ M2 /Γ M1 ), where Γ M1 and Γ M2 are mole fractions of any given states M1 and M2, respectively. Hydrogen bond state mole fractions (Γ s ) within a given macrostate were calculated as Γ s = n s /m, where n s is the number of datapoints within the criteria for each hydrogen bond state s within the subset of datapoints (m) comprising state M. Care was taken to integrate the possible degenerate configurations pertaining to each distinct hydrogen bond that arise from structural symmetries imposed by molecular rotation-as previously reported [31].
The complete ensemble for each system was used to perform the hydrogen bond ( Figure 4) and free energy variation ( Figure 5) analyses. Reported free energy values, as well as their errors (Tables 1 and 2), were calculated by partitioning each ensemble into five subsets in both the enzyme-bound and enzyme-free systems; mole fractions were calculated independently in each subset and averaged to yield means and standard deviations. The only exception to this was the calculation of the NAC free energy for the enzyme-bound PR-RT system for which very few absolute counts were exhibited-thus the ensemble was divided into only two subsets to compute means and errors here.
Estimates for the overall catalyzed energy barrier (∆G ‡c ) for all substrates except NC-SP2, were made based on converting experimental k cat values [39] using Equation (2) and assuming γ = 1 at 300 K. NC-SP2 was not measured in [39], but was measured in a similar study at 310 K [41], as were several other cleavage junctions at this temperature [40]. Therefore, k cat for NC-SP2 at 300 K was estimated by multiplying the k cat ratio of NC-SP2/CA-SP1 from [40,41] to the k cat value of CA-SP1 in the main set [39] from which ∆G ‡c was subsequently calculated.

Conclusions
Computational studies have previously revealed the existence of a small but significant thermodynamic barrier contributed by the formation of near attack conformations (NACs) that lie on the transition path of the peptide hydrolysis reaction catalyzed by HIV-1 protease [31]. However, NACs were also found there to confer no catalytic effect because the thermodynamic barrier to their formation was equivalent in the uncatalyzed reaction. In the current study, the role of NACs has been further explored across a range of substrates cleaved by the HIV-1 protease, using the same all-atom ensemble molecular dynamics simulation approach coupled to Bürgi-Dunitz theory for characterizing nucleophilic attack of a water molecule on the lytic peptide bond.
This study supports the previous findings that NACs play little or no role in the catalytic effect induced by HIV-1 protease-catalytic barrier reduction is thus entirely dominated by non-NAC contributions. Nonetheless, the functional role of NACs may emerge when considering them together with non-NAC contributions as well as the virological requirements for the ordering of cleavage rates across the different substrates. For HIV-1, the kinetic order of cleavage is tightly regulated [61] to achieve correct architectural reorganization into a mature virion [62] within the physiologically required timescale for infection [43,44].
The findings reported here suggest that NAC contributions, whilst small, are also largely invariant across multiple substrates in the uncatalyzed reaction. Similarly, non-NAC contributions, even though large, are also relatively invariant because the uncatalyzed barrier may be independent of peptide sequence. The catalyzed reactions, however, present a different picture because a range of variation (∼3 kcal/mol) exists in the overall energy barrier and both non-NAC and NAC contributions are shown to vary by ∼3 kcal/mol across the range of substrates studied. Although non-NAC contributions dominate the reduction in the overall barrier, they appear to be grouped in discrete energy bands that are too coarsely separated to account for either the small variations or the correct ordering of the overall barriers across the substrates. The small variation in NAC contributions thus may constitute a complementary fine-tuning mechanism to rectify both of these issues.
Biological systems face evolutionary selection pressure on multiple fronts and viruses, in particular, are well-known for their parsimony in achieving macromolecular functionality. Together, such a combination of coarse-reduction by non-NAC and complementary fine-tuning by NAC contributions, respectively, may thus provide a way for controlling required enzymatic rates, whilst under sequence selection pressure for other biomolecular functions. There is likely a trade-off in selecting amino-acid sequences for these functions and those required to directly optimize the given enzymatic reaction. This selection pressure may make it difficult to fine-tune the necessary differences in enzymatic rate by solely using changes in non-NAC contributions. NACs offer an alternative mode to accommodate such differences whilst modulating the rate to meet that required biologically for optimal processing at the corresponding site.
Establishing NAC effects have proven computationally challenging because rigorous characterization at the limit of a classical forcefield requires sufficient sampling to observe ground state (GS) motions towards the transition state. This has challenged the exploration of relative NAC effects in similar but non-identical systems. Here, sufficient sampling of the GS is made possible for most systems by reversible water entry to the active-site with around 10 µs of aggregate sampling per enzyme-bound system. Other systems of interest for exploring NAC contributions may therefore be accessible with current computational power.
Furthermore, future studies using quantum mechanical/molecular mechanics (QM/MM) approaches [22] may test the accuracy of the classical observations exhibited here in more detail and also establish a quantum analog of the nucleophilic attack criteria given by Bürgi-Dunitz theory, from which NAC contributions could be better decoupled from subsequent steps along the reaction pathway.
Nonetheless, the observation here that, for the HIV-1 protease system, relative enzyme specificity is in part directed by fine-tuned nucleophilic water NAC contributions to the catalyzed reaction barrier, implies that similar NAC-specificity effects may exist in other enzyme-substrate reactions and suggests a novel mechanism for fine-tuning and controlling such reactions.