Large Multidomain Protein NMR: HIV-1 Reverse Transcriptase Precursor in Solution

NMR studies of large proteins, over 100 kDa, in solution are technically challenging and, therefore, of considerable interest in the biophysics field. The challenge arises because the molecular tumbling of a protein in solution considerably slows as molecular mass increases, reducing the ability to detect resonances. In fact, the typical 1H-13C or 1H-15N correlation spectrum of a large protein, using a 13C- or 15N-uniformly labeled protein, shows severe line-broadening and signal overlap. Selective isotope labeling of methyl groups is a useful strategy to reduce these issues, however, the reduction in the number of signals that goes hand-in-hand with such a strategy is, in turn, disadvantageous for characterizing the overall features of the protein. When domain motion exists in large proteins, the domain motion differently affects backbone amide signals and methyl groups. Thus, the use of multiple NMR probes, such as 1H, 19F, 13C, and 15N, is ideal to gain overall structural or dynamical information for large proteins. We discuss the utility of observing different NMR nuclei when characterizing a large protein, namely, the 66 kDa multi-domain HIV-1 reverse transcriptase that forms a homodimer in solution. Importantly, we present a biophysical approach, complemented by biochemical assays, to understand not only the homodimer, p66/p66, but also the conformational changes that contribute to its maturation to a heterodimer, p66/p51, upon HIV-1 protease cleavage.


Introduction
Many retroviruses translate their proteins as large precursor polyproteins from which individual proteins are cleaved to become their mature functional forms [1]. Polyproteins are multi-domain proteins and are known to be mobile and adhesive to other proteins in solution, with complex folding and thermodynamic characteristics [2][3][4][5][6][7]. Thus, multiple orthogonal methods are essential to obtain a reliable characterization of such proteins in solution. In this review article, we focus on reverse transcriptase (RT) from human immunodeficiency virus-1 (HIV-1), which is initially expressed as a 66 kDa protein (p66) in a Gag-Pol polyprotein and contains five domains in itself. RT proteins play an essential role in the replication of all retroviruses, including HIV-1, and related retrotransposons [8][9][10][11][12].
During viral maturation, p66 is cleaved by the HIV-1 protease (PR) to form a functional heterodimer, comprising p66 and p51 subunits (p66/p51) [13,14]. The p51 subunit is generated upon PR-mediated removal of a majority of the ribonuclease H (RNH) domain from p66 [15][16][17]. Over the past~25 years, a wealth of published structural data has elucidated the molecular details of the mature p66/p51 and its interactions with deoxyribonucleotide triphosphates and DNA substrates, as well as polymerase

Function and Structure of the Mature HIV-1 RT
RT catalyzes all steps in the reverse transcription of the HIV-1 (+) single stranded RNA into double stranded DNA and is, therefore, essential for virus replication [14,[41][42][43]. It has been a primary target for antiviral drug development since the discovery of HIV-1 in 1983, and 12 agents that directly target this enzyme have been FDA approved as HIV-1 antivirals (plus several more in clinical trials) [44][45][46]. These antivirals can be categorized into two therapeutic classes: nucleoside/nucleotide RT inhibitors (NRTIs) and nonnucleoside RT inhibitors (NNRTIs). Both NRTIs and NNRTIs bind in (NRTIs) or near to (NNRTIs) the DNA polymerase domain of RT, and primarily impact this activity [47,48]. However, RT is a multifunctional enzyme and also contains an RNH domain that is responsible for the cleavage of the RNA strand in the intermediate RNA/DNA duplex that is formed during reverse transcription [13,14]. To date, despite significant effort [49][50][51][52][53][54][55][56][57], no drug that targets this function has been clinically developed.
The p66 subunit in mature RT, p66/p51, has two domains: a polymerase domain (residues 1 to 426) and an RNH domain (residues 427 to 560). The polymerase domain contains finger-palm (residues 1 to 236), thumb (residues 237 to 318), and connection (residues 319 to 426) subdomains, while the RNH domain is a single domain fold (residues 427 to 560) (blue, green, yellow, orange ribbons, respectively, in Figure 1a) [20]. Among published reports, the term "subdomain" is not always utilized [58][59][60][61] and, the starting or ending residue numbers may differ slightly, based on differences in the allocation of a β-strand or a loop region [18,19,62,63]. Although the finger-palm subdomain has two structurally distinct regions, i.e., finger (residues 1 to 85 and 120 to 150) and palm (residues 85 to 119 and 151 to 243), they are not sequentially independent of each other, and thus are combined as finger-palm in this review article. In RT, the polymerase active site, D110, D185, and D186, is located in the finger-palm domain [64] while the ribonuclease active site, D443, E478, D498, and D549, is located in the RNH domain.
palm in the p51 subunit (Figure 1d). The conformational rearrangements leading to this p66/p51 conformation is structurally of interest and is discussed further below. Figure 1. p66/p51 reverse transcriptase (RT) structure, highlighting (a) the domain orientation in the p66 subunit, (b) the domain orientation in the p51 subunit, and relative orientation of (c) two fingerpalm domains in the p66 and p51 subunits and that of (d) the two connection domains in the p66 and p51 subunits. In panels (a,b), the bar presentations below the structures indicate which domains are highlighted: finger-palm (blue), thumb (green), connection (yellow) and ribonuclease H (RNH) (orange). In panel (c), residues, 10-16 and 86-95, that are at the subunit interface in the p66 subunit, are highlighted with a red color in both subunits. Similarly, in panel (d), residues, 405-412, that are at the subunit interface in the p66 subunit, are highlighted in both subunits. The graphic presentation was made using VMD software [65] and the RT structure (PDB 1DLO [66]).
For RT maturation, HIV-1 PR cleaves one of the p66 subunits in p66/p66 at residues 440 and 441; the location of this cleavage site, often referred to as the "p51-RNH processing site", is significant since the RNH domain in p66 begins at residue 427 ( Figure 2a). As such, the processing site, F440-Y441, is actually within the RNH fold and not in the domain-linker. This is clearly evident in the RNH domain structure of the full-length heterodimer and in the structure of the isolated RNH protein: the terminal regions are exposed to solution while the p51-RNH cleavage site of p66 is sequestered in a -sheet within the RNH domain and is inaccessible to PR ( Figure 2a) [18,19,44,67]. Previous backbone dynamic studies of an isolated RNH domain showed that residues F440-Y441 are rigidly folded in the protein core in solution [68,69]. In the mature protein, p66/p51, this cleavage site is also protected by the side chain of Y427; although the stretch of amino acids from Y427 to T439 appears to have no secondary structure elements, it tightly interacts with the RNH core, with the side chain of Y427 capping a pocket of the RNH core ( Figure 2b) [70]. Evidence for a stabilizing effect of this region on the core comes from studies showing that E438N or F440A mutation unfolds the RNH core ( Figure  2c) [71]. Further, molecular dynamics (MD) simulation of the wild type and mutant RNH indicate the importance of the charge network at the processing site ( Figure 2c) [71]. Although the F440 cleavage site residue must be exposed to solution for PR access, its side chain is also needed to maintain the RNH core fold. Thus, in p66/p51, the p51-RNH processing site is protected, and to understand the structural basis of RT maturation, an understanding of the precursor, p66/p66, structure is essential. Figure 1. p66/p51 reverse transcriptase (RT) structure, highlighting (a) the domain orientation in the p66 subunit, (b) the domain orientation in the p51 subunit, and relative orientation of (c) two finger-palm domains in the p66 and p51 subunits and that of (d) the two connection domains in the p66 and p51 subunits. In panels (a,b), the bar presentations below the structures indicate which domains are highlighted: finger-palm (blue), thumb (green), connection (yellow) and ribonuclease H (RNH) (orange). In panel (c), residues, 10-16 and 86-95, that are at the subunit interface in the p66 subunit, are highlighted with a red color in both subunits. Similarly, in panel (d), residues, 405-412, that are at the subunit interface in the p66 subunit, are highlighted in both subunits. The graphic presentation was made using VMD software [65] and the RT structure (PDB 1DLO [66]).
The p51 subunit in p66/p51 lacks the RNH domain, and the relative orientation of the finger-palm, thumb, and connection domains differ in the p51 subunit compared to the p66 (Figure 1b). Although the finger-palm and connection domains in the p66 subunit interact with those in the p51 subunit, the domains in the p66 subunit do not arrange symmetrically with those in the p51 subunit (Figure 1c,d). For example, the subunit interface residues of the finger-palm domain of p66 are located at the inner core of the finger-palm domain in the p51 subunit ( Figure 1c). Similarly, a part of the subunit interface residues in the connection domain in the p66 subunit interact with the finger-palm in the p51 subunit ( Figure 1d). The conformational rearrangements leading to this p66/p51 conformation is structurally of interest and is discussed further below.
For RT maturation, HIV-1 PR cleaves one of the p66 subunits in p66/p66 at residues 440 and 441; the location of this cleavage site, often referred to as the "p51-RNH processing site", is significant since the RNH domain in p66 begins at residue 427 ( Figure 2a). As such, the processing site, F440-Y441, is actually within the RNH fold and not in the domain-linker. This is clearly evident in the RNH domain structure of the full-length heterodimer and in the structure of the isolated RNH protein: the terminal regions are exposed to solution while the p51-RNH cleavage site of p66 is sequestered in a β-sheet within the RNH domain and is inaccessible to PR (Figure 2a) [18,19,44,67]. Previous backbone dynamic studies of an isolated RNH domain showed that residues F440-Y441 are rigidly folded in the protein core in solution [68,69]. In the mature protein, p66/p51, this cleavage site is also protected by the side chain of Y427; although the stretch of amino acids from Y427 to T439 appears to have no secondary structure elements, it tightly interacts with the RNH core, with the side chain of Y427 capping a pocket of the RNH core ( Figure 2b) [70]. Evidence for a stabilizing effect of this region on the core comes from studies showing that E438N or F440A mutation unfolds the RNH core ( Figure 2c) [71]. Further, molecular dynamics (MD) simulation of the wild type and mutant RNH indicate the importance of the charge network at the processing site ( Figure 2c) [71]. Although the F440 cleavage site residue Figure 2. RNH domain structure, highlighting (a) the p51-RNH processing site (i.e., F440-Y441, red ribbon) and the active site (purple sticks), (b) the Y427 side chain (thick stick), and (c) F440 cleavage site. In panels (b,c) the side chains of the surrounding residues are shown. The graphic presentation was made using VMD software [65] and the RT structures (PDB 3KK2 [72] or 1DLO [66]).
Gag-Pol proteins are processed to matured proteins in both the cell and virus, however, the proteins found in the virus are matured in the virus [90][91][92][93][94]. Evidence for the latter comes from studies showing the accumulation of full-length Gag-Pol in the virus when the protein is packaged to the virus in the presence of a PR inhibitor or with an inactive PR mutant [90][91][92][93]. Further, when RT and/or IN are deleted from Gag-Pol, a delivery vehicle (i.e., a Gag-derived packaging signal or sequence) is required for their packaging into the virus (the enzymes alone are not packaged) [95].
The first event of Gag-Pol precursor processing is a cleavage at the p2/NC site in Gag by a cis (intra-molecular) PR reaction, while subsequent processing steps of the Pol region involve a trans (inter-molecular) reaction [77,[81][82][83]96]. In experiments on cells or viruses, p66 is produced before, or near the same time as, p51 [37,92,97]. (Note, the processing order observed in in vitro transcription/translation experiments [81,82,96] is different from those observed with in vivo systems, presumably because of differences in the protein dimerization and oligomerization steps as well as molecular interactions with cellular factors). Importantly, mutations in the RT region are known to affect Gag-Pol polyprotein maturation in the virion [92,98], suggesting the importance of the p66 region for the maturation of the entire Pol protein. Although the sequence of PR processing at sites within the Pol region differ among reports, mutation studies and protein concentration studies consistently indicate that p66 homodimer formation is a prerequisite for p66/p51 formation [37,40,99].
These observations indicate that the immediate upstream step to p66/p51 RT formation is formation of p66/p66 and cleavage of the homodimer to p66/p51 by PR (Figure 3a). The dissociation constant (KD) of p66/p66, ~4 µM, is ten times larger than that of p66/p51 [34][35][36]100], indicating that tight dimerization occurs after p51-RNH cleavage, and this dimerization and associated conformational change likely prevent the cleavage of the remaining RNH domain in p66/p51. However, the structure of p66/p66 and the solution exposure of the uncleaved p51-RNH site are unknown. We and others have tried to characterize the p66/p66 structure, providing three structural models: asymmetric homodimer with one unfolded RNH domain, based on solution NMR experiments of p66/p66, p66/p51, and p51 [70,101,102]; symmetric homodimer with both RNH domains folded, based on solution NMR experiments of p66/p66 and p51 [100]; or asymmetric In panels (b,c) the side chains of the surrounding residues are shown. The graphic presentation was made using VMD software [65] and the RT structures (PDB 3KK2 [72] or 1DLO [66]).
Gag-Pol proteins are processed to matured proteins in both the cell and virus, however, the proteins found in the virus are matured in the virus [90][91][92][93][94]. Evidence for the latter comes from studies showing the accumulation of full-length Gag-Pol in the virus when the protein is packaged to the virus in the presence of a PR inhibitor or with an inactive PR mutant [90][91][92][93]. Further, when RT and/or IN are deleted from Gag-Pol, a delivery vehicle (i.e., a Gag-derived packaging signal or sequence) is required for their packaging into the virus (the enzymes alone are not packaged) [95].
The first event of Gag-Pol precursor processing is a cleavage at the p2/NC site in Gag by a cis (intra-molecular) PR reaction, while subsequent processing steps of the Pol region involve a trans (inter-molecular) reaction [77,[81][82][83]96]. In experiments on cells or viruses, p66 is produced before, or near the same time as, p51 [37,92,97]. (Note, the processing order observed in in vitro transcription/translation experiments [81,82,96] is different from those observed with in vivo systems, presumably because of differences in the protein dimerization and oligomerization steps as well as molecular interactions with cellular factors). Importantly, mutations in the RT region are known to affect Gag-Pol polyprotein maturation in the virion [92,98], suggesting the importance of the p66 region for the maturation of the entire Pol protein. Although the sequence of PR processing at sites within the Pol region differ among reports, mutation studies and protein concentration studies consistently indicate that p66 homodimer formation is a prerequisite for p66/p51 formation [37,40,99].
These observations indicate that the immediate upstream step to p66/p51 RT formation is formation of p66/p66 and cleavage of the homodimer to p66/p51 by PR (Figure 3a). The dissociation constant (K D ) of p66/p66,~4 µM, is ten times larger than that of p66/p51 [34][35][36]100], indicating that tight dimerization occurs after p51-RNH cleavage, and this dimerization and associated conformational change likely prevent the cleavage of the remaining RNH domain in p66/p51. However, the structure of p66/p66 and the solution exposure of the uncleaved p51-RNH site are unknown. We and others have tried to characterize the p66/p66 structure, providing three structural models: asymmetric homodimer with one unfolded RNH domain, based on solution NMR experiments of p66/p66, p66/p51, and p51 [70,101,102]; symmetric homodimer with both RNH domains folded, based on solution NMR experiments of p66/p66 and p51 [100]; or asymmetric homodimer with both RNH domains folded, based on electron spin resonance (ESR) experiments [103] (Figure 3b).
While NMR data of a protein in solution reflects the ensemble average of conformers of the protein in a certain time scale, ESR data provide the ensemble of conformers as a set of snap shots of the conformers, regardless of the time scale, by flash freezing. Thus, the two folded RNH models, whether conformational symmetry is present or not, may be consistent with each other. On the other hand, RNH unfolding requires an energy barrier to be crossed, because the folded form is protected, as discussed in the previous subsection ( Figure 2). We will discuss the model differences in the last section of this review. Figure 3. (a) RT maturation process, i.e., the formation of p66/p66 and processing to p66/p51, and (b) three homodimer structural models: asymmetric homodimer with one unfolded RNH domain, proposed based on solution NMR experiments of p66/p66, p66/p51, and p51 [70,101,102]; symmetric homodimer with both RNH domains folded, proposed based on solution NMR experiments of p66/p66 and p51 [100]; or asymmetric homodimer with both RNH domains folded, proposed based on electron spin resonance (ESR) experiments [103].

NMR Strategy to Observe p66/p66
Solution NMR studies of large proteins, over 100 kDa, are technically challenging and of considerable interest in the NMR field [104][105][106]. The slowing of molecular tumbling in solution as molecular mass increases accounts for this challenge; severe line-broadening and signal overlap occur in the typical 1 H-13 C or 1 H-15 N correlation spectra obtained from large proteins uniformly labeled with 13 C-or 15 N. Selective isotope labeling, often concomitant with deuteration, is well-known as a strategy to reduce this signal overlap and line-broadening [107]. In addition, transverse relaxation optimized spectroscopy (TROSY)-based experiments increase the upper limit of molecular size that can be studied in solution by NMR [108][109][110][111]. In particular, methyl groups are highly sensitive NMR probes and are often used to detect NMR signals of large proteins in solution [105,[112][113][114][115][116]. This is because the fast three-site jump of methyl groups significantly reduces the line-width of the methyl NMR signals, making them easier to detect compared to signals with broad line-widths [117][118][119][120] and establishing the equivalency of the three C-H vectors that undergo dipolar cross correlation, naturally enabling the TROSY effect in the HMQC experiments [110,111].
Taking advantage of the methyl 1 H and 13 C signals, the London group pioneered efforts to assess the conformational differences in p51 RT, p66 RT, and p66/p51 RT by observing methionine methyl signals in the HMQC experiments [121,122]. They initially identified the effect of NNRTI interaction on the p66 subunit and the p51 subunit of p66/p51 RT. Proteins needed for the Met- methyl NMR can be prepared by expressing the protein in a minimum medium or a defined amino-acid medium with 13 C- labeled methionine. This Met- methyl NMR is sensitive and has been applied to investigate various NNRTI bound forms of p66/p51 by other groups, too [123,124]. While NMR data of a protein in solution reflects the ensemble average of conformers of the protein in a certain time scale, ESR data provide the ensemble of conformers as a set of snap shots of the conformers, regardless of the time scale, by flash freezing. Thus, the two folded RNH models, whether conformational symmetry is present or not, may be consistent with each other. On the other hand, RNH unfolding requires an energy barrier to be crossed, because the folded form is protected, as discussed in the previous subsection ( Figure 2). We will discuss the model differences in the last section of this review.

NMR Strategy to Observe p66/p66
Solution NMR studies of large proteins, over 100 kDa, are technically challenging and of considerable interest in the NMR field [104][105][106]. The slowing of molecular tumbling in solution as molecular mass increases accounts for this challenge; severe line-broadening and signal overlap occur in the typical 1 H-13 C or 1 H-15 N correlation spectra obtained from large proteins uniformly labeled with 13 C-or 15 N. Selective isotope labeling, often concomitant with deuteration, is well-known as a strategy to reduce this signal overlap and line-broadening [107]. In addition, transverse relaxation optimized spectroscopy (TROSY)-based experiments increase the upper limit of molecular size that can be studied in solution by NMR [108][109][110][111]. In particular, methyl groups are highly sensitive NMR probes and are often used to detect NMR signals of large proteins in solution [105,[112][113][114][115][116]. This is because the fast three-site jump of methyl groups significantly reduces the line-width of the methyl NMR signals, making them easier to detect compared to signals with broad line-widths [117][118][119][120] and establishing the equivalency of the three C-H vectors that undergo dipolar cross correlation, naturally enabling the TROSY effect in the HMQC experiments [110,111].
Taking advantage of the methyl 1 H and 13 C signals, the London group pioneered efforts to assess the conformational differences in p51 RT, p66 RT, and p66/p51 RT by observing methionine methyl signals in the HMQC experiments [121,122]. They initially identified the effect of NNRTI interaction on the p66 subunit and the p51 subunit of p66/p51 RT. Proteins needed for the Met-ε methyl NMR can be prepared by expressing the protein in a minimum medium or a defined amino-acid medium with 13 C-ε labeled methionine. This Met-ε methyl NMR is sensitive and has been applied to investigate various NNRTI bound forms of p66/p51 by other groups, too [123,124].
The London group next applied methyl NMR using RTs that were perdeuterated but with 1 H/ 13 C labeling at the isoleucine δ1 positions, mainly to understand the p66/p66 structure and maturation. Based on differences in their data at the start versus the end of a many hour experiment, they derived a model in which p66/p66 forms a symmetric conformation but undergoes a slow conformational change, with a 6.5 h time constant, to an asymmetric form, in which one RNH domain is unfolded ("Asymmetric dimer & unfolded RNH" model in Figure 2b) [63,70,101,102,125]. In both methionine methyl and isoleucine methyl studies, they integrated NMR results with their small angle X-ray scattering (SAXS) data and/or MD simulation results [63,70,101,102,125].
Our efforts to understand the p66/p66 homodimer conformation in solution began with a comparison of 1 H-15 N NMR spectra from the p66 protein (residues from 1 to 560) with those from the isolated thumb domain (residues from 237 to 318), the RNH domain (residues from 427 to 560), and p51 (residues from 1 to 427) (Figure 4a) [100]. Generally, comparison of NMR spectra from isolated domains with those of an entire protein is a useful and practical strategy to study large proteins that undergo domain motion [105,126,127]. Nevertheless, it was impossible to detect all signals of the 132 kDa homodimer protein by using 1 H-15 N TROSY-HSQC. In fact, we were surprised by the many signals that were observed in the p66 NMR spectrum [100]. The p66 spectral pattern was similar to that of p51, which homodimerizes with a K D of~0.3 mM, compared to the approximate ten-fold weaker K D of p66/p66 [34][35][36]100]. Superimposing the p66 and p51 spectra with those of isolated thumb and RNH domains indicated a high degree of similarity among the spectra [100]. Dimerization of p66, >80%, was confirmed from size-exclusion chromatography with multi-angle light scattering (SEC-MALS). Given that only one set of resonances appeared to be present in the p66 spectrum, we assumed that the p66 monomer-dimer equilibrium affects fast-exchange on the chemical shift time scale. On this assumption, the similarity of the spectral patterns among the isolated domains, p51 and p66, indicated independent domain motions of the thumb and RNH domains in p66/p66. We proposed a "symmetric dimer & folded RNHs" model of the p66/p66 conformation, based on these observations (Figure 2b). consistently support a "symmetric dimer & folded RNHs" model ( Figure 2b). The significance of the p66/p66 structure models is discussed below.

Characteristics of Methyl vs Amide NMR When Domain Motion Exists
The Ile-1 and Met- methyl HMQC experiment, as discussed thus far, is highly sensitive and, therefore, informative when characterizing large proteins in solution [105,[112][113][114][115][116]121,122], due to the fast three-site jump of methyl groups [117][118][119][120], and the TROSY effect, which is a consequence of the methyl group C-H dipolar cross-correlations [110,111]. Ile-1 methyl HMQC signals have been observed both in rigid and mobile regions of large proteins. In contrast, our initial 1 H- 15 N HSQC experiments, in which many 1 H-15 N cross peaks were observed, suggested more domain motion [70,100], than our Ile-1 methyl HMQC spectra. Although the intrinsic sensitivity of 1 H-13 C methyl- The limited sensitivity of the 1 H-15 N NMR experiments prompted us to follow-up using the strategy of the London group, i.e., detection of Ile-δ1 methyl signals of p66/p66 (Figure 4b) [128]. We recorded 1 H-13 C HMQC spectra of Ile-δ1 methyl signals of p66/p66 in the inhibitor-free form, the NNRTI-bound form, and those with/without tRNA Lys3 [128], a known primer for the reverse transcription reaction that, with the viral RNA, interacts with RT [129][130][131]. In the absence of PR, we observed a stable p66/p66 spectrum that did not change over a period of 40 h [128]. Particularly note-worthy was the absence of a time-dependent increase in the random coil signal of the Ile-δ1 methyl spectrum. Even upon titration of an NNRTI that enhances homodimer formation (described below), the random coil signal was not increased and only slight shifts of methyl signals in the RNH domain were observed, suggesting that the RNH domains were folded in the apo-and NNRTI-bound forms [128]. In the presence of tRNA Lys3 , either with or without NNRTI, one RNH domain has an altered conformation or experiences a different chemical environment. RT maturation experiments in vitro (i.e., monitoring the proteolysis of p66/p66 to p66/p51 by PR) showed that p66/p66 in the presence of tRNA Lys3 , either with/without NNRTI, is more efficiently matured [128], correlating the states that were observed by NMR with the professing efficiency in the states. We also utilized information obtained from our 19 F NMR data regarding the position of 181 in p66/p66 and p66/p51 in the absence or presence of NNRTI (Figure 4c) [132,133]. The data suggested a 1:1 NNRTI:p66/p66 binding stoichiometry and conformational similarity between p66/p66 and p66/p51 at residue 181.
Finally, we compared 1 H-13 C HMQC spectra of Ile-δ1 methyl signals of p66/p66 in the various ligand-bound states [128] and correlated these with the efficiency of p66/p51 production in the presence of active PR using similar conditions [40], with an aim to identify what conformation enhances RT maturation. The production of p66/p51 by PR was enhanced in the presence of tRNA Lys3 , regardless of whether NNRTI was present. As mentioned above, significant chemical shift changes of one RNH domain were also observed in the presence of tRNA Lys3 , regardless of NNRTI binding. Generally, the slow PR-catalyzed production of p66/p51 in the absence of tRNA Lys3 is consistent with the conclusion that both RNH domains are folded in the unbound homodimer. Altogether, these data consistently support a "symmetric dimer & folded RNHs" model ( Figure 2b). The significance of the p66/p66 structure models is discussed below.

Characteristics of Methyl vs Amide NMR When Domain Motion Exists
The Ile-δ1 and Met-ε methyl HMQC experiment, as discussed thus far, is highly sensitive and, therefore, informative when characterizing large proteins in solution [105,[112][113][114][115][116]121,122], due to the fast three-site jump of methyl groups [117][118][119][120], and the TROSY effect, which is a consequence of the methyl group C-H dipolar cross-correlations [110,111]. Ile-δ1 methyl HMQC signals have been observed both in rigid and mobile regions of large proteins. In contrast, our initial 1 H-15 N HSQC experiments, in which many 1 H-15 N cross peaks were observed, suggested more domain motion [70,100], than our Ile-δ1 methyl HMQC spectra. Although the intrinsic sensitivity of 1 H-13 C methyl-TROSY and 1 H-15 N TROSY experiments differ, the effect of domain motion upon effective correlation times is expected to be similar for the two experiments.
Here, we discuss three possibilities that could cause differences in the "apparent sensitivity" to domain motion in 1 H-13 C methyl-TROSY and 1 H-15 N TROSY experiments. First, Ile-δ1 methyl groups experience more rotational internal dynamics than the backbone amide [134] and, thus, are expected to be less sensitive to domain motion or overall molecular tumbling, when compared to the backbone amides. Second, since the 1 H spin-flip rate increases as the rotational correlation time increases, domain motion may significantly decrease 1 H spin-flips among the residual 1 Hs in a perdeuterated protein and, thus, enhance 1 H-15 N TROSY effects [135,136]. Third, magnetization recovery during the pulse repetition time may also affect the overall spectral sensitivity in the spectra. To discuss the latter point, we show plots of the proton longitudinal relaxation time (T 1 ) with methyl fast rotation ( Figure 5a) and without methyl fast rotation (Figure 5b). The latter mimics the case of amide proton T 1 , yet we did not directly compare this result with the calculated amide proton relaxation rates in deuterated proteins because the latter T 1 's vary greatly, depending on the level of deuteration and individual sites [137,138].
together, there are multiple factors that affect the apparent sensitivity changes by domain motion in 1 H-13 C methyl-TROSY and 1 H-15 N TROSY experiments. In the calculations, a model-free spectral density function with an extended fast motion was assumed [140,141] with a generalized order parameter (Ss 2 ) of 0.8, and with a correlation time for internal motion of 50 ps for a rigid molecule (solid lines) or 4 ns for one that undergoes domain motion (dashed lines). In the calculation in panel (a), an order parameter for methyl fast motion (Sf 2 ) was assumed at 0.25 for methyl protons [117,139]. In the calculation in panel (b), Sf 2 at 0.9 was assumed to provide a relaxation sink.

Integration of Multiple NMR Strategies
In addition to 1 H-15 N TROSY-HSQC and Ile-1 methyl HMQC, we also utilized 19 F NMR data of a 19 F single-site labeled protein to gain information in a pin-point manner. In these 19 F NMR experiments, a 19 F NMR probe, 4-trifluoromethyl-phenylalanine (tfmF), was site-specifically introduced using an orthogonal tRNA/tRNA synthetase pair [132,142,143]. Generally, the combination of a single-site label that prevents resonance overlap and the excellent sensitivity of 19 F NMR (83% of 1 H) enables the detection of changes in the chemical or structural environment at a specific site in a large protein [132,133]. Such an application of 19 F NMR is selective, in that one can choose the specific location of the label, and is orthogonal to other NMR methods. The latter is important to validate observations from 1 H-15 N TROSY-HSQC and Ile-1 methyl HMQC NMR. Similarly, in the London group's pioneering RT NMR work, multiple methods to validate observations were utilized, including methionine methyl NMR [121,122] and Ile-1 methyl NMR [70,101,102], as well as 1 H-15 N NMR [125]. Together, these studies demonstrate the importance of multi-nuclear NMR approaches, as well as other methodologies, which we further describe below.

NMR and Other Methodologies
As initially mentioned, the "asymmetric dimer & unfolded RNH" model of p66/p66 was proposed by the London group [63,70,101,102,125], while we proposed the "symmetric dimer & folded RNHs" model [100,128] (Figure 3b). Since both models are based on NMR observations, a consideration of other model-validating methodologies is important to understand what was actually observed.
In our case, we compared conformational states observed by NMR [128] with the efficiency of RT maturation by PR [40]. Production of p66/p51 by PR was slow in both the presence and absence of NNRTI, compared to those in the presence of tRNA Lys3 [40]. Such a difference in RT maturation rate is consistent with the structural model in which the RNH domains are folded in the absence of nucleic acid (Figure 6a) and even in the presence of NNRTI (Figure 6b). Upon tRNA Lys3 interaction, the p66/p66 NMR chemical shifts broadened or changed position for one of the two RNH domains [128]. Since our processing assay indicated that tRNA Lys3 enhances RT maturation [40], we concluded that conformation of p66/p66 in complex with tRNA Lys3 is a form in which one RNH domain is efficiently cleaved by PR (Figure 6c). In addition, we made sure that our p66 did not contain nucleic protons [117,139]. In the calculation in panel (b), S f 2 at 0.9 was assumed to provide a relaxation sink.
The rotational correlation time is estimated at~100 ns for the p66/p66 homodimer (the right edge of the graph, in Figure 5). Our calculation illuminates that the proton magnetization recovery is significantly shorter for methyl protons that undergo fast methyl rotation, compared to non-methyl protons ( Figure 5) whose rotational correlation time is that of the protein. For proteins that have a rotational correlation time larger than~ns, methyl 1 H T 1 is almost independent of the rotational correlation time of the protein, because of the fast methyl three-site jump [119,120,139] (Figure 5a). The calculation also shows that, although the T 1 of non-methyl protons at~100 ns rotational correlation time is large, it is significantly reduced when there is domain motion (Figure 5b). Taken together, there are multiple factors that affect the apparent sensitivity changes by domain motion in 1 H-13 C methyl-TROSY and 1 H-15 N TROSY experiments.

Integration of Multiple NMR Strategies
In addition to 1 H-15 N TROSY-HSQC and Ile-δ1 methyl HMQC, we also utilized 19 F NMR data of a 19 F single-site labeled protein to gain information in a pin-point manner. In these 19 F NMR experiments, a 19 F NMR probe, 4-trifluoromethyl-phenylalanine (tfmF), was site-specifically introduced using an orthogonal tRNA/tRNA synthetase pair [132,142,143]. Generally, the combination of a single-site label that prevents resonance overlap and the excellent sensitivity of 19 F NMR (83% of 1 H) enables the detection of changes in the chemical or structural environment at a specific site in a large protein [132,133]. Such an application of 19 F NMR is selective, in that one can choose the specific location of the label, and is orthogonal to other NMR methods. The latter is important to validate observations from 1 H-15 N TROSY-HSQC and Ile-δ1 methyl HMQC NMR. Similarly, in the London group's pioneering RT NMR work, multiple methods to validate observations were utilized, including methionine methyl NMR [121,122] and Ile-δ1 methyl NMR [70,101,102], as well as 1 H-15 N NMR [125]. Together, these studies demonstrate the importance of multi-nuclear NMR approaches, as well as other methodologies, which we further describe below.

NMR and Other Methodologies
As initially mentioned, the "asymmetric dimer & unfolded RNH" model of p66/p66 was proposed by the London group [63,70,101,102,125], while we proposed the "symmetric dimer & folded RNHs" model [100,128] (Figure 3b). Since both models are based on NMR observations, a consideration of other model-validating methodologies is important to understand what was actually observed.
In our case, we compared conformational states observed by NMR [128] with the efficiency of RT maturation by PR [40]. Production of p66/p51 by PR was slow in both the presence and absence of NNRTI, compared to those in the presence of tRNA Lys3 [40]. Such a difference in RT maturation rate is consistent with the structural model in which the RNH domains are folded in the absence of nucleic acid (Figure 6a) and even in the presence of NNRTI (Figure 6b). Upon tRNA Lys3 interaction, the p66/p66 NMR chemical shifts broadened or changed position for one of the two RNH domains [128]. Since our processing assay indicated that tRNA Lys3 enhances RT maturation [40], we concluded that conformation of p66/p66 in complex with tRNA Lys3 is a form in which one RNH domain is efficiently cleaved by PR (Figure 6c). In addition, we made sure that our p66 did not contain nucleic acid contamination based on the 254/280 ratio [40], or protease contamination, conducting SDS-PAGE on the NMR samples before and after the experiments to validate the species that were observed [128]. acid contamination based on the 254/280 ratio [40], or protease contamination, conducting SDS-PAGE on the NMR samples before and after the experiments to validate the species that were observed [128]. The London group's MODEL was highly supported by their NMR data and other structural biology data, such as SAXS and MD simulations. The data explained the mechanistic structural changes as p66/p66 conformation became a p66/p51-like form and likely crossed a high-energy barrier from a folded RNH to unfolded RNH [63,70,101,102,125]. No assay to investigate actual p66/p51 production or biochemical characterization of the samples was shown in their studies. Since our group and the London group used an almost identical p66 amino acid sequence, the different observations may be explained by a difference in the relative domain orientations of the p66/p66 proteins, possibly introduced at the initial folding step during protein production or due to minor cofactors present in the samples (or not). In particular, we wonder whether their p66/p66 would mature to p66/p51 or p66/p5Xs, in which p5Xs is a molecular size between 66 kDa and 51 kDa. In addition, if the RNH is fully unfolded as suggested by their model, then intermediate products with lengths between those of p51 and p66, as well as p51 must be produced upon PR-mediated processing. Indeed, in a high-salt condition without tRNA Lys3 , we observed PR-proteolyzed products in the SDS-PAGE between the p51 and p66 bands [40]. However, intermediate species are not detected in the virus [37,90,98,144]. Thus, we favor a model where the RNH domains in p66/p66 are folded, and one RNH is destabilized to enable specific cleavage at the p51-RNH site. Alternatively, structural studies of the Pol region, in addition to isolated p66, may further answer the RT maturation mechanism question.

Conclusions
This article reviewed recent work on the structural characterization of the large, multi-domain HIV-1 RT precursor, p66/p66, by application of multiple NMR probes and other biochemical and biophysical methodologies. We argue that inconsistent observations can be solved using an orthogonal approach. Given that multiple conformers may co-exist in multi-domain proteins, we conclude that such integrated methodologies are critical to gain insight into the validity of the models generated.
Funding: This research was funded by National Institutes of Health, USA, grant number P50AI150481. The London group's MODEL was highly supported by their NMR data and other structural biology data, such as SAXS and MD simulations. The data explained the mechanistic structural changes as p66/p66 conformation became a p66/p51-like form and likely crossed a high-energy barrier from a folded RNH to unfolded RNH [63,70,101,102,125]. No assay to investigate actual p66/p51 production or biochemical characterization of the samples was shown in their studies. Since our group and the London group used an almost identical p66 amino acid sequence, the different observations may be explained by a difference in the relative domain orientations of the p66/p66 proteins, possibly introduced at the initial folding step during protein production or due to minor cofactors present in the samples (or not). In particular, we wonder whether their p66/p66 would mature to p66/p51 or p66/p5Xs, in which p5Xs is a molecular size between 66 kDa and 51 kDa. In addition, if the RNH is fully unfolded as suggested by their model, then intermediate products with lengths between those of p51 and p66, as well as p51 must be produced upon PR-mediated processing. Indeed, in a high-salt condition without tRNA Lys3 , we observed PR-proteolyzed products in the SDS-PAGE between the p51 and p66 bands [40]. However, intermediate species are not detected in the virus [37,90,98,144]. Thus, we favor a model where the RNH domains in p66/p66 are folded, and one RNH is destabilized to enable specific cleavage at the p51-RNH site. Alternatively, structural studies of the Pol region, in addition to isolated p66, may further answer the RT maturation mechanism question.

Conclusions
This article reviewed recent work on the structural characterization of the large, multi-domain HIV-1 RT precursor, p66/p66, by application of multiple NMR probes and other biochemical and biophysical methodologies. We argue that inconsistent observations can be solved using an orthogonal approach. Given that multiple conformers may co-exist in multi-domain proteins, we conclude that such integrated methodologies are critical to gain insight into the validity of the models generated.