A New Approach to 3D Modeling of Inhomogeneous Populations of Viral Regulatory RNA

Tertiary structure (3D) is the physical context of RNA regulatory activity. Retroviruses are RNA viruses that replicate through the proviral DNA intermediate transcribed by hosts. Proviral transcripts form inhomogeneous populations due to variable structural ensembles of overlapping regulatory RNA motifs in the 5′-untranslated region (UTR), which drive RNAs to be spliced or translated, and/or dimerized and packaged into virions. Genetic studies and structural techniques have provided fundamental input constraints to begin predicting HIV 3D conformations in silico. Using SimRNA and sets of experimentally-determined input constraints of HIVNL4-3 trans-activation responsive sequence (TAR) and pairings of unique-5′ (U5) with dimerization (DIS) or AUG motifs, we calculated a series of 3D models that differ in proximity of 5′-Cap and the junction of TAR and PolyA helices; configuration of primer binding site (PBS)-segment; and two host cofactors binding sites. Input constraints on U5-AUG pairings were most compatible with intramolecular folding of 5′-UTR motifs in energetic minima. Introducing theoretical constraints predicted metastable PolyA region drives orientation of 5′-Cap with TAR, U5 and PBS-segment helices. SimRNA and the workflow developed herein provides viable options to predict 3D conformations of inhomogeneous populations of large RNAs that have been intractable to conventional ensemble methods.

In-solution studies have established the dimer-prone secondary structure (U5-AUG model) exists in thermodynamic equilibrium with U5 pairing DIS (U5-DIS model) [4]. U5-DIS pairings reorient metastable AUG and CES sequences into branched multiple hairpins that characterize the monomer 5 -UTR (non-dimer prone) [4,8]. Nucleotide pairings favoring the dimer-prone 5 -UTR conformation or monomer conformation have been examined in-solution. Substitutions in DIS or destabilizing U5-AUG pairings diminish formation of dimers in synthetic RNAs and or RNA preparations from cells and virions [21].
Select base substitutions that destabilize U5-AUG and unpair adjacent PolyA nts significantly upregulate the HIV RNA translation rate in infected lymphocytes, demonstrating monomer structure significantly affects activity [22]. Unpaired PolyA nts have been shown to influence the structural topology of the 5 -cap site and modulate engagement by host cap-binding proteins [6,23]. The phylogenetic conservation of PolyA indicates that experimentally changing these nt-nt pairings may promote maximum viral translation efficiency, but does not necessarily maintain optimal efficiency of other structural motifs within the viral 5 -UTR [22].
3D models can provide a testing ground to interrogate base substitutions, including those that destabilized U5-AUG and PolyA stem to significantly upregulate HIV translation rate [28]. Current computational methods have the ability to predict 3D RNA conformations within small parameter space (<160 nt) or within larger molecules given biologically-determined input constraints [3,29,30]. Tertiary structure prediction tools, such as SimRNA have been compared and validated in the RNA Puzzles experiments [31][32][33]. SimRNA begins with known structural constraints in PDB format that have been derived from chemical and enzymatic mapping, crystallography, NMR, small angle X-ray scattering (SAXS), or Förster resonance energy transfer (FRET) and optimally, are validated by mutagenesis and biological assays. Structural information validated by genetic studies has been published of progressively larger fragments of HIV 5 -UTRs, providing a solid foundation of 2D input constraints for 3D modeling [4,6,9,[12][13][14][15][16][17][18][19][20]34,35]. 3D models have the potential to begin to guide experiments characterizing contiguous HIV 5 -UTRs in large 3D ensembles.
Herein we used SimRNA to perform simulations of the contiguous HIV 5 -UTR starting from published 2D pairings of TAR, PolyA, CES and U5-DIS (monomer) or dimer-prone U5-AUG (herein designated dimer). SimRNA evaluated the positive predictive value (PPV) of the input constraints and the sensitivity of output models. Input constraints were varied based on experimentally determined metastable U5 pairings and theoretical variables. Perspective on the capability of SimRNA to simulate HIV RNA interactions with centroid secondary structure, or no secondary structure constraint was also developed.

Materials and Methods
The HIV-1 NL4-3 5 -UTR primary sequence was downloaded from NCBI https://www.ncbi.nlm. nih.gov/nuccore/AF324493 4 September 2015 and the 5 -UTR starting with two 5 guanosine residues (Cap + G-356 nt) was studied using the SimRNA program [32,33]. The input to SimRNA was primary sequence and experimentally-determined 2D structural constraints provided in the dot-bracket language [36]. Variables in the input constraints were designated Monomer (U5-DIS model), Dimer (dimer-prone, U5-AUG model) and centroid WT with/out PolyA unpaired [4,37]. The specific parameters for the SimRNA calculations were 30 million iterations with 10 replica exchanges, for a total of 300 million iterations. Each SimRNA run required on order 60 h of clock time on the Pitzer Cluster at The Ohio State Supercomputer Center. SimRNA provided outputs in dot-bracket language, 3D images and PDB files. Agreement between input restraints and output base pairings were benchmarked for sensitivity and positive-predictive values. Input and Output nt-nt pairings were compared in the dot-bracket language, converted to Circle plots by CircleCompare (https://rna.urmc.rochester.edu/ RNAstructureWeb/Servers/CircleCompare.html, 1 December 2018) and traditional 2D sequence models were drawn in Adobe Photoshop 6. Visual Molecular Dynamics software package (VMD 1.9.3) [38] produced static views and movies of the Output models.

SimRNA Evaluated 3D Properties of HIV NL4-3 5 -UTR Beginning with 5 -Capped-Guanosine
All simulations used the same HIV NL4-3 primary sequence of 356 nt commencing with guanosine residues (5 -Cap + G). The workflow relied upon experimentally determined 5 -UTR conformers ( Figure 1A). Perspective on the capability of SimRNA to simulate HIV RNA was developed by comparing in silico constraints of centroid 2D or no secondary structure (noSS) inputs. The dimer-prone (herein designated Dimer) and Monomer 2D models ( Figure 1B,C) were previously determined by analysis of HIV RNA fragments [4,6,9,37]. The color scheme for each 5 -UTR conformation visualized TAR, light blue; PolyA stem, navy; U5 stem, cyan; PBS, green; DIS stem, orange ( Figure 1B). Input constraints were provided to SimRNA in Vienna dot-bracket language showing unpaired bases (dot) in relation to paired bases (brackets) ( Figure 1D). SimRNA output the medoid of the three most populated clusters of nt-nt pairings having the lowest 2% of free energy (Cluster 1, 2 or 3). Medoid of the largest cluster is the most representative tertiary conformation engendered by the input restraints (Cluster 1) and was displayed using VMD software and dot-bracket language according to the workflow ( Figure 1A).
SimRNA Output from the Dimer and Monomer Input restraints generated similar numbers of objects in the top 3 clusters, as expected for identical primary sequences (Table S1). The dot-bracket language readily visualized the position of each difference between the input and output ( Figure 1D). Dimer exceeded Monomer in the number of nt-nt pairings in common between the Input restraints and Output medoid (95% and 74%, respectively) (sensitivity, Table 1). Dimer Output also exceeded Monomer in the number of nt-nt pairings retained in 3D Output model that are in the Input experimentally-determined constraints (PPV, Table 1). These measurements indicated the dimer-prone U5-AUG pairings were energetically compatible with intramolecular folding of metastable regions between helices trapped in energetic minima.

Dimer Output 3D Model Predicted TAR-PolyA-U5 Converge near the 5 -Cap
The Dimer 3D tertiary prediction placed TAR (light blue) and PolyA (navy) helices stacking on one another ( Figure 2A; Movie 1). These results agree with published results of chemical mapping [8,10,39,40] and SAXS [34,35]. The 5 -Cap (magenta space-filling atom) lay at the junction of TAR and PolyA helices and the base of the U5 stem near G104-U105. In Figure 2B, the 3D Output focused on the U5 nt (cyan atoms) pairing with AUG (red atoms), matching the input constraint. 5 -Cap was stacked on G104-U105 (white and yellow atoms) at the junction of TAR and PolyA, and AUG was in close proximity, albeit paired with U5.
In Figure 2C, the 3D Output focused on the green PBS-segment (nt 180-225) and that DIS was surface-exposed (see yellow atoms). The 5 -splice site (ss) lay between helices formed by nts 300-330 of CES (Movie 1) [37]. To emphasize the differences in nt-nt pairings between input and output, Dimer Output pairings were displayed in a Circle plot and transcribed into the traditional secondary structure format ( Figure 3). Since 96% of the Input constraints maintained (Table 1), the ensemble exhibited minimal change from the Input nt-nt pairings provided for Dimer ( Figure 2B). The 3D model predicted stacking of G104-U105 that had not been apparent in 2D Input constraints. The model predicted the PBS-segment (nt 134-224) containing the double-stranded primer activation signal (PAS) [41] and the tRNA-like element (TLE) at the apex of helices formed of nt 134-179 [42] and, at the base of the helices, the double-stranded RNA binding site of DHX9/RNA helicase A (RHA) [28].

Monomer Output 3D Model Altered Accessibility of 5 -Cap and 5 -ss and Reoriented U5 Stem
The Monomer 3D model identified TAR and PolyA helices, but in a different orientation than Dimer ( Figure 4A; Movie 2). Whereas Dimer modeled coaxial arrangement of the TAR-PolyA helices, Monomer modeled V-shape due to differences in stacking interactions. Figure 4B focused on the close proximity of 5 -Cap, G104, U105 and DIS (orange atoms). Helices formed of U5-DIS pairs and downstream orange nt (e.g., 256-260) showed the 5 -Cap intercalated between these helices (compare Figure 4A,B; Movie 2). Figure 4C focused on the 5 ss at the apex of a stem loop. Gag AUG was paired close to the 3 -terminus of the 5 -UTR. The models depict TAR-PolyA-U5 stems intersect around the 5 -Cap ( Figure 4A). Notably, only 74% of the Input nt-nt pairings maintained in Monomer Output ( Table 1). The lower positive predictive value of Monomer versus Dimer (74% versus 87%, Table 1) suggested U5-DIS helices significantly changed intramolecular folding of 5-UTR relative to Dimer.
Circle plots compared the different nt pairings between Monomer Input restraints and Monomer Output (Supplementary Figure S1) and were used to generate the traditional secondary structure drawing ( Figure 5). The nt-nt pairings of TAR and PolyA maintained in the 3D Output model, but new pairings encompassed the U5 stem (cyan) through the PBS-segment (green) (compare Figure 5 with Figure 1C), consistent with the long-distance interaction model of Huthoff and Berkhout [8]. The 3D Monomer model eliminated the nt-nt pairings of PAS [41], TLE [42] and the binding site of DHX9/RHA [28]. We concluded tertiary interactions drove rearrangement of metastable Monomer Input restraints in silico. The Output model predicted three elements important to reverse transcription were incompatible with U5-DIS pairings.

The Unpairing of PolyA Nts Reduced Local Energy Minima in the Thermodynamic Equilibrium to Monomer Tertiary Structure
Unpaired residues modulate local energy minima in the thermodynamic equilibrium between structural intermediates [43,44]. Biological evidence has unequivocally demonstrated the nt-nt pairings observed in TAR are essential for HIV transcriptional trans-activation by Tat [45][46][47][48], whereas the nt-nt pairings of the PolyA stem are less stringently required [49]. To test Input constraints without PolyA pairing, we provided SimRNA the identical Monomer Input restraints, except residues 59-103 were unpaired (Mono PolyA unpaired).
SimRNA calculated 300 million iterations of the input restraints comparing Monomer and Mono PolyA unpaired. Output benchmarks improved for Mono PolyA unpaired, indicating metastable PolyA region favored intramolecular folding of the nt-nt pairings in energetic minima ( Table 1).
The percentage of pairings maintained between the Output and Input restraints was 87%, compared to 74% for Monomer (PPV , Table 1). Notably, PolyA pairings (n = 17) were completely restored in the top clusters of SimRNA models (Clusters 1 and 2, Figure S2).
Inspection of the 3D Output identified the expected helical structure of TAR ( Figure 6A,B) and U5 residues (105-110) paired with DIS residues (256-259) ( Figure 6A, see cyan and orange space-filling atoms) (Movie 3). The results suggest metastable PolyA nts tethered TAR and U5-DIS helices trapped in energetic minima. The metastable PolyA region changed proximity of 5 -Cap to G104 and U105 at the base of the U5 stem ( Figure 6A,B; Movie 3) and reoriented AUG near to U5, rather than near to PBS as observed in Monomer ( Figure 6A,B compared with Figure 4B). The 5 ss remained at the apex of a stem loop, yet was not intercalated between the PBS and residues of the 3-way junction, as observed in Monomer ( Figure 6A,B compared with Figure 4C).
Approximately 80% of the nt-nt pairings were in common with Monomer and all of the differences were downstream from PolyA stem ( Figure S2). We concluded the ensemble space of 5 -UTR conformation was reduced by PolyA nt pairings. These observations agree with the prior finding that unpaired residues play an important, passive role in HIV 5 -UTR secondary structure [7,50]. In sum, stacking of 5 -Cap with G104-U105 favored by input restraints pairing the PolyA nts.

TAR Input 2D Restraints Significantly Influence SimRNA Models
To add perspective on the power of laboratory versus computationally-derived constraints useful for SimRNA modeling, we ran two additional models, the first using the secondary structure of the centroid of the HIV ensemble as input and the second using no 2D constraint, only the primary sequence as input data (noSS). The centroid is an ab initio thermodynamic average of HIV NL4-3 2D ensemble (herein designated Centroid WT) [36,51,52]. Whereas the thermodynamic average of 2D RNA structures integrates the minimum free energy (MFE) of canonical base pairings (G-C, A-U, G-U) and folding constraints of nearest neighbor bases within short RNA strands [51], advances in dynamic programming algorithms take this approach a step further by monitoring ensembles of metastable structures [52]. This dynamic approach samples nt-nt pairings, calculating probabilities of concurrent sub-structures from the ensemble of metastable structures and structures trapped in energetic minima. Convergence on probabilistic structures is required to produce the predominant stems, bulges, internal and multi-helical loops and provide the most representative model, or centroid.
Centroid WT Input constraints did not demonstrate input constraints additional to the experimentally-determined Monomer and Dimer Input 2D constraints ( Figure S3A,B). Centroid WT secondary structure prediction mimicked the U5-AUG pairings observed for Dimer. Next, the computationally-derived Centroid WT input was processed 300 million iterations by SimRNA and the workflow processed results of the theoretical input constraints (Figure 7). Centroid WT Input pairings maintained and the top output cluster had 12 additional pairings ( Figure S2C).

Figure 7.
Approach to generate tertiary structure of HIV NL4-3 5 -UTR using theoretical Input constraints. Workflow presents two sources of theoretical Input restraints provided to SimRNA to predict 3D models of the HIV 5-UTR in the course of 300 million iterations on the Pitzer Supercomputer Cluster. The most representative 3D models were backed into Vienna dot-bracket language and 2D features were analyzed using CircleCompare and used to render 2D comparison with the input 2D model. The 3D model and Circleplot present the SimRNA simulation of HIV 5 -UTR with no secondary structure restraint (no SS). Label colors designate: TAR, light blue; PolyA stem, navy; U5 stem, cyan; PBS stem, green; DIS stem, orange; AUG stem, red. Atoms: 5 -Cap and 3 nt, magenta; U5, cyan; DIS, orange; 5 -splice site (5 ss), black; gag start codon, red.
Centroid WT predicted TAR and PolyA arranged coaxial helices and base stacking interaction between Cap, G104-U105 ( Figure 8A,B compared with Figure 4A,B). Unlike Dimer, the 5 ss was positioned at the apex of the stem loop formed by nt 280-300, as observed in Monomer ( Figure 8A,C compared with Figure 4A,C). The U5 stem was an extended helix (nt 141-177 and 225-280) topped with PBS (green) (Figures 8A,C and 9), similar to structural intermediates of HIV 5 -UTR dimerization [43,44,46,50].  The Output from Centroid WT recapitulated TAR and PolyA stems and pairing of U5 nts 134-140 (Figures 8 and 9). As summarized in Table 2, 60% of the identified pairings were maintained in Dimer and Monomer (Table 2), indicating overlap in the parameter spaces. The 3D simulations of Centroid WT Input restraints predicted thermodynamic stability of U5-AUG pairings, similar to Dimer, and apical position of the 5 ss that is similar to Monomer. Output ensembles of centroid WT and noSS had similar numbers of total objects as Dimer and Monomer (Table S1). However, the Output from noSS Input constraints recapitulated only 3% of the pairings within centroid WT (Table 2). These results indicated the energetic cost of TAR was excessive within the parameter space of 300 million iterations. Given noSS Input constraints, SimRNA sampled metastable structures without constraint on TAR, a condition comparable to basal HIV transcription that is Tat/TAR-independent independent. The results suggested the TAR helix significantly influences the thermodynamic equilibrium of the first~100 nt of the 5 -UTR.

Discussion
In the course of 300 million iterations of dynamic assemblies, SimRNA evaluated the 3D parameter space of the 5 -UTR beginning with experimentally-determined 2D constraints, theoretical constraints on PolyA nt pairing, and the secondary structure centroid prediction of its whole Boltzmann ensemble. SimRNA predictions suggest TAR helical stem and metastable PolyA region are fundamental input constraints determining the proximity of 5 -Cap to the junction of TAR and PolyA helices and the base of the U5 stem near G104-U105. The SimRNA 3D modeling agreed in principle with recent NMR results that PolyA nt pairing/unpairing propagates structural rearrangements throughout the 5 -UTR, perturbing the orientation of 5 -Cap, TAR and U5 residues, [6,53].
The metastable nt pairings in U5 and PBS-segment likely contribute to the alternate functionalities of the 5 -UTR in cellulo. Destabilized PBS-segment reduces affinity for DHX9/RHA and diminishes processivity of reverse transcriptase and virus infectivity [28,54,55]. Destabilized A59-U103 pairing significantly increase Gag translation rate in cells [22] and increase biochemical affinity of the HIVMAL 5 -Cap for host eIF4E [6]. The HIV late RNAs have been shown to specifically retain nuclear cap-binding complex (CBC) in polysomes and virions, instead of exchanging to eIF4E [56]. Recently, polysomal CBC RNPs were shown to activate eIF4E-independent translation of stress response proteins, overcoming AP-1 protein translation attenuation by mTOR inhibition [57]. Structural malleability near the HIV 5 -Cap may be beneficial to regulate cap-exchange and antagonize antiviral mTOR inhibition.
Tertiary modeling of polypeptide conformations has been integral to elucidating the functional mechanisms of enzymes, chaperones, many structural proteins and receptor-ligand interactions and for in silico modeling of small molecule therapeutics. Tertiary modeling of RNA structures has robust potential to guide the rational design of antiviral therapeutics [58][59][60][61]. Supportive evidence that alternative nt-nt pairings propagate structural changes throughout the HIV-1 leader RNA includes TAR-binding compounds exert structural effects outside TAR [62,63] and small molecule binding within the CES reduce virus titer [64]. Our results suggest programs like SimRNA can help to begin to predict RNA structure-function relationships testable in structural studies and cell-based studies [3].

Conclusions
By incorporating experimental and theoretical constraints, and performing 300 million iterations of dynamic ensembles, the SimRNA software package was used to simulate tertiary conformations of the HIV NL4-3 5 -UTR. The Output models predicted tertiary interactions in context of U5-AUG pairings placed G104-U105 stacking on 5 -Cap and maintained helices of the PBS-segment previously identified as PAS, TLE [41,42] and the double-stranded RNA binding site for DHX9/RHA [28]. Alternative U5 pairing with DIS reduced SimRNA agreement of input constraints with tertiary models. In conclusion, secondary structures that support early and late events in HIV replication divergently effect the 5 -UTR 3D models identified by the SimRNA algorithm. The simulation workflow developed herein provides a viable approach to model inhomogeneous large RNA populations.