Folding Kinetics of Riboswitch Transcriptional Terminators and Sequesterers

To function as gene regulatory elements in response to environmental signals, riboswitches must adopt specific secondary structures on appropriate time scales. We employ kinetic Monte Carlo simulation to model the time-dependent folding during transcription of TPP riboswitch expression platforms. According to our simulations, riboswitch transcriptional terminators, which must adopt a specific hairpin configuration by the time they have been transcribed, fold with higher efficiency than Shine-Dalgarno sequesterers, whose proper structure is required only at the time of ribosomal binding. Our findings suggest both that riboswitch transcriptional terminator sequences have been naturally selected for high folding efficiency, and that sequesterers can maintain their function even in the presence of significant misfolding.


Introduction
The riboswitch is a mechanism of self-regulation in messenger RNA that is found primarily in metabolic genes of bacteria. Riboswitches possess an aptamer that is capable of binding a specific ligand, and an expression platform that regulates the gene's expression according to the binding state of the aptamer [1]. The expression platform can regulate expression through formation of an intrinsic (rho-independent) terminator hairpin, by sequestering a Shine-Dalgarno ribosomal binding site, or by cleaving the messenger. The terminator hairpin operates by halting transcription while the Shine-Dalgarno sequesterer, also a hairpin, operates by preventing translation.
Because riboswitches function through conformational changes resulting from the ligand-bound or unbound state of the aptamer, they rely on both RNA thermodynamics and structural kinetics. Of particular importance is the secondary structure, the pattern of pairing among complementary bases (see Fig. 1). This secondary structure forms both during and after mRNA transcription [2,3], leading to a time-dependent free energy landscape for RNA folding. The importance of kinetics for the operation of some terminator-type riboswitches is supported by the presence of transcriptional pause sites following the aptamer and antiterminator [4].
Here we concentrate on the dynamical folding of the expression platform as it grows during transcription. A minimum free energy (MFE) structure adopted by an incomplete sequence may become metastable once the sequence is complete. The lifetime [5] of such a metastable structure may exceed the time allowed for the switch to function, leading to a failure of gene regulation. By comparing the folding efficiencies of transcriptional terminator-type riboswitch terminator hairpins with that of with translational sequesterers, we suggest that riboswitch transcriptional terminators have been naturally selected to fold reliably under the time constraint imposed by the mRNA transcription rate. By inspection of specific poorly folding sequesterers, we propose that even misfolded sequesterers may retain some function provided their Shine-Dalgarno sequence remains bound.

Sequences
Riboswitch aptamers are highly conserved and well annotated in the RFam database [6]. Unfortunately, the expression platform sequences are poorly conserved and are not generally annotated. Their hairpin topology is their main conserved feature, along with either a trailing poly-U pause site in a terminator (see Fig. 1) or a Shine-Dalgarno ribosomal binding site in a sequesterer. We choose to study a set of TPP (thiamine pyrophosphate, vitamin B 1 ) riboswitches whose expression platforms have been independently annotated [7]. Out of the 135 annotated riboswitches, 73 are classified as sequester-type, 52 as terminator-type, 9 as both terminator and sequesterer, and one as neither. We choose to examine only those expression platforms with a definite classification, and we include an additional four nucleotides on each end to provide genomic context for our folding studies. All sequences studied fold to a hairpin as their MFE structure, as annotated. Some statistical properties of the resulting sequences are shown in table 1. Notice that on average the sequesterers are longer than terminators by 10 nucleotides, and also that their lengths are highly variable. To explore the dependence of folding efficiency on length, we constructed an artificial family of extended terminators by adding 5 nucleotide pairs randomly to each terminator, drawing the additional pairs from the pairs already present in the original terminators. We then randomly shuffle the pairs while preserving the topology of bulges and loop in the minimum free energy structure.

Folding
For Minimum Free Energy calculations of secondary structures, RNAfold [8] is used with the ViennaRNA 1.4 energy model [9] at temperature T = 37 • C. Note that we will not be able to capture the influence of tertiary contacts or of pseudoknots within the confines of this model. The default energy Figure 1. Secondary structure of the aptamer and terminator of the Bacillus subtilis ykoF riboswitch. (top) Bound state, aptamer formed, transcription off. The P1 stem of the aptamer (pink) conflicts with the antiterminator (green), allowing formation of the terminator (blue), and thus halting transcription via the poly-U pause site (orange). (bottom) Unbound state, aptamer unformed, transcription on. Destabilizing the aptamer allows formation of the antiterminator, which conflicts with the terminator, and hence allows transcription to proceed. parameters for 1M NaCl are used despite cellular conditions being 150 − 250 mM Na + and 5 − 10 mM Mg 2+ , since the energetics of the secondary structures in these conditions are similar [10]. That is, 1M NaCl has approximately equivalent ionic strength to real cellular conditions, since the doubly-charged Mg 2+ is far more effective at compensating the phosphate backbone of nucleic acids than the singly charged Na + . A suitable validated energy model for true cellular conditions is not available [11]. Folding is simulated at the level of secondary structure by kinetic Monte Carlo using the ViennaRNA program kinfold [12]. The rate for transitions in kinfold is given in arbitrary units that require calibration to real time. As an estimate for the kinfold timescale, τ K is taken to be about 5µs/step, from the calibration of Liu and Ou-Yang [11]. To simulate folding during transcriptional growth, additional nucleotides are added to the 3 ′ end of the chain at regular time intervals. Typical bacterial transcription rates, R t , range from 20-80 nt/s, with 50 nt/s taken as standard. The possibility that a significant transcriptional pause might take place inside the antiterminator or the terminator is neglected though this could be important in some specific cases [4]. Simulating at the level of secondary structure is more efficient, though less realistic, than applying molecular dynamics to coarse-grained continuum models [13,14].

Terminator
Because we focus on the competition between folding rates and transcription rates, the chief parameter governing the simulation is the product ρ = 1/(τ K R t ), representing Monte Carlo step performed between each nucleotide addition. Our standard value is ρ = 4000 MC steps/nt transcribed. Because of the range of transcription rates R t , as well as the uncertainty concerning the timescale calibration τ K , we carry out simulations over a range of values of ρ. Our primary result, the high efficiency of terminator folding relative to sequesterers, holds over several orders of magnitude in ρ.

Statistical analysis of distributions
Results of this study will be presented in the form of distributions over repeated kinetic folding attempts for many individual sequences. For example, Fig. 2a   frequency P (f ) with which terminators reach a given fraction f of their MFE structures under our standard growth conditions. According to this figure, in our complete population of transcriptional terminators, each folded 100 times, 100% of the expected nt pairs are obtained in the majority of trials, and more than 60% of the expected pairs are obtained in all the trials. However, 0% of the expected pairs are obtained in a non-negligible subset of trials for Shine-Dalgarno sequesterers. It is not known what fraction of the MFE hairpin structure is required for successful termination, but one can set a threshold t anywhere between 1% and 80% with almost no impact on the fractions f of terminator folds that lie above and below this threshold, because the distribution nearly vanishes over this range. We use the fraction of expected pairs as a metric for termination, rather than the free energy of the folded structure, because deep metastable traps are precisely what is to be avoided for efficient folding and termination.
For a given sequence s, its folding efficiency e s is defined as the fraction of attempted folds that form a viable hairpin. Define the viability v of a fold as a function of the MFE structure fraction f This is precisely the fraction of attempted folds whose folding fraction f lies above the threshold t. As discussed above, considerable freedom exists in the choice of the threshold t, but t = 0.7 is taken as a reasonably conservative limit because a high fraction of the MFE structure is presumably required for actual functionality of the terminator. Hence we define a structure with a fraction f of its MFE base pairs below t = 0.7 as "misfolded".

Terminators vs. sequesterers
Riboswitch intrinsic terminator hairpins can be expected to fold with greater efficiencies than sequesterers because terminators act at the time of transcription. The constraint that terminators must perform within the transcription time means that terminators must fold quickly. Meanwhile, sequesterers act at the time of translation, effectively relaxing this constraint. Here the folding efficiencies of the sequesterers are compared to transcriptional terminator hairpins across a family of riboswitches. TPP-binding riboswitches are chosen, because of the availability of annotated terminator and sequesterer riboswitches [7].
According to Fig. 2, the terminator hairpins do indeed fold quite efficiently, with all but one of Rodionov's annotated terminators having a folding efficiency greater than 80% under our standard growth conditions. However, the sequesterers fold with substantially lower efficiency. Table 2 enumerates the numbers of hairpins of each type folding efficiently (e ≥ 80%) and inefficiently (e < 80%). The p-value for the null hypothesis (i.e. the assertion that the proportion of efficient sequesterers equals the proportion of efficient terminators) is p = 3 × 10 −7 (Fisher exact test), providing strong support for the claim that terminator-type riboswitch hairpins fold with higher efficiency during transcription than do sequesterer-types. Figure 3 shows the proportion of efficiently folding terminators and sequesterers for a range of timescales ρ = 1/(τ K R t ), allowing for τ K and R t to vary over orders of magnitude without affecting the conclusion that terminators fold with higher efficiency than sequesterers. What explains the relative folding efficiencies of terminators and sequesterers? As outlined in Table 1, some gross features of rho-independent terminator sequences differ from Shine-Dalgarno sequesterers. Perhaps the primary difference between them lies in their length distributions. TPP rho-independent terminators are 38 nucleotides long on average, while the Shine-Dalgarno sequesterers average 48 nucleotides in length. Indeed, longer sequences will tend to possess more and deeper metastable states  [15] for TPP rho-independent transcriptional terminators (left) and Shine-Dalgarno sequesterers (right). Regions 1-5 correspond, respectively, to the first half of the 5 ′ side of the stem, the second half of the same, the loop, the first half of the 3 ′ side of the stem, the second half of the same.   Fig. 2c, indicate that while there is some effect detrimental to efficient folding in the longer hairpins, this length difference alone does not account for the difference in folding efficiencies. Furthermore, while we note a weak correlation of decreasing efficiency with increasing terminator sequence length, neither the extended terminators nor the sequesterers exhibit any significant correlation between efficiency and length.
A second difference lies in the nucleotides frequencies (Table 1) and their distribution among five regions of the hairpins as illustrated in Fig. 4. Here region 3 represents the hairpin loop, with regions 1 and 2 lying along the 5 ′ side of the hairpin and regions 4 and 5 along the 3 ′ side. Terminators exhibit an excess of U in region 5 associated with the beginning of the poly-U pause site, and a weak corresponding enhancement of complementary A nucleotides in region 1. Sequesterers, in contrast, exhibit an enhancement of A and G in regions 4 and 5 corresponding to the Shine-Dalgarno consensus sequence of AGGAGG, and a corresponding enhancement of complementary C and U in regions 1 and 2. Another difference is the excess U in the loop region 3 of terminators that can be attributed to an internal pause site allowing time for aptamer and antiterminator folding [4] prior to completion of terminator transcription. The enhancement of the specifically-binding C and its non-complementary U in regions 1 and 2 of the sequesterer might have been expected to aid in folding efficiency, yet still the terminators, dominated in most regions by the promiscuously-binding U and G, manage to fold with relatively high efficiency. However, the weak enhancement of specifically-binding A in region 1 of the terminator, complementary to the poly-U pause site in region 5, may play some small role in terminator folding efficiency.
Overall, neither the differences in sequence length nor in nucleotide content appear capable of explaining the difference in folding efficiency between terminators and sequesterers. The most likely explanation available is simply that the folding efficiencies differ as a result of natural selection. Selection pressure apparently favors relatively short hairpins and disfavors sequences containing metastable traps in terminators that must fold under the constraint of short transcription time. This selection pressure is reduced or absent in the case of Shine-Dalgarno sequesterers. Indeed, as evidenced in Fig. 3, many sequesterers fail to fold efficiently even on very long time scales. Perhaps sequesters Figure 5. Alternate folds of low efficiency terminators and sequesterers. (a,b) Most common specific fold and MFE structure of the Tte-ThiD terminator sequence. Nucleotides forming stem of terminator are highlighted in blue, while poly-U pause site is in orange. (c,d) Most common specific fold and MFE structure of Sm-ThiC. Shine-Dalgarno sequence is highlighted in blue, while the translation start site is highlighted in orange.
function in an ensemble of metastable structures, provided the Shine-Dalgarno sequence remains bound, while in contrast transcriptional terminators require very specific structures in order to function [16][17][18].

Specific examples
Here we analyze specific cases of poorly folding terminators and sequesterers. The most poorly folding terminator is the ThiD terminator of Thermoanaerobacter tengcongensis (Tte) which folds with efficiency e s =0.18 at the fastest transcription rate (smallest timescale) ρ = 1/τ K R t =250 MC steps/nt transcribed. Similarly, the ThiC riboswitch of Sinorhizobium meliloti (Sm) stands out for having the lowest observed efficiency (e s = 0.292) at the slowest transcription rate (largest timescale), ρ=512000 MC steps/nt transcribed. Two alternate folds of each sequence are illustrated in Fig. 5. The most common specific fold of Tte-ThiD (Fig. 5a), which occurs in 35% of folding attempts, shares no common pairs with the MFE structure (Fig. 5b), which occurs in 6% of folding attempts. Likewise, for Sm-ThiC, the most common specific fold (Fig. 5c) occurs in 10% of attempts and shares no common pairs with the MFE structure (Fig. 5d), which occurs in 3% of attempts.
The misfolded terminator (Fig. 5a) lacks the necessary hairpin preceding the poly-U pause site that terminates transcription. It is notable that the Shine-Dalgarno sequence remains sequestered in the misfolded sequesterer, suggesting that perhaps the function is preserved. This might explain how low folding efficiency sequesterers could remain functional even while misfolded on the time scale of translation initiation.
At the largest timescale, ρ=512000, the efficiency of Tte-ThiD rises to 91%. To understand the high efficiency of Tte-ThiD relative to Sm-ThiC at long times, we compare their free energy landscapes in Fig. 6. The misfold of Tte-ThiD is relatively weakly bound (only -2.3 kcal/mol) with a barrier of 5.8 kcal/mol separating the misfold from the MFE structure. This barrier has high entropy, as it Figure 6. Free energy landscapes in units of kcal/mol. Completely unbound structures have energy 0. Basins of depth less than 2.5 have been suppressed Tte-ThiD (left) structure number 6 corresponds to the most common fold (Fig. 5a). Sm-ThiC (right) structure number 5 corresponds to the most common fold (Fig. 5c). In both cases, structure number 1 is the MFE fold (Figs. 5(b,d)).  corresponds to complete unfolding followed by almost any single base pairing yielding a net energy for the saddle state of +3.5 kcal/mol. This high barrier entropy reduces the effective free energy barrier [5]. Furthermore, as Tte is a thermophile, relatively high thermal energy is available to aid in escape from metastable traps. In contrast, Sm-ThiC is relatively strongly bound (-11.6 kcal/mol). The saddle state separating the misfold from the MFE is only partially unbound, at energy -4.0, but the net barrier of 7.6 kcal is nearly 2 kcal/mol (about 3RT ) larger than for Tte-ThiD and also is relatively low entropy.
The common misfolds of both Tte-ThiD and Sm-ThiC share a common feature -their paired nucleotides lie to the 5 ′ (earlier transcribed) side of the pairs comprising the MFE structures. That is, they contain structure that can form before the sequence is fully transcribed. To see how widespread this mechanism is, we examined the 16 sequesterers that fold with efficiency less that 0.5 at our standard transcription rate, ρ=4000. In all but one case the most common misfold places the hairpin loop to the 5 ′ side of its location in the MFE structure. That is, they involve structures that can form earlier in time than the MFE. The sole exception, is a very short sequence for which a few missing pairs reduce the matched fraction f below 0.7 even while the sequence lies in the basin of the MFE structure.

Conclusions
In conclusion, this study addressed whether riboswitch transcriptional terminators fold with unusually high efficiency, indicating selection for reliability of folding. It was shown that transcriptional terminators in TPP riboswitches are unusually easy to fold during transcription in comparison with Shine-Dalgarno sequesterers, resulting in a strongly significant p-value for the null hypothesis. Experimental validation of this prediction might be feasible using optical tweezer studies [19].
Detailed examination of a specific terminator (Tte-ThiD and sequesterer (Sm-ThiC) which fold with relatively low efficiency, reveals a generic mechanism for misfolding, namely trapping into minimum free energy conformations of partially transcribed sequences that become potentially long-lived metastable states of the fully transcribed sequence. We also suggest that sequesterers may be more tolerant of misfolds than terminators, provided that the Shine-Dalgarno sequence remains bound in the misfolded structure.