## 1. Introduction

The riboswitch is a mechanism of self-regulation in messenger RNA that is found primarily in metabolic genes of bacteria. Riboswitches possess an aptamer that is capable of binding a specific ligand and an expression platform that regulates the gene’s expression according to the binding state of the aptamer [

1]. The expression platform can regulate expression through formation of an intrinsic (rho-independent) terminator hairpin, by sequestering a Shine-Dalgarno ribosomal binding site, or by cleaving the messenger. The terminator hairpin operates by halting transcription, while the Shine-Dalgarno sequesterer, also a hairpin, operates by preventing translation.

Because riboswitches function through conformational changes resulting from the ligand-bound or unbound state of the aptamer, they rely on both RNA thermodynamics and structural kinetics. Of particular importance is the secondary structure, the pattern of pairing among complementary bases (see

Figure 1). This secondary structure forms both during and after mRNA transcription [

2,

3], leading to a time-dependent free energy landscape for RNA folding. The importance of kinetics for the operation of some terminator-type riboswitches is supported by the presence of transcriptional pause sites following the aptamer and antiterminator [

4].

**Figure 1.**
Secondary structure of the aptamer and terminator of the Bacillus subtilis ykoF riboswitch. (**a**) Bound state, aptamer formed, transcription off. The P1 stem of the aptamer (pink) conflicts with the antiterminator (green), allowing formation of the terminator (blue), and, thus, halting transcription via the poly-U pause site (orange). (**b**) Unbound state, aptamer unformed, transcription on. Destabilizing the aptamer allows formation of the antiterminator, which conflicts with the terminator, and, hence, allows transcription to proceed.

**Figure 1.**
Secondary structure of the aptamer and terminator of the Bacillus subtilis ykoF riboswitch. (**a**) Bound state, aptamer formed, transcription off. The P1 stem of the aptamer (pink) conflicts with the antiterminator (green), allowing formation of the terminator (blue), and, thus, halting transcription via the poly-U pause site (orange). (**b**) Unbound state, aptamer unformed, transcription on. Destabilizing the aptamer allows formation of the antiterminator, which conflicts with the terminator, and, hence, allows transcription to proceed.

Here, we concentrate on the dynamical folding of the expression platform as it grows during transcription. A minimum free energy (MFE) structure adopted by an incomplete sequence may become metastable once the sequence is complete. The lifetime [

5] of such a metastable structure may exceed the time allowed for the switch to function, leading to a failure of gene regulation. By comparing the folding efficiencies of transcriptional terminator-type riboswitch terminator hairpins with that of with translational sequesterers, we suggest that riboswitch transcriptional terminators have been naturally selected to fold reliably under the time constraint imposed by the mRNA transcription rate. By inspection of specific poorly folding sequesterers, we propose that even misfolded sequesterers may retain some function, provided their Shine-Dalgarno sequence remains bound.

## 2. Methods

#### 2.1. Sequences

Riboswitch aptamers are highly conserved and well annotated in the RFam database [

6]. Unfortunately, the expression platform sequences are poorly conserved and are not generally annotated. Their hairpin topology is their main conserved feature, along with either a trailing poly-U pause site in a terminator (see

Figure 1) or a Shine-Dalgarno ribosomal binding site in a sequesterer. We choose to study a set of TPP (thiamine pyrophosphate, vitamin B

_{1}) riboswitches whose expression platforms have been independently annotated [

7]. Out of the 135 annotated riboswitches, 73 are classified as sequester-type, 52 as terminator-type, nine as both terminator and sequesterer and one as neither. We choose to examine only those expression platforms with a definite classification, and we include an additional four nucleotides on each end to provide genomic context for our folding studies. All sequences studied fold to a hairpin as their MFE structure, as annotated. Some statistical properties of the resulting sequences are shown in

Table 1. Notice that, on average, the sequesterers are longer than terminators by 10 nucleotides, and also that their lengths are highly variable. To explore the dependence of folding efficiency on length, we constructed an artificial family of extended terminators by adding five nucleotide pairs randomly to each terminator, drawing the additional pairs from the pairs already present in the original terminators. We then randomly shuffle the pairs while preserving the topology of bulges and loop in the minimum free energy structure.

#### 2.2. Folding

For minimum free energy calculations of secondary structures,

`RNAfold` [

8] is used with the

`ViennaRNA` 1.4 energy model [

9] at temperature

$T={37}^{\circ}$C. Note that we will not be able to capture the influence of tertiary contacts or of pseudoknots within the confines of this model. The default energy parameters for 1M NaCl are used despite cellular conditions being 150–250 mM Na

^{+} and 5–10 mM Mg

${}^{2+}$, since the energetics of the secondary structures in these conditions are similar [

10]. That is, 1 M NaCl has approximately equivalent ionic strength to real cellular conditions, since the doubly-charged Mg

${}^{2+}$ is far more effective at compensating the phosphate backbone of nucleic acids than the singly charged Na

${}^{+}$. A suitable validated energy model for true cellular conditions is not available [

11].

**Table 1.**
Sequesterer, terminator and extended terminator sequence average properties. Minimum free energy (MFE) frequency and ensemble diversity represent the frequency of the MFE structure and the diversity of the secondary structure ensemble at T = 37

${}^{\circ}$C, as obtained from

`RNAfold` [

8]. A, C, G and U are nucleotide fractions.

**Table 1.**
Sequesterer, terminator and extended terminator sequence average properties. Minimum free energy (MFE) frequency and ensemble diversity represent the frequency of the MFE structure and the diversity of the secondary structure ensemble at T = 37 ${}^{\circ}$C, as obtained from `RNAfold` [8]. A, C, G and U are nucleotide fractions.
| Sequesterers | Terminators | Extended |
---|

Length (nucleotides) | 47.7±11.2 | 37.8±5.7 | 47.8±5.7 |

MFE (kCal/mol) | −14.9 | −16.3 | −27.4 |

MFE frequency | 0.35 | 0.52 | 0.53 |

Ensemble diversity | 1.97 | 0.70 | 0.58 |

A % | 23.8 | 21.7 | 22.3 |

C % | 26.1 | 18.3 | 18.5 |

G % | 23.7 | 19.7 | 20.9 |

U % | 26.4 | 40.3 | 38.3 |

Folding is simulated at the level of secondary structure by kinetic Monte Carlo using the

`ViennaRNA` program

`kinfold` [

12]. The rate for transitions in

`kinfold` is given in arbitrary units that require calibration to real time. As an estimate for the

`kinfold` timescale,

${\tau}_{K}$ is taken to be about

$5\phantom{\rule{3.33333pt}{0ex}}\mu s/\mathrm{step}$, from the calibration of Liu and Ou-Yang [

11]. To simulate folding during transcriptional growth, additional nucleotides are added to the

${3}^{\prime}$ end of the chain at regular time intervals. Typical bacterial transcription rates,

${R}_{t}$, range from 20−80 nt/s, with 50 nt/s taken as standard. The possibility that a significant transcriptional pause might take place inside the antiterminator or the terminator is neglected, though this could be important in some specific cases [

4]. Simulating at the level of secondary structure is more efficient, though less realistic, than applying molecular dynamics to coarse-grained continuum models [

13,

14].

Because we focus on the competition between folding rates and transcription rates, the chief parameter governing the simulation is the product, $\rho =1/\left({\tau}_{K}{R}_{t}\right)$, representing Monte Carlo (MC) steps performed between each nucleotide addition. Our standard value is $\rho =4000$ MC steps/nt transcribed. Because of the range of transcription rates, ${R}_{t}$, as well as the uncertainty concerning the timescale calibration, ${\tau}_{K}$, we carry out simulations over a range of values of ρ. Our primary result, the high efficiency of terminator folding relative to sequesterers, holds over several orders of magnitude in ρ.

#### 2.3. Statistical Analysis of Distributions

Results of this study will be presented in the form of distributions over repeated kinetic folding attempts for many individual sequences. For example,

Figure 2a displays a histogram showing the relative frequency,

$P\left(f\right)$, with which terminators reach a given fraction,

f, of their MFE structures under our standard growth conditions. According to this figure, in our complete population of transcriptional terminators, each folded 100 times, 100% of the expected nt pairs are obtained in the majority of trials, and more than 60% of the expected pairs are obtained in all the trials. However, 0% of the expected pairs are obtained in a non-negligible subset of trials for Shine-Dalgarno sequesterers. It is not known what fraction of the MFE hairpin structure is required for successful termination, but one can set a threshold,

t, anywhere between 1% and 80%, with almost no impact on the fractions,

f, of terminator folds that lie above and below this threshold, because the distribution nearly vanishes over this range. We use the fraction of expected pairs as a metric for termination, rather than the free energy of the folded structure, because deep metastable traps are precisely what is to be avoided for efficient folding and termination.

**Figure 2.**
Normalized distributions of folding performances for terminator- and sequesterer-type riboswitches from a wide range of prokaryotes. (**a**) Histograms of folding fractions, f, combined for all sequences, s, of each specific type. (**b**) Histograms of folding efficiencies, ${e}_{s}$, for individual sequences, s. (**c**) Cumulative distribution of folding efficiencies, including extended terminators.

**Figure 2.**
Normalized distributions of folding performances for terminator- and sequesterer-type riboswitches from a wide range of prokaryotes. (**a**) Histograms of folding fractions, f, combined for all sequences, s, of each specific type. (**b**) Histograms of folding efficiencies, ${e}_{s}$, for individual sequences, s. (**c**) Cumulative distribution of folding efficiencies, including extended terminators.

For a given sequence, s, its folding efficiency, ${e}_{s}$, is defined as the fraction of attempted folds that form a viable hairpin. Define the viability, v, of a fold as a function of the MFE structure fraction, f, through the equation, $v\left(f\right)=\theta (f-t)$, where θ is the step function and t is the viability threshold. Thus, ${e}_{s}=\langle v\left(f\right)\rangle $, averaged over independent folding attempts. If sequence s has probability distribution, ${P}_{s}\left(f\right)$, for folding to MFE fraction, f, its efficiency can be evaluated as ${e}_{s}={\int}_{0}^{1}{P}_{s}\left(f\right)v\left(f\right)\mathrm{d}f={\int}_{t}^{1}{P}_{s}\left(f\right)\mathrm{d}f$. This is precisely the fraction of attempted folds, whose folding fraction, f, lies above the threshold, t. As discussed above, considerable freedom exists in the choice of the threshold, t, but $t=0.7$ is taken as a reasonably conservative limit, because a high fraction of the MFE structure is presumably required for actual functionality of the terminator. Hence, we define a structure with a fraction, f, of its MFE base pairs below $t=0.7$ as “misfolded”.

## 3. Terminators vs. Sequesterers

Riboswitch intrinsic terminator hairpins can be expected to fold with greater efficiencies than sequesterers, because terminators act at the time of transcription. The constraint that terminators must perform within the transcription time means that terminators must fold quickly. Meanwhile, sequesterers act at the time of translation, effectively relaxing this constraint. Here, the folding efficiencies of the sequesterers are compared to transcriptional terminator hairpins across a family of riboswitches. TPP-binding riboswitches are chosen, because of the availability of annotated terminator and sequesterer riboswitches [

7].

**Figure 3.**
Proportion of thiamine pyrophosphate (TPP) terminators (black line) and sequesterers (red line) that fold efficiently (i.e., with $e\ge 0.8$) at various timescales, ρ. Data points indicate the individual folding efficiencies, ${e}_{s}$, of each hairpin sequence, s. The green line at $\rho =4000$ Monte Carlo (MC) steps/nt transcribed indicates the timescale for ${\tau}_{K}=5\mu $s and ${R}_{t}=50$ nt/s.

**Figure 3.**
Proportion of thiamine pyrophosphate (TPP) terminators (black line) and sequesterers (red line) that fold efficiently (i.e., with $e\ge 0.8$) at various timescales, ρ. Data points indicate the individual folding efficiencies, ${e}_{s}$, of each hairpin sequence, s. The green line at $\rho =4000$ Monte Carlo (MC) steps/nt transcribed indicates the timescale for ${\tau}_{K}=5\mu $s and ${R}_{t}=50$ nt/s.

According to

Figure 2, the terminator hairpins do indeed fold quite efficiently, with all but one of Rodionov’s annotated terminators having a folding efficiency greater than 80% under our standard growth conditions. However, the sequesterers fold with substantially lower efficiency.

Table 2 enumerates the numbers of hairpins of each type, folding efficiently (

$e\ge 80\%$) and inefficiently (

$e<80\%$). The

p-value for the null hypothesis (

i.e., the assertion that the proportion of efficient sequesterers equals the proportion of efficient terminators) is

$p=3\times {10}^{-7}$ (Fisher exact test), providing strong support for the claim that terminator-type riboswitch hairpins fold with higher efficiency during transcription than do sequesterer-types.

Figure 3 shows the proportion of efficiently folding terminators and sequesterers for a range of timescales,

$\rho =1/\left({\tau}_{K}{R}_{t}\right)$, allowing for

${\tau}_{K}$ and

${R}_{t}$ to vary over orders of magnitude without affecting the conclusion that terminators fold with higher efficiency than sequesterers.

**Table 2.**
Efficiency table for Fisher’s exact test comparing terminator hairpins to Shine-Dalgarno sequesterers, when grown at 50 nt/s and assuming ${\tau}_{K}=5\phantom{\rule{3.33333pt}{0ex}}\mu s$.

**Table 2.**
Efficiency table for Fisher’s exact test comparing terminator hairpins to Shine-Dalgarno sequesterers, when grown at 50 nt/s and assuming ${\tau}_{K}=5\phantom{\rule{3.33333pt}{0ex}}\mu s$.
| Efficient | Inefficient |
---|

| (e ≥ 0.8) | (e < 0.8) |
---|

Terminator | 51 | 1 |

Sequesterer | 45 | 28 |

What explains the relative folding efficiencies of terminators and sequesterers? As outlined in

Table 1, some gross features of rho-independent terminator sequences differ from Shine-Dalgarno sequesterers. Perhaps the primary difference between them lies in their length distributions. TPP rho-independent terminators are 38 nucleotides long on average, while the Shine-Dalgarno sequesterers average 48 nucleotides in length. Indeed, longer sequences will tend to possess more and deeper metastable states that would compete with the MFE state. The length dependence of folding efficiency was tested by duplicating five base pairs in each TPP rho-independent terminator in order to mimic the lengths of sequesterers. The results shown in

Figure 2c indicate that while there is some effect detrimental to efficient folding in the longer hairpins, this length difference alone does not account for the difference in folding efficiencies. Furthermore, while we note a weak correlation of decreasing efficiency with increasing terminator sequence length, neither the extended terminators nor the sequesterers exhibit any significant correlation between efficiency and length.

**Figure 4.**
Frequency weighted sequence logos [

15] for TPP rho-independent transcriptional terminators (

**a**) and Shine-Dalgarno sequesterers (

**b**). Regions 1–5 correspond, respectively, to the first half of the

${5}^{\prime}$ side of the stem, the second half of the same, the loop, the first half of the

${3}^{\prime}$ side of the stem and the second half of the same.

**Figure 4.**
Frequency weighted sequence logos [

15] for TPP rho-independent transcriptional terminators (

**a**) and Shine-Dalgarno sequesterers (

**b**). Regions 1–5 correspond, respectively, to the first half of the

${5}^{\prime}$ side of the stem, the second half of the same, the loop, the first half of the

${3}^{\prime}$ side of the stem and the second half of the same.

A second difference lies in the nucleotides frequencies (

Table 1) and their distribution among five regions of the hairpins, as illustrated in

Figure 4. Here, region 3 represents the hairpin loop, with regions 1 and 2 lying along the

${5}^{\prime}$ side of the hairpin and regions 4 and 5 along the

${3}^{\prime}$ side. Terminators exhibit an excess of U in region 5 associated with the beginning of the poly-U pause site and a weak corresponding enhancement of complementary A nucleotides in region 1. Sequesterers, in contrast, exhibit an enhancement of A and G in regions 4 and 5, corresponding to the Shine-Dalgarno consensus sequence of AGGAGG and a corresponding enhancement of complementary C and U inregions 1 and 2. Another difference is the excess U in the loop region 3 of terminators that can be attributed to an internal pause site, allowing time for aptamer and antiterminator folding [

4] prior to completion of terminator transcription. The enhancement of the specifically-binding C and its non-complementary U in regions 1 and 2 of the sequesterer might have been expected to aid in folding efficiency; yet, still, the terminators, dominated in most regions by the promiscuously-binding U and G, manage to fold with relatively high efficiency. However, the weak enhancement of specifically-binding A in region 1 of the terminator, complementary to the poly-U pause site in region 5, may play some small role in terminator folding efficiency.

Overall, neither the differences in sequence length nor in nucleotide content appear capable of explaining the difference in folding efficiency between terminators and sequesterers. The most likely explanation available is simply that the folding efficiencies differ as a result of natural selection. Selection pressure apparently favors relatively short hairpins and disfavors sequences containing metastable traps in terminators that must fold under the constraint of short transcription time. This selection pressure is reduced or absent in the case of Shine-Dalgarno sequesterers. Indeed, as evidenced in

Figure 3, many sequesterers fail to fold efficiently, even on very long time scales. Perhaps sequesters function in an ensemble of metastable structures, provided the Shine-Dalgarno sequence remains bound, while in contrast, transcriptional terminators require very specific structures in order to function [

16,

17,

18].

## 4. Specific Examples

Here, we analyze specific cases of poorly folding terminators and sequesterers. The most poorly folding terminator is the

ThiD terminator of

Thermoanaerobacter tengcongensis (

Tte), which folds with efficiency,

${e}_{s}$ = 0.18, at the fastest transcription rate (smallest timescale),

$\rho =1/{\tau}_{K}{R}_{t}$ = 250 MC steps/nt transcribed. Similarly, the

ThiC riboswitch of

Sinorhizobium meliloti (

Sm) stands out for having the lowest observed efficiency (

${e}_{s}=0.292$) at the slowest transcription rate (largest timescale),

ρ = 512,000 MC steps/nt transcribed. Two alternate folds of each sequence are illustrated in

Figure 5. The most common specific fold of

Tte-ThiD (

Figure 5a), which occurs in 35% of folding attempts, shares no common pairs with the MFE structure (

Figure 5b), which occurs in 6% of folding attempts. Likewise, for

Sm-ThiC, the most common specific fold (

Figure 5c) occurs in 10% of attempts and shares no common pairs with the MFE structure (

Figure 5d), which occurs in 3% of attempts.

The misfolded terminator (

Figure 5a) lacks the necessary hairpin preceding the poly-

U pause site that terminates transcription. It is notable that the Shine-Dalgarno sequence remains sequestered in the misfolded sequesterer, suggesting that perhaps the function is preserved. This might explain how low folding efficiency sequesterers could remain functional, even while misfolded on the time scale of translation initiation.

**Figure 5.**
Alternate folds of low efficiency terminators and sequesterers. (**a,b**) Most common specific fold and MFE structure of the Tte-ThiD terminator sequence. Nucleotides forming the stem of the terminator are highlighted in blue, while the poly-U pause site is in orange. (**c,d**) Most common specific fold and MFE structure of Sm-ThiC. The Shine-Dalgarno sequence is highlighted in blue, while the translation start site is highlighted in orange.

**Figure 5.**
Alternate folds of low efficiency terminators and sequesterers. (**a,b**) Most common specific fold and MFE structure of the Tte-ThiD terminator sequence. Nucleotides forming the stem of the terminator are highlighted in blue, while the poly-U pause site is in orange. (**c,d**) Most common specific fold and MFE structure of Sm-ThiC. The Shine-Dalgarno sequence is highlighted in blue, while the translation start site is highlighted in orange.

At the largest timescale,

ρ = 512,000, the efficiency of

Tte-ThiD rises to 91%. To understand the high efficiency of

Tte-ThiD relative to

Sm-ThiC at long times, we compare their free energy landscapes in

Figure 6. The misfold of

Tte-ThiD is relatively weakly bound (only −2.3 kcal/mol), with a barrier of 5.8 kcal/mol separating the misfold from the MFE structure. This barrier has high entropy, as it corresponds to complete unfolding, followed by almost any single base pairing, yielding a net energy for the saddle state of +3.5 kcal/mol. This high barrier entropy reduces the effective free energy barrier [

5]. Furthermore, as

Tte is a thermophile, relatively high thermal energy is available to aid in the escape from metastable traps. In contrast,

Sm-ThiC is relatively strongly bound (−11.6 kcal/mol). The saddle state separating the misfold from the MFE is only partially unbound, at an energy of −4.0, but the net barrier of 7.6 kcal is nearly 2 kcal/mol (about

$3RT$) larger than for

Tte-ThiD and also is relatively low entropy.

The common misfolds of both Tte-ThiD and Sm-ThiC share a common feature—their paired nucleotides lie to the ${5}^{\prime}$ (earlier transcribed) side of the pairs comprising the MFE structures. That is, they contain structure that can form before the sequence is fully transcribed. To see how widespread this mechanism is, we examined the 16 sequesterers that fold with efficiency less that 0.5 at our standard transcription rate, ρ = 4000. In all but one case, the most common misfold places the hairpin loop to the ${5}^{\prime}$ side of its location in the MFE structure. That is, they involve structures that can form earlier in time than the MFE. The sole exception is a very short sequence for which a few missing pairs reduce the matched fraction, f, below 0.7, even while the sequence lies in the basin of the MFE structure.

**Figure 6.**
Free energy landscapes in units of kcal/mol. Completely unbound structures have energy of zero. Basins of depth less than 2.5 have been suppressed

Tte-ThiD. (

**a**) Structure number 6 corresponds to the most common fold (

Figure 5a).

Sm-ThiC; (

**b**) Structure number 5 corresponds to the most common fold (

Figure 5c). In both cases, structure number 1 is the MFE fold (

Figure 5b,d).

**Figure 6.**
Free energy landscapes in units of kcal/mol. Completely unbound structures have energy of zero. Basins of depth less than 2.5 have been suppressed

Tte-ThiD. (

**a**) Structure number 6 corresponds to the most common fold (

Figure 5a).

Sm-ThiC; (

**b**) Structure number 5 corresponds to the most common fold (

Figure 5c). In both cases, structure number 1 is the MFE fold (

Figure 5b,d).

## 5. Conclusions

In conclusion, this study addressed whether riboswitch transcriptional terminators fold with unusually high efficiency, indicating selection for the reliability of folding. It was shown that transcriptional terminators in TPP riboswitches are unusually easy to fold during transcription in comparison with Shine-Dalgarno sequesterers, resulting in a strongly significant

p-value for the null hypothesis. Experimental validation of this prediction might be feasible using optical tweezer studies [

19].

Detailed examination of a specific terminator (Tte-ThiD and sequesterer (Sm-ThiC), which fold with relatively low efficiency, reveals a generic mechanism for misfolding, namely trapping into minimum free energy conformations of partially transcribed sequences that become potentially long-lived metastable states of the fully transcribed sequence. We also suggest that sequesterers may be more tolerant of misfolds than terminators, provided that the Shine-Dalgarno sequence remains bound in the misfolded structure.