Site-Specific Incorporation of Functional Components into RNA by an Unnatural Base Pair Transcription System

Toward the expansion of the genetic alphabet, an unnatural base pair between 7-(2-thienyl)imidazo[4,5-b]pyridine (Ds) and pyrrole-2-carbaldehyde (Pa) functions as a third base pair in replication and transcription, and provides a useful tool for the site-specific, enzymatic incorporation of functional components into nucleic acids. We have synthesized several modified-Pa substrates, such as alkylamino-, biotin-, TAMRA-, FAM-, and digoxigenin-linked PaTPs, and examined their transcription by T7 RNA polymerase using Ds-containing DNA templates with various sequences. The Pa substrates modified with relatively small functional groups, such as alkylamino and biotin, were efficiently incorporated into RNA transcripts at the internal positions, except for those less than 10 bases from the 3′-terminus. We found that the efficient incorporation into a position close to the 3′-terminus of a transcript depended on the natural base contexts neighboring the unnatural base, and that pyrimidine-Ds-pyrimidine sequences in templates were generally favorable, relative to purine-Ds-purine sequences. The unnatural base pair transcription system provides a method for the site-specific functionalization of large RNA molecules.


Introduction
Expansion of the genetic alphabet of DNA by an unnatural base pair system could be a powerful tool for the site-specific incorporation of extra functional components into nucleic acids and proteins. The realization of this genetic expansion system requires the development of an unnatural base pair that functions in biological systems, such as replication, transcription, and translation, with highly exclusive selectivity as a third base pair, along with the natural A-T and G-C pairs. Researchers are attempting to create expanded systems, and many unnatural base pairs have been designed and tested in in vitro biological systems [1][2][3][4][5]. Among them, some unnatural base pairs have exhibited high selectivity as a third base pair in PCR amplification and/or transcription [6][7][8][9][10][11][12][13][14][15][16][17][18][19].
In the course of our research, we developed several unnatural base pairs, such as those between 2-amino-6-thienylpurine (s) and 2-oxopyridine (y) [13,[20][21][22], 7-(2-thienyl)-imidazo [4,5-b]pyridine (Ds) and pyrrole-2-carbaldehyde (Pa) [6], Ds and 2-nitropyrrole (Pn) [23], and Ds and 2-nitro-4propynylpyrrole (Px) [7,8], toward practical use in DNA/RNA-based biotechnologies. The Ds-Px pair exhibits the highest selectivity and efficiency in PCR amplification, and the Ds-Pa pair is suitable for the site-specific incorporation of Pa and its modified bases, as well as Ds, into RNA by transcription. The substrates of modified-Pa bases, in which several functional groups are attached to position 4 of Pa via a propynyl linker, can be site-specifically incorporated into RNA, opposite Ds in templates, by T7 RNA polymerase. For transcription involving the Ds-Pa pair, DNA templates containing Ds are prepared by PCR amplification using the Ds-Px pair. Thus, the combination of the Ds-Px and Ds-Pa pairs would be useful for the direct site-specific modification of long RNA molecules by T7 transcription.
To provide a highly versatile method for generating functional RNA molecules with an increased number of components by the Ds-Pa pair system, we here report the site-specific incorporation of several modified-Pa substrates, linked with alkylamino, biotin, or fluorescent groups, into RNA transcripts (from 17-mer to 48-mer) using DNA templates containing Ds with different sequence contexts and lengths. We previously reported that the PCR amplification efficiency involving the Ds-Px pair depends on the natural base contexts around the unnatural base. For example, we found that the efficiency of pyrimidine-Ds-pyrimidine sequences in both the template and complementary growing strands is higher than that of purine-Ds-purine sequences [8]. In this report, we examined whether the sequence dependency is also observed in transcription involving the Ds-Pa pair, as in the case of PCR amplification involving the Ds-Px pair. We found that Pa substrates modified with relatively small functional groups, such as alkylamino and biotin, were efficiently and selectively incorporated into RNA transcripts at the internal positions, except for those less than 10 bases from the 3′-terminus, without any sequence dependency. Although the incorporation efficiency and selectivity were reduced for Pa substrates modified with relatively large functional groups, such as TAMRA, FAM, and digoxigenin, their incorporation can be practically used for the site-specific labeling of RNA transcripts. In addition, the efficiency of the site-specific incorporation of all modified-Pa substrates into a position close to the 3′-terminus of a transcript significantly depends on the natural base sequence contexts around the unnatural base in templates. In this case, the sequence dependency is similar to that in replication involving the Ds-Px pair. Overall, the specific transcription using the Ds-Pa pair could be a useful tool for the site-specific functionalization of long RNA molecules, based on a fundamental understanding of the characteristics of the Ds-Pa pair in transcription.
These nucleoside derivatives were then converted to the triphosphate substrates (bearing free amino groups for alkylamino-PaTPs) by Eckstein's method. Biotin-PaTP [6] and Dig-hx-PaTP were prepared by the treatment of NH 2 -PaTP with biotin-N-hydroxysuccinimide and digoxigenin-3-Omethylcarbonyl-ε-aminocaproic acid-N-hydroxysuccinimide ester, respectively (Scheme 3). TAMRA-hx-PaTP and FAM-hx-PaTP were prepared by the treatment of NH 2 -hx-PaTP with 5-carboxytetramethylrhodamine N-hydroxysuccinimidyl ester and 5-carboxyfluorescein N-hydroxysuccinimidyl ester, respectively (Scheme 2). The relatively large groups, such as TAMRA, FAM, and digoxigenin, were attached to the 4-aminopropynyl Pa base via an aminohexyl linker, to reduce the steric hindrance in the nucleic acid and polymerase complexes during transcription. All modified-Pa triphosphates were purified by DEAE Sephadex column chromatography and C18 HPLC. Their structures were identified by 1 H-and 31 P-NMR and mass spectrometry.
DNA templates containing Ds were prepared by standard phosphoramidite chemistry, using a DNA synthesizer [6]. The synthesized DNA fragments were purified by gel electrophoresis. Full-length template strands (from 35-mer to 65-mer) containing one Ds base were hybridized with a 21-mer non-template strand including the T7 promoter region for transcription ( Figure 1).

T7 Transcription for 48-Mer Transcripts Containing Modified-Pa at Internal Positions
We first examined the incorporation efficiency of each modified-PaTP, as well as PnTP, in transcription by T7 RNA polymerase, using 65-mer DNA templates containing one Ds base, in which Ds was located at the complementary site corresponding to position 21 in the 48-mer transcripts ( Figure 2A). To investigate the natural base sequence dependency around the Ds base in templates, we used four templates containing Ds in different sequence contexts, 3′-CDsC-5′, 3′-CDsT-5′, 3′-TDsC-5′, and 3′-GDsG-5′, as well as control templates, 3′-CAT-5′ and 3′-GAG-5′, comprising only the natural bases. Transcription was performed in the presence of 1 mM natural base NTPs, 0 or 1 mM modified-PaTP or PnTP, [γ-32 P] GTP, DNA templates, and T7 RNA polymerase, at 37 °C for 3 h. After transcription, the 5′-labeled transcripts were analyzed by gel electrophoresis (Figure 2B), and the relative yields of the full-length transcripts (48-mer) were determined by comparison to the yield of transcription using the control template ( Figures 2B and 3A). On the gel, the mobilities of the full-length transcripts differed, depending on the modifications of the Pa base. The incorporation of PaTPs modified with large molecules (TAMRA-hx-PaTP, FAM-hx-PaTP, and Dig-hx-PaTP) reduced the mobility of the transcripts on the gel, and the incorporation of TAMRA-hx-Pa or FAM-hx-Pa into the full-length transcripts was confirmed by monitoring their fluorescence with a bio-imager, FLA-7000 (data not shown). Truncated products (20-mer), which resulted from pausing before the unnatural base position, were also detected. All of the unnatural base substrates, PaTP, PnTP, and modified-PaTPs, except for the PaTPs modified with the large molecules, were efficiently incorporated into RNA by T7 transcription, and the transcription efficiency (86−139%) was independent of both the sequence contexts and the amounts of the truncated products (20-mer). Furthermore, we found that PnTP, with a 2-nitro group instead of the 2-aldehyde group of Pa, is also a favorable substrate as a pairing partner of Ds in transcription, as shown in Figures 2 and 3A.
The values were averaged from three data sets. (b) Correlation between the incorporation efficiency and selectivity of PaTPs modified with large molecules in T7 transcription. The selectivity was determined as the ratio of the relative yield of the 48-mer transcripts with and without modified-Pa bases (TAMRA-hx-Pa, FAM-hx-Pa, and Dig-hx-Pa) from the gel analysis of transcripts.
The PaTPs modified with the large molecules were less effective, and the transcription yields were reduced by nearly half. In addition, the PaTPs modified with the large molecules exhibited reduced incorporation selectivity opposite Ds in templates, and the full-length transcripts without modified-Pa also appeared on the gel. We determined the incorporation selectivities of TAMRA-hx-PaTP, FAM-hx-PaTP, and Dig-hx-PaTP, from the ratios of the amounts of the full-length transcripts with modified-Pa to those without modified-Pa ( Figure 3B). The 5′ 32 P-labeled transcripts were partially digested with either RNase T1 (T1) or with alkali (AL). A portion of the partially alkali-digested transcripts was treated with streptavidin magnetic beads, to capture the RNA fragments containing Biotin-Pa (AL+SA). Each digested fragment was analyzed on a 10% polyacrylamide gel containing 7 M urea.
Since the bands corresponding to the full-length transcripts containing these modified-Pa were shifted on the gel, the incorporation selectivity of each modified-Pa could be determined by comparing the band densities between the full-length transcripts with and without the modified-Pa bases. Although the incorporation selectivities of the PaTPs modified with the large molecules were relatively low, the selectivity of each modified-Pa correlated well with its incorporation efficiency into RNA. Depending on the sequence context of the templates, more efficient incorporation tended to lead to more selective incorporations. Next, we determined the incorporation position of Biotin-Pa in the full-length transcripts. We previously reported the high selectivity of the PaTP and Biotin-PaTP incorporations opposite Ds by transcription, using DNA templates containing 3′-CDsT-5′ and 3′-ADsT-5′ sequences, respectively [6].
Therefore, we confirmed the incorporation positions of Biotin-Pa in the full-length transcripts, obtained by transcription using DNA templates with four different sequence contexts, 3′-CDsC-5′, 3′-CDsT-5′, 3′-TDsC-5′, and 3′-GDsG-5′. The 32 P-labeled full-length transcripts were partially digested by either alkali or RNase T1, and the digested products were incubated with streptavidin magnetic beads. This treatment separated the Biotin-Pa-containing fragments from the digested products, and the remaining fragments without Biotin-Pa were analyzed by gel electrophoresis (Figure 4). By comparing the sequencing ladder patterns obtained by the alkali and RNase T1 treatments, the position of Biotin-Pa incorporation was determined from the initial disappearance position of the streptavidintreated ladders.
On the patterns obtained by the streptavidin treatment of all four transcripts with different sequences, the ladders corresponding to the fragments larger than 21-mer were almost undetectable. Based on the disappearance ratio of the ladders, more than 90% of Biotin-Pa was incorporated at the desired position in the transcripts. In the alkali-digested products without the streptavidin treatment, the ladders corresponding to the fragments larger than 21-mer were largely shifted on the gel. From these results, we concluded that Biotin-PaTP was site-specifically incorporated into the transcripts at position 21, opposite Ds in the templates. The incorporation selectivity of Biotin-PaTP opposite Ds was more than 90%, and was independent of the sequence context.

T7 Transcription for Transcripts Containing Pa at Different Positions from the 3′-Terminus
We next examined how the position of Pa incorporation, especially close to the 3′-terminus, in transcripts affects the incorporation efficiency. We previously reported that transcription efficiency involving the Ds-Pa pairing decreased when using short DNA templates (such as 35-mer) [6].
Thus, we prepared a series of DNA templates (42-65-mer), which were shortened stepwise from the terminus, and examined T7 transcription using the templates ( Figure 5). For the experiments, we chose the DNA template containing the 3′-GDsG-5′ sequence, which was expected to be an inferior template based on our replication experiments involving related unnatural base pairs [7,8]. Using the DNA templates and PaTP, we compared the yields of transcripts (25-, 28-, 33-, 38-, 43-, and 48-mer).
The full-length transcripts (48-, 43-, 38-, and 33-mer) obtained from 65-, 60-, 55-, and 50-mer templates were observed as fairly clear bands on the gel, and the relative yields of these transcripts were more than 71%, as compared to that using the control template (ContTemp-65) with natural NTPs. In contrast, the full-length transcripts (28-mer and 25-mer) from the 45-and 42-mer templates were difficult to discern as a single main band on the gel, and the transcription yields were also reduced, to 35-36%. When using the 45-mer and 42-mer templates, several ladders, which were larger than the full-length transcripts, appeared on the gel. Although we still do not know how the ladders were generated, this might be an intrinsic problem caused by transcript-transcript or transcript-template interactions, depending on the natural base sequence contexts [24] or the instability of the complex between the template, resulting transcript, substrate, and T7 RNA polymerase. Even allowing for this problem, the incorporation of Pa at a position less than 12-bases from the 3′-terminus of the transcripts significantly reduces the transcription efficiency involving the Ds-Pa pairing, when using templates with the inferior 3′-GDsG-5′ sequence. These results compelled us to examine how the sequence contexts in templates affect the transcription efficiency, when the Pa incorporation site is close to the 3′-terminus in the transcripts.
Taken together, the incorporations of Pa and modified-Pa, as well as those of Pn, close to the 3′-terminus of the transcript exhibited lower efficiencies, relative to those of the internal incorporations, as shown in Figure 3. for ′-GDsG-5′). The values were averaged from three to four data sets.

DNA Template Preparation for the Ds-Pa Transcription System
For the practical uses of the Ds-Pa transcription system demonstrated here, the preparation methods for long Ds-containing templates are an important consideration. The general schemes for Ds-containing template preparation and the specific transcription of modified-PaTPs are summarized in Figure 9. For the modified-PaTP incorporation, Ds-containing DNA templates can be prepared by several methods. Templates (less than 80-mer including the T7 promoter) are prepared by a direct chemical method using a DNA synthesizer, as shown here. In this case, partially double stranded DNA templates for the promoter region can be used. Templates (more than 100-mer) can be prepared by several methods, as follows: (1) Chemical synthesis + enzymatic ligation: Chemically synthesized 5′-phosphorylated DNA fragments are annealed as a double stranded form and ligated by T4 DNA ligase [6,15]. As the complementary position opposite Ds, T can be used instead of the complementary unnatural base, Px. Phosphorylation of DNA fragments is performed during DNA chemical synthesis by using commercially available phosphorylation amidite reagents or T4 polynucleotide kinase. (2) Fusion PCR: For the internal introduction of Ds in long templates (more than 200-mer), fusion PCR methods involving the Ds-Px pair can be used with dDsTP and dPxTP (unpublished data, Kimoto and Hirao). (3) PCR: For the terminal incorporation of Ds close to the 3′-terminus for transcripts, PCR amplification with primers containing Ds is useful [15,25]. In these three methods, the position of Ds introduction into each DNA fragment should be more than 10 bases away from the terminus. Figure 9. Schemes for the preparation of Ds-containing templates for T7 transcription, using Ds-containing DNA fragments.

General
Reagents and solvents were purchased from standard suppliers without further purification. Reactions were monitored by thin-layer chromatography (TLC), using 0.25 mm silica gel 60 plates impregnated with 254 nm fluorescent indicator (Merck). 1 H-NMR, 13 C-NMR, and 31 P-NMR spectra were recorded on a Bruker (300-AVM) magnetic resonance spectrometer. Nucleoside purification was performed on a Gilson HPLC system with a preparative C18 column (Waters μ-BONDASPHERE, 150 × 19 mm). The triphosphate derivatives were purified by DEAE-Sephadex A-25 column chromatography (300 × 15 mm; eluted by a linear gradient of 50 mM to 1 M triethylammonium bicarbonate) and by HPLC with using a C18 column (CAPCELL PAK, 250 × 4.6 mm, SHISEIDO), eluted with a linear gradient of CH 3 CN in 100 mM triethylammonium acetate, pH 7.0. High resolution mass spectra (HRMS) and electrospray ionization mass spectra (ESI-MS) were recorded on a JEOL JM 700 or GC mate mass spectrometer and a Waters micromass ZMD 4000 equipped with a Waters 2690 LC system or a Waters UPLC-MS (H class) system, respectively. DNA templates were synthesized with an Applied Biosystems 392 DNA synthesizer, using CE phosphoramidite reagents for the natural and Ds bases (Glen Research), and were purified by gel electrophoresis. The syntheses of PaTP, Pa′TP, NH 2 -PaTP, and Biotin-PaTP were described previously [6]. Biotin-PaTP is commercially available from Glen Research, and the other modified-PaTPs are available from TagCyx Biotechnologies. T7 RNA polymerase and T4 polynucleotide kinase were purchased from Takara. Antarctic phosphatase and streptavidin magnetic beads were purchased from New England Biolabs. The natural NTP Set (100 mM solutions: ATP, CTP, GTP, and UTP) and RNase T1 were purchased from GE Healthcare. The [γ-32 P]GTP and [γ-32 P]ATP were purchased from PerkinElmer. The Escherichia coli tRNA mixture was purchased from Sigma. Gel images were analyzed with a bio-imaging analyzer, FLA7000 (Fuji Film).
1-(5-O-Dimethoxytrityl-β-D-ribofuranosyl)-4-iodopyrrole-2-carbaldehyde (1.7 g) was purified by silica gel column chromatography (1% MeOH in CH 2 Cl 2 ), and 2.6 mmol (1.7 g) of the purified product was co-evaporated with dry pyridine three times. The residue was dissolved in pyridine (26 mL), and acetic anhydride (979 μL, 10.4 mmol) was added to the solution. The mixture was stirred at room temperature for 12 h. After the reaction, the solution was diluted with EtOAc and washed with a saturated NaHCO 3 solution. The organic layer was washed with brine, dried over MgSO 4 , and evaporated in vacuo. The product was purified by silica gel column chromatography in CH 2 Cl 2 , to give 1.77 g of 2 (89% yield from 1). 1

Conclusions
We have examined the site-specific incorporation of functional components into RNA by an unnatural base pair transcription system, using the Ds-Pa pairing. The Pa substrate can easily be chemically modified with a wide range of functional groups, via the propyne linker. These modified-PaTPs can be site-specifically incorporated into RNA transcripts by conventional T7 transcription, by considering the characteristics of the Ds-Pa pair in transcription.
The PaTPs modified with small molecules (MW less than ~240, such as PaTP, Pa′TP, NH 2 -PaTP, and Biotin-PaTP) are efficiently incorporated into a specific internal position of a transcript, except for a position less than 10 bases from the 3′-terminus, without any natural base sequence dependency. For incorporation into a position close to the 3′-terminus of a transcript, the pyrimidine-Pa-pyrimidine sequences should be avoided, because of low transcription efficiency. Our previous results revealed that the incorporation selectivity of PaTP opposite Ds in templates is more than 90%, and the misincorporation rate of PaTP opposite the natural bases in templates is around 0.2% per base [6]. The present experiments using Biotin-PaTP also displayed high selectivity (>90%).
The PaTPs modified with large molecules (MW larger than ~480, such as TAMRA-hx-PaTP, FAM-hx-PaTP, and Dig-hx-PaTP) exhibited reduced incorporation efficiency and selectivity in T7 transcription, and their incorporation into a position less than 10 bases from the 3′-terminus of the transcripts should be avoided. However, for their internal incorporation into RNA (less than ~50-mer), the intended transcripts can be isolated by gel electrophoresis, because the mobility of the transcripts containing the modified-Pa bases is significantly shifted on a gel, relative to that of transcripts lacking the modified-Pa bases.
This specific transcription involving the Ds-Pa pair could provide a useful tool for the site-specific introduction of functional groups of interest into RNA molecules with long chains. The incorporation of PaTPs modified with large molecules, such as fluorescent dyes, is unfavorable for specific transcription, and thus, as an alternative method, the efficient incorporation of the NH 2 -Pa bases can be used as a site for post-transcriptional modifications with large functional molecules.