Structural Insights into the Respiratory Syncytial Virus RNA Synthesis Complexes

RNA synthesis in respiratory syncytial virus (RSV), a negative-sense (−) nonsegmented RNA virus, consists of viral gene transcription and genome replication. Gene transcription includes the positive-sense (+) viral mRNA synthesis, 5′-RNA capping and methylation, and 3′ end polyadenylation. Genome replication includes (+) RNA antigenome and (−) RNA genome synthesis. RSV executes the viral RNA synthesis using an RNA synthesis ribonucleoprotein (RNP) complex, comprising four proteins, the nucleoprotein (N), the large protein (L), the phosphoprotein (P), and the M2-1 protein. We provide an overview of the RSV RNA synthesis and the structural insights into the RSV gene transcription and genome replication process. We propose a model of how the essential four proteins coordinate their activities in different RNA synthesis processes.


Overview of the Respiratory Syncytial Virus (RSV)
The human respiratory syncytial virus (HRSV or RSV) is an enveloped virus with a negative-sense (−) genome in the genus Orthopneumovirus, family Pneumoviridae, order Mononegavirales [1]. Since its first isolation in 1955, RSV has been a leading cause of infant mortality by viral infections in the US and worldwide [2]. RSV is the most common cause of hospitalization of infants for respiratory problems, with an estimation of 80,000 cases each year in the US [3]. The RSV reinfection remains common throughout all age groups, causing bronchiolitis in infants, common colds in adults, and more serious respiratory illnesses in older adults and immunocompromised patients [4]. Unfortunately, no effective vaccine or antiviral therapy is available to prevent or treat RSV [5][6][7][8].

RSV Virion and Genome
The RSV virions are either spherical particles of 100-350 nm in diameter or long filaments up to 10 µm and 60-200 nm in diameter. The RSV virion comprises an RNA synthesis ribonucleoproteins (RNP) complex packaged in a lipid envelope derived from the host cell membrane. The RNA synthesis RNP consists of four proteins essential for the RSV RNA synthesis: the nucleoprotein (N), the large polymerase protein (L), the phosphoprotein (P), and the processivity factor M2-1. The RSV envelope contains three Figure 1. The schematic diagram of the genome organization and RNA synthesis of RSV. The leader (Le, cyan) and trailer (Tr, gold) are at 3 and 5 ends of the genome. The genes N, P, M2, L (blue), NS1, NS2, M, SH, G, and F (gray) are flanked with gene start (GS, white) and gene end (GE, black). Two replication promoters (magenta arrows) at 3 ends of the genome and antigenome and a single transcription promoter (green arrow) are shown. The mRNA transcription products with 5 methyl guanosine ( m G) caps and 3 poly-A (A n ) are below respective genes. Genome and antigenome are encapsidated with N proteins (yellow circles).

RSV RNA Synthesis
RSV initiates viral infection by delivering into the host cell a virus-specific RNA synthesis RNP required for both replications of the full-length genome and transcriptions of individual viral genes [9]. RNA synthesis is carried out by the RNA-dependent RNA polymerase (RdRp) complex, which consists of the catalytic core L and the cofactor P. L is a 250 kDa polypeptide that executes the synthesis of viral genomic or antigenomic RNAs and mRNAs and catalyzes three distinct enzymatic activities: ribonucleotide polymerization, mRNA 5 cap addition, and cap methylation [6]. Interestingly, the mRNA caps are synthesized by unique chemical reactions: (a) the cap forms via a covalent L:RNA intermediate, distinct from eukaryotes and all other virus orders; (b) the cap is methylated at the 2 -O position first, followed by the N-7 position, the opposite order of mammalian mRNAs [10,11] (highlight in Figure 2). The RNA template for the polymerase is not naked RNA but rather a helical nucleocapsid (NC, N:RNA), a single-strand RNA continuously encapsidated by the N protein, as highlighted as yellow circles on (−) genome and positive-sense (+) antigenome ( Figure 1). Each RSV N coats seven nucleotides (nts) [12,13], and the entire RSV genome of 15,222 nts (A2 strain, GenBank accession number M74568) requires more than 2000 copies of N. There is only one promoter (green arrow) for transcription at the 3 end of the Le region (position 1-11) (Figure 1). The polymerase initiates RNA synthesis at this promoter to produce an uncapped leader complementary (LeC) RNA and progresses along the genome. After recognizing the first GS signal, the polymerase initiates mRNA synthesis, caps and methylates the 5 of the mRNA, and elongates it. When the polymerase reaches the GE signal, it polyadenylates the transcript using a short U-rich template, releases the nascent mRNA, and then reinitiates downstream mRNA synthesis at the next GS signal. There are two replication promoters (magenta arrows), one embedded in the Le sequence and another residing in the 3 end of the (+) antigenome ( Figure 1). During replication, the polymerase binds to the promoter in the Le region (position 1-34) and initiates RNA synthesis at the 3 end. It copies the template to generate a full-length (+) RNA antigenome, ignoring the cis-acting GS and GE signals. At the 3 end of the antigenome is the complement sequence of the trailer (TrC), which shares an 88% sequence identity with the Le and contains a promoter. The polymerase uses this promoter to generate (−) RNA genome progeny. The promoters for transcription and replication overlap but are different in length, and all promoters can be recognized by the same polymerase [12,14] (Figure 1).
Much of our current understanding of RNA synthesis comes from studies of prokaryotic or eukaryotic DNA-or RNA-dependent RNA polymerases [15][16][17][18]. RNA synthesis by RSV and other NNS RNA viruses is believed to follow the "two-metal-ion" mechanism of catalysis [19] but differs from the cellular polymerases for three main aspects. First, rather than naked RNA, the authentic RNA template is wrapped by N as a helical NC. Second, the N:RNA genome serves as the template for both transcription and replication, and the promoters of these two processes overlap, requiring the polymerase to be highly regulated. Third, instead of having a promoter for each gene, all viral mRNAs are synthesized from a single promoter, with the polymerase stopping and reinitiating synthesis along the genome. The RSV transcription and replication appear straightforward, requiring the RNA polymerase, the promoter at the 3 end of the genome and antigenome RNAs, and short cis-acting signals flanking each of the genes.
Further, studies using the RSV minigenome replicon in which the RNP complexes are reconstituted in cells by supplying the trans-acting protein and RNA components have shown that the replication requires N, L, and P, whereas transcription requires N, L, P, and M2-1 [5,6]. Remarkably, using these minimal elements, RSV coordinates multiple biosynthetic events to generate appropriate ratios of encapsidated antigenome and genome RNAs, as well as multiple monocistronic 5 capped, methylated, and 3 polyadenylated viral mRNAs to ensure faithful virus propagation [9,20] (Figure 1). Reviews of RSV and NNS RNA viruses can be found in [5,6,9].

Structural Insights of the RSV RNA Synthesis Complex
In the past two decades, the advance of structural and biochemical studies on the Mononegavirales RNA synthesis RNPs has provided rich insights into RSV RNA synthesis and viral replication. For the comparison purpose, we use the well-studied vesicular stomatitis virus (VSV) and human parainfluenza virus (HPIV), the representative members of the Rhabdoviridae and Paramyxoviridae familes, respectively, and HMPV as a control in the Pneumoviridae family. Comparing the RNP structures of RSV and other representative viruses sheds light on the shared and different transcription and replication strategies. As highlighted above, N, L, P, and M2-1 are the minimal components for the in vitro reconstitution of the RSV RNA synthesis machinery. This review focuses on RSV and summarizes structural insights of those four proteins that constituted the RSV RNA synthesis RNP. Readers who are interested in more targeted reviews may read recommendations in each subsection.

The RSV L Protein
The RSV L is a single polypeptide of 2165 residues and is the catalytic core of the RSV polymerase. The RSV L exists as a monomeric but multifunctional enzyme, bearing three distinct enzymatic domains, namely, the RNA-dependent RNA polymerization (RdRp), the cap addition (Cap), and the cap methylation (MT) domains. Besides three functional domains, the RSV L also contains two structural domains, the connector domain (CD) and the C-terminal domain (CTD). The RSV RdRp activities share similarities with the activities of other viral RNA-dependent RNA polymerases (RdRPs), but the Cap and MT activities are unique to the Mononegavirales and distinct to the capping in the host cells.
The RdRp, Cap, CD, MT, and CTD domain organizations of Mononegavirales L are color-coded in blue, green, yellow, pink, and cyan, respectively, in Figure 3A. For the determined structures of the RSV polymerase, only the RdRp and Cap domains of the RSV L are visible, with the catalytic sites of the RdRp domains highlighted in magenta spheres [21,22] ( Figure 3B). The cartoon representation of the RSV L is shown in the dotted box. Consistent with this, the closely related HMPV polymerase structure also lacks the CD, MT, and CTD domains for L [23]. In contrast, both VSV L and HPIV L structures contain all five domains [24][25][26] (Figure 3C,D). Note that all three structures presented here are superposed based on the RdRp domain. Interestingly, there is a domain swap of the MT and CTD domains for the VSV L and HPIV L [26,27]. Readers interested in prior reviews on viral and Mononegavirales RdRPs may consult [27][28][29], and the unique capping mechanisms can be found in [10,11].

The RSV P Protein
In general, the P protein contains three domains: the N-terminal domain (NTD), the oligomerization domain (OD), and the C-terminal domain (CTD), which are indicated as P NTD (magenta), P OD (red), and P CTD (orange), respectively ( Figure 4A). The RSV P is a protein of 241 residues with multiple flexible regions and the essential cofactor of the RSV polymerase. The RSV P exists as a tetrameric protein, acting as the multimodular adapter to coordinate the RNA synthesis complex activities and interacting with RNA-free N (N • ), N:RNA complex, and M2-1 proteins [5,6]. Thus far, the structures of the RSV polymerase (L:P) and the RSV M2-1:P complex have been determined [21,22,30]. The cartoon representation of the RSV P is shown in the dotted box ( Figure 4B). The ribbon diagrams of P NTD that interacts with M2-1 and P OD (red) and P CTD that binds to L are shown with the terminal residue numbers indicated. The flexible or unmodeled regions are linked by the dotted lines, and only one copy of the M2-1-binding domain is shown ( Figure 4B). The M2-1-binding motif of P on M2-1 is illustrated in Section 4.4.
The VSV P is the most extensively studied among Mononegavirales P proteins. Previously, the co-crystal structures of the N • -binding motif, the N:RNA-binding motif, and the oligomerization domain of P interacting with respective binding partners have been determined [31][32][33] (Figure 4C). Recently, the cryo-EM structure of the VSV polymerase revealed several fragments of P NTD that interact with the CTD, RdRp, and CD domains of L [24]. Again, the flexible or unmodeled regions are linked by the dotted line, with the terminal residue numbers indicated ( Figure 4C). The architectures of the VSV NC:P and N • P complexes highlight the N • -binding and the N:RNA-binding motif, respectively ( Figure 5D,E). The HPIV P model was extracted from the structure of the HPIV polymerase [26] ( Figure 4D). P OD (red) sits on the RdRp domain of HPIV L ( Figure 3D) and points away from it, and the P CTD (orange) of one chain of P also interacts with L. Interestingly, there are several notable differences among the P proteins presented here from different families: (1) the RSV P and HPIV P are tetramers, but the VSV P is a dimer; (2) the length of the oligomerization domain of HPIV P is more than twice the length of the RSV P OD or VSV P OD ; (3) the sizes of the RSV P and VSV P are similar (241 aa vs. 265 aa), but HPIV P is much longer (392 aa).

The RSV N Protein
The RSV N is a protein of 391 residues, consisting of two core domains: the N-terminal domain (NTD, light green) and the C-terminal domain (CTD, yellow), plus the N-terminal motif (N-arm, green) and C-terminal motif (C-arm, blue) near the terminus of the core domains ( Figure 5A). The crystal structure of the RSV N:RNA pseudo-ring revealed that N has two core lobes (NTD and CTD) with the RNA bound in the central groove, and each N coats 7-nt of RNA [13]. Three continuous N subunits are shown, with the middle N highlighted in color and the left and right N molecules in gray. The interacting RNA molecules are colored in red ( Figure 5B). Both the N-arm and C-arm of N connect the adjacent N subunits in the RNA-bound ring, providing a significant stabilizing interaction. The cartoon representation of the RSV N:RNA is shown in the dotted box ( Figure 5B). The structure of HPIV N:RNA is also shown with a similar orientation. The N-arm and C-arm of the HPIV N are slightly different from that of the RSV N, but the location and the role are similar [41] ( Figure 5C).
The first high-resolution structures of the Mononegavirales N:RNA were the VSV N:RNA by Green et al. and RABV viral N:RNA by Albertini et al. in 2006 [42,43], providing the first glimpses of how RNA is encapsidated by N ( Figure 5D). Both VSV and rabies belong to the family Rhabdoviridae. Their structures are similar, with every N protein coating nine nucleotides, although the VSV N:RNA is a 10-mer ring, while RABV N:RNA is an 11-mer ring. After that, several other Mononegavirales N:RNA structures were determined, including those of RSV, HMPV, HPIV, MeV, and EBOV [13,41,[44][45][46][47]. The Luo group also determined the structure of P CTD that binds to the VSV N:RNA [32]. The structure revealed that the P CTD (orange) binds to the CTD and the C-arm of two adjacent N proteins ( Figure 5D). Interestingly, the study on the interaction between the RSV P CTD to the RSV N suggests that in RSV, P CTD interacts with N NTD , rather than N CTD , which may be unique for Pneumoviridae N proteins [48,49].
Biochemical studies suggest P binds to RNA-free N (N • ) monomers and delivers them to nascent RSV genomes or antigenomes [50]. The P NTD inhibits N self-assembly and acts as a chaperone for monomeric N. In 2011, Leyrat et al. determined the co-crystal structure of the VSV N • P [31]. While the N is in the same orientation as in the N:RNA structure, the P NTD (magenta) binds to a similar location where the RNA is located in the N:RNA complex ( Figure 5E). The structures of the N • P from other Mononegavirales, such as HMPV, HPIV, MeV, NiV, have also been determined, providing additional insights on such interactions [44,[51][52][53]. Briefly, the available evidence highlights the shared role of the P NTD in interacting with N • . Utilizing this feature, recently, in our group and in others, the P NTD is co-expressed with N to generate N • , which can be used for de novo assembly of virus-specific RNA templates [54][55][56]. Readers interested in prior reviews on the N and nucleocapsid structures of Mononegavirales may consult [57][58][59].

The RSV M2-1 Protein
M2-1 acts as an antiterminator to ensure the full-length transcription of all ten RSV mRNAs, and it is impossible to rescue infectious RSV from cDNAs without M2-1 [60]. M2-1 exists as a tetramer in solution, and the early study showed M2-1 directly interacts with RNA and P in a competitive manner [61]. However, it is not clear whether the interaction with RNA and P is strictly exclusive.
The 194-residue RSV M2-1 consists of three distinct domains: the zinc-binding domain (ZBD: 1-31), an oligomerization domain (OD: 32-68), and a core domain (CD: . The domain organization of M2-1 is shown in Figure 6A. In 2014, Tanner et al. determined the crystal structures of apo RSV M2-1 in two different space groups, which revealed the symmetrical tetramer configuration [62] ( Figure 6B). The cartoon representation of the RSV M2-1 is shown in the dotted box. Interestingly, Leyrat et al. determined that the crystal structure of a related apo HMPV M2-1 showed an asymmetric tetramer, with three of the protomers in a closed conformation and one protomer in an open conformation [63] ( Figure 6C). The crystal structures of the HMPV M2-1 in complex with adenosine monophosphate (AMP) and 5-nt DNA fragments were also determined, revealing that the RNA surface binding sites are consistent with the NMR and mutagenesis studies [63]. Readers interested in a prior review on RSV M2-1 may consult [64].  Figure 6D). In the same study, it was demonstrated that high-affinity RNAs could outcompete P [30]. More recently, in 2020, our group determined a co-crystal structure of M2-1 bound to a short positive-sense gene-end RNA (SH7) [65]. We used both experiments and simulations to reveal that RNA interacts with two separate domains, ZBD and CD, of M2-1, independent of each other [65] ( Figure 6E). It was shown that M directly interacts with M2-1, and M2-1 is likely to form a layer below M while interacting with the RNP complex [66]. Recent fluorescence microscopy studies by Bouillier et al. have shown M2-1 is colocalized at the inclusion body-associated granule (IBAG), the site of active viral RNA synthesis [67].

The Model of RSV RNA Synthesis
On the basis of current data, we proposed a model of the sequential RSV RNA synthesis (Figure 7). In general, L alone adopts an open conformation (A). P coordinates the activity of L and shows a previously unappreciated role for L domain arrangements. Upon binding, a tetrameric P locks CD, MT, and CTD (gray) domains of L (pre-initiation) into a closed conformation (B). The pre-initiation complex then recruits the N:RNA to form the initiation complex and start the de novo RNA synthesis (C). These domains then adopt an open conformation upon promoter recognition and remain open during elongation (D-G), resulting in higher mobility of the CD, MT, and CTD domains of L. Depending on N • , RNA synthesis goes to either transcription (capping and methylation, continued elongation, and polyadenylation, D-F) or replication (G). Even within this proposed framework, some ambiguity remains. For example, (1) P may bind to L all the time. (2) The domains of L undergo significant conformational rearrangements in response to not only P but also other cofactors, such as M2-1 and N. (3) There is also the possibility that the active RNA synthesis needs the coordination of multiple (e.g., two) L proteins at the same time.

Conclusions and Discussion
Undoubtedly, the visualization of the RSV RNA synthesis RNP has provided enriched insights into how RSV integrates viral transcription and replication as the essential part of the viral life cycle. The elucidation of the interaction surfaces and catalytic domains of the RNA synthesis RNP distinct from the host cell counterparts will facilitate the rational design of novel therapeutics [11,[68][69][70]. The prior studies mainly focused on the atomic details of one or two components of the RNA synthesis RNP at a time. It is worthy to note that one highly dynamic field of the functional aspects of viral replication is the formation of membraneless liquid organelles, such as in the work demonstrated by Rincheval et al. [71]. Future studies will gear toward understanding how all four proteins are orchestrated in the assembly and function of the RSV RNA synthesis RNP at the molecular level. In particular, how the RSV RNA synthesis RNP (1) carries out the replication in a complex of the authentic N:RNA template, (2) executes the transcriptional steps in a step-wise manner, and (3)