Structural Biology of Bacterial RNA Polymerase

Since its discovery and characterization in the early 1960s (Hurwitz, J. The discovery of RNA polymerase. J. Biol. Chem. 2005, 280, 42477–42485), an enormous amount of biochemical, biophysical and genetic data has been collected on bacterial RNA polymerase (RNAP). In the late 1990s, structural information pertaining to bacterial RNAP has emerged that provided unprecedented insights into the function and mechanism of RNA transcription. In this review, I list all structures related to bacterial RNAP (as determined by X-ray crystallography and NMR methods available from the Protein Data Bank), describe their contributions to bacterial transcription research and discuss the role that small molecules play in inhibiting bacterial RNA transcription.


Early Research on the Structure of Bacterial RNA Polymerase
The common core of multi-subunit RNAP in cellular organisms is composed of five subunits that are conserved in all three domains of life. Bacterial RNAP core enzyme is the simplest and best characterized form, consisting of Į (two copies), ȕ, ȕ', and Ȧ subunits (Figures 1 and 2a). The core enzyme is responsible for binding to template DNA to synthesize RNA, which is complemented by a ı factor to form a holoenzyme that recognizes the promoter sequence to begin promoter-specific transcription [1,2]. Structural study of bacterial RNAP by electron microscopy [3] began in the mid-1960s. Crystallization of RNAP isolated from Thermusthermophilus was first reported in the late 1970s [4]; however, the X-ray crystal structure was not determined until the end of the millennium. Before determining the complete structure of RNAP, stable domains and subcomplexes within RNAP were targeted for structural studies (Table 1). These structures were important guides for building the entire structure of RNAP.
The first atomic view of RNAP was obtained from the C-terminal domain of Escherichia coli RNAP Į subunit (residues 250-329), also known as ĮCTD (PDB: 1COO) [5], which plays important roles in regulating transcription via interaction with many transcription factors ( Figure 2a) and also binds to the upstream promoter DNA [6]. The structure of ĮCTD was determined by NMR, which revealed its compact structure and distinct protein topology compared with other DNA binding proteins. The characterization of the structure of ĮCTD was a springboard for a series of mutagenesis experiments that revealed communication of bacterial RNAP with numerous transcription factors during gene regulation. (a) Three-dimensional representation of the interaction between RNAP and transcription factors. The E. coli RNAP holoenzyme is shown as a molecular surface representation (Į subunits: white; ȕ subunit: cyan; ȕ' subunit: pink; Ȧ subunit: gray; ı 70 : orange; ı region 1.1: red). Transcription factors binding sites are indicated in double quotation marks and PDB codes of structures are shown in brackets; (b) Three-dimensional representation of the interaction between ı and anti-ı factors. E. coli RNAP holoenzyme is shown as a molecular surface representation, and only the core enzyme is partially transparent (Į subunits: white; ȕ subunit: cyan; ȕ' subunit: pink; Ȧ subunit: gray; ı 70 : orange). Targets of anti-ı factors are indicated in double quotation marks and PDB codes of structures are shown in brackets. A subsequent study revealed the X-ray crystal structure of the Į subunit N-terminal domain (ĮNTD) (PDB: 1BDF) [7]. The structure showed the Į subunit homodimer, which is an essential platform for binding of the largest subunits, ȕ and ȕ' (Figure 1). ȕ and ȕ' subunits form the catalytic center of RNA synthesis and also provide binding sites for double-stranded downstream DNA, DNA/RNA hybrid formed during transcription and RNA. These subunits are highly conserved in bacteria; however, large sequence insertions found in these subunits characterize specific evolutionary lineages of bacteria. These insertions can be isolated as stable domains and crystallized for determining X-ray structures (Table 1). These structures have contributed to providing atomic images of bacterial RNAP because these lineage-specific insertions are located on the peripheral surface of RNAP and electron density maps of these domains are of relatively poor quality in the bacterial RNAP crystals.
ı factor transiently associates with the core enzyme for promoter recognition and it dissociates from the core enzyme once RNAP starts processive RNA synthesis ( Figure 1). Proteolysis of ı factor determines its domain organization and structures of some stable domains have been determined by X-ray crystallography and NMR (Table 1). In 1996, the first image of ı factor was obtained from the E. coli group I ı 70 (also known as ı D ) N-terminal domain containing regions 1.2-2.4 (PDB: 1SIG) [15], which provided insight into the recognition of a í10 element and melting of the promoter DNA by the ı regions 2.4 and 2.3, respectively. A nearly complete view of ı factor was obtained from two proteolytic fragments of Thermus aquaticus ı A . One fragment contained ı domain 2 (ı2: region 1.

An Explosion of Structural Information on Bacterial RNA Polymerase
The entire structure of bacterial RNAP was first described as a core enzyme form and was isolated from the thermophilic bacterium T. aquaticus (PDB: 1HQM) [29]. This was an important milestone in the study of bacterial transcription that provided a structural framework for four decades of bacterial transcription research. The structure revealed a unique crab claw-shaped molecule, which was distinct from the T7 phage-like single-subunit RNAP family composed of right-hand-shaped molecules. The configuration of the bacterial RNAP active site was also different from that of the single-subunit RNAP [46], even though these enzymes use the same two-metal ion mechanism [47] for RNA synthesis. Comparison of cellular RNAPs from three domains of life, including eukaryotic RNAPs I [48,49] and II [50] as well as archaeal RNAP [51], revealed a conserved overall shape with multi-subunit arrangement and an active site cleft with conserved motifs including a bridge helix (separating the main and secondary channels), trigger loop (for RNA synthesis and cleavage) and switches (for accommodating DNA and RNA into the RNAP clefts).

Structural Basis of Transcription Elongation
Crystals of the transcription elongation complex were prepared using T. thermophilus RNAP and a synthetic DNA/RNA scaffold, and structures were determined with and without a nucleotide triphosphate substrate (PDB: 2O5I, 2O5J) [42,43]. Structures revealed atomic details of RNAP and DNA/RNA interactions within the DNA binding main channel and the RNA exit channel and showed how the RNAP mobile element trigger loop changes conformation throughout nucleotide substrate selection and phosphodiester bond formation.
During RNA extension, RNAP temporarily stops transcription (transcription pausing) by several mechanisms including ı factor-dependent promoter proximal pausing, elemental pausing, RNA hairpin-dependent pausing and backtrack pausing ( Figure 1) [52]. The structure of an elemental-paused elongation complex using Thermus RNAP and DNA/RNA scaffolds derived from the E. coli his-pause sequence showed a unique RNAP conformation including an open-clamp and kinked bridge helix that may inhibit processive RNA extension (PDB: 4GZY, 4GZZ) [44].
Misincorporated nucleotides in RNA transcripts slow down the rate of transcription and result in the backward movement of RNAP (backtracking), which ejects the 3' RNA end into the secondary channel and later cleaves misincorporated nucleotides by RNAP endonuclease activity (Figure 1). This activity is further stimulated by the elongation factor GreA. The structure of this single-nucleotide-backtracked elongation complex showed a sharply bent RNA backbone at the unpaired RNA base, which accommodated the proofreading cavity of the RNAP active site (PDB: 4WQS) [45]. This study also revealed the structure of RNAP in complex with GreA-like protein, showing that the GreA coiled-coil domain containing acidic residues binds to the RNAP active site through the secondary channel and coordinates Mg 2+ , which is essential for RNA cleavage (PDB: 4WQT).
The transcription factor RapA is an ATP-dependent DNA translocase, which reactivates the stalled elongation complex (Figure 1). The structure of the elongation complex with RapA showed that RapA binds around the RNA exit site, making the RNA channel just wide enough for single-stranded RNA to pass (PDB: 4S20) [53].

Promoter-Dependent Transcription: How RNA Polymerase Recognizes Promoter DNA Sequences and Initiates Transcription
Bacterial RNAP recognizes promoter DNA and initiates transcription as the holoenzyme form containing the catalytic core enzyme plus the promoter recognition ı factor ( Figure 1). Structures of the holoenzyme containing the group I ı factor ı A (also known as SigA), isolated from T. aquaticus and T. thermophilus, showed that three ı domains (ı domains 2, 3, and 4) arranged on the surface of the core enzyme whose function was to recognize the í35 andí10 elements (separated by ~17 bp DNA) (PDB: 1L9U, 1IW7) [32,33] (Table 1). The structure of the holoenzyme in complex with fork junction promoter DNA (í41 to í7 relative to the transcription start site at +1) showed that the promoter DNA lies across one face of the holoenzyme and is positioned completely outside the RNAP cleft (PDB: 1L9Z) [38]. Universally conserved aromatic residues in the ı region 2.3 are ideally positioned to stack on the exposed face of the base pair at the upstream edge of the transcription bubble.
Atomic details of í35 element recognition by ı domain 4 (ı4), as revealed by co-crystal structure, showed that the helix-turn-helix motif of ı4 reads the hexameric DNA sequence (PDB: 1KU7) [16]. Details of the interaction between ı2 and the í10 element were obtained from the structure of ı2 bound to the single-stranded í10 element (PDB: 3UGO, 3UGP) [18] and the T. thermophilus RNAP holoenzyme in complex with the promoter DNA fragment (í12 to +12 on DNA) (PDB: 4G7H, 4G7O) [39]. The holoenzyme/DNA complex structure also showed interactions between ı region 3.2 and template DNA, as well as the presence of core enzyme and non-template DNA in the transcription bubble in order to position the transcription start site of the template DNA at the RNAP active center for initiation of transcription. Cellular RNAPs start transcription by de novo RNA priming (Figure 1). Structures of the T. themophilus de novo transcription initiation complex revealed unique contact by the initiating NTPs bound at the transcription start site with template DNA and with RNAP (PDB: 4Q4Z, 4OIO) [40,41]. The RNAP promoter DNA complex was active in the crystalline state and capable of transcribing RNA to 6-mer lengths, allowing to determine the structure of the initial transcription complex (PDB: 4Q5S) [40]. The structure showed that RNAP-RNA contacts stabilized the short RNA transcript in the active site cleft and that the RNA 5' end displaced the ı finger from its position near the active site, which was suggested as a first step in ı release from the RNAP.

Structures of Alternative ı Factors
Bacterial RNAP uses a group I ı factor (ı 70 in E. coli and ı A in other bacteria) for transcribing housekeeping genes required for log-phase growth. Other classes of ı factors, including the group 2 ı factors (ı S , ı H , and ı F in E. coli), the extracytoplasmic function (ECF) ı factors (ı E and FecI in E. coli), and ı N (also known as ı 54 ), direct RNAP to genes for induction of stress responses, flagella synthesis, and spore formation in spore-forming bacteria, such as Bacillus. Structures of these alternative ı factors were determined as stable domains (Table 1) and provided insights into the mechanism of promoter DNA recognition. Structures of some alternative ı factors have been determined as ı/anti-ı factor complexes, and these structures will be described below (Table 2 and Figure 2b). To date, there has been no high-resolution structure that supports bacterial holoenzyme having any alternative ı factor.  a: group I ı factor; b: group II ı factor; c: extracytoplasmic function (ECF) ı factor; X : X-ray crystallography method; C : Cryo-EM single-particle analysis method; N : NMR method.

A New Era of Structural Study of Bacterial Transcription Using E. coli RNA Polymerase
Crystal structures of bacterial RNAPs have been determined using the Thermus genus. Due to a high degree of RNAP sequence conservation among all bacterial species, mechanistic insight derived from the Thermus RNAP structure has been generalized to represent the transcription apparatus in all bacteria. Nevertheless, understanding the structure of E. coli RNAP is essential to fully interpret the enormous amount of biochemical, biophysical, and genetic data that has been collected on E. coli RNAP. In 2013, the first crystal structure of the E. coli RNAP ı 70 holoenzyme was determined (PDB: 4YG2) [35]. E. coli RNAP can be readily prepared using an overexpression system, which allows RNAP structural study to move in a new direction using RNAP mutants.
Characterization of E. coli RNAP has created new possibilities for the structural study of bacterial RNAP. Sixteen structures of E. coli RNAP with transcription factors or small molecules have been determined already, including the long-awaited structure of E. coli RNAP in complex with (p)ppGpp, a master regulator of stress response in bacteria (PDB: 4JK1, 4JK2, 4JKR) [74,75] (Table 3 and Figure 3). The ppGpp binding site of T. thermphilus RNAP was also determined from the crystal structure of the T. thermphilus RNAP-ppGpp complex (PDB: 1SMY) [76], but it failed to verify its relevance to E. coli transcription regulation by ppGpp [77].

Transcription Regulation: How RNA Polymerase Communicates with Transcription Factors
Binding of alternative ı factors to the RNAP core enzyme is regulated by anti-ı factor, which forms a stable complex with its target ı factor and blocks core RNAP binding. Structures of the ı/anti-ı factor complex have shown various ways that the core enzyme-binding interface of ı factor can be masked ( Table 2 and Figure 2b). The ı/anti-ı complex has also contributed to providing high-resolution images of ı factor, since ı factor is notorious for having a heterogenous conformation and is difficult to crystallize in isolation.
Bacterial RNAP physically interacts with protein factors during gene regulation. Most bacterial transcription factors bind upstream of promoter DNA and recruit RNAP to target promoters and/or facilitate formation of the open complex. Three crystal structures of a transcription factor, RNAP domain, and DNA ternary complex have provided insights into how these factors influence transcription activation through different mechanisms (Table 2 and Figure 2a).
Catabolite activator protein (CAP) (also known as cAMP receptor protein CRP) is the most well characterized bacterial transcription factor, providing insights into the interaction between the helix-turn-helix motif and DNA, the allosteric control of DNA binding, and transcription activation. In classical lactose operon regulation [78], CAP binds DNA ~60 bp upstream from the transcription start site (class I transcription factor binding site) and interacts with ĮCTD [6,79]. The crystal structure of the CAP-ĮCTD-DNA ternary complex provided a first view of the transcription factor-RNAP interaction as well as the ĮCTD-DNA interaction (PDB: 1LB2) [61]. The structure showed that these proteins interact with small interfaces, but it is sufficient for RNAP to target the promoter. A low-resolution view of the transcription initiation complex including E. coli RNAP, CAP, and promoter DNA has been determined by cryo-electron microscopy (EMD ID, EMD-5127; PDB: 3IYD) [68].
Bacteriophage Ȝ cI protein (ȜcI) plays a central role in the lytic to lysogenic growth switch and is able to both activate and repress transcription. For activation of transcription, ȜcI binds just upstream of the í35 element (class II transcription factor binding site) and interacts with ı domain 4 (ı4). The crystal structure of the ȜcI-ı4-DNA ternary complex showed that these proteins also use a small interface for recruiting ı4 to the í35 element (PDB:1RIO) [66].
PhoB is a classical two-component response regulator, which interacts with ı4 to activate gene expression. In contrast with ȜcI-dependent transcription activation, PhoB-dependent promoters lack a canonical í35 element. The structure of the PhoB-ı4-DNA ternary complex showed that contact between ı4 and the DNA major groove is less extensive; however, interaction between ı4 and PhoB compensates for this, allowing recruitment of RNAP to a target promoter lacking the í35 element (PDB: 3T72) [67].
Spx is a global transcription regulator in Bacillus subtilis that interacts with a large interface on ĮCTD, thereby forming a stable Spx-ĮCTD complex without a DNA platform. Spx regulates gene expression under conditions of disulfide stress, which is sensed by disulfide bond formation between two cysteine residues in Spx. Crystal structures of both oxidized and reduced forms of the Spx-ĮCTD complex provided insight into how the disulfide bond affects DNA binding (PDB: 1Z3E, 3GFK, 3IHQ) [62][63][64].
Bacteriophages produce transcription factors that bind directly with host RNAPs and hijack host transcription machinery to transcribe phage genomes. Several structural models of bacterial RNAP in complex with phage proteins have been used to explain how phage protein remodels RNAP to redirect transcription machinery to the phage genome (Table 2 and Figure 2a) [17,36,[70][71][72][73].

How Small Molecules Inhibit RNA Transcription
Bacterial RNAP is essential for cell growth and viability, and lack of sequence and structural similarities between certain areas of bacterial and eukaryotic RNAP make this enzyme an attractive target for antibiotic development. Crystal structures of bacterial RNAP in complex with small molecule transcription inhibitors have been used to characterize binding sites, mechanisms of action, and mechanisms of resistance (Table 3 and Figure 3). Currently, two bacterial RNAP inhibitors, rifampin (also known as rifampicin) and fidaxomicin (also known as DIFICID ® and lipiarmycin), have been used in clinical practice. Rifampin, a semisynthetic rifamycin, is the cornerstone of current tuberculosis treatment. Structures of the RNAP-rifampin complex (T. aquaticus core enzyme, PDB: 1YNN; E. coli holoenzyme, PDB: 4KMU) provided a detailed view of the interaction between rifampin and the ȕ subunit and explained how rifampin blocks RNA extension [80,81]. The rifampin binding site is also recognized by sorangicin, which prevents extension of short RNA (PDB: 1YNJ) [82]. Chemical modifications of rifampin have provided an additional RNAP-drug interface that may enhance drug affinity and efficacy, as well as reduce the frequency of spontaneous resistance mutations. Structures of RNAP holoenzymes in complex with rifampin derivatives (T. thermophiles holoenzyme, PDB: 2A68, 2A69; E. coli holoenzyme, PDB: 4KN4, 4KN7) showed an additional interaction with the ı region 3.2 (ı finger) that may influence binding of template DNA at the active site, thereby reducing the efficiency of transcription initiation [81,83]. The structure of RNAP in complex with GE23077 (PDB: 4MQ9, 4OIN, 4OIP), a drug that inhibits nucleotide binding, provided a framework for creating a bipartite compound by tethering rifampin with GE23077, which has superior affinities for both rifampin-susceptible and rifampin-resistant RNAPs (PDB: 4OIR) [41]. Streptolydigin binds to key elements of the RNAP active site (bridge helix and trigger loop) and inhibits all reactions involved in RNAP transcription by blocking conformational changes in these elements that occur during transcription. Structures of the RNAP-streptolydigin complex revealed that it traps the bridge helix and trigger loop in straight and unfolded conformational states, respectively, by direct interaction with these elements (PDB: 2PPB, 1ZYR, 2A6H) [34,42,84].
Myxopyronin binds to the RNAP switch region, which is a hinge of the RNAP clamp whose function is to open and close the DNA-binding channel. Upon binding, myxopyronin inhibits transcript initiation by preventing formation of the promoter open complex. Structural and biochemical studies on the T. thermophilus RNAP-myxopyronin complex (PDB: 3DXJ, 3EQL) have proposed two models of inhibition: myxopyronin inhibits transcription (1) by preventing the opening of the RNAP clamp that permits entry and unwinding of DNA (hinge jamming mechanism) [85] or (2) by interfering with interactions between RNAP and the unwound DNA template strand [86].

Figure 3.
Three-dimensional representation of targets of small-molecule RNAP inhibitors. E. coli RNAP holoenzyme is depicted as an Į-carbon backbone traced with partially transparent molecular surfaces (Į subunits: white; ȕ subunit: cyan; ȕ' subunit: pink; Ȧ subunit: gray; ı 70 : orange). Small-molecule inhibitors bound to RNAP are depicted as CPK models. Chemical structures of small-molecule inhibitors and mechanisms of transcription inhibition are indicated.
A new structure of a bacterial RNAP-inhibitor complex was obtained by using salinamide, which binds the N-terminal end of the bridge helix and prevents nucleotide addition (PDB: 4MEX) [37]. Squaramide is a non-natural bacterial RNAP inhibitor isolated by high-throughput screening, and its binding site was predicted at the switch region, based on analysis of squaramide-resistant RNAP mutants followed by computational modeling using the structure of the T. thermophilus RNAP-myxopyronin complex [88]. Crystal structures of E. coli RNAP in complex with squaramides were determined, structurally confirming that squaramides bind to RNAP switches (PDB: 4YFK, 4YFN) [87].
During the past decade, our understanding of bacterial transcription processes has drastically improvedas a result of structural studies on bacterial RNAP. I hope this review will be a useful reference for researchers who study the mechanism, structure and function of bacterial RNAP transcription and that structures presented in this review will provide guidelines for designing new experiments.