1. Introduction
Bacteriophages were first discovered in the early 20th century due to their ability to kill bacteria [
1]. Apart from their therapeutic uses, bacteriophages were found to encode proteins that carried out many of the same basic processes that are found in eukaryotic cells. The T4 bacteriophage, which infects
Escherichia coli, is one of the best-studied viruses in this group. Its double-stranded DNA genome encodes all of the proteins necessary to carry out viral DNA replication in the infected cell. The components of the T4 replisome can be purified and used to reconstitute DNA replication
in vitro. This system has been well characterized as a model for DNA replication at a fork [
2,
3,
4]. The T4 replisome consists of eight proteins, which together catalyze coordinated leading and lagging strand synthesis (
Figure 1). These proteins are similar in structure and function to their eukaryotic homologues [
5]. Studies on the T4 system have contributed greatly to the understanding of DNA replication and paved the way for current studies on human and yeast DNA replication. This review will cover the current understanding of T4 DNA replication and highlight areas where recent research has yielded new mechanistic insight into functioning of the T4 replisome. For more detail on other prokaryotic model systems, see recent reviews highlighting studies of the T7 bacteriophage and
E. coli replisomes [
6,
7,
8].
Figure 1.
A model of the T4 bacteriophage DNA replisome. Replication of T4 genomic DNA is accomplished by a replication complex composed of eight proteins. The helicase (gp41) and primase (gp61) interact to form the primosome with the assistance of the helicase loader (gp59). The primosome complex encircles the lagging strand DNA, unwinding duplex DNA while synthesizing RNA primers for use by the lagging strand polymerase (gp43). DNA synthesis on both strands is catalyzed by a holoenzyme complex formed by a polymerase (gp43) and a trimeric processivity clamp (gp45). The clamp is loaded onto the DNA by the clamp loader complex (gp44/62). The leading and lagging strand holoenzymes interact to form a dimer. Single-stranded DNA formed by the helicase is coated with single-stranded DNA-binding protein (gp32).
Figure 1.
A model of the T4 bacteriophage DNA replisome. Replication of T4 genomic DNA is accomplished by a replication complex composed of eight proteins. The helicase (gp41) and primase (gp61) interact to form the primosome with the assistance of the helicase loader (gp59). The primosome complex encircles the lagging strand DNA, unwinding duplex DNA while synthesizing RNA primers for use by the lagging strand polymerase (gp43). DNA synthesis on both strands is catalyzed by a holoenzyme complex formed by a polymerase (gp43) and a trimeric processivity clamp (gp45). The clamp is loaded onto the DNA by the clamp loader complex (gp44/62). The leading and lagging strand holoenzymes interact to form a dimer. Single-stranded DNA formed by the helicase is coated with single-stranded DNA-binding protein (gp32).
2. T4 Replication Fork Components
T4 replication can be initiated via several different pathways [
9]. Two specialized structures, R-loops and D-loops, have been shown to be important. R-loops form at T4 origin sites where an RNA primer is synthesized. D-loops are formed by the recombination machinery and are used to initiate origin-independent DNA synthesis. These two mechanisms of DNA replication initiation of have been reviewed elsewhere [
10].
Synthesis of the T4 genomic DNA is accomplished by a holoenzyme complex composed of the gp43 polymerase and the gp45 sliding clamp [
11,
12,
13]. On the leading strand, DNA synthesis is carried out continuously by one holoenzyme complex. On the lagging strand, DNA is synthesized in the opposite direction of the progression of the replication fork. Multiple priming events allow a second holoenzyme complex to carry out DNA synthesis discontinuously in 1 to 2 kb fragments known as Okazaki fragments. While there is no available crystal structure for the T4 gp43, the structure for the RB69 bacteriophage gp43 has been solved alone and as part of a binary and ternary complex [
14,
15,
16]. The two proteins are 62% identical and 74% similar and thus, the proteins are likely very similar in topology. The RB69 structure reveals five conserved domains in a configuration similar to that of the eukaryotic B family polymerases. The
N-terminus contains a 3′ to 5′ exonuclease active site. This truncated exonuclease domain from T4 gp43 has been isolated and the structure solved [
12]. The catalytic activity of this domain is independent from the rest of the polymerase, as it retains full exonuclease activity
in vitro [
17]. The
C-terminus of RB69 gp43 is organized into conserved finger, palm, and thumb domains, which catalyze DNA polymerization 5′ to 3′ [
15].
The T4 sliding clamp, gp45, is a ring-shaped, trimeric protein that serves as a processivity factor for the polymerase [
18,
19]. The inner diameter of the ring is about 35 Å, which is large enough to accommodate duplex DNA. Unlike clamps in other systems, the T4 clamp exists in solution as a partially open ring with one of the three subunit interfaces disrupted [
20,
21,
22]. Once loaded onto DNA, the interior of the clamp interacts with the DNA phosphate backbone through a number of basic residues and anchors the polymerase to the DNA [
19]. gp43 has a C-terminal PIP box domain that mediates the interaction of the polymerase and the sliding clamp [
23].
The circular gp45 clamp is loaded onto the DNA by a clamp loader complex. In T4, four gp44 subunits associate with one gp62 subunit forming the gp44/62 clamp loader [
24]. Each gp44 subunit binds ATP and the complex has a strong DNA-dependent ATPase activity [
25,
26]. The clamp loader is a member of the AAA+ family of ATPases, but unlike other enzymes of this type, clamp loaders are pentameric rather than hexameric. This asymmetry results in a gap that allows the clamp loader to specifically recognize the primer-template junction when loading a clamp [
27,
28].
The T4 helicase, gp41, forms a hexamer upon binding GTP or ATP [
29]. This active form of the helicase hydrolyzes GTP/ATP to move along single-stranded DNA [
30,
31]. Electron microscopy has revealed that there are two forms of the hexameric gp41, a symmetric ring and a gapped asymmetric ring [
32]. The “open” ring is thought to be important for the loading of the helicase onto DNA [
29]. As part of the replication fork, gp41 unwinds the double stranded DNA by traveling 5′ to 3′, encircling the lagging strand while excluding the leading strand [
33]. The preferred substrate for the helicase is a forked DNA with both 5′ and 3′ single-stranded DNA regions, suggesting the protein interacts with both the leading and lagging strands [
33,
34]. T4 also encodes two other helicases, UvsW and Dda. Both accessory helicases have been suggested to have roles in replication initiation, recombination, and repair (see review [
35]).
Priming on the lagging strand is catalyzed by the gp61 primase, which interacts with gp41 to form the primosome [
36]. This primosome synthesizes pentaribonucleotides from 5′-GTT-3′ priming sites. The 3′-T is necessary for priming but is not used to template the primer; the resulting primers have the sequence 5′-pppACNNN-3′ [
37]. At high concentrations
in vitro, gp61 alone can synthesize some RNA primers, but they are typically dimers primed from a 5′-GCT-3′ site [
37,
38]. In the presence of gp41, the rate of primer synthesis increases and shifts to pentaribonucleotide products primed from 5′-GTT-3′ sites, which is the priming site used
in vivo [
38,
39]. gp61 alone is monomeric, but in the presence of gp41 and/or DNA, it oligomerizes into a hexameric ring [
32,
40].
Exposed single-stranded DNA is bound by gp32, which is necessary for DNA replication
in vivo. It has many functions including preventing the formation of DNA secondary structure, protecting DNA from nuclease digestion, and stimulation of the gp43 synthesis rate and processivity [
41,
42,
43]. A crystal structure of gp32 in complex with DNA reveals three domains. The N-terminus binds other gp32 monomers allowing for oligomerization, the
C-terminus mediates interactions with other proteins such as the T4 polymerase, and the core domain binds single-stranded DNA [
44].
In vivo a helicase loader, gp59, is required for origin-dependent initiation of replication [
45]. In the presence of gp32, the helicase cannot efficiently load onto the DNA fork without the addition of gp59. gp59 interacts with gp41 stoichiometrically and helps to displace gp32, allowing the helicase to load [
46]. gp59 is thought to mediate loading by inducing a conformational change in gp41 that promotes DNA binding [
47]. It is unclear if gp59 dissociates or remains as part of the replication complex [
48,
49]. Binding events between gp43 and gp59 have been observed using single-molecule FRET [
50].
4. Holoenzyme Processivity
The holoenzyme on the leading strand synthesizes DNA in the same direction as the movement of the replication fork.
In vivo, the T4 genome can be synthesized within 15 min [
55]. The half-life of the holoenzyme complex has been measured as 11 min as part of a moving fork and about 6 min on a small, defined DNA fork structure [
56,
57]. Given the half-life of the holoenzyme and the speed of synthesis, it is possible that the entire T4 genome could be synthesized by a single holoenzyme on the leading strand. While this highly processive holoenzyme would be advantageous on the leading strand, the lagging strand is synthesized discontinuously and the holoenzyme must repeatedly dissociate and rebind for synthesis of each Okazaki fragment.
A more recent study probing the processivity of the T4 holoenzyme confirmed the long half-life during replication using a standard dilution experiment [
58]. However, it was found that an inactive mutant of the polymerase (D408N) was able to rapidly displace the wild-type polymerase and inhibit DNA synthesis. This inhibition occurred on both the leading and lagging strands. These results suggest that although the polymerase will not readily dissociate on its own, it can be actively displaced by a second polymerase without affecting DNA synthesis. The exchange process was termed dynamic processivity and is thought to be mediated through interactions with gp45 [
58]. The
C-terminus of gp43 is essential for polymerase binding to the clamp, but its deletion does not affect DNA polymerization [
23]. When polymerase containing this deletion was used as a trap, it could no longer displace the replicating polymerase [
58]. As the clamp is trimeric, it is hypothesized that multiple polymerases could bind and facilitate the exchange. This “toolbelt” model for the clamp has been suggested in other systems as well, with numerous proteins involved in DNA replication and repair also containing clamp binding domains [
59,
60]. In the T7 system, where there is no sliding clamp, the exchange process has been shown to be mediated by an interaction between the polymerase and the helicase [
61]. It is thought that the helicase can bind multiple polymerases facilitating exchange on the leading strand and recycling on the lagging strand.
5. Coupling of Helicase and Polymerase for Leading Strand Synthesis
While both gp41 helicase and gp43/gp45 holoenzyme can function independently
in vitro to unwind duplex DNA, the two enzymes work best when their activities are combined. The helicase alone is significantly slower and less processive than the replication fork, and the holoenzyme is very inefficient at strand displacement synthesis [
33,
62]. Together, the helicase and holoenzyme are able to efficiently carry out leading strand synthesis [
63]. In the presence of a macromolecular crowding reagent, only gp43 and gp41 are needed, indicating the clamp does not play a role [
64]. While the functional coupling between the two proteins has been clearly demonstrated, there is no evidence of a physical interaction between gp43 and gp41 [
65,
66]. One study also found that the T4 polymerase could be replaced with another processive polymerase and still carry out strand displacement synthesis, but could not be replaced with a low processivity polymerase [
65]. This suggests that each enzyme is stabilized on the DNA replication fork by the activity of the other, with the helicase providing single-stranded DNA that the polymerase then traps.
In the T7 system, it was reported that nucleotide incorporation by the polymerase provided the driving force to stimulate helicase activity, but a detailed mechanism for helicase-polymerase coupling was not described [
67]. A more recent single-molecule study of the coupling in the T4 system used magnetic tweezers to monitor both coupled and uncoupled activity [
68]. A DNA hairpin was tethered to a glass slide with a magnetic bead on the other end. Force was applied to destabilize the duplex and assist enzymes in opening the hairpin. At low force, where the duplex of the hairpin is stable, the helicase moved at 6 times slower than its maximal translocation rate and showed sequence dependent pausing. As higher force was applied, the rate of helicase activity increased dramatically. Additionally, at low helicase concentrations, significant helicase slippage was observed involving the reannealing of tens to hundreds of base pairs. This fits with the passive model of helicase activity previously demonstrated, in which the helicase is not efficient in destabilizing duplex DNA and relies on transient fraying of base pairs to move forward [
69].
The T4 holoenzyme was found to have very low strand displacement activity at low force and mainly exhibited exonuclease activity [
68]. When higher forces destabilized the duplex, the holoenzyme was able to replicate the hairpin at maximal speeds. At moderate forces, the holoenzyme exhibited pausing and stalling. The proportion of holoenzymes observed synthesizing DNA, pausing, or degrading DNA was highly dependent on the force used. This indicates that at higher forces the holoenzyme is able to stay in the polymerization mode, while lower forces shift the holoenzyme to the exonuclease mode. When pausing and exonuclease events were excluded from analysis, the holoenzyme activity fits with a model of a strongly active motor. The basis for collaborative coupling then emerges in a model where the helicase provides the single-stranded DNA for the holoenzyme, but also prevents the fork regression pressure from switching the polymerase into the exonuclease mode. As the holoenzyme is kept in its highly processive polymerization mode, it stimulates the activity of the helicase and prevents slippage backwards [
68].
6. Coordination of Helicase and Priming on the Lagging Strand
The leading and lagging strands are thought to be synthesized at the same net rate, despite the need for repeated priming and extension events on the lagging strand [
4,
70,
71]. Priming is catalyzed by a gp61-gp41 complex known as the primosome. Both priming and DNA unwinding activity are stimulated when both proteins are present [
34,
38,
39]. There is strong biochemical evidence for the interaction of the hexameric gp41 helicase and oligomeric gp61 primase [
34,
36,
72,
73]. Importantly, a gp61-gp41 fusion protein has been shown to have close to wild-type priming and helicase activity and can successfully catalyze coordinated leading and lagging strand synthesis [
74].
This tight coordination of activity is clear, despite the fact that the helicase travels 5′ to 3′ unwinding duplex DNA while the primase synthesizes primers 3′ to 5′ on the same strand. There are three models for how this coupling can occur. The first model suggests that the helicase, and possibly the whole replisome, pauses while the primers are being synthesized. In the second model, primase subunits dissociate from the helicase and are left behind to synthesize primers. In the third model, coupling is accomplished by the formation of priming loops wherein the lagging strand folds back allowing for priming. The loop is then released after the primer is synthesized.
By observing helicase and priming activity on DNA hairpins using magnetic tweezers, the role of the three models in the T4 primosome could be directly observed [
74]. In the T7 system, both pausing of the primosome [
75] and priming loops have been reported [
76]. The T4 study yielded no evidence of pausing of the T4 primosome. However, clear evidence of both primase disassembly and looping were seen in these experiments, indicating that there are two different mechanisms used by T4 to couple the helicase and primase (
Figure 2). While primase disassembly was the predominant mode, in the case where the primase and helicase were fused only the looping mechanism was seen.
Figure 2.
The two models of primosome activity used by T4 to initiate lagging strand synthesis. The helicase (gp41) and primase (gp61) interact as stacked rings encircling the lagging strand. This complex unwinds duplex DNA while synthesizing pentaribonucleotide RNA primers for use by the lagging strand polymerase (gp43). Primer synthesis occurs while the helicase continues to unwind DNA in the opposite direction. Two models have been proposed to accommodate these coupled activities. In the primosome disassembly model (shown left), one of the primase subunits dissociates from the primosome complex and remains with the newly synthesized primer. In the DNA looping model (shown right), the excess DNA unwound by the helicase during primer synthesis loops out allowing the primase to stay intact. In both models, the clamp loader (gp44/62) loads a clamp (gp45) onto the newly synthesized primer. The lagging strand polymerase is then signaled to release and recycle to the new primer.
Figure 2.
The two models of primosome activity used by T4 to initiate lagging strand synthesis. The helicase (gp41) and primase (gp61) interact as stacked rings encircling the lagging strand. This complex unwinds duplex DNA while synthesizing pentaribonucleotide RNA primers for use by the lagging strand polymerase (gp43). Primer synthesis occurs while the helicase continues to unwind DNA in the opposite direction. Two models have been proposed to accommodate these coupled activities. In the primosome disassembly model (shown left), one of the primase subunits dissociates from the primosome complex and remains with the newly synthesized primer. In the DNA looping model (shown right), the excess DNA unwound by the helicase during primer synthesis loops out allowing the primase to stay intact. In both models, the clamp loader (gp44/62) loads a clamp (gp45) onto the newly synthesized primer. The lagging strand polymerase is then signaled to release and recycle to the new primer.
7. Recycling of the Lagging Strand Polymerase
The trombone model was proposed to explain the coordination of leading and lagging strand synthesis with the two polymerases synthesizing in opposite directions. In this model, the lagging strand DNA loops out during the formation of each Okazaki fragment [
4]. These loops have been visualized in electron micrographs of T4 replication products [
48]. The lagging strand polymerase is retained as part of the replisome after completing synthesis of each Okazaki fragment [
4]. It dissociates from the DNA, but then rapidly binds the next primer to continue synthesis. This recycling of the lagging strand polymerase is supported by numerous studies. While the clamp, clamp loader, primase, and gp32 have all been shown to exchange with proteins in solution during replication, the polymerase is resistant to dilution [
77,
78,
79,
80]. The size of the Okazaki fragments is also independent of polymerase concentration [
4,
58]. Importantly, the leading and lagging strand polymerases interact in the presence of DNA, which provides a mechanism for tethering the lagging strand polymerase to the replisome [
66].
While the holoenzyme on the leading strand is highly processive, on the lagging strand it must repeatedly dissociate. The trigger for the dissociation of the lagging strand polymerase has not clearly been defined despite a number of studies. Several models have been proposed with two gaining the most support and evidence suggests that both play a role during replication [
81]. The collision model proposes that the lagging strand polymerase dissociates after colliding with the end of the previous Okazaki fragment, and this stimulates the primase to synthesize a new primer [
62,
82]. However, it has been also shown that dissociation of the lagging strand polymerase can occur before reaching the previous Okazaki fragment leaving single-stranded DNA gaps [
81]. To account for this observation, the signaling model has been proposed where recycling is triggered by the synthesis of a new primer and the timing controlled by gp61 [
80,
81,
83]. Recently, additional signals have been proposed to regulate this recycling in other replication systems such
E. coli and T7. These new triggers include tension induced dissociation of the polymerase [
84], primer availability [
85], and a third polymerase [
86]. While it has been shown that a third T4 polymerase does not seem to play a role in Okazaki fragment synthesis [
87], the nature of the signal for recycling is still unknown.