The DNA Recognition Motif of GapR Has an Intrinsic DNA Binding Preference towards AT-rich DNA

The nucleoid-associated protein GapR found in Caulobacter crescentus is crucial for DNA replication, transcription, and cell division. Associated with overtwisted DNA in front of replication forks and the 3′ end of highly-expressed genes, GapR can stimulate gyrase and topo IV to relax (+) supercoils, thus facilitating the movement of the replication and transcription machines. GapR forms a dimer-of-dimers structure in solution that can exist in either an open or a closed conformation. It initially binds DNA through the open conformation and then undergoes structural rearrangement to form a closed tetramer, with DNA wrapped in the central channel. Here, we show that the DNA binding domain of GapR (residues 1–72, GapRΔC17) exists as a dimer in solution and adopts the same fold as the two dimer units in the full-length tetrameric protein. It binds DNA at the minor groove and reads the spatial distribution of DNA phosphate groups through a lysine/arginine network, with a preference towards AT-rich overtwisted DNA. These findings indicate that the dimer unit of GapR has an intrinsic DNA binding preference. Thus, at the initial binding step, the open tetramer of GapR with two relatively independent dimer units can be more efficiently recruited to overtwisted regions.


Introduction
GapR is a nucleoid-associated protein (NAP) that is highly conserved throughout the α-proteobacteria. In Caulobacter crescentus, GapR is essential for normal growth and cell division [1]. Depleting GapR leads to defects in DNA replication, chromosome segregation, and cell division, while constitutive expression of GapR is lethal [2,3].
GapR shows a preference towards AT-rich DNA and overtwisted DNA, with a cellcycle-dependent and transcription-activity-related dynamic chromosome-binding pattern [1,3,4]. It is more enriched around the origin of replication at the swarmer cell stage and moves with the replication fork after chromosome replication initiates [2]. It is also associated with highly expressed genes and operons [3]. Both the DNA synthesis inhibitor novobiocin and RNA polymerase inhibitor rifampicin can lead to the redistribution of GapR within minutes [2]. The binding of GapR to overtwisted DNA sequences in front of the replication forks or downstream highly transcribed genes stimulates gyrase and topo IV to remove the (+) supercoils that arise during the unwinding of the DNA duplex, facilitating replication and transcription [3,5].
Monica et al. firstly determined the crystal structure of GapR in complex with DNA [3]. They found that GapR can form a dimer-of-dimers that exists as a closed ring and fully encircles DNA in the middle. This structure suggested that GapR might be only able to recognize overtwisted DNA, as the larger diameter of B-DNA cannot be accommodated. Moreover, as it is implausible for the circular genomic DNA to enter into a closed ring, they proposed that free GapR exists as a dimer, and it can track along DNA and search for sites with narrow minor and expanded major grooves, where it reorganizes with another dimer to form a closed tetramer with DNA in the center [3]. However, several later studies have found that GapR remains a tetramer in solution even when DNA is absent [6][7][8]. Michael et al. proposed another model for GapR binding to DNA, in which GapR tetramer adopts an open conformation when it initially binds to DNA and undergoes structural rearrangement to fully encircle the DNA [6]. The second binding model is supported by our findings that free GapR adopts multiple conformations in solution, which are in dynamic exchange equilibrium, and it indeed adopted an open tetramer conformation in another crystal structure in complex with DNA [7]. Michael et al. also found that, besides overtwisted DNA, GapR can also bind to B-form DNA, as the central channel size is slightly adjustable. They proposed that GapR tetramer may scan along DNA and search for overtwisted high-binding sites [6].
However, why GapR prefers to bind overtwisted DNA has not been fully elucidated, although the size of the central channel may provide an obscure explain. Besides, when GapR gets released from DNA by the moving RNA/DNA polymerase, it would be less efficient for it to randomly bind to another region of the chromosome and scan for new functional sites as a closed tetramer, as the moving of GapR may be hindered by other nucleoid-associated proteins and the complex structure of the chromosome.
Here, we show that a GapR truncation mutant GapR ∆C17 , which represents the DNA recognition motif of full-length GapR and exists as a dimer, exhibits a DNA binding preference similar to full-length GapR. We determined the crystal structure of GapR ∆C17 and illuminated its DNA recognition mechanism through NMR titration-based Haddock modeling. These findings indicate that the dimer unit of GapR has an intrinsic DNA binding preference towards AT-rich overtwisted DNA, which does not need the formation of a closed tetramer. It suggests that, at the initial binding step, as an open tetramer with two relatively independent dimer units, free GapR can efficiently target to overtwisted regions.

Dimeric GapR ∆C17 Has the Same Fold as the Dimer Units of Full-Length Tetramer GapR
In a previous study, we found that GapR ∆C17 (residues 1-72, 8.2 kDa), with most of the α3 helix residues deleted, exists as a dimer in solution, as the molecular-weight values for GapR ∆C17 determined by using analytical ultracentrifugation (AUC) and size-exclusion chromatography coupled to multi-angle light-scattering (SEC-MALS) are 19.4 and 19.8 kDa, respectively. Moreover, a dimer, but not tetramer band, was detected in the chemical cross-linking experiment [7].
Here, we have determined the crystal structure of GapR ∆C17 with a resolution of 2.08 angstrom (PDB code 6K5X), with the crystallographic data statistics summarized in Table 1. Consistent with previous studies, this structure reveals a homodimer of GapR ∆C17 . Each monomer adopts an L-shape with two α-helices (α1, residues 14-51; α2, residues 55-64) ( Figure 1A). Besides the salt bridges formed by residues R26-E47, E28-R65, and E31-K66 ( Figure 1B), the dimer is mainly stabilized by intermolecular hydrophobic packing ( Figure 1C). The surface on the α2 helix side is highly positively charged, while the other side is negatively charged ( Figure 1D). GapR ∆C17 generally has the same fold as the dimer units of the full-length tetramer protein (backbone RMSD = 0.95 Å), with slight changes in the positions of two α2 helices ( Figure 1E).

GapR ∆C17 Can Bind to DNA with a Preference towards AT-rich Sequences
Compared with the full-length protein, GapR ∆C17 contains all the residues interacting with DNA and is essentially the DNA recognition motif of GapR. Interactions between GapR ∆C17 and DNA were studied by using 2D 1 H-15 N HSQC-based NMR titration experiments, as most of the backbone NH signals of GapR ∆C17 could be observed and assigned (Figure 2A). Adding double-stranded 6A DNA (CGCAAAAAAGCG) molecules to GapR ∆C17 caused the gradual shifts of many NH signals ( Figure 2B). The most significantly affected residues are L30, E31, E33, A35, E36, M38, K42, E43, V44, A46, E47, K49, V55, K59, V61, R63, R69, and R72, with combined chemical shift changes over 0.06 ppm, while the N-terminal region residues are less perturbed ( Figure 2C). When different DNA molecules were added, the NH signals of GapR ∆C17 moved almost in the same directions, and the residues referred above are always the most affected ( Figure 3C,D), suggesting that the binding modes of GapR ∆C17 to these DNA molecules should be the same. However, DNA molecules with AT-rich sequences in the middle, such as 6A and 3AT DNA, caused much more significant chemical shift perturbations than those with GC-rich sequences in the middle, such as 3CG and 6CG DNA ( Figure 3C,D), suggesting that the binding affinities of GapR ∆C17 towards AT-rich DNA molecules are stronger. The dissociation equilibrium constants (K d ) of GapR ∆C17 to 6A and 6CG DNA were measured by using microscale thermophoresis (MST) experiments, which are 1.1 ± 0.1 µM and 18 ± 2 µM ( Figure 3E), respectively. These results suggest that GapR ∆C17 has a preference towards AT-rich sequences, which is the same as the full-length tetrameric GapR protein.

GapR ∆C17 Binds DNA at the Minor Groove through Electrostatic Interactions between Its Lysine/Arginine Network and DNA Phosphate Groups
Based on the NMR titration experiments, the structure models of GapR ∆C17 in complex with DNA were built by using the Haddock 2.4 program [10] to elucidate the DNA binding mechanism of GapR ∆C17 . When the crystal structure of 6A DNA (PDB code 1D98) [11] was used, the top cluster contains 39 models with an RMSD of 4.2 ± 1.2 Å (compared with the model that has the lowest Haddock score) ( Table 2). These models indicate that the GapR ∆C17 dimer binds 6A DNA at the minor groove, with two α1 helices parallel to DNA backbones and two α2 helices parallel to the axis of DNA ( Figure 3A). Residues K34, K42, and K49 from two α1 helices and residues K56, K59, R63, and K66 from two α2 helices bind to DNA phosphate groups on the two sides of the minor groove through electrostatic interactions, with no sequence-specific interactions to DNA base pairs. Although not all of these 14 residues show close contact with DNA simultaneously, they form a positively charged clamp that half wraps the DNA, which may tolerate the rotation and shift of the protein on DNA. The 39 models in the cluster do not exhibit a uniform binding pattern of these residues, and the right figure of Figure 3A shows a schematic diagram of the interactions in the representative structure with the lowest Haddock score. The electrostatic potential surface of GapR ΔC17 computed by using APBS [9]. (E) Superposition of the GapR ΔC17 dimer with a dimer unit from the full-length tetrameric GapR (PDB code 6CG8).

GapR ΔC17 Can Bind to DNA with a Preference towards AT-rich Sequences
Compared with the full-length protein, GapR ΔC17 contains all the residues interacting with DNA and is essentially the DNA recognition motif of GapR. Interactions between GapR ΔC17 and DNA were studied by using 2D 1 H-15 N HSQC-based NMR titration experiments, as most of the backbone NH signals of GapR ΔC17 could be observed and assigned The electrostatic potential surface of GapR ∆C17 computed by using APBS [9]. (E) Superposition of the GapR ∆C17 dimer with a dimer unit from the full-length tetrameric GapR (PDB code 6CG8). When different DNA molecules were added, the NH signals of GapR ΔC17 moved almost in the same directions, and the residues referred above are always the most affected ( Figure 3C,D), suggesting that the binding modes of GapR ΔC17 to these DNA molecules should be the same. However, DNA molecules with AT-rich sequences in the middle, such as 6A and 3AT DNA, caused much more significant chemical shift perturbations than those with GC-rich sequences in the middle, such as 3CG and 6CG DNA ( Figure  3C,D), suggesting that the binding affinities of GapR ΔC17 towards AT-rich DNA molecules are stronger. The dissociation equilibrium constants (Kd) of GapR ΔC17 to 6A and 6CG DNA preference towards AT-rich sequences, which is the same as the full-length tetrameric GapR protein.

GapR ΔC17 Binds DNA at the Minor Groove through Electrostatic Interactions between its lysine/Arginine Network and DNA Phosphate Groups
Based on the NMR titration experiments, the structure models of GapRΔC17 in complex with DNA were built by using the Haddock 2.4 program [10] to elucidate the DNA binding mechanism of GapR ΔC17 . When the crystal structure of 6A DNA (PDB code 1D98) [11] was used, the top cluster contains 39 models with an RMSD of 4.2 ± 1.2 Å (compared with the model that has the lowest Haddock score) ( Table 2). These models indicate that the GapR ΔC17 dimer binds 6A DNA at the minor groove, with two α1 helices parallel to DNA backbones and two α2 helices parallel to the axis of DNA ( Figure 3A). Residues K34, K42, and K49 from two α1 helices and residues K56, K59, R63, and K66 from two α2 helices bind to DNA phosphate groups on the two sides of the minor groove through electrostatic interactions, with no sequence-specific interactions to DNA base pairs. Although not all of these 14 residues show close contact with DNA simultaneously, they form a positively charged clamp that half wraps the DNA, which may tolerate the rotation and shift of the  For the validation of the DNA binding mode of GapR ∆C17 , NMR titration-based competition experiments were performed with a DNA minor groove binding molecule, netropsin. When netropsin was gradually added to a sample of GapR ∆C17 /6A complex, the NH signals of GapR ∆C17 shifted to the positions of free GapR ∆C17 (Figure 4A), indicating that netropsin strongly interferes with the interaction between GapR ∆C17 and 6A DNA, which supports the minor groove binding mode of GapR ∆C17 . Moreover, the DNA binding affinities of the K34A, K42A, K49A, K56A, K59A, R63A, and K66A mutant proteins with 6A DNA were measured by using MST experiments, which are significantly lower compared with the wild-type protein ( Figure 4B), indicating that all of these lysine/arginine residues are important for DNA binding. These residues are highly conserved among GapR homologs ( Figure 4C).  A 12-bp GC-rich DNA molecule 3CG (CTACGCGCGTAG) (PDB code 5MVK) [14] was also docked with GapR ΔC17 , and the results were compared with those of 6A DNA. GapR ΔC17 also binds 3CG DNA at the minor groove, which is similar to 6A DNA ( Figure  3B). However, the 3CG DNA is less twisted with much wider minor groove than 6A DNA ( Figure 3C), which might be less matched in shape with GapR ΔC17 , as the GapR ΔC17 /3CG DNA models in the top cluster revealed fewer close contacts between DNA phosphate groups and the interface lysine/arginine residues, and their intermolecular electrostatic energies are generally higher than those of GapR ΔC17 /6A DNA models ( Figure 3D). These findings should explain why AT-rich overtwisted DNA sequences with narrow minor grooves are more preferred by GapR ΔC17 .
The DNA binding mode of GapR ΔC17 is similar to those of the two dimer units in the The complex models of GapR ∆C17 /6A DNA suggest that DNA shape and the spatial distribution of phosphate groups should be related to the binding affinity by affecting the electrostatic interaction patterns. AT-rich DNA generally have higher levels of propellertwist [12], and overtwisting can narrow the diameter and minor groove widths of DNA [13], which might be more preferred by GapR ∆C17 .
A 12-bp GC-rich DNA molecule 3CG (CTACGCGCGTAG) (PDB code 5MVK) [14] was also docked with GapR ∆C17 , and the results were compared with those of 6A DNA. GapR ∆C17 also binds 3CG DNA at the minor groove, which is similar to 6A DNA ( Figure 3B). However, the 3CG DNA is less twisted with much wider minor groove than 6A DNA ( Figure 3C), which might be less matched in shape with GapR ∆C17 , as the GapR ∆C17 /3CG DNA models in the top cluster revealed fewer close contacts between DNA phosphate groups and the interface lysine/arginine residues, and their intermolecular electrostatic energies are generally higher than those of GapR ∆C17 /6A DNA models ( Figure 3D). These findings should explain why AT-rich overtwisted DNA sequences with narrow minor grooves are more preferred by GapR ∆C17 .
The DNA binding mode of GapR ∆C17 is similar to those of the two dimer units in the open tetramer/DNA complex, which also mainly bind DNA at the minor groove. Thus, the open tetramer/DNA complex should represent a local energy minimum state in which two dimer units both adopt a high-affinity DNA binding mode, which was stabilized during crystallization.

Discussion
Binding of GapR to the overtwisted regions in front of the replication fork and RNA polymerase stimulates gyrase and topo IV to eliminate (+) supercoils, which is essential for normal replication and transcription [3]. Throughout the cell cycle, the GapR level remains relatively constant. The moving of the replication folk continually recruits GapR in the front and leaves zones of GapR depletion behind [4]. Moreover, when a transcription process ends and another transcription event occurs, GapR should reach the new locus in time. Thus, GapR shows a dynamic distribution pattern and can redistribute rapidly after antibiotic treatments targeting DNA replication and transcription [2]. How GapR quickly binds to new sites is not well understood.
It was found that the size of the central tunnel of GapR in the closed form can have minor variations, which can accommodate both B-form DNA and overtwisted DNA [6]. Moreover, in solution, the relative position of the bound DNA inside the central channel of the closed tetramer changes dynamically [7]. Thus, it was proposed that GapR slides along DNA to scan for high-affinity sites and changes its position when the structure of DNA is affected by replication or transcription. However, as the chromosome may have complicated structures organized by other NAPs [15], this diffusion might be hindered and less efficient for the global redistribution of GapR.
Here, we show that GapR ∆C17 prefers to bind to AT-rich DNA with narrow minor grooves, implying an alternative way for the fast redistribution of GapR. The full-length GapR has two dimer units, each representing a DNA recognition motif and having an intrinsic DNA binding preference, conferring GapR with the ability to selectively bind to AT-rich overtwisted regions in the open form, even before it fully encircles the DNA. Thus, when released into the cytoplasm by the moving replication fork or the transcription machine, GapR can redistribute to other high-affinity sites with high efficiency.

Protein Expression and Purification
The coding sequence for GapR ∆C17 was synthesized according to the gapr gene (CCNA_03428, UniProt) in Caulobacter crescentus, which was then cloned into the NdeI and XhoI sites of a pET-21a (+) vector (Novagen, Beijing, China), followed by a C-terminal His 6tag. Point mutations of GapR ∆C17 were generated by using the site-directed mutagenesis kit (SBS Genetech, Beijing, China). The plasmids were then transfected into Escherichia coli Rosetta (DE3) competent cells (CWBIO, Beijing, China). Bacteria were cultured in 1 L Luria-Bertani (LB) medium at 35 • C, until the optical density at 600 nm reached 0.8. Expression of unlabeled proteins was then directly induced with 0.5 mM IPTG (isopropylthio-β-Dgalactoside). For the preparation of NMR samples, bacteria were centrifuged (3260 rpm, 5 min) and transferred to 500 mL 15 N labeled or 15 N/ 13 C/ 2 H labeled M9 medium and recovered for 40 min before expression induction.
After 6 h of treatment of IPTG, bacteria were harvested by centrifugation, resuspended in 30 mL lysis buffer (50 mM sodium phosphate, 1 M NaCl, and 20 mM imidazole, at pH 8.0), and then lysed by sonication. After centrifugation, the target protein in the supernatant was purified by using an Ni-NTA column (Qiagen, Hilden, Germany) by standard methods and eluted in the elution buffer (50 mM sodium phosphate, 1 M NaCl, and 250 mM imidazole, at pH 8.0). The sample was further purified by size-exclusion chromatography on a Superdex 75 column (GE Healthcare Life Sciences, USA) equilibrated in 50 mM sodium phosphate and 150 mM NaCl, at pH 7.0. Fractions corresponding to target proteins, as confirmed by SDS/PAGE (Genscript, Beijing, China), were pooled, concentrated, and stored at −80 • C.

Protein Crystallization
GapR ∆C17 was concentrated to 10 mg/mL in a buffer containing 50 mM HEPES (pH 7.0) and 300 mM NaCl and crystallized by hanging drop vapor diffusion at room temperature, using an equal volume of protein sample and crystallization solution consisting of 0.1 M sodium cacodylate (pH 6.0) and 15% PEG4000. Crystals of the selenomethionyl derivative of the construct grew under similar conditions. Crystals were frozen in liquid nitrogen after a quick soak in a cryo-protectant comprising crystallization solution with 30% glycerol for data collection.

Crystallographic Structure Determination
The diffraction data of native protein were collected at beamline 3W1A at Beijing Synchrotron Radiation Facility, while the diffraction data of SAD protein were collected at beamline BL17U at Shanghai Synchrotron Radiation Facility (Shanghai, China). Both the native and SAD datasets were integrated and scaled by using HKL2000 [16]. The space group identified for GapR ∆C17 was I4 1 22, with one molecule in the asymmetric unit of the native crystal. The Phenix v1.19.2 program (Berkeley, CA, USA) was used to locate the heavy atoms and to calculate the initial phases, leading to an interpretable electron density map. The manual model building was carried out using the program Coot [17]. Finally, a model that was refined to an R work /R free of 25.1%/26% was obtained. Data collection and the statistics of the final model are summarized in Table 1.

NMR Titration Experiments
The NMR sample of GapR ∆C17 used for NMR titration experiments contained 0.1 mM 15

MST Assay
A total of 200 nM wild-type or mutant GapR ∆C17 dimer was incubated in the dark with a red fluorescent dye NT-647 in a buffer with 20 mM PBS, 150 mM NaCl, and 0.02% (w/v) ndodecyl-β-D-maltoside (pH = 7), for 30 min, at room temperature. Samples contain 20 nM labeled dimer with 16 different ratios of DNA molecules. MST assays were performed at 60% MST power and 40% excitation power, using the Monolith NT.115 instrument (NanoTemper Technologies, München, Germany). Each measurement was repeated three times. Data analyses were performed by using the MO Affinity Analysis v.2.2.4 software (München, Germany).

Generation of Haddock Models of GapR ∆C17 and DNA
Crystal structures of GapR ∆C17 , 6A DNA, and 3CG DNA were subjected to energy minimization, using AMBER before docking. GapR ∆C17 residues L30, E31, E33, A35, E36, M38, K42, E43, V44, A46, E47, K49, V55, K59, V61, R63, R69, and R72 that were most significantly perturbed in the NMR titration experiments, and DNA residues 5-11 and 17-23 were defined as positive residues, in order to prevent translational displacements of GapR ∆C17 along DNA. Passive residues were defined automatically. Solvated docking was performed by using the guru interface of the server with the default parameters [20]. The number of structures for rigid docking was 1000. Two hundred best models were subjected to semi-flexible refinement and final refinement, which were then clustered with an RMSD cutoff of 7.5 Å and a minimum cluster size of 4.

Informed Consent Statement: Not applicable.
Data Availability Statement: The structure of GapR ∆C17 was deposited at The RCSB Protein.