A Bit Stickier, a Bit Slower, a Lot Stiffer: Specific vs. Nonspecific Binding of Gal4 to DNA

Transcription factors regulate gene activity by binding specific regions of genomic DNA thanks to a subtle interplay of specific and nonspecific interactions that is challenging to quantify. Here, we exploit Reflective Phantom Interface (RPI), a label-free biosensor based on optical reflectivity, to investigate the binding of the N-terminal domain of Gal4, a well-known gene regulator, to double-stranded DNA fragments containing or not its consensus sequence. The analysis of RPI-binding curves provides interaction strength and kinetics and their dependence on temperature and ionic strength. We found that the binding of Gal4 to its cognate site is stronger, as expected, but also markedly slower. We performed a combined analysis of specific and nonspecific binding—equilibrium and kinetics—by means of a simple model based on nested potential wells and found that the free energy gap between specific and nonspecific binding is of the order of one kcal/mol only. We investigated the origin of such a small value by performing all-atom molecular dynamics simulations of Gal4–DNA interactions. We found a strong enthalpy–entropy compensation, by which the binding of Gal4 to its cognate sequence entails a DNA bending and a striking conformational freezing, which could be instrumental in the biological function of Gal4.


Introduction
Protein-DNA interactions play essential roles in several biological functions in cells, like gene transcription, DNA replication, repair and recombination. To perform their regulatory functions, many of the DNA-binding proteins, among which are transcription factors (TF), need to bind to specific double-stranded (ds) sites in the presence of an overwhelming number of nonspecific dsDNA tracts. These proteins thus have to be optimized both for specific binding and for effective searching, which proceeds through a combination of sliding along the dsDNA, switching between contacting dsDNA segments and hopping via unbinding, 3D diffusion and binding [1][2][3]. Therefore, the specific binding, its strength and molecular conformations, are just one of the key ingredients for the regulation mechanisms. Equally crucial are the kinetic on and off rates-that gauge the role of hopping, the strength and nature of the nonspecific binding -that control the sliding and the flexibility, and multivalence of the protein-that lower the barrier for intersegment switching. The balance among these many factors is often achieved through the combination of two or more positively charged DNA-binding domains and the presence recruitment of the RNA polymerase II transcription machinery to a downstream-located core promoter region [23]. As schematically shown in Figure 1A, Gal4 binds DNA as a homodimer, with two zinc finger domains (Zn2/Cys6-fold group) making base pairspecific contacts to highly conserved CGG triplets at the ends of the consensus sequence, while flexible linkers and dimerization elements contact the phosphate backbone within the inner 11 base pairs. We selected Gal4 because of a combination of factors: it has been extensively studied with different approaches, including label-free biosensing [24], and a crystallographic description of its interaction with the cognate site is available [21]; despite thorough structural knowledge, the mechanisms yielding selective regulation by this and other yeast transcription factors are still unclear [25,26].
found values in the 3-30-nM range [22,26,28], in line with several other transcription factors [29,30]. The available data for the kinetics of Gal4 unbinding from its cognate site give koff ≈ 10 −4 s −1 , obtained with single-molecule experiments [31] and Surface Plasmon Resonance imaging [24]. Nonspecific interactions measured by EMSA indicate 10-to 1000-fold weaker binding [26], in line with what was found for other zinc finger proteins [15].
In this work, we present a thorough study of specific and nonspecific Gal4-dsDNA binding, which includes methodological novelty, the developing of a simple kinetic model to analyze the results and performing MD simulations to interpret the results. Specifically, we exploited the RPI to measure the equilibrium and kinetics of Gal4-dsDNA binding at different ionic strengths. We perform a combined analysis of specific and nonspecific binding using a hierarchical two-step process model, which enables extracting the difference in free energy between the two modes of interaction. In parallel, we performed long, state-of-the-art all-atom molecular dynamics simulations of Gal4 contacting dsDNA, which offered a detailed description of the specific and nonspecific binding of Gal4, including protein conformations, bond distribution and DNA bending. Experimental data and computer simulations consistently indicate that the binding of Gal4 to its cognate site involves a marked entropy/enthalpy compensation.  Considered together, the previous investigations of Gal4-DNA binding, which took place in the last decades and involved a variety of techniques, give a sense of the intrinsic uncertainty of the experimental determinations in this field: the available estimates for the dissociation constant of its complex with the consensus sequence span several orders of magnitude, ranging from 200 nM [24] down to 0.5 nM [27]. Most studies, however, found values in the 3-30-nM range [22,26,28], in line with several other transcription factors [29,30]. The available data for the kinetics of Gal4 unbinding from its cognate site give k off ≈ 10 −4 s −1 , obtained with single-molecule experiments [31] and Surface Plasmon Resonance imaging [24]. Nonspecific interactions measured by EMSA indicate 10-to 1000-fold weaker binding [26], in line with what was found for other zinc finger proteins [15].
In this work, we present a thorough study of specific and nonspecific Gal4-dsDNA binding, which includes methodological novelty, the developing of a simple kinetic model to analyze the results and performing MD simulations to interpret the results. Specifically, we exploited the RPI to measure the equilibrium and kinetics of Gal4-dsDNA binding at different ionic strengths. We perform a combined analysis of specific and nonspecific binding using a hierarchical two-step process model, which enables extracting the difference in free energy between the two modes of interaction. In parallel, we performed long, state-of-the-art all-atom molecular dynamics simulations of Gal4 contacting dsDNA, which offered a detailed description of the specific and nonspecific binding of Gal4, including protein conformations, bond distribution and DNA bending. Experimental data and com-puter simulations consistently indicate that the binding of Gal4 to its cognate site involves a marked entropy/enthalpy compensation.

Gal4 Binding to Specific and Nonspecific DNA Sequences
The sensing surface of the RPI is a nonreflecting glass substrate, coated with a polymer to reduce nonspecific binding, on which "receptors" are chemically immobilized in spots [18,19] ( Figure 1B). The RPI raw signal is the reflected light intensity from each spot on the sensing surface, which can be converted, with no free parameter, into the molecular surface mass density σ (see Supplementary Text S3). To explore specific vs. nonspecific Gal4-DNA binding, we prepared surfaces with spots containing four distinct dsDNA probes: two 40 basepair (bp) blunt-ended dsDNA and two hairpins (40 bp-long ds stem plus eight-base-long loop), differing only for the presence or absence of the consensus sequence from position 28 to position 45 ( Figure 1B and Figure S1). All probes have a single-strand spacer of 10 adenosines to provide flexibility and increase the distance to the surface. The consensus sequence we used, 5 CGG-N 11 -CCG 3 , is based on a large body of previous studies showing sequence-specific DNA binding by Gal4 as a dimer (e.g., Marmorstein [21]; see Supplementary Text S4). The high conservation of CGG motifs has been recently confirmed by protein-binding microarray (PBM) studies to reflect a high affinity for the Gal4 DNA-binding domain of each monomer [32]. In this study, we considered as nonspecific any sequence lacking such a consensus and analyzed in detail two of them, the NSP sequence ( Figure 1B) and the CTRL sequence ( Figure S2 and Table  S1), the latter chosen to minimize an affinity for cryptic sites (see Supplementary Text S4 for more details).
Examples of Gal4-binding curves measured for spots of specific and nonspecific dsDNA probes (listed in Table 1) are shown in Figure 1B. When the consensus sequence is present, the amount of bound proteins is larger and the time to saturation longer, indicating a stronger but overall slower interaction with the specific tract. This behavior can be better appreciated in experiments in which Gal4 is added in stepwise increasing concentrations c. Figure 2A shows σ(t), the time evolution of the protein mass accumulating on the four families of spots following the injection of Gal4 in the measuring cell. At all concentrations, the binding of Gal4 to the spots carrying the consensus sequence is more "efficient", since these spots capture a larger amount of Gal4 proteins, and "slower", since it takes more time to plateau. A similar difference is also observed with respect to other nonspecific control strands with different sequences ( Figure S2). Table 1. DNA oligomers used in the study. Sequences 1-4 were grafted on the Reflective Phantom Interface (RPI)-sensing surface. Sequences 5 and 6 were used to hybridize sequences 3 and 4, respectively. The red part represents the region that differs between specific and nonspecific strands. CGG and CCG sequences, important for GAl4 binding, are underlined. We have fitted each increment of σ(t) by where the two fitting parameters are the extrapolated asymptote at each injected concentration Σ(c) and the growth rate . It is worth noticing that, since and kon and koff are the kinetic rates for binding and unbinding, the measurement of the rising time does not simply reflect the binding rate but, rather, conveys information on both. Figure 2B shows the values obtained for the asymptotic value Σ(c) for the pair of specific and nonspecific hairpin probes. These can, in turn, be fitted with a simple Langmuir adsorption curve:

1
(2) We have fitted each increment of σ(t) by where the two fitting parameters are the extrapolated asymptote at each injected concentration Σ(c) and the growth rate Γ(c). It is worth noticing that, since Γ(c) = k on c + k o f f and k on and k off are the kinetic rates for binding and unbinding, the measurement of the rising time does not simply reflect the binding rate but, rather, conveys information on both. Figure 2B shows the values obtained for the asymptotic value Σ(c) for the pair of specific and nonspecific hairpin probes. These can, in turn, be fitted with a simple Langmuir adsorption curve: where the dissociation constant K d = k off /k on . In the fitting process, the saturation value Σ ∞ is not constrained but is kept the same for each pair of specific and nonspecific probes. This corresponds to the assumption that the maximum number of proteins hosted by a single probe duplex at large protein concentration depends on the probe length but not on the presence of the specific tract. Remarkably, the value of Σ ∞ obtained from repeated measurements in the same conditions of Figure 2 corresponds to 0.9 ± 0.1 Gal4 homodimers per DNA strand (see Supplementary Text S5). This evidence does not exclude the possibility that at large concentration more than one protein can bind to a single DNA probe strand, either containing the specific tract or not. Indeed, the total length of the dsDNA probes roughly corresponds to the size of two Gal4 homodimers. However, our analysis indicates that the possible binding of a second protein on the same DNA strand is either unlikely or characterized by a much larger K d , hence not affecting the analysis of the initial part of the Langmuir isotherm proposed in this study. This analysis enables determining the K d summarized in Figure 2D. As expected, Gal4 interacts with its cognate site more strongly (K d = 25-35 nM) than with generic sequences (K d = 160-240 nM). These values indicate a free-energy difference of about 0.9-1.2 kcal/mol between specific and nonspecific sequence, similar to what was observed for the Max protein in reference [11]. The values only slightly depend on the dsDNA probe density on the spots ( Figure S3) and on the background treatment ( Figure S4). The association rate k on is obtained from the measured initial slope σ'(c) after each stepwise concentration increment ( Figure 2C). Since σ (c) = Σ(c)Γ(c) = Σ ∞ k on c, k on is obtained as the slope of the linear fit of σ'(c)/ Σ ∞ . The extracted k on is very similar for specific and nonspecific interactions, being less than 20% larger in specific spots, suggesting an equality to the k on . By adopting this assumption, i.e., fitting all data as a single set (dashed line in Figure 2C), we obtain k on = 1.6 ± 0.6 × 10 −5 s −1 nM −1 , with the differences in K d mainly ascribed to the different rate k off = K d ·k on , with which Gal4 unbinds from duplexes. In the case of strands carrying the Gal4 consensus sequence, we obtain k off = 3.1-6.9 × 10 −4 s −1 , while, in the case of generic dsDNA, k off = 1.4-4.7 × 10 −3 s −1 , indicating a detachment time almost 10 times faster. The measured k off for the consensus sequence is similar within a factor of three to the value obtained for Gal4 in previous studies by Surface Plasmon Resonance imaging [24], whereas our value of k on is about 25 times larger, suggesting a faster access of the protein to the DNA strands on the surface in our conditions. Besides differences in the composition and passivation of the sensing surface, it must be noted that our approach relies on a global analysis of both specific and nonspecific binding (see Supplementary Text S1) measured at low concentrations of Gal4, thus far from the saturation of surface probe sites and consequent crowding effects. In general, similar equilibrium or kinetic rate constants of protein interactions can be determined with both solution-phase methodologies and surface biosensors [33]. However, the surface immobilization of nucleic acids can provide an accumulation of charges that induces the electrostatic effects typical of large solutionphase concentrations comparable to those of the DNA within the nucleus [20].
To better explore the relevance of an electrostatic component of the interactions, we performed experiments at various values of ionic strength I s across the standard value I s = 150-mM NaCl. The electrostatic effects are relevant, as indicated by the decreasing bound proteins ( Figure S5). The behaviors of K d , k on and k off vs. I s are shown in Figure 3A-C. In presence of specific bonds, K d progressively increases with I s ( Figure 3A, red dots). A similar behavior has been observed for various DNA-binding proteins and is mostly related to the release of counterions upon binding [34,35]. In the absence of specific interactions, K d increases with I s more rapidly, indicating a stronger weakening of the nonspecific bonds up to I s = 200-mM NaCl, above which it sharply falls ( Figure 3A, blue dots). This nonmonotonic behavior leads to a maximum difference between specific and nonspecific equilibrium constants, remarkably located around 150-mM NaCl. Further insight is given by the kinetics. k on monotonically decreases with I s , as expected from the reduced electrostatic attraction range ( Figure 3C). what surprising behavior could be understood in the following way. The weakening of the electrostatic attraction is more relevant for nonspecific interactions, which are less stabilized by HB. However, at large Is, the value of nonspecific koff approaches that of specific interactions, indicating a similar stability in the two situations and thus suggesting that the narrowed electrostatic self-repulsion favors the onset of new attractive interactions, possibly additional HB made accessible by previously inaccessible conformations.

Analysis of Equilibrium and Kinetics Through a Nested-Well Binding Model
The specific docking of Gal4 to its consensus sequence is known to depend on the formation of about 20 HB ( Figure 1A), which require Gal4 to be in a precise position and orientation with respect to the dsDNA and to adopt a definite molecular conformation. Thus, when Gal4 is in contact with its consensus sequence but its position/orientation/conformation is not the one enabling H-bonding, its interaction must resemble those relevant for generic dsDNA. This agrees with the notion that interactions of Gal4 to its consensus sequence are intrinsically preceded by those to nonspecific dsDNA that control sliding and hopping [36]. Since we have parallel access to specific and nonspecific observations, we aim to disentangle the two components by performing a differential analysis of our data. The escape rate of Gal4 from a generic dsDNA is made easier by increasing the salt concentration up to 150-mM NaCl, above which k off sharply drops ( Figure 3B). When specific interactions are present, k off is instead monotonic and much milder. This somewhat surprising behavior could be understood in the following way. The weakening of the electrostatic attraction is more relevant for nonspecific interactions, which are less stabilized by HB. However, at large I s , the value of nonspecific k off approaches that of specific interactions, indicating a similar stability in the two situations and thus suggesting that the narrowed electrostatic self-repulsion favors the onset of new attractive interactions, possibly additional HB made accessible by previously inaccessible conformations.

Analysis of Equilibrium and Kinetics through a Nested-Well Binding Model
The specific docking of Gal4 to its consensus sequence is known to depend on the formation of about 20 HB ( Figure 1A), which require Gal4 to be in a precise position and orientation with respect to the dsDNA and to adopt a definite molecular conformation. Thus, when Gal4 is in contact with its consensus sequence but its position/orientation/conformation is not the one enabling H-bonding, its interaction must resemble those relevant for generic dsDNA. This agrees with the notion that interactions of Gal4 to its consensus sequence are intrinsically preceded by those to nonspecific dsDNA that control sliding and hopping [36]. Since we have parallel access to specific and nonspecific observations, we aim to disentangle the two components by performing a differential analysis of our data.
To this goal, we developed a simple model embodying this notion of specific-throughnonspecific interactions. Our model shares features with previous ones that were proposed to incorporate into simple kinetic equations the notion that the target search of transcription factor crucially depends on nonspecific binding, which might have the role of an "antenna" [37] or of a "funnel" [38], facilitating the docking on the cognate site. The model we propose here is, however, simpler than previous ones, as a consequence of the simultaneous access, afforded by our experimental design, to the binding to specific and nonspecific DNA strands of equal sizes.
In our model, we introduce a reaction coordinate x, ordering all possible Gal4 molecular conformations, which are the same around the DNA strands that carry or not the specific sequence. We thus envision the specific binding as encompassing a set of x coordinates surrounded by regions of nonspecific interactions, as in the "Nested-Well" (NW) model sketched in Figure 1C and discussed in Supplementary Text S6, which comprises two consecutive binding reactions: reaction 1 (from unbound to nonspecific interactions) and reaction 2 (from nonspecific interactions to specific binding). In the latter, the equilibrium dissociation coefficient K 2 is defined as K 2 = σ 1 /σ 2 , the ratio between the surface densities of nonspecifically (σ 1 ) and specifically bound (σ 2 ) proteins to the dsDNA probes. In the limit of large K 2 -i.e., vanishing depth of the inner well-the NW model becomes a singlewell model describing the binding to strands that do not carry the target sequence, with the equilibrium coefficient K 1 . The binding equilibrium that is measured when Gal4 interacts with dsDNA carrying its cognate site, and involves both specific docking and nonspecific interactions, should instead be compared to the binding of the whole NW system. In this case, K d = K 1 K 2 /(1+ K 2 ), or Having experimental access to both K 1 and K d (see Figure 3A) it is straightforward to extract K 2 as a function of the ionic strength, as shown in Figure 3D. We find that the smallest K 2 , corresponding to the tightest specific binding, is found at the concentration of 150-mM NaCl.
The solution of the NW model also provides the time dependence of the amount of Gal4 adhering to DNA after a stepwise increment of its concentration, which is found to depend on k on1 and k off1 -the kinetic rate constants for the nonspecific binding of Gal4 on the duplexes and k on2 and k off2 -the rate constants for the unimolecular reaction from the nonspecific to the specific state for the proteins already bound to the DNA. We find that the binding kinetics to a NW probe is a double exponential, with shorter (τ S ) and longer (τ L ) characteristic times: where B, τ S and τ L are explicit functions of the equilibrium and kinetic coefficients for transitions 1 and 2 (see Supplementary Text S6). At moderate K 2 , of interest for this analysis, the response time is nearly exponential and is dominated by τ L (B ≈ 1). τ L depends on k off2 : when k off2 is small, τ L becomes large, as expected because of the slower escape time from the inner well; when k off2 is large, τ L reaches a limiting value larger than the response time for nonspecific binding (τ L∞ > τ 1 ). τ L∞ depends on K 1 , K 2 and τ 1 and corresponds to the time involved in the escape from the outer well of the nonspecifically bound proteins that, however, are in constant equilibrium with those that are specifically bound (Figures S6 and S7). The data shown in Figure 1B were simultaneously fit to this kinetic model (continuous lines). We find a good agreement with the data, indicating that the NW model captures the differences in both the binding strength and kinetics. In particular, we find τ L ≈ τ L∞ , as apparent by comparing the NW fitting curve with such limiting exponential behavior σ(c, t) = Σ(c) 1 − e −t/τ L∞ (dashed line), indicating that the residence time of Gal4 on its consensus sequence is shorter than k off1 −1 ≈ 300 s. The nearly exponential kinetics, predicted and observed, justifies the use of Equation (1) in the general analysis of our data. Despite its simplicity, the NW model enables describing the Gal4 binding by capturing the slower approach to the saturation of specific interactions by justifying that k on is the same in specific and nonspecific interactions and by providing a mean to quantify the nonspecific-to-specific transition. Further details are available in Supplementary Text S6.

Entropy-Enthalpy Compensation
The selectivity for Gal4 to its consensus sequence observed in this study and expressed by the coefficient K 2 is, at best, K 2 ≈ 0.16 at 150-mM NaCl, meaning that, out of 10 Gal4 dimers bound to the dsDNA probe containing the consensus sequence, 10/1.16 ≈ 8.6 are actually docked to a cognate sequence, while 1.4 contact the dsDNA without adopting the specific binding conformation. One could wonder how this 6:1 ratio could manage to regulate the gene expression in vivo, where the ratio between the number of DNA bases involved in the consensus sequence vs. all bases present in the system is not of the order of 10 −1 , as it is here, but, rather, of the order 10 −6 or less, a notion suggesting that only a tiny minority of the Gal4 molecules actually manages to dock on the DNA target.
The weak selectivity revealed by K 2 indicates that the free energy difference between specific and nonspecific binding is rather small, ∆G = RT ln(K 2 ) ≈ −1.1 kcal/mol , where R is the gas constant. Intriguingly, this figure is much smaller than the one expected from the large number of HB involved in the docking of Gal4, which should provide an enthalpic gain upon specific binding an order of magnitude larger [39][40][41]. To explore the entropic and enthalpic components of ∆G, we performed binding experiments analogous to those in Figure 2 as a function of the temperature (see Supplementary Text S2 and Figure S8) from which we extracted K 2 (T), shown in Figure 4. the general analysis of our data. Despite its simplicity, the NW model enables describing the Gal4 binding by capturing the slower approach to the saturation of specific interactions by justifying that kon is the same in specific and nonspecific interactions and by providing a mean to quantify the nonspecific-to-specific transition. Further details are available in Supplementary Text S6.

Entropy-Enthalpy Compensation
The selectivity for Gal4 to its consensus sequence observed in this study and expressed by the coefficient K2 is, at best, 0.16 at 150-mM NaCl, meaning that, out of 10 Gal4 dimers bound to the dsDNA probe containing the consensus sequence, 10/1.16 ≈ 8.6 are actually docked to a cognate sequence, while 1.4 contact the dsDNA without adopting the specific binding conformation. One could wonder how this 6:1 ratio could manage to regulate the gene expression in vivo, where the ratio between the number of DNA bases involved in the consensus sequence vs. all bases present in the system is not of the order of 10 −1 , as it is here, but, rather, of the order 10 −6 or less, a notion suggesting that only a tiny minority of the Gal4 molecules actually manages to dock on the DNA target. The weak selectivity revealed by K2 indicates that the free energy difference between specific and nonspecific binding is rather small, ∆ 1.1 kcal mol ⁄ , where R is the gas constant. Intriguingly, this figure is much smaller than the one expected from the large number of HB involved in the docking of Gal4, which should provide an enthalpic gain upon specific binding an order of magnitude larger [39][40][41]. To explore the entropic and enthalpic components of ∆ , we performed binding experiments analogous to those in Figure 2 as a function of the temperature (see Supplementary Text S2 and Figure S8) from which we extracted K2(T), shown in Figure 4. By fitting these data with ⁄ ⁄ (dashed line), we obtain ∆ 12.8 3 kcal mol ⁄ and ∆ 38.7 9 cal mol K ⁄ , confirming the expectation of an enthalpic gain upon specific binding more than 10 times larger than the measured ∆ , which is compensated for more than 90% by a similarly large entropic penalty. This emerging entropy-enthalpy compensation indicates that the enthalpy made available by the HB is spent more in entropy reduction than to localize Gal4 on the consensus sequence, in turn suggesting that conformational freezing upon docking, rather than the binding strength, might be a key to understanding the biological function of Gal4. In- By fitting these data with K 2 = exp(∆H/RT − ∆S/R) (dashed line), we obtain ∆H ≈ −12.8 ± 3 kcal/mol and ∆S ≈ −38.7 ± 9cal/(mol K), confirming the expectation of an enthalpic gain upon specific binding more than 10 times larger than the measured ∆G, which is compensated for more than 90% by a similarly large entropic penalty. This emerging entropy-enthalpy compensation indicates that the enthalpy made available by the HB is spent more in entropy reduction than to localize Gal4 on the consensus sequence, in turn suggesting that conformational freezing upon docking, rather than the binding strength, might be a key to understanding the biological function of Gal4. Indeed, conformational freezing could be instrumental in the specific binding of Gal4 with several other coactivator proteins, a necessary step toward the activation of the gene expression [42].

Structure and Interface of the DNA-Gal4 Complex: A Molecular Dynamics Study
We thus investigated, through state-of-the-art all-atom molecular dynamics simulations, the conformational freedom and interface of DNA and Gal4, both when isolated and within their complex in the presence or absence of a consensus sequence. We tracked their relative motion and evaluated their thermodynamics in the binding. Figure 5A-C displays representative snapshots of the isolated Gal4 and of the specific and nonspecific Gal4-DNA complexes, respectively. While the isolated Gal4 assumes a rather compact conformation, as already suggested [21], the interaction with DNA forces more open protein configurations. To quantify such an effect, we calculated the secondary structure percentage along the amino acid sequence, subdivided into an unstructured coil, beta-like configuration or alpha helix. In Figure 6A, we plot the lost and gained secondary structures as the differences between specific binding and isolated proteins (top panel) and between nonspecific binding and isolated proteins (bottom panel). Regions of DNA-Gal4 close proximity are shaded. Most of the interface tracts undergo significant conformational changes, mainly from folded to coil configurations (blue to white bars), while regions not involved in the binding are much less affected. The transition is more marked in the specific complex, such as for residues 15-25. deed, conformational freezing could be instrumental in the specific binding of Gal4 with several other coactivator proteins, a necessary step toward the activation of the gene expression [42].

Structure and Interface of the DNA-Gal4 Complex: A Molecular Dynamics Study
We thus investigated, through state-of-the-art all-atom molecular dynamics simulations, the conformational freedom and interface of DNA and Gal4, both when isolated and within their complex in the presence or absence of a consensus sequence. We tracked their relative motion and evaluated their thermodynamics in the binding. Figure 5A-C displays representative snapshots of the isolated Gal4 and of the specific and nonspecific Gal4-DNA complexes, respectively. While the isolated Gal4 assumes a rather compact conformation, as already suggested [21], the interaction with DNA forces more open protein configurations. To quantify such an effect, we calculated the secondary structure percentage along the amino acid sequence, subdivided into an unstructured coil, beta-like configuration or alpha helix. In Figure 6A, we plot the lost and gained secondary structures as the differences between specific binding and isolated proteins (top panel) and between nonspecific binding and isolated proteins (bottom panel). Regions of DNA-Gal4 close proximity are shaded. Most of the interface tracts undergo significant conformational changes, mainly from folded to coil configurations (blue to white bars), while regions not involved in the binding are much less affected. The transition is more marked in the specific complex, such as for residues 15-25.   To address the structural changes of the DNA sequences, we computed several inter-base-pair quantities, the grooves' depth and width and the bending angle for each frame of MD simulations using Curves+ [39]. In the case of specific DNA sequences, we also observed a 9° increase in the average bending of the DNA toward the protein, as compared to the isolated DNA oligomer ( Figure 5B). This significant structural effect, which contributes to the overall conformational entropy loss, can be appreciated in other inter-base-pair quantities (like twist) and in the groove dimensions ( Figures S10 and S11). On the contrary, in the case of nonspecific DNA sequences, no relevant changes were observed. To address the structural changes of the DNA sequences, we computed several interbase-pair quantities, the grooves' depth and width and the bending angle for each frame of MD simulations using Curves+ [39]. In the case of specific DNA sequences, we also observed a 9 • increase in the average bending of the DNA toward the protein, as compared to the isolated DNA oligomer ( Figure 5B). This significant structural effect, which contributes to the overall conformational entropy loss, can be appreciated in other inter-base-pair quantities (like twist) and in the groove dimensions ( Figures S10 and S11). On the contrary, in the case of nonspecific DNA sequences, no relevant changes were observed.
The conformational changes are correlated with the relative motion between Gal4 and the DNA, as measured through the center of mass of the protein via the RMSD time series ( Figure 6B) or the change of distance along the z-axis parallel to the DNA axis (∆z) between a specific amino acid and the base pairs involved in HB ( Figure S9). Repeated simulations show that Gal4 in a complex with the nonspecific DNA sequence can visit different binding sites, since the changed ∆z can reach the value of 15 Å, and it remains quite flexible, as demonstrated by the large observed fluctuations. In contrast, Gal4 in complex with the specific sequence retains most of its initial contacts, and much smaller relative movements of the protein along the DNA are observed. The difference between the two situations is also apparent when studying the HBs and their dynamics along the trajectories (Tables S2 and S3). Indeed, the average number of active HB at each frame agrees with the crystallographic data and is only slightly larger for the specific case, <n HB,s ≥20 vs. <n HB,ns ≥16. Instead, when considering stable HB along the trajectory (with an occupancy 30% or higher), specific HB are much more, n HB,s = 14 vs. n HB,ns = 6, indicating that the contacts are mostly preserved in the former case, while they are continuously refreshed in the latter. Moreover, for the nonspecific case, in the two repeats, different HB are observed, suggesting that different conformations can be explored, and if the simulations were longer, the breaking of these HB and the changing of the relative position between the protein and the DNA would likely be observed. Overall, these findings, together with the number (147 ± 11 and 153 ± 14 for the specific and nonspecific complexes, respectively) and distribution of interfacial water molecules solvating DNA and protein ( Figure S12), defined as water molecules at a distance of 4 A • of both the protein and the DNA sequences, indicate the crucial importance of taking into account the dynamic nature of the interfaces to correctly describe the stability and specificity of Gal4 binding.

Entropy-Enthalpy Compensation upon Binding from the Molecular Dynamics Study
The all-atom molecular dynamics simulations can also lead to estimates of the various contributions to the binding free energy that, albeit difficult to quantitatively compare to experimental results, can support their interpretation. Among them (defined and discussed in Supplementary Text S7), we identified polar terms-including electrostatic energy and the polar contribution to the solvation energy and nonpolar terms-including Van der Waals interactions and related to the different number of HB and contacts for the two binding modes. Such terms are all favorable for both binding modes with respect to the unbound state (Tables S4 and S5), with an overall difference between specific and nonspecific complex of about 35 kcal mol −1 , indicating that specific Gal4-DNA binding is strongly favored by enthalpy. This figure is counterbalanced by entropic contributions way more favorable for the nonspecific interactions (T∆S~-17 kcal mol −1 ) than for specific binding (T∆S~+26 kcal mol −1 ). Several factors contribute to this entropy difference, as further discussed in Supplementary Text S7 and Table S4: (i) the protein has access to a significantly higher number of conformations when bound to the nonspecific DNA sequence ( Figure 5), (ii) the protein can slide along the dsDNA only when undocked ( Figure 6B) and (iii) the dsDNA is bent and stiffened by Gal4 binding in this specific case ( Figures S10 and S11). Although additional terms not considered here, like the entropic effect due to the loss of bound water molecules when forming the complex [43] (a large number of water molecules was observed for the specific complex; see Supplementary Text S7 for more details), may also contribute to the total binding affinity, and despite the potential role of several simplifying assumptions, our simulations unequivocally show a very relevant entropy-enthalpy compensation mechanism for the specific binding in which a relatively small free-energy reduction results from the differences of large quantities.

Discussion and Conclusions
In this article, we report a quantitative analysis of the binding of the yeast gene regulator Gal4 to dsDNA. We measured the binding strength and kinetics of Gal4 to dsDNA oligomers with different sequences thanks to the real-time multiplexing capacity of the recently introduced RPI technology. The Gal4-DNA interaction is only one of several molecular interactions required for galactose-dependent gene control in yeast, which inevitably limited the scope of our modelling. Nevertheless, the mode of specific DNA recognition by Gal4, a paradigmatic transcriptional activator, is especially worthy of being fully understood, as it has the potential to affect the subsequent events that include the galactose-dependent unmasking of transcription activation domains followed by their interactions with subunits of various coactivator complexes, like SAGA and Mediator [44].
Overall, the combination of our experimental observations, comparative analysis through a simple model and molecular dynamics simulations suggests the following description of the Gal4-DNA recognition process. The protein is attracted towards DNA primarily by an electrostatic interaction, finely modulated by ionic strength. The first binding is not specific to the DNA sequence and involves a relevant number of HBs, although it leaves a large conformational freedom to the protein-DNA complex. Upon random sliding and rearrangements, the protein binds to the consensus sequence with an enthalpy gain almost compensated by a large entropy loss.
Our quantitative analysis conveys new clues to understand the specificity and efficacy of the action of Gal4, which may be relevant for various other transcription factors. It has been repeatedly noticed that the specificity of transcription factors, especially in yeast and eukaryotic cells, is not sufficient to provide the necessary transcription selectivity, which can only be provided by cooperativity among the different transcription factors at the same DNA regulatory regions. Indeed, our results on Gal4 imply that, when the target sequence is diluted in the 10 7 base-long yeast genome, the nonspecific binding should largely dominate, even assuming a large fraction of chromatinization. While this might appear the reasonable and expected condition enabling the search for the needlein-the-haystack cognate site through sliding and hopping, one might also wonder what prevents spurious transcription signaling. We argue that the large entropic cost in specific binding, associated to a precisely shaped Gal4-DNA complex, might be a critical element in recruiting the cofactors necessary to initiate the transcription, thereby minimizing the ectopic transcription initiation events. Accordingly, out of the many conformations that are compatible with interacting with a generic dsDNA, only a tiny fraction matches those induced by specific docking. The overall outcome of these effects might well be described as DNA sequence-induced structural changes, as those reported for glucocorticoid receptor binding to its cognate site [45,46]. We speculate that such conformational constraints, combined with the docking lifetime, are key to the biological action of Gal4, as well as of other TFs.

RPI Measurements
The RPI measurements were performed by using the experimental set-up and the analysis procedure described in reference [18]. Briefly, Gal4 was injected into the RPI cartridge to reach a final concentration c from 0.08 nM up to 50 nM. We avoided larger protein concentrations that can result in aggregation. All the experiments were performed at 30 • C under stirring. Raw images of the reflecting surface were converted into surface density signals, as detailed in Supplementary Text S3. The binding curves were analyzed as described in the text to extract the equilibrium and kinetic parameters.

All-Atom Molecular Dynamics Simulations and Analysis
To model Gal4, we started from the crystallographic structure of the DNA-bound protein (PDB 3COQ). For DNA sequences, the free structure was energy-optimized using the internal/helicoidal variable modeling JUMNA [48] with the AMBER par98 force field. For the nonspecific complex, we superimposed the specific complex and the nonspecific DNA sequence to determine the corresponding protein-DNA contacts that have to be preserved during the minimization of the protein-DNA nonspecific complex. For the protocols, solvent models and Debye-Hückel salt treatment, see Supplementary Text S7. Microsecond all-atom molecular dynamic (MD) simulations were performed using the GROMACS 5 package [49] on the protein and DNA molecules alone and on their specific and nonspecific complexes (two repeats). See Supplementary Text S7 for all details on the MD protocols and HB treatments. The conformational analysis of the dsDNA was performed using Curves+ [50], which provides a full set of helical, backbone and groove geometry parameters. The HB and solvating water molecules were identified based on the distance and angle cut-offs (see Supplementary Text S7) upon which the thermodynamic quantities were estimated.