A Glimpse into the Structural Properties of the Intermediate and Transition State in the Folding of Bromodomain 2 Domain 2 by Φ Value Analysis

Bromodomains (BRDs) are small protein interaction modules of about 110 amino acids that selectively recognize acetylated lysine in histones and other proteins. These domains have been identified in a variety of multi-domain proteins involved in transcriptional regulation or chromatin remodeling in eukaryotic cells. BRD inhibition is considered an attractive therapeutic approach in epigenetic disorders, particularly in oncology. Here, we present a Φ value analysis to investigate the folding pathway of the second domain of BRD2 (BRD2(2)). Using an extensive mutational analysis based on 25 site-directed mutants, we provide structural information on both the intermediate and late transition state of BRD2(2). The data reveal that the C-terminal region represents part of the initial folding nucleus, while the N-terminal region of the domain consolidates its structure only later in the folding process. Furthermore, only a small number of native-like interactions have been identified, suggesting the presence of a non-compact, partially folded state with scarce native-like characteristics. Taken together, these results indicate that, in BRD2(2), a hierarchical mechanism of protein folding can be described with non-native interactions that play a significant role in folding.


Introduction
The correct folding process of the biological macromolecules is crucial for living cells, as their biochemical processes rely on finely tuned inter-molecular recognition events, which depend on structural complementarity between interacting molecules. This biochemical principle, known as the structure-function relationship, is particularly evident in the case of structurally complex macromolecules, such as proteins. However, notwithstanding decades of experimental, theoretical, and computational efforts, the mechanism of protein folding is still one of the major problems in molecular biology.
As for any chemical reaction, a clear description of the folding of a protein would require the identification and structural characterization of each of the molecular species transiently populated during the process [1]; however, in the case of protein folding, experimental difficulties arise because of the intrinsic cooperativity of the process and the large number of weak interactions forming from the denatured state to the native state in a biologically relevant timescale [2].
In this context, the description of the folding mechanism of proteins populating partially folded intermediates is particularly valuable, as it may offer the opportunity to follow the evolution of structure formation. In this context, the BET (bromo-extra-terminal domain) bromodomains (BRDs) represent a useful experimental system not only because of their limited molecular weight (about 100 amino acids) and structural simplicity, but also because their role in a variety of patho-physiological processes is becoming increasingly evident [3,4]. Recently, the second BRD of BRD2 (hereafter, BRD2(2)) has been highlighted as an essential node in the cellular response to SARS-CoV-2 infection [5,6].
BRDs are structural motifs that are known to recognize and bind to acetylated Lys residues in histones. The available structures of a variety of BRDs in the PDB database show that this domain consists of a helical bundle composed of four conserved α-helices (αZ, αA, αB, and αC, from the N-to the C-terminus of the domain) connected by loop regions of variable length, notably the ZA loop connecting helices αZ and αA and the BC loop connecting helices αB and αC [7] (Figure 1). These domains act as modulators of eukaryotic gene expression [7,8] by recognizing and binding to the N-terminal tails of histone proteins containing one or more acetyl-lysine residues (AcK). The specific recognition of the post-translationally modified AcK involves a set of highly conserved residues in the hydrophobic core of the domain [9]; however, it has been recently proposed that the highly flexible and less conserved ZA loop may additionally contribute to the binding mechanism [10][11][12]. . The structure of the bromodomains is a four-helix bundle formed by four conserved α-helices αZ, αA, αB, and αC connected by loop regions (ZA and BC loops) of variable length. The hydrophobic binding pocket is located at the end of the bundle (on top of the structure represented here) and surrounded by residues located on the loops ZA and BC.
We recently described the kinetic folding mechanism of two BET bromodomains: the second BRD of BRD2 (BRD2(2)) and the first BRD of BRD4 (BRD4(1)), demonstrating in both cases the population of a transient obligatory intermediate by rapid mixing and temperature jump (un)folding experiments [13]. Here, we take a step further in elucidating the folding scenario of these domains by providing a structural characterization of the intermediate (I) and late transition state (TS2) of the BRD2(2) domain by Φ value analysis [14], as probed by the kinetic analysis of 25 site-directed mutants.

Urea-Induced Equilibrium Unfolding
Twenty-five site-directed mutants were designed and produced in order to perform the Φ value analysis on BRD2 (2). The mutants were designed by following the accepted guide-lines adopted in Φ value analysis [14][15][16] and characterized by equilibrium denaturation experiments. The thermodynamic stability of the different BRD2(2) mutants was measured by urea-induced equilibrium unfolding experiments [13] at a pH of 7.5 and 20 • C by monitoring the change of ellipticity at 222 nm by CD spectroscopy (Supplementary Figure S1). The reversible urea-induced denaturations monitored by far-UV CD of all the BRD mutants showed a sigmoidal dependence on denaturant concentration that could be fitted to a two-state model. The free energy of urea-induced unfolding, ∆G H 2 O , of the mutants and that of the wild type were obtained by globally fitting the whole dataset with a shared m-value of 1.93 kcal/mol/M [13] ( Table 1). The mutants clustered into three classes according to their thermodynamic stability with respect to the wild type (∆∆G H   (3). Data were globally fitted to a two-state model according to Equation (2), with the m-values shared between the datasets. Data are reported as the mean ± SE of the fit.

Folding Kinetics
As recalled above, we previously demonstrated that the two BET-BRDs, BRD2(2) and BRD4(1), follow a three-state mechanism, involving an on-pathway folding intermediate (I) in the sub-millisecond time regime [13]. In the same work, on the basis of a parameter derived from the kinetic data, i.e., the β-Tanford values [16,17], we estimated that, in the case of BRD2(2), the intermediate is characterized by limited compactness and poor native-like characteristics.
Therefore, in order to map in more detail the structural features of the transient species along the folding pathway of the BRD2(2) domain at a residue level and to identify the interactions stabilizing the intermediate and the folding transition state, we subjected all of the mutants to (un)folding kinetic experiments.
The complete (un)folding kinetics data set (chevron plot) obtained for each single mutant versus wild type BRD2(2) is reported in Figure 2 (representative kinetic folding and unfolding time courses are shown in Supplementary Figure S2).  (2) and its mutants. In the different panels the chevron plots of BRD2(2) wild type (in black) and its mutants (in red) are shown. All of the experiments were carried out in 50 mM of Tris HCl, pH = 7.5, 0.2 M of NaCl, and 2 mM of DTT. The data were globally fitted to a three-state folding mechanism, sharing the m-values.
Following the same approach we have used in the characterization of the folding mechanism of BRD2(2) [13], the complete dataset was globally fitted to a three-state folding mechanism with shared values of m (Equation (4) in Materials and Methods). The calculated folding and unfolding parameters, together with the Φ values associated with the intermediate (I) and late transition state (TS2), are listed in Table 2. The robustness of our analysis was revealed by the good agreement between the ∆G D-N values obtained from equilibrium denaturations (Table 1) and (un)folding kinetics ( Table 2). Table 2. Kinetic parameters of BRD2(2) and its mutants (see Materials and Methods) with Φ values calculation. In accordance with the standard methodology [16], the calculated Φ values were divided into three groups (low: Φ < 0.3; intermediate: 0.3 < Φ < 0.7; high: Φ > 0.7) and mapped onto the native structure of BRD2 (2). As shown in Figure 3a, the structural distribution of the Φ values indicated that the intermediate was characterized by a few native-like contacts identified by high Φ (I) values. Such a relatively small number of native-like interactions was in accordance with a non-compact partially folded state with scarce native-like characteristics, as hypothesized earlier on the basis of the low β-Tanford value [13]. The high Φ (I) values are located primarily in helix αB (A411, A415, A416), a region that represents the initial folding nucleus. As the number of native-like contacts was only marginally increased later in the process, it appears that the sequence of BRD2(2) was not optimized for efficient folding. Indeed, only two additional high Φ values were measured for the late transition state TS2 (A416 in helix αB and V399 in helix αA) (see Figure 3b). Interestingly, such a scenario, implying a rugged folding landscape, has been proposed earlier for another, unrelated, small four-helix bundle protein [18]. On the contrary, the N-terminal region of the domain (α-helix αZ and the ZA loop) appeared to be characterized by low Φ (I) values, suggesting that this region consolidates its structure only later in the folding process. Inspection of Table 2 shows that some Φ values (all involving residues in or interacting with the ZA loop) displayed unusually high values (i.e., Φ values > 1). Although, for two of them (A380G and L381A), the ∆∆G was very low (<0.6 kcal/mol), thus precluding a reliable interpretation, the high Φ values of A439G (located in the C-terminal helix αC and establishing contacts with F372 in the ZA loop) observed for both the intermediate and TS2 suggest that this residue is involved in nonnative interactions in both of these meta-stable states. Non-native interactions, as probed by unusual Φ values, have been found in other proteins, and it has been observed that they are often present in regions stabilizing folding intermediates [19] or in regions that are crucial for the function of the protein, such as the protein surfaces involved in recognition and binding [18,20]. In this context, it is interesting to note that the conformational plasticity of the ZA loop of BRDs, evidenced by molecular dynamics simulations, has been recently proposed to provide the necessary malleable interaction surface of the BRD domains to interact with their different target peptides [21].
In order to get an overall description of the structural and energetic properties of the intermediate and late transition state TS2, we resorted to analyzing the effects of the structural perturbations induced by mutagenesis by plotting the ∆∆G of the intermediate (∆∆G D-I ) and transition state TS2 (∆∆G D-TS2 ) versus those of the native states (∆∆G D-N ) ( Figure 4). This kind of analysis, known as Bronsted plot analysis [16], is commonly used to provide information on the folding landscape explored by proteins [14]. While a linear dependence is indicative of a pure nucleation mechanism, a more scattered Bronsted plot suggests the development of different nuclei, as predicted by a diffusion-collision folding mechanism [22]. Although the Bronsted plot for I and TS2 displayed an overall linear dependence, in the former case, dispersion of the data was more evident (R = 0.47 and 0.77, respectively) and the slope was lower (0.25 and 0.6, respectively), strengthening the hypothesis that the formation of the intermediate proceeds along a non-cooperative and rugged energy landscape, whereas a more cooperative process leads to the formation of the native-like transition state TS2. Inspection of Figure 4a shows that almost all of the positions with a higher value of ∆∆G D-I than the overall trend (i.e., residues 399, 411, 418, and 439) are clustered in the hydrophobic core of the protein domain, whereas the residues with the lower values of ∆∆G D-I (i.e., residues 350, 357, 367, and 378) are mainly located in the αZ helix and ZA loop. These results indicate that the stability of the intermediate does not rely on residues in the N-terminal part of the domain, but is mainly stabilized by a diffused nucleus involving a limited number of residues located in α-helices αA, αB, and αC in the C-terminal half of BRD2(2). On the contrary, the Bronsted plot for TS2 (Figure 4b) shows a better correlation, indicating that, as observed in other protein domains [23] and theoretically predicted [24], the late transition state TS2 is more native-like, representing a distorted version of the native state. Overall, these findings are in accordance with the distribution of the Φ values in the structure of BRD2(2) discussed above (Figure 3), and indicate that the intermediate is mainly stabilized by a small hydrophobic nucleus at the C-terminus of the domain and involves residues in the α-helices αA, αB, and αC.

Conclusions
Although the BRDs are protein domains that play crucial roles in many cellular processes, fundamental aspects, such as their folding mechanism, are still largely unexplored. The complete characterization of the folding of BRD2(2) by Φ value analysis provided in this work allowed us to obtain, for the first time in this protein class, structural information on both the intermediate I and transition state TS2. Moreover, by analyzing the contributions of native and non-native interactions at early and late stages of folding, we could depict a rugged folding landscape and hypothesize that the evolutionary pressure for maintaining the function of BRD2(2) may have decreased its folding efficiency. Future work will test this hypothesis by comparing the folding efficiency and binding properties of BRD2(2).

Site-Directed Mutagenesis
The constructs encoding the site-directed mutants of BRD2(2) were obtained using the gene encoding BRD2(2) wild type as a template to perform site-directed mutagenesis with the QuickChange Lightning Site-Directed Mutagenesis kit (Agilent Technologies, Santa Clara, CA) according to the manufacturer's instructions. All mutations were confirmed by DNA sequencing analysis.

Protein Expression and Purification
The BRD2(2) wild type and all of the site-directed mutants were expressed in E. coli Rosetta cells. Bacterial cells were grown in an LB medium, containing 30 µg/mL of kanamycin at 37 • C until OD 600 = 0.6, and then protein expression was induced with 0.5 mM of IPTG. After induction, cells were grown at 18 • C overnight and then collected by centrifugation.
To purify the protein, the bacterial pellet was resuspended and treated as described previously [25]. The purity of the protein was analyzed through SDS-PAGE, and the structural integrity of the purified proteins was checked by circular dichroism (CD) spectra in the far-and near-UV region. Protein concentration was determined spectrophotometrically using a molar absorptivity coefficient (ε 280) corresponding to 15,930 M −1 cm −1 for wild type and the other mutants, based on a molecular mass of 13,351.3 Da, and calculated according to Gill and von Hippel [26].

Equilibrium Experiments
Equilibrium unfolding experiments were carried out at 20 • C in 20 mM of Tris HCl, pH = 7.5, 0.2 M of NaCl, and 200 µM of DTT. CD measurements were carried out with a JASCO J-720 spectropolarimeter using a 0.2 cm cuvette. BRD2 (2) and all of the site-directed mutants, at a constant concentration of 80 µg/mL, were incubated at 20 • C at increasing urea concentration (0-9.0 M). When equilibrium was reached, far-UV CD spectra were recorded. The reversibility of the BRD2(2) wild type and mutant unfolding was checked as described previously [13]. All equilibrium unfolding experiments were performed in triplicate. Urea-induced equilibrium unfolding transitions monitored by far-UV CD ellipticity changes was analyzed by fitting the baseline and transition region data to a two-state linear extrapolation model [27] according to: where ∆G unfolding is the free energy change for unfolding for a given denaturant concentration, ∆G H 2 O is the free energy change for unfolding in the absence of denaturant, m is a slope term which quantifies the change in ∆G unfolding per unit concentration of denaturant, R is the gas constant, T is the temperature, and K unfolding is the equilibrium constant for unfolding. The model expresses the signal as a function of denaturant concentration: where yi is the observed signal, y U and y N are the baseline intercepts for unfolded and native protein, s U and s N are the baseline slopes for the unfolded and native protein, [X]i is the denaturant concentration after the ith addition, ∆G H 2 O is the extrapolated free energy of unfolding in the absence of denaturant, and m is the slope in a ∆G unfolding versus [X] plot. Data were globally fitted with the m-values shared between the datasets; all other parameters were not constrained.
All unfolding transition data were fitted using GraphPad Prism 5. Data were normalized between 0 and 100%, where 0 corresponds to the molar ellipticity at 222 nm of the native protein, the smallest value (at 0 M urea), and 100 corresponds to the molar ellipticity at 222 nm of the unfolded protein, the largest value (at 9 M urea).

Kinetic Experiments
Unfolding and refolding kinetics experiments were performed using an SX-18 stoppedflow apparatus (Applied Photophysics, Leatherhead, UK). The protein samples were excited at 280 nm, and the fluorescence emission was measured using a 320 nm cutoff glass filter. The final concentration of the protein was typically 3 µM. At least five individual traces were acquired and then averaged for each experiment. All of the averages were satisfactorily fitted with a single exponential equation. Experiments were conducted in a buffer of 50 mM of Tris HCl, pH = 7.5, 0.2 M of NaCl, and 2 mM of DTT, as well as different concentrations of urea, ranging from 0.7 M to 8.1 M, at 20 • C.
The semilogarithmic plot (chevron plot) of each mutant was fitted on the basis of a three-state folding scheme with an on-pathway intermediate as previously reported [13,28] by using the following equation: