Not Cleaving the His-tag of Thal Results in More Tightly Packed and Better-Di ﬀ racting Crystals

: Flavin-dependent halogenases chlorinate or brominate their substrates in an environmentally friendly manner, only requiring the cofactor reduced ﬂavin adenine dinucleotide (FADH 2 ), oxygen, and halide salts. The tryptophan 6-halogenase Thal exhibits two ﬂexible loops, which become ordered (substrate-binding loop) or adopt a closed conformation (FAD loop) upon substrate or cofactor binding. Here, we describe the structure of N His -Thal-RebH5 containing an N-terminal His-tag from pET28a, which crystallized in a di ﬀ erent space group ( P 2 1 ) and, surprisingly, di ﬀ racted to a higher resolution of 1.63 Å than previously deposited Thal structures ( P 6 4 ; ~2.2 Å) with cleaved His-tag. Interestingly, the binding of glycine in the active site can induce an ordered conformation of the substrate-binding loop.


Introduction
Many natural products, like chloramphenicol [1], vancomycin [2,3], and others [4][5][6], have a halide substituent, which is important for their biological activity. The unfavorable conventional chemical synthesis of these products requires toxic elemental halides and Lewis acids as catalysts and lacks substrate specificity as well as regioselectivity. Therefore, flavin-dependent halogenases (FDHs) came into focus as potential biocatalysts for aromatic compounds in organic synthesis because they halogenate regioselectively, substrate-specifically, and in an environmentally friendly manner [7,8], only requiring molecular oxygen, reduced flavin adenine dinucleotide (FADH 2 ), and halide salts (mainly Cl − or Br − ) [7]. Tryptophan halogenases are one group of FDHs which halogenate in position C5 (e.g., PyrH [9]), C6 (e.g., Thal [10]), and C7 (e.g., RebH [11,12], PrnA [13]) of the indole moiety depending on the orientation of the substrate tryptophan within the active site [9,10]. Structurally, the active site is located at the interface of the two subdomains, a conserved box-shaped and a pyramidal subdomain. The cofactor flavin adenine dinucleotide (FAD) becomes bound by the box-shaped subdomain and the halide ion is bound near the isoalloxazine ring of the cofactor [13].
Crystal structures of various tryptophan halogenases revealed two loop regions that undergo conformational changes upon binding of ligands [14]. The substrate-binding loop is disordered when the substrate-binding site (further referred to as the active site) is empty and adopts a defined structure when tryptophan is bound. The FAD loop is disordered (e.g., in PyrH [9] and BrvH [15]) or adopts an open conformation (e.g., in Thal [10]) in the apo enzyme but closes in the presence of FAD (e.g., in PrnA [13], PyrH [9], SttH [16], and Tar14 [17]). Crystal packing can prevent some of these movements; e.g., in RebH, the FAD loops of two symmetry mates form a crystal contact locking the loop in the open conformation even when FAD is bound [11,12]. In apo Xcc4156, the FAD loop in the open conformation participates in crystal contact so that attempts to soak FAD into these crystals causes their destruction [18]. Therefore, it may be of interest to obtain more than one crystal form

Plasmids and Molecular Cloning
The plasmid vector pET-28a-thal-rebH5 encoding Thal-RebH5, a quintuple mutant originating from Thal wildtype (Uniprot ID: A1E280) from Streptomyces albogriseolus, was kindly provided by Dr. Hannah Minges, Organic Chemistry III, Bielefeld University. The molecular cloning and mutation procedure were described earlier [10].

Expression and Purification of Recombinant Thal-RebH5
The expression and purification of N His -Thal-RebH5 was performed according to the described protocol for wt-Thal [10], except that a Co-NTA (pure Cube Co-NTA agarose, Cube Biotech, Monheim am Rhein, Germany) was used instead of Ni-NTA and the cleavage of the His-tag was omitted.

Crystallization and Data Collection
Purified N His -Thal-RebH5 (protein buffer: 10 mM Tris pH 7.4, 50 mM NaCl, 1 mM TCEP) was crystallized using the sitting drop vapor diffusion method at 20 • C with a drop ratio of 1:1 of protein solution (~15 mg·mL −1 ) and reservoir solution [((1) 0.1 M bicine pH 9.0, 20% (w/v) PEG 4000, 10% (v/v) glycerol, and 0.02 M amino acid mix consisting of L-Glu, L-Ala, D-Ala, Gly, L-Lys, D-Lys, L-Ser, and D-Ser) and ((2) 0.1 M bicine pH 9.0 and 10% (w/v) PEG 4000 supplemented with 5 mM tryptophan)]. Crystals appeared in different shapes, ranging from almost hexagonal prisms (see Figure 1a) to plates ( Figure 1b) within four days. The crystal habit of N His -Thal-RebH5 was inferior to that of untagged Thal (Figure 1d). Crystals of N His -Thal-RebH5 were generally smaller, often plate-shaped instead of nicely extended into all three dimensions, and often intergrown (Figure 1c). For cryoprotection, the crystals were transferred to reservoir solution supplemented with glycerol ((1) 20% (v/v) and (2) 35% (v/v)), before being flash cooled and stored in liquid nitrogen. Crystals of His-tagged Thal were mechanically less robust than those of untagged Thal and easily broke during mounting. Data sets of N His -Thal-RebH5 were collected at a wavelength of 0.98 Å and a temperature of 100 K at beamline P13 operated by EMBL Hamburg at the PETRA III storage ring at DESY, Hamburg, Germany [22].

Data Processing, Structure Determination and Refinement
The data sets were processed with XDS [23] and scaled with XSCALE [23]. N His -Thal-RebH5 diffracted to (1) 1.63 Å and (2) 1.84 Å, respectively, cutting at I/sig(I)~2. The crystals (see Figure 1) belonged to the space group P2 1 as determined by the program aimless [24] from the CCP4 package [25]. The structures of N His -Thal-RebH5 were determined by molecular replacement using the program Phaser [26] and the structure of apo-Thal chain A (PDB ID 6H43) as the search model. Phaser placed two molecules per asymmetric unit with no clashes.
For the N His -Thal-RebH5 structures, an extensive water search was performed using AutoBuild [27] from the PHENIX suite [28]. The structures were further improved by modeling in COOT [29] and restrained refinement in Refmac5 [30] using NCS restraints for (1) or in PHENIX using TLS refinement with one TLS group per chain and NCS restraints for (2). The structures were optimized and validated using COOT [29] and the pdb validation server [31] until R values converged to (1) R free of 20.2% and R work of 16.7% and (2) R free of 19.9% and R work of 16.6%, respectively. A summary of the data processing and refinement statistics is shown in Table 1. Figures of the final structures were generated using PyMOL (Schrödinger, New York, NY, USA). The RMSD analysis was derived from pairwise structure-based sequence alignments with the DALI server [32] (mode all against all).

Accession Numbers
The atomic coordinates and structure factors of N His -Thal-RebH5 (1) and (2) have been deposited in the PDB as entries 7AQU and 7AQV, respectively.

Crystallization Details and Comparison to Thal-RebH5
N His -Thal-RebH5 crystallizes in different crystallization conditions than untagged Thal-RebH5 [10], with a different and inconsistent crystal morphology ranging from plates, which were often intergrown, to almost hexagonal structures ( Figure 1). Even so, the N His -Thal-RebH5 crystals diffracted with 1.63 Å to 2.17 Å better than untagged Thal-RebH5 (>2.28 Å in 98%) and untagged wt-Thal (>2.2 Å). The space group P2 1 differs from untagged Thal that crystallized in P6 4 . However, both crystal forms have two monomers per asymmetric unit.

Comparison of the Previously Described P6 4 and the New P2 1 Crystal Forms
N His -Thal-RebH5 crystallizes as homodimer but is a monomer in solution (size exclusion chromatography) like Thal-RebH5 (PDB ID 6IB5). Both protomers of the new structures are very similar to each other and to those of structures previously obtained in space group P6 4 (see Table S1 in the supplementary material). All five mutations distinguishing Thal-RebH5 from wildtype Thal (Thal → Thal-RebH5: V52I, V82I, S360T, G469S, T470N) are well defined by 2Fo-Fc density. Structures (1) and (2) of N His -Thal-RebH5 described here are isomorphous to each other and the corresponding crystals were grown under similar conditions (same buffer, pH, and PEG) in an optimization grid screen.
(1) additionally contained glycerol and an amino acid mix derived from the original crystallization condition of the Morpheus screen [33], and (2) additionally contained 5 mM l-Trp in order to obtain a Crystals 2020, 10, 1135 5 of 10 substrate-bound structure. No density for Trp was present in the active sites of (2). This is consistent with our observations of crystals of untagged Thal-RebH5 where only 6 of 47 data sets of Thal-RebH5 co-crystallized or soaked with l-Trp showed at least some density for bound Trp [10]. Even upon soaking the P6 4 crystals with saturated l-Trp solution, the resulting difference density could not be modeled reliably [10], indicating the Thal-RebH5 binds Trp less tightly than wt-Thal. In (1), the active site is empty in chain B but contains a glycine in chain A that superposes with the position of the backbone of the native substrate tryptophan ( Figure 2). Therefore, the electron density of the substrate-binding loop of chain A is better defined than for B, where amino acids (AA) 448-451 and 456 were not modeled (Figure 3), but not as good as when the substrate tryptophan is bound. This verifies previous observations that predominantly the active site of chain A binds substrate, whereas chain B remains empty or binds substrate with lower occupancy [10,14]. For structure (2) with two empty active sites, both substrate-binding loops are not defined and were not modeled (chain A: AA 450-456; chain B: AA 452-456) ( Figure 3). The amino acid Tyr366 is of interest as it may couple the conformation of the substrate-binding and the FAD loop [14]. The side chain of Tyr366 adopts two very different conformations depending on the occupation of the active site with tryptophan. In both chains of (1) and (2), Tyr366 adopts the conformer typical for an empty active site. Due to the better resolution, a previously predicted H-bond between Tyr366 and Asn54 could be modeled with no clashes and a distance around 2.4 Å in both chains and structures. Phe 112 undergoes a peptide flip upon binding of the substrate tryptophan. In the new P2 1 crystal form, both chains of the structures (1) and (2) feature Phe 112 in the typical unbound conformation underlining the emptiness of the active sites.
The FAD loop (AA 40-49), which is visible in all previously determined and published structures of Thal, is partly not defined and, therefore, not modeled in chain A in (1) (AA 41-44) and (2) (AA 42-44) (Figure 3). Upon lowering the RMSD values under 0.6, density for these residues appears, suggesting that the FAD loop is not totally disordered. In chain B of (1) and (2) the FAD loop is defined and in an open conformation with a slightly different conformation than in Thal-RebH5 (PDB code 6IB5) (see Figure S1 in the supplementary material).

Comparison of the Previously Described P64 and the New P21 Crystal Forms
NHis-Thal-RebH5 crystallizes as homodimer but is a monomer in solution (size exclusion chromatography) like Thal-RebH5 (PDB ID 6IB5). Both protomers of the new structures are very similar to each other and to those of structures previously obtained in space group P64 (see Table S1 in the supplementary material). All five mutations distinguishing Thal-RebH5 from wildtype Thal (Thal → Thal-RebH5: V52I, V82I, S360T, G469S, T470N) are well defined by 2Fo-Fc density. Structures (1) and (2) of NHis-Thal-RebH5 described here are isomorphous to each other and the corresponding crystals were grown under similar conditions (same buffer, pH, and PEG) in an optimization grid screen. (1) additionally contained glycerol and an amino acid mix derived from the original crystallization condition of the Morpheus screen [33], and (2) additionally contained 5 mM L-Trp in order to obtain a substrate-bound structure. No density for Trp was present in the active sites of (2). This is consistent with our observations of crystals of untagged Thal-RebH5 where only 6 of 47 data sets of Thal-RebH5 co-crystallized or soaked with L-Trp showed at least some density for bound Trp [10]. Even upon soaking the P64 crystals with saturated L-Trp solution, the resulting difference density could not be modeled reliably [10], indicating the Thal-RebH5 binds Trp less tightly than wt-Thal. In (1), the active site is empty in chain B but contains a glycine in chain A that superposes with the position of the backbone of the native substrate tryptophan (Figure 2). Therefore, the electron density of the substrate-binding loop of chain A is better defined than for B, where amino acids (AA) 448-451 and 456 were not modeled (Figure 3), but not as good as when the substrate tryptophan is bound. This verifies previous observations that predominantly the active site of chain A binds substrate, whereas chain B remains empty or binds substrate with lower occupancy [10,14]. For structure (2) with two empty active sites, both substrate-binding loops are not defined and were not modeled (chain A: AA 450-456; chain B: AA 452-456) ( Figure 3). The amino acid Tyr366 is of interest as it may couple the conformation of the substrate-binding and the FAD loop [14]. The side chain of Tyr366 adopts two very different conformations depending on the occupation of the active site with tryptophan. In both chains of (1) and (2) (Figure 3). Upon lowering the RMSD values under 0.6, density for these residues appears, suggesting that the FAD loop is not totally disordered. In chain B of (1) and (2) the FAD loop is defined and in an open conformation with a slightly different conformation than in Thal-RebH5 (PDB code 6IB5) (see Figure S1 in the supplementary material).
In structure (1), the N-terminal His-tag and the linker are not defined by electron density. In structure (2), however, the linker could be partly modeled for chain B to Val-5. The modeled linker extends out of the dimer (Figure 3) in between two symmetry relatives. Superposition with Thal-RebH5 (6IB5) revealed almost the same amount of space for a possible linker in the P64 crystals of untagged Thal-RebH5. The surrounding of the N-terminus differs for the protomers in the new crystal form. The residue D2 is the first amino acid defined in both chains of (1). In chain A, D2 is
In structure (1), the N-terminal His-tag and the linker are not defined by electron density. In structure (2), however, the linker could be partly modeled for chain B to Val-5. The modeled linker extends out of the dimer (Figure 3) in between two symmetry relatives. Superposition with Thal-RebH5 (6IB5) revealed almost the same amount of space for a possible linker in the P6 4 crystals of untagged Thal-RebH5. The surrounding of the N-terminus differs for the protomers in the new crystal form. The residue D2 is the first amino acid defined in both chains of (1). In chain A, D2 is closest to a symmetry-related A K267 (see Figure S3c in the supplementary material). In chain B, D2 is closest to a symmetry-related B E424 (see Figure S3d in the supplementary material). In both chains of (1) and in chain A of (2), D2 forms an intramolecular salt bridge with R4. In chain B of (2) the D2 side chain adopts another conformation and interacts with the partly modeled linker and therefore stabilizes it. The defined linker in chain B of (2) forms an extended crystal contact to a symmetry-related chain B, e.g., involving hydrogen bonds of G-2 with the side chains of B R420 and B E424 (see Figure S3d in the supplementary material).

Discussion
Comparison of Thal-RebH5 structures with His-tag ( (1) and (2)) and without His-tag (6IB5) revealed a high similarity of the structures themselves and their functional regions like the FAD loop, the substrate-binding loop, the active site, the possible peptide flip of Phe112, and the switch residue Tyr366. The extended N-terminus with the partly modeled linker of the His-tag (structure (2) chain B) (Figure 3) makes an additional crystal contact. The crystals looked less promising and were mechanically less robust than previously obtained crystals of Thal constructs with cleaved His-tag.

Differing Functional Regions
One observation was the presence of a glycine in the active site of chain A of N His -Thal-RebH5 (1) and, therefore, a more ordered substrate-binding loop of chain A compared to chain B with an empty active site. This phenomenon was already observed in apo-Thal (6H43) [10], where unmodeled 2Fo-Fc density within the active site and a defined substrate-binding loop was detected in chain A, but not in chain B. This principle was further analyzed and described by a statistical analysis of a negatively cooperative binding of the substrate tryptophan and cofactor FAD [14]. When the substrate tryptophan is bound, the substrate-binding loop becomes ordered, and when both substrate and cofactor are bound, the substrate is always bound to chain A, whereas FAD is bound in chain B.
Another observation was the partly undefined FAD loop of chain A of (1) and (2). One potential reason for this could be that residue I42 of chain A would clash strongly with B P262 of a symmetry-related molecule. In chain B, I42 does not clash and even makes stabilizing van der Waals interactions with a neighboring molecule (see Figure S3a,b in the supplementary material). In addition, there is a crystal contact of B R44 and a symmetry-related A D307, whereas chain A makes no such crystal contact.
In all other Thal structures, the FAD loop is not stabilized by a crystal contact but is always defined by electron density [10,14]. The open conformation of the FAD loop seen in the P6 4 structures might be the energetically lowest possible conformation in the absence of FAD and might even be present in solution.

Impact of V M on Diffraction
The denser packing of P2 1 compared to P6 4 crystals, deducible from the lower Matthews coefficient of 2.28 for P2 1 (N His -Thal-RebH5), compared to 3.33 for P6 4 (apo-Thal, 6H43), is another possible explanation for the better diffraction. In 1968, Matthews estimated solvent contents of 27-65%, which correlates to V M values of 2.15-2.61 Å 3 /Dalton, for protein crystals by analyzing 116 structures [35]. Hence, the untagged P6 4 structures of Thal are at the high end of the V M distribution. Kantardjieff and Rupp [36] reanalyzed in 2002 the V M distribution of 10,471 non-redundant protein crystal forms and observed a broader range with a mode of 2.34 Å 3 /Dalton corresponding to a solvent content of~47%. They also analyzed the V M distribution along the resolution and observed lower V M values at higher resolution, indicating that crystals diffract better when they are packed tightly.

Impact of His-tag on Crystallization
Changing the position of the His-tag (N-or C-terminal) may already have effects on the protein [37], such as differing activity, inhibition, or conformation [38,39]. There are contradictory publications on the impact of His-tags on crystallization. Smits et al. [39] report only the construct with His-tag yielded crystals, whereas Plavša et al. [40] could improve the diffraction quality in terms of resolution, spot shape, and spot size by adding thrombin in situ immediately before crystallization to cleave the His-tag (using a pET-28a vector). This also resulted in a different crystal morphology like we observed for tagged and untagged Thal-RebH5 (see Figure 1). The His-tag can change the properties of the protein [41,42] and, therefore, it is possible that the tagged and untagged protein do not crystallize in the same crystallization conditions [43], just like with N His -Thal-RebH5 and untagged Thal-RebH5.
Bucher et al. [44] chose a more systematic approach. They analyzed the crystallization habit of the protein PfuMBP with various short tags and could detect an influence of the tag amino acid sequence on the formation of crystals and their ability to diffract. On the other hand, they observed no impact of the His-tag on the crystallization of PfuMBP and concluded that the His-tag is "not intrinsically detrimental to the formation of protein crystals" [44] (p. 397) in context of the numerous deposited His-tagged structures [44].
Carson et al. [45] statistically analyzed, in 2007, the impact of His-tags on the crystallization of 1142 tagged in comparison to 11906 untagged protein structures deposited at the PDB and detected that "the resolution and R-factors were consistently better for the his-tagged structures but the differences are not statistically significant" [45] (p. 301). Only small structural differences between tagged and untagged protein structures were detected, which agrees with our results of high similarity of the structures themselves and their functional regions. Moreover, the majority of the tags (90%) were disordered or involved in crystal contacts, as well as, on average, ten residues long [43,45]. This again coincides with our results of reliable density only for five residues of the linker and an additional crystal contact involving the N-terminus of chain B. In addition, most of the His-tags protruded outside of the protein just like N His -Thal-RebH5 (2) [43,45]. In total, "structures with well-defined His-tags are rare" [41] (p. 300) and "have only marginally different values" [41] (p. 299) of R-factors, V M and B-factors [45].
In summary, it is not entirely possible to predict the impact of His-tags on the crystallization of one's protein, but there are so many His-tagged structures deposited in the PDB that there is no mandatory need to cleave the His-tag. Tagged and untagged Thal-RebH5 is a great example, showing that it is worth a try to test a different construct and a different crystal morphology, even when the crystals do not look promising. Furthermore, the binding of a glycine in the active site of the tryptophan 6-halogeanse Thal can induce an ordered conformation of the substrate-binding loop.