E ﬀ ects of Proline Substitutions on the Thermostable LOV Domain from Chloroﬂexus aggregans

: Light-oxygen-voltage (LOV) domains are ubiquitous photosensory modules found in proteins from bacteria, archaea and eukaryotes. Engineered versions of LOV domains have found widespread use in ﬂuorescence microscopy and optogenetics, with improved versions being continuously developed. Many of the engineering e ﬀ orts focused on the thermal stabilization of LOV domains. Recently, we described a naturally thermostable LOV domain from Chloroﬂexus aggregans . Here we show that the discovered protein can be further stabilized using proline substitution. We tested the e ﬀ ects of three mutations, and found that the melting temperature of the A95P mutant is raised by approximately 2 ◦ C, whereas mutations A56P and A58P are neutral. To further evaluate the e ﬀ ects of mutations, we crystallized the variants A56P and A95P, while the variant A58P did not crystallize. The obtained crystal structures do not reveal any alterations in the proteins other than the introduced mutations. Molecular dynamics simulations showed that mutation A58P alters the structure of the respective loop (A β -B β ), but does not change the general structure of the protein. We conclude that proline substitution is a viable strategy for the stabilization of the Chloroﬂexus aggregans LOV domain. Since the sequences and structures of the LOV domains are overall well-conserved, the e ﬀ ects of the reported mutations may be transferable to other proteins belonging to this family.


Introduction
Light-oxygen-voltage (LOV) domains are ubiquitous photosensory modules found in proteins from bacteria, archaea, fungi, plants and protists [1,2]. They bind different flavonoids, mainly FMN and FAD, as cofactors, and absorb blue and ultraviolet light [1]. Upon illumination, LOV domains undergo conformational changes that may result in a variety of outcomes: they may partially unfold, form homo-or heterodimers, translocate to the plasma membrane, regulate the activity of a kinase or some other effector domain [1,[3][4][5][6].
The diffraction data were collected at 100 K on the BioMAX macromolecular crystallography beamline at MAX IV Laboratory (Lund, Sweden). Diffraction images were processed using XDS [34]. POINTLESS and AIMLESS [35] were used to merge, scale and assess the quality of the data, as well as to convert intensities to structure factor amplitudes and generate Free-R labels.

Structure Determination and Refinement
The structures of CagFbFP-A56P and CagFbFP-A95P were solved using molecular replacement with MOLREP [36] and CagFbFP structure (PDB ID 6RHF) [25] as a search model. The resulting model was refined manually using Coot [37] and REFMAC5 [38].

Molecular Dynamics (MD) Simulations
The initial coordinates for the wild type CagFbFP simulations were taken from the X-ray structure (PDB ID 6RHF) [25]. The models of the CagFbFP-A58P variant were constructed using the SWAP function in YASARA Structure Version 17.4.17 [39], and optimized using the SCWRL rotamer library search [40]. The lowest energy conformers were selected for further studies. Two sets of independent simulations for each conformer of Ala58 and Pro58 ("A" and "B") were carried out. The protonation states of titratable residues at pH 7 were assigned on the basis of pKa calculations using the PROPKA 3.1 program [41] and visual inspection; all charged residues were kept at their standard protonation states. Side chains of Asn and Gln residues were checked for possible flipping. The phosphate group of the FMN cofactor was deprotonated, carrying a charge of −2e. Consequently, CagFbFP and CagFbFP-A58P dimers had a total charge of −4e. To neutralize the systems, solvent water molecules that were at least 5.5 Å away from any protein atoms were replaced by Na + ions. Hydrogen atoms were added, employing the tleap module of AmberTools14 [42]. Crystal water molecules were kept; the protein was solvated in a water box centered at the center of mass to ensure a water layer of 12 Å around the protein. The total size of the simulated systems was~43,000 atoms, including~13,200 TIP3P [43] water molecules. All MD simulations were carried out using the Amber14 program [42] with the Amber ff99SB [44,45] all-atom force field for proteins, the general Amber force field (GAFF) [46] for flavin mononucleotide (FMN) and the TIP3P model for water [43]. We used the atomic charges and force field parameters for FMN moiety reported in our previous work [47]. Initially, the solvent and the ions followed by the whole system were subjected to minimization using 10,000 steps of the steepest descent, followed by 3000 steps of conjugate-gradient minimization. The system was then slowly heated from 0 to 300 K for 50 ps. In all simulations, constant pressure periodic boundary conditions using the Particle Mesh Ewald (PME) [35] method were employed. To calculate the electrostatic interactions, a cutoff of 10 Å was used. After heating, the systems were equilibrated for 1000 ps at 300 K. Finally, three independent 50 ns-long production runs were performed for each alternative conformer of WT and A58P variants. Pymol [48], VMD [49] and AmberTools14 [42] were used for molecular visualizations and analysis of MD simulations.

Identification of Positions for Proline Substitutions
Recently, we have identified and studied a small thermostable flavin-based fluorescent protein CagFbFP, derived from a soluble light-oxygen-voltage (LOV) domain-containing histidine kinase from the thermophilic bacterium Chloroflexus aggregans [25]. The protein crystallized well, and an ultra-high resolution structure of CagFbFP has been determined [25]. While most of the protein is very well-ordered, backbones as well as side chains of residues Ala58 and Asp59, located in the loop between β-strands Aβ and Bβ, were observed to adopt two alternative conformations. We reasoned that targeting this region with mutations might stabilize it in a single conformation, and also stabilize the overall protein.
To understand the natural variability of the amino acids observed in these and other positions in different LOV domains, we prepared a multiple sequence alignment of different LOV-derived fluorescent proteins [9,19,25,50,51] (Figure 1). In addition to these proteins, we wanted to compare CagFbFP to its close homologs identified in the genomes of Chloroflexus aurantiacus [52] and Chloroflexus islandicus [53], with sequence identities of 72% and 87%, respectively. The sequence alignment revealed that the loop connecting Aβ and Bβ is shorter by one amino acid in the Chloroflexi proteins compared to others. At the same time, prolines are observed in iLOV, EcFbFP and DsFbFP at the position of CagFbFP's Ala56, as well as most of the proteins, including the ones from Chloroflexus aurantiacus and Chloroflexus islandicus, at the position of CagFbFP's Ala58 ( Figure 1). Following this observation, we calculated how often prolines are observed in the sequences of all LOV domains found by Glantz et al. [2]. Prolines are observed in~47% of proteins at the position of Ala56, and in~77% of proteins at the position of Ala58. Since a proline's backbone is naturally more rigid than that of other amino acids, it can stabilize certain kinks in the protein, and consequently the overall protein [27,28]. Additionally, mutating a particular amino acid to the consensus one often improves the stability of the resulting protein [26]. Consequently, mutating Ala56 and Ala58 into prolines might be beneficial for the stability of CagFbFP.
Careful analysis of the sequence alignment revealed also another position, that of CagFbFP's Ala95, situated at the N-terminus of the helix Fα, which is also often occupied by prolines in other LOV proteins. Overall, 33% of the proteins from the dataset from Glantz et al. [2] contain a proline at this or the neighboring position. Consequently, we focused on probing the effects of the mutations A56P, A58P and A95P on the stability of CagFbFP.
Following this observation, we calculated how often prolines are observed in the sequences of all LOV domains found by Glantz et al. [2]. Prolines are observed in ~47% of proteins at the position of Ala56, and in ~77% of proteins at the position of Ala58. Since a proline's backbone is naturally more rigid than that of other amino acids, it can stabilize certain kinks in the protein, and consequently the overall protein [27,28]. Additionally, mutating a particular amino acid to the consensus one often improves the stability of the resulting protein [26]. Consequently, mutating Ala56 and Ala58 into prolines might be beneficial for the stability of CagFbFP.  [19], EcFbFP [9], Pp1FbFP [50], Pp2FbFP (formerly PpFbFP [9]), DsFbFP [50], CreiLOV and VafLOV [51], six different FbFPs from thermophilic microorganisms [24], CagFbFP [25] and CagFbFP homologs from Chloroflexus aurantiacus and Chloroflexus islandicus. CagFbFP alanines that were substituted with prolines in this work are marked with asterisks.

Characterization of Ala→Pro CagFbFP Mutants
Following the identification of the prospective positions for Ala→Pro substitutions, we have prepared atomistic models of the corresponding variants, CagFbFP-A56P, CagFbFP-A58P and CagFbFP-A95P, using PyMOL [48]. The models revealed that, as expected from sequence alignment ( Figure 1) and structures of other LOV proteins, the replacement of Ala56, Ala58 or Ala95 with prolines should not disturb the protein backbone ( Figure 2). Consequently, we produced the mutated variants and evaluated their properties in vitro. The absorption, excitation and emission spectra of CagFbFP-A56P, CagFbFP-A58P and CagFbFP-A95P are identical to those of CagFbFP, with the excitation maximum at ∼449 nm and the emission maximum at~495 nm (Supplementary Materials Figures S1-S3).
Crystals 2020, 10, x FOR PEER REVIEW 5 of 13 this or the neighboring position. Consequently, we focused on probing the effects of the mutations A56P, A58P and A95P on the stability of CagFbFP.

Characterization of Ala→Pro CagFbFP Mutants
Following the identification of the prospective positions for Ala→Pro substitutions, we have prepared atomistic models of the corresponding variants, CagFbFP-A56P, CagFbFP-A58P and CagFbFP-A95P, using PyMOL [48]. The models revealed that, as expected from sequence alignment ( Figure 1) and structures of other LOV proteins, the replacement of Ala56, Ala58 or Ala95 with prolines should not disturb the protein backbone ( Figure 2). Consequently, we produced the mutated variants and evaluated their properties in vitro. The absorption, excitation and emission spectra of CagFbFP-A56P, CagFbFP-A58P and CagFbFP-A95P are identical to those of CagFbFP, with the excitation maximum at ∼449 nm and the emission maximum at ~495 nm (Supplementary Materials Figures S1-3).
To evaluate the thermal stability of the mutated variants, we measured the dependence of the fluorescence intensity on temperature during heating-induced denaturation and cooling-induced refolding of the purified proteins (Figure 3). Similarly to CagFbFP, the variants CagFbFP-A56P, CagFbFP-A58P and CagFbFP-A95P reveal two melting transitions upon denaturation, and only one upon refolding, as evidenced by derivatives of fluorescence as a function of temperature ( Figure 3). The results of these experiments are summarized in Table 1. Overall, the mutation A56P slightly stabilizes the protein, but does not facilitate refolding; A58P does not significantly influence Figure 2. Location of the prospective proline substitution sites (magenta, modeled using PyMOL [48]) mapped onto the CagFbFP structure (green, PDB ID 6RHF). Ala58 (top right) adopts two alternative conformations in the original structure [25].
To evaluate the thermal stability of the mutated variants, we measured the dependence of the fluorescence intensity on temperature during heating-induced denaturation and cooling-induced refolding of the purified proteins (Figure 3). Similarly to CagFbFP, the variants CagFbFP-A56P, CagFbFP-A58P and CagFbFP-A95P reveal two melting transitions upon denaturation, and only one upon refolding, as evidenced by derivatives of fluorescence as a function of temperature ( Figure 3). The results of these experiments are summarized in Table 1. Overall, the mutation A56P slightly stabilizes the protein, but does not facilitate refolding; A58P does not significantly influence denaturation, but slightly facilitates refolding; and A95P stabilizes the protein and facilitates refolding. In all of the cases, the refolding is not complete, and refolded fraction is similar within the experimental errors. However, we should note that part of the differences in the fluorescence of the original and refolded samples is due to irradiation-induced damage of the chromophore, which is especially strong at elevated temperatures.  Table 1.

Crystallization of the Mutated CagFbFP Variants
To gain structural information about the effects of proline substitutions, we attempted crystallization of the mutated CagFbFP variants. CagFbFP-A56P and CagFbFP-A95P formed large crystals, reaching up to 700 μm in size, which diffracted to 1.6 Å and belonged to the same space group as CagFbFP (P21212). The data collection statistics are reported in Table 2. On the contrary, CagFbFP-A58P did not form crystals; only amorphous aggregates were observed in some of the crystallization trials. The reason for this is not clear; possibly as Ala58 is close to Arg86 and Gln123 of the adjacent protein chains in the crystals of CagFbFP, the mutation A58P in CagFbFP-A58P resulted in steric clashes and prevented the formation of crystal contacts of the same type. While the resolution of CagFbFP-A56P and CagFbFP-A95P crystal structures is lower than that of the original CagFbFP structure [25], it could have likely been improved by the extensive screening of crystals and using advanced diffraction data collection strategies.  Each experiment was conducted independently four times, and the data were averaged for plotting. Characteristic unfolding and refolding temperatures are summarized in Table 1. Table 1. Melting temperatures of CagFbFP and its proline mutants. T m1 and T m2 correspond to the two melting transitions (Figure 3), and T r corresponds to the temperature of refolding. Errors are standard deviations of the values observed in four independent experiments.

Crystallization of the Mutated CagFbFP Variants
To gain structural information about the effects of proline substitutions, we attempted crystallization of the mutated CagFbFP variants. CagFbFP-A56P and CagFbFP-A95P formed large crystals, reaching up to 700 µm in size, which diffracted to 1.6 Å and belonged to the same space group as CagFbFP (P2 1 2 1 2). The data collection statistics are reported in Table 2. On the contrary, CagFbFP-A58P did not form crystals; only amorphous aggregates were observed in some of the crystallization trials. The reason for this is not clear; possibly as Ala58 is close to Arg86 and Gln123 of the adjacent protein chains in the crystals of CagFbFP, the mutation A58P in CagFbFP-A58P resulted in steric clashes and prevented the formation of crystal contacts of the same type. While the resolution of CagFbFP-A56P and CagFbFP-A95P crystal structures is lower than that of the original CagFbFP structure [25], it could have likely been improved by the extensive screening of crystals and using advanced diffraction data collection strategies. Overall, the structures of CagFbFP-A56P and CagFbFP-A95P are very similar to the original CagFbFP structure, with the root mean square deviation of the positions of the backbone atoms of~0.2 Å in each case. Polder OMIT maps [54] confirm the identity of the introduced mutations ( Figure S4). The LOV domain fold is not changed, and the two protomers form an antiparallel dimer, with the hydrophobic surfaces of β-sheets at the dimerization interface, as observed previously for CagFbFP [25]. The protein backbone structure is slightly altered around the mutation site in CagFbFP-A56P, and essentially unchanged in CagFbFP-A95P ( Figure 4). Interestingly, in both of the mutants, Ala58 and Asp59 are still in two alternative conformations. Most likely, the differences in the free energies of the two conformations were not affected by the introduced mutations.

MD Simulations of the A58P Variant
To gain insight into the structure of the A58P variant, we conducted extensive MD simulations of the WT and mutated proteins. Both CagFbFP and CagFbFP-A58P preserve their overall structure in the simulations, as evidenced by the root mean square deviations of the atomic positions from the starting structure ( Figure S5). Some of the simulations reveal relative motions of the protomers within the dimer ( Figure S5), yet in each case, the structures of the individual protomers are unchanged ( Figure S6). Analysis of backbone fluctuations ( Figure S6) does not reveal any clear differences in the flexibility of the WT and A58P variants. However, we observed that the structure of the Aβ-Bβ loop was changed in the mutant ( Figure 5). In particular, the peptide torsion angle φ is positive for Ala58 and negative for Pro58. The moderate displacement of the Aβ-Bβ loop in the A58P variant is likely the reason for its inability to form crystals, which is similar to the WT protein. Interestingly, we do not observe any interconversion between the two alternative conformations of Ala58 or Pro58. Because of this, it is not clear whether one or another conformation is preferred in the mutant.

MD Simulations of the A58P Variant
To gain insight into the structure of the A58P variant, we conducted extensive MD simulations of the WT and mutated proteins. Both CagFbFP and CagFbFP-A58P preserve their overall structure in the simulations, as evidenced by the root mean square deviations of the atomic positions from the starting structure ( Figure S5). Some of the simulations reveal relative motions of the protomers within the dimer ( Figure S5), yet in each case, the structures of the individual protomers are unchanged ( Figure S6). Analysis of backbone fluctuations ( Figure S6) does not reveal any clear differences in the flexibility of the WT and A58P variants. However, we observed that the structure of the Aβ-Bβ loop was changed in the mutant ( Figure 5). In particular, the peptide torsion angle ϕ is positive for Ala58 and negative for Pro58. The moderate displacement of the Aβ-Bβ loop in the A58P variant is likely the reason for its inability to form crystals, which is similar to the WT protein. Interestingly, we do not observe any interconversion between the two alternative conformations of Ala58 or Pro58. Because of this, it is not clear whether one or another conformation is preferred in the mutant.

Discussion
In this work, we analyzed the multiple sequence alignment of several light-oxygen-voltage (LOV) proteins and the available structural information to identify three amino acid positions in a thermostable LOV protein, CagFbFP, which could stabilize the protein when substituted with prolines. Two of the identified mutations, A56P and A58P, had a mostly neutral effect on the protein's stability, whereas the third one, A95P, moderately stabilized the protein, and also improved its refolding temperature. Complementary structural studies show that the structure of the mutated proteins remains essentially unchanged, although the structure of the Aβ-Bβ loop is slightly different in the A58P variant.
Previously, multiple studies have probed the effects of mutations on various properties of LOV domains. Earlier studies focused on the effects of mutating the conserved cysteine, which forms a covalent bond with the flavonoid cofactor during the photocycle, and some random mutations on flavin binding and photochemical reactivity [55,56]. Later studies probed the effects of mutations on other properties, particularly the absorption spectrum [47,50,57,58], photocycle lifetime [57,59], brightness of the cysteine-less variants [9,19,20,60], generation of radicals [15,61,62] and thermal stability [21][22][23]. Many of these mutations were rational, or could be rationalized after initial discovery, thus allowing one to apply the same principles to impart a different LOV domain with the desirable properties.
We expect that the effects of the proline substitutions may also be transferable to other LOV domains, since the rationales that we employed (consensus design [26][27][28], stabilization by prolines [27,28]) will hold. Interestingly, out of three LOV domain thermal stabilization studies [21][22][23], only one reported the proline substitutions [22], which were, however, not included in the most stable variant with multiple mutations. Thus, testing the effects of proline substitutions may be a complementary approach to recombination, and directed evolution to speed up the discovery of the most stable variants. We hope that our work will advance the development of efficient LOV-based tools for fluorescence microscopy and optogenetics.

Discussion
In this work, we analyzed the multiple sequence alignment of several light-oxygen-voltage (LOV) proteins and the available structural information to identify three amino acid positions in a thermostable LOV protein, CagFbFP, which could stabilize the protein when substituted with prolines. Two of the identified mutations, A56P and A58P, had a mostly neutral effect on the protein's stability, whereas the third one, A95P, moderately stabilized the protein, and also improved its refolding temperature. Complementary structural studies show that the structure of the mutated proteins remains essentially unchanged, although the structure of the Aβ-Bβ loop is slightly different in the A58P variant.
Previously, multiple studies have probed the effects of mutations on various properties of LOV domains. Earlier studies focused on the effects of mutating the conserved cysteine, which forms a covalent bond with the flavonoid cofactor during the photocycle, and some random mutations on flavin binding and photochemical reactivity [55,56]. Later studies probed the effects of mutations on other properties, particularly the absorption spectrum [47,50,57,58], photocycle lifetime [57,59], brightness of the cysteine-less variants [9,19,20,60], generation of radicals [15,61,62] and thermal stability [21][22][23]. Many of these mutations were rational, or could be rationalized after initial discovery, thus allowing one to apply the same principles to impart a different LOV domain with the desirable properties.
We expect that the effects of the proline substitutions may also be transferable to other LOV domains, since the rationales that we employed (consensus design [26][27][28], stabilization by prolines [27,28]) will hold. Interestingly, out of three LOV domain thermal stabilization studies [21][22][23], only one reported the proline substitutions [22], which were, however, not included in the most stable variant with multiple mutations. Thus, testing the effects of proline substitutions may be a complementary approach to recombination, and directed evolution to speed up the discovery of the most stable variants. We hope that our work will advance the development of efficient LOV-based tools for fluorescence microscopy and optogenetics.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4352/10/4/256/s1, Figure S1: Normalized absorption spectra of CagFbFP and its proline-substituted variants, Figure S2. Normalized fluorescence excitation spectra of CagFbFP and its proline-substituted variants, Figure S3. Normalized fluorescence emission spectra of CagFbFP and its proline-substituted variants, Figure S4. Omit (polder) maps for the mutants, (a) A56P and (b) A95P. The original structure of CagFbFP (PDB ID 6RHF) is shown in green, the structures of the mutants are shown in magenta. Polder electron density maps (green) are contoured at the level of 3 × r.m.s, Figure S5. Root mean square deviations of backbone atom positions as a function of time. Dimers of proteins harboring the conformers A and B of the residue 58 were simulated both for the WT and A58P variants for 3 times (runs 1-3). The values were averaged over 1 ns time intervals. Some trajectories, such as WT B run 1, display relatively high overall RMSD as a consequence of displacement of one protomer relative to another one. Structures of individual protomers are conserved well in all simulations ( Figure S6  Acknowledgments: Atomic coordinates and structure factors for the reported crystal structures have been deposited in the Protein Data Bank under the accession codes 6Y7R (CagFbFP-A56P) and 6Y7U (CagFbFP-A95P). X-ray diffraction data were collected at the BioMAX beamline at MAX IV Laboratory (Lund, Sweden). Simulations were performed with computing resources granted by JARA-HPC from RWTH Aachen University under project JARA0065.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

FbFP
Flavin-based Fluorescent Protein LOV Light-Oxygen-Voltage WT Wild Type