Accurate Prediction of Absorption Spectral Shifts of Proteorhodopsin Using a Fragment-Based Quantum Mechanical Method

Many experiments have been carried out to display different colors of Proteorhodopsin (PR) and its mutants, but the mechanism of color tuning of PR was not fully elucidated. In this study, we applied the Electrostatically Embedded Generalized Molecular Fractionation with Conjugate Caps (EE-GMFCC) method to the prediction of excitation energies of PRs. Excitation energies of 10 variants of Blue Proteorhodopsin (BPR-PR105Q) in residue 105GLN were calculated with the EE-GMFCC method at the TD-B3LYP/6-31G* level. The calculated results show good correlation with the experimental values of absorption wavelengths, although the experimental wavelength range among these systems is less than 50 nm. The ensemble-averaged electric fields along the polyene chain of retinal correlated well with EE-GMFCC calculated excitation energies for these 10 PRs, suggesting that electrostatic interactions from nearby residues are responsible for the color tuning. We also utilized the GMFCC method to decompose the excitation energy contribution per residue surrounding the chromophore. Our results show that residues ASP97 and ASP227 have the largest contribution to the absorption spectral shift of PR among the nearby residues of retinal. This work demonstrates that the EE-GMFCC method can be applied to accurately predict the absorption spectral shifts for biomacromolecules.


Introduction
The rhodopsin proteorhodopsin and related proteins have aroused continuous and extensive interest both experimentally and theoretically among researchers, especially during the past 20 years [1][2][3][4][5]. Rhodopsin is a seven transmembrane α-helices (TM) protein which uses retinal as a chromophore. Retinal, the aldehyde of vitamin A and a polyene chromophore, is derived from β-carotene and utilized in the all-trans/13-cis configurations in microbial rhodopsins allowing certain microorganisms to convert light into metabolic energy. 11-cis/all-trans retinal acts as the molecular basis of animal vision and is attached by a Schiff base linkage to the conserved residue lysine (LYS231) sidechain in the middle of TM7. The retinal Schiff base (RSB) is protonated (RSBH+) in most cases [6].
Proteorhodopsin(PR), as a member of the microbial rhodopsin family, is a light-driven protein found in marine proteobacteria. Due to the widespread global distribution of proteobacteria in sea water, PR may have an important effect on global solar energy input in the biosphere [7,8]. The maximum absorption wavelength of PR is tuned according to the depth at which the bacteria live [9,10]. It is well known that a shorter wavelength of the light corresponds to a higher energy and larger power to penetrate deeper water. To date,

Computational Approaches
The ground state of microbial rhodopsin possesses an all-trans configuration for its chromophore [6]. Our previous study demonstrated that the protein environment plays an important role in the excited-state properties of GFP [13]. We therefore expect a similar dependence on protein environment for the electronic properties of the PR chromophore. To handle the full complexities of the PR chromophore-protein interactions, we use Electrostatic Embedded (EE)-GMFCC, which is an extension of GMFCC to include many-body environment effects in each fragment QM calculation using embedding charges that represent the remaining fragments [14,15].

The EE-GMFCC Method for Excited State and Its Application for PR105Q
Before applying the EE-GMFCC method to absorption spectrum calculations of PRs, we give a brief description of this quantum fragmentation approach to large biomolecular systems. The EE-GMFCC method was initially developed for the total energy calculation on the ground state of large proteins [14][15][16]. In the EE-GMFCC framework, a protein with N residues is divided into N-2 fragments with each residue capped by its neighboring residues (conjugate caps) [14,15,17]. Each fragment's energy is computed at the QM level. Interactions between fragments that are not directly bonded are handled by including two-body interactions in larger fragments formed by two residues in close contact within a distance cutoff. Typically, three-body and higher-order interactions within the EE-GMFCC scheme are small and can be neglected. The QM energies of the conjugated caps are deducted to avoid overcounting contributions from overlapping regions of the fragments.
The ground-state energy of a protein is calculated by EE-GMFCC as follows, where E denotes the sum of the self-energy of a fragment and the interaction energy between the fragment and its background charges. Cap * i−1 A i Cap i+1 represents the ith residue (A i ) covalently bonded with molecular caps of Cap * i−1 and Cap i+1 . The definition of molecular caps is given in ref. [14]. The first two sums calculate the one-body (1B) QM energy of the system, which accounts for the covalently-linked three neighboring residues. The third term in Equation (1) represents the two-body (2B) QM energy corrections, arising from the two non-covalently-linked residues that are spatially in close contact. The last two terms are applied to subtract the double-counted interaction between QM region and background charges in all 1B and 2B calculations. R ij is the minimum distance between any two atoms from residues i and j. λ 2 is the cutoff distance of 2B. q m(k i ) represents the charge of the mth atom of fragment k i . R m(k i )n(l i ) represents the distance between atoms m(k i ) and n(l i ). For a detailed description of the total ground-state energy calculation of proteins using the EE-GMFCC method, please refer to our previous work [14].
Recently, our group extended the ground state EE-GMFCC method to make it applicable for predicting the properties of excited states of the luminescent biomolecule GFP [13]. The calculation of excitation energy, transition dipole moment, and oscillator strength using EE-GMFCC showed good agreement with the corresponding full system QM results. In this study, based on our previous works [13,[18][19][20][21], the three-body QM interaction term was neglected because it only slightly improves the accuracy but with substantially more computational time. Moreover, the cutoff distance threshold for the two-body QM interactions was set to 4 Å to strike a compromise between attained accuracy and computational cost.
In its current formulation, the EE-GMFCC method is appropriate only for local excitations, wherein the dominant electronic response following excitation is localized on a single molecular unit, i.e. a chromophore. Such a local excitation can be defined by the excited state of the chromophore in isolation being qualitatively similar to its excited state in the protein environment. We expect this to be the case for PRs, because its non-standard residue LYR231 has been identified as the chromophore responsible for light absorption.
After we define the locally excited region, fragments including this region are defined as excited fragments (EF), whereas fragments excluding the excited region are defined as unexcited fragments (UEF). Then the total excited-state energy of the protein can be described as the summation of the excited state energy E Excited−State EE−GMFCC (EF) from the contributions of excited fragments and the ground-state energy E Excited−State EE−GMFCC (UEF) from unexcited fragments as follows: Most denotations in Equation (2) (the EE-GMFCC method for excited state) are similar to the ground state calculation. Here we just point out the key differences. The superscript "prime" represents the excited-state energy of the subsystem containing the localized excitation region (residue m). In addition, for the unexcited fragments, the energy calculation is the same as the ground state. Then the total EE-GMFCC energy at the excited state is given by: Here, we assume that the atomic charges of the protein at the excited state are approximated to be the same as those at the ground state. We subtract Equation (1) from Equation (3) to obtain the excitation energy of the protein based on the EE-GMFCC approach. The final expression of the EE-GMFCC excitation energy is as follows: In this study, m is the residue number of chromophore LYR. ω stands for the excitation energy. The QM region is given in the brackets including two kinds of fragments, namely Cap * i−1 A i Cap i+1 and Cap * i Cap i+1 , whereas the remaining part of the protein was represented by background atomic charges. The QM calculations were performed using the Terachem package [22][23][24]. Cap * i−1 A i Cap i+1 represents the QM region of the ith residue (A i ) covalently bonded with molecular caps of Cap * i−1 and Cap i+1 . The definition of molecular caps is given in ref. [14]. The first two summations of Equation (4) calculate the one-body (1B) QM contributions to the total excitation energy of the system.
There are two kinds of fragment interactions in Equation (4), which includes one-body (1B) and two-body (2B) QM interaction terms. The one-body (1B) term represents the fragment with sequentially connected two residues or three residues, whereas the twobody (2B) QM interaction term represents the interaction between two non-neighboring residues that are spatially in close contact within a distance threshold. In this study, for PRs, the 1B term includes the chromophore LYR231 and its neighboring residues, namely, Fragment(230), Fragment(231), Fragment(232), Concap(230), and Concap(231), where Fragment(230), Fragment(231), and Fragment(232) contain residues 229-231, 230-232, and 231-233, respectively, and Concap(230) and Concap(231) contain residues 230-231 and 231-232, respectively (see Figure 1 and Figure S1 of the Supplementary Materials). The 2B term in this study represents the QM interaction between LYR231 and non-neighboring residue that is within 4 Å of LYR231. Therefore, Equation (4) becomes: where ω 1B and ω 2B denote the 1B and 2B QM interactions, respectively. The last term ω 231,j − ω 231 is the sum of 2B QM corrections, namely, the excitation energy between LYR231 and its non-neighboring residue j, if the distance between any pair of atoms from LYR231 and residue j is less than or equal to 4 Å. In the 1B term, the interaction between the chromophore and non-neighboring residues is described by the embedding electrostatic charges. The interactions between the chromophore and non-neighboring residues in spatially close contact are subjected to quantum mechanical treatment in the 2B QM calculations.  is the sum of 2B QM corrections, namely, the excitation energy between LYR231 and its non-neighboring residue j, if the distance between any pair of atoms from LYR231 and residue j is less than or equal to 4 Å . In the 1B term, the interaction between the chromophore and non-neighboring residues is described by the embedding electrostatic charges. The interactions between the chromophore and non-neighboring residues in spatially close contact are subjected to quantum mechanical treatment in the 2B QM calculations. Graphical representation of fragments, concaps, and 2B QM interaction. The background α-helix cartoon of the PR protein with 80% transparency is described by background charges, whereas the sticks of the corresponding residues represent the QM region in the EE-GMFCC calculation. The green sticks denote the chromophore LYR231. The structures all come from the same configuration of the PR protein.

Homology Modelling
In this work, the structure of Blue-absorbing PR (BPR-PR105Q) was prepared by means of a hybrid Quantum Mechanics/Molecular Mechanics (QM/MM) approach and MD simulation [25]. The initial structure was taken from the X-ray crystal structure of BPR (PR105Q) in the dark state (PDB id: 4JQ6, chain B), which we chose as the template of PR105Q (PR with GLN105). The MODELLER [26] software, was utilized for homology modelling of PR105Q. The alignment of PR105Q and chain B of 4JQ6 was performed using TM-align [27]. Residue LYS231 and retinal were combined in a protonated retinal Schiff base (RSBH+) linkage as a non-standard residue in the Amber package [28,29], which we named LYR231 in this work. The structure of BPR (PR105Q) and LYR231 are shown in Figure 1. Graphical representation of fragments, concaps, and 2B QM interaction. The background α-helix cartoon of the PR protein with 80% transparency is described by background charges, whereas the sticks of the corresponding residues represent the QM region in the EE-GMFCC calculation. The green sticks denote the chromophore LYR231. The structures all come from the same configuration of the PR protein.

Homology Modelling
In this work, the structure of Blue-absorbing PR (BPR-PR105Q) was prepared by means of a hybrid Quantum Mechanics/Molecular Mechanics (QM/MM) approach and MD simulation [25]. The initial structure was taken from the X-ray crystal structure of BPR (PR105Q) in the dark state (PDB id: 4JQ6, chain B), which we chose as the template of PR105Q (PR with GLN105). The MODELLER [26] software, was utilized for homology modelling of PR105Q. The alignment of PR105Q and chain B of 4JQ6 was performed using TM-align [27]. Residue LYS231 and retinal were combined in a protonated retinal Schiff base (RSBH+) linkage as a non-standard residue in the Amber package [28,29], which we named LYR231 in this work. The structure of BPR (PR105Q) and LYR231 are shown in Figure 2. The distances between the chromophore and all residues are given in Supplementary Figure

Force Field Construction of the Non-Standard Residue LYR231
To make direct comparison to experiment, absorption spectra of the wild-type protein and several other mutated proteins were computed under high pH conditions, such that the ASP97 residue near the Schiff base was deprotonated. For the structure of chromophore, a LYS residue jointed with retinal (named LYR) was formed by the Schiff base at the connection site. The isomeric all-trans configuration occupied more than 70% for almost all of the mutant proteins. Therefore, we chose the all-trans isomer as the initial structure of LYR (see Figure 2) [12].
We set the residue ASP97 at the deprotonated state. The experimental value of the absorption peak compared in this study was the measured value when the protein was solvated in high pH solution. The Amber18 program was utilized to obtain the force-field parameters [29]. The parameters of chromophore were created using the Generalized Amber Force Field (GAFF) [30]. The Amber ff14SB force field and the TIP3P water model [31] were used for other parts of the protein [32] and water, respectively. Force-field parameters of non-standard residue (LYR231, the chromophore) were obtained from the ANTE-CHAMBER module with the AM1-BCC charge model [33] using the semi-empirical quantum mechanics (sqm) method [34]. The side view of PR. In (a,b), the green moiety denotes the chromophore, and the blue moiety is residue 105. (c) The structure of LYR and all hydrogens were removed for simplicity. All atom names are marked for clarity.

Force Field Construction of the Non-Standard Residue LYR231
To make direct comparison to experiment, absorption spectra of the wild-type protein and several other mutated proteins were computed under high pH conditions, such that the ASP97 residue near the Schiff base was deprotonated. For the structure of chromophore, a LYS residue jointed with retinal (named LYR) was formed by the Schiff base at the connection site. The isomeric all-trans configuration occupied more than 70% for almost all of the mutant proteins. Therefore, we chose the all-trans isomer as the initial structure of LYR (see Figure 2) [12].
We set the residue ASP97 at the deprotonated state. The experimental value of the absorption peak compared in this study was the measured value when the protein was solvated in high pH solution. The Amber18 program was utilized to obtain the force-field parameters [29]. The parameters of chromophore were created using the Generalized Amber Force Field (GAFF) [30]. The Amber ff14SB force field and the TIP3P water model [31] were used for other parts of the protein [32] and water, respectively. Force-field parameters of non-standard residue (LYR231, the chromophore) were obtained from the ANTECHAM-BER module with the AM1-BCC charge model [33] using the semi-empirical quantum mechanics (sqm) method [34].

Molecular Dynamics (MD) Simulation
The initial protein structure was solvated in a cubic periodic box of TIP3P water molecules with each side at least 10 Å from the nearest solute atom, and the total size of the cubic box was 87.298 × 71.649 × 57.894 Å 3 . The counter ions Na + were added to neutralize the entire system according to its total net charge. The molecular structure of PR105Q was optimized under molecular mechanics using two steps. First, the protein was restrained and all other molecules were relaxed. Next, the entire system was energy minimized. Each minimization procedure consisted of 10,000 steps of the steepest descent optimization, followed by 40,000 steps of the conjugate gradient optimization approach.
We used a set of configurations extracted from simulations to compute the averaged absorption spectra using QM calculations. The approach based on the sequential use of simulations and quantum mechanical calculations (denominated Sequential QM/MM) has been widely used to calculate spectroscopic properties of molecules in liquid environments [35]. A wide variety of works use this sequential method to predict absorption spectra and other properties, such as NMR and emission spectra, supplying converged results similar to the experimental results.
To obtain the ensemble-average absorption spectra, conformations of PR105Q were extracted from Quantum Mechanics/Molecular Mechanics (QM/MM) molecular dynamics (MD) simulation trajectories [28,36,37]. The QM/MM MD simulation of PR was carried out using Amber18 [38] interfaced to the Gaussian09 package [39]. The QM part, consisting of the residue LYR and residues 105 and 200, was treated with the M06-2X functional in conjunction with the 6-31G* basis set [40].
Schapiro and co-workers studied the initial excited state dynamics of GPR, and their simulations indicated that the retinal-TYR200 interaction played an important role in the outcome of the photo isomerization [25]. In this study, during classical MD simulation, the fluctuation of TYR200 had a significant influence on the excitation energy of the chromophore LYR231. Therefore, we performed MD simulation of PR using the QM/MM method, and TYR200 was included in the QM region.
The detailed procedure of MD simulation is given as follows. First, the optimized system was heated to 300 K in 50 ps. Secondly, a 500 ps equilibration run with classical MD was carried out before the final 10 ps QM/MM MD simulation for a production run at 300 K [41]. Langevin dynamics with a collision frequency of 1.0 ps −1 was used to control the temperature. A 25 Å cutoff was applied to the QM/MM electrostatic interactions, and the SHAKE algorithm was applied to restrain bonds with hydrogen atoms.
The QM/MM MD simulations were performed for PR105Q and nine other mutants [28,29]. A total of 100 snapshots were evenly extracted from the trajectory of the last ps QM/MM MD simulation with a time interval of 10 fs. Each conformation of the selected 100 snapshots was subsequently calculated by the EE-GMFCC method to obtain the vertical excitation energy (Equation (5)) [13]. Here we chose the B3LYP density functional in TDDFT calculations [42].

Key Residue Mutation and Fragment-Based QM Calculations for Excitation Energies
Residue 105 plays an important role in determining the color of PR, where BPR (PR105Q) and GPR (PR105L) have GLN and LEU at this position, respectively [12]. Accurate predictions of absorption spectral shifts upon point mutations are critical to the rational mutagenesis design of PR [11,[43][44][45]. Here, residue 105 is a vital residue that affects the absorption spectral shift of PR. Kandori and co-workers carried out an experimental investigation in which several different point mutations were introduced for the color determining residue 105 [12].
For the QM calculations, water molecules were removed from the conformation of MD simulation trajectory. The excitation energy of the protein was calculated with the EE-GMFCC method at the TD-B3LYP/6-31G* level [46,47] using Equation (5). Figure 3 shows the work flow of the complete computational protocol utilized in this study for model construction, MD simulation, and excitation energy calculations. We used Ambertools to complete mutation operation for PR. First, we removed sidechain atoms of residue 105, while the backbone atoms (C, N, CA, O) of residue 105 and other parts of the protein were reserved. Second, we changed the residue name of R105 to the name of the residue to be mutated. Third, the sidechain atoms of the new residue were added to the backbone of R105 by the Amber program. Finally, the added residue was energy minimized to avoid unreasonable repulsive interactions with nearby residues. The minimization process was undertaken using Amber18 [28]. Ambertools to complete mutation operation for PR. First, we removed sidechain atoms of residue 105, while the backbone atoms (C, N, CA, O) of residue 105 and other parts of the protein were reserved. Second, we changed the residue name of R105 to the name of the residue to be mutated. Third, the sidechain atoms of the new residue were added to the backbone of R105 by the Amber program. Finally, the added residue was energy minimized to avoid unreasonable repulsive interactions with nearby residues. The minimization process was undertaken using Amber18 [28].

Comparison between Calculated Excitation Energies and Experiment for Wild-Type PR and Its Mutants
From experimental investigation, residue 105 is an important amino acid that influences the absorption spectrum of the chromophore in PR. In this study, we mutated residue 105 to other amino acids for predicting the absorption spectral shift. Vertical excitation energies were calculated on mutated PRs using the EE-GMFCC method. Predicted excitation energies were compared with corresponding experimental absorption peaks. Figure 4 shows that the correlation coefficient (R) between the EE-GMFCC results (1B + 2B) and experimental values is 0.937, and the equation of linear regression is y = 0.995x

Comparison between Calculated Excitation Energies and Experiment for Wild-Type PR and Its Mutants
From experimental investigation, residue 105 is an important amino acid that influences the absorption spectrum of the chromophore in PR. In this study, we mutated residue 105 to other amino acids for predicting the absorption spectral shift. Vertical excitation energies were calculated on mutated PRs using the EE-GMFCC method. Predicted excitation energies were compared with corresponding experimental absorption peaks. Figure 4 shows that the correlation coefficient (R) between the EE-GMFCC results (1B + 2B) and experimental values is 0.937, and the equation of linear regression is y = 0.995x + 4.718. In contrast, the results given by EE-GMFCC without two-body (2B) QM interaction corrections yielded a worse agreement with the experiment. The correlation coefficient given by EE-GMFCC with 1B correction is merely 0.710 (see Figure 4a), indicating that, perhaps unsurprisingly, the residues which are spatially in close contact with the central chromophore have the greatest impact on the excitation energy of the chromophore in PR. From Table 1, we can see that the mean unsigned error (MUE) of the predicted excitation energies of those 10 systems decreases from 8.0 to 3.5 nm calculated by EE-GMFCC from 1B to 1B + 2B, compared to experimental values. This is direct evidence that the absorption wavelength of the chromophore is affected by its surrounding chemical environment.

Mutation
Exp ( [12].  (4)). 1B and 2B represent the excitation energy contributions from the covalently-linked three neighboring residues, and the two noncovalently-linked residues that are spatially in close contact, respectively.  (4)). 1B and 2B represent the excitation energy contributions from the covalently-linked three neighboring residues, and the two non-covalently-linked residues that are spatially in close contact, respectively. Table 1. Calculated excitation energy using EE-GMFCC at the TD-B3LYP/6-31G* level for different mutations of PR. "Dev." stands for the deviation from the experimental value. 1B and 2B denote the excitation energy calculated by 1B correction, and 1B + 2B corrections, respectively. 1B and 2B represent the excitation energy contributions from the covalently-linked three neighboring residues, and the two non-covalently-linked residues that are spatially in close contact, respectively. MUE denotes the mean unsigned error.

Mutation
Exp ( One hundred snapshots extracted from the last ps of QM/MM MD trajectory were selected to calculate the average excitation energy of PR. The excitation energy distribution of these 100 conformations of PR105D is shown in Figure 5. As shown in Figure 5, the conformations with the predicted excitation energy close to the experimental absorption peak have the largest population. The most probable absorption wavelength is almost equal to the average value, which is in good agreement with the experimental data [12]. Excitation energy distributions for the other nine mutants are given in Supplementary Figure S4 of the Supplementary Materials.

OR PEER REVIEW
10 of 20 One hundred snapshots extracted from the last ps of QM/MM MD trajectory were selected to calculate the average excitation energy of PR. The excitation energy distribution of these 100 conformations of PR105D is shown in Figure 5. As shown in Figure 5, the conformations with the predicted excitation energy close to the experimental absorption peak have the largest population. The most probable absorption wavelength is almost equal to the average value, which is in good agreement with the experimental data [12]. Excitation energy distributions for the other nine mutants are given in Supplementary Figure S4 of the Supplementary Materials.

Residue-Based Decomposition of Excitation Energies
Next, we studied the excitation energy contribution from each residue around the chromophore. To avoid interference from embedding charges, we used the GMFCC scheme, which turns off the background charges in each fragment QM calculation, because 1B and 2B QM interactions in EE-GMFCC include many-body environmental effects through electrostatic embedding.
The GMFCC scheme leads directly to the excitation energy contribution from each residue around the chromophore LYR231 by subtracting the single chromophore excitation energy from the two-body (residue-chromophore) excitation energy [48]. Our previous work used the GMFCC method to provide a qualitative prediction of the relative shift that each residue contributes to the excitation energy of the GFP chromophore [13]. Here, we applied the same approach, in which the 2B distance threshold was set to 4 Å between the chromophore and nearby residues. The excitation energy contribution of each residue around the chromophore was calculated by GMFCC based on 100 snapshots from QM/MM MD simulation. The excitation energies were also calculated at the TD-B3LYP/6-31G* level for consistency.
The per-residue decomposition of the average excitation energy of the PR105D protein is shown in Table 2 and Figure 6. Residues spatially close to chromophore had the

Residue-Based Decomposition of Excitation Energies
Next, we studied the excitation energy contribution from each residue around the chromophore. To avoid interference from embedding charges, we used the GMFCC scheme, which turns off the background charges in each fragment QM calculation, because 1B and 2B QM interactions in EE-GMFCC include many-body environmental effects through electrostatic embedding.
The GMFCC scheme leads directly to the excitation energy contribution from each residue around the chromophore LYR231 by subtracting the single chromophore excitation energy from the two-body (residue-chromophore) excitation energy [48]. Our previous work used the GMFCC method to provide a qualitative prediction of the relative shift that each residue contributes to the excitation energy of the GFP chromophore [13]. Here, we applied the same approach, in which the 2B distance threshold was set to 4 Å between the chromophore and nearby residues. The excitation energy contribution of each residue around the chromophore was calculated by GMFCC based on 100 snapshots from QM/MM MD simulation. The excitation energies were also calculated at the TD-B3LYP/6-31G* level for consistency.
The per-residue decomposition of the average excitation energy of the PR105D protein is shown in Table 2 and Figure 6. Residues spatially close to chromophore had the greatest influence on the excitation energy of the protein. ASP97, ASP105, and ASP227 yield the largest blue shifts, whereas MET134, PHE152, and LEU135 show red shifts of PR105D. As shown in Figure 6, the most blue-shifted residue is ASP97, yielding a wavelength shift of −59.5 nm (+267.9 meV), compared to the chromophore alone, whereas the most red-shifted residue is MET134, with a wavelength shift of +6.6 nm (−26.3 meV). Table 2. Excitation energy decomposition per residue based on an ensemble average over 100 conformations. The residues are spatially in close contact with the chromophore and their QM contributions are calculated by GMFCC at the TD-B3LYP/6-31G* level. "Ex" represents the excitation energy in eV. ∆Ex represents the excitation energy difference between 2B (residue + LYR231) and LYR231. ∆WL represents the wavelength difference between 2B (residue + LYR231) and LYR231 in nm.

Res. Name
Ex (

The Local Electric Field along the Retinal
It is known that the polyene chain of the retinal chromophore has considerable chargetransfer character in its lowest excited state, which we confirm below. It is therefore reasonable to expect that point mutations of the protein can modulate the excitation energy of the retinal through changes in the electrostatic field. To test this idea, we investigated the electric field along the polyene chain under different point mutations. A measure of the electric field along the polyene chain is given by [49][50][51]: where E c10−c1 is the electric field induced by protein residues from C1 to C10 on the conjugate chain, which approximates the electric field along the retinal chain (see Figure 2). The term i/ ∈ LYR denotes that charges of residue LYR231 were excluded from the calculation of electric field. q i represents the atomic charge of the ith atom,

The Local Electric Field along the Retinal
It is known that the polyene chain of the retinal chromophore has consider charge-transfer character in its lowest excited state, which we confirm below. It is th fore reasonable to expect that point mutations of the protein can modulate the excita energy of the retinal through changes in the electrostatic field. To test this idea, we in tigated the electric field along the polyene chain under different point mutations. A m ure of the electric field along the polyene chain is given by [49][50][51]: where 10− 1 is the electric field induced by protein residues from C1 to C10 on the jugate chain, which approximates the electric field along the retinal chain (see Figur The term i∉ LYR denotes that charges of residue LYR231 were excluded from the calc tion of electric field. represents the atomic charge of the ith atom, ⃗ denotes the c dinate vector of ith atom, and |⃗ − ⃗ | represents for the distance between atom i atom C10. The 1B excitation energies calculated by the EE-GMFCC method show good corr tion with the average electric field based on the Amber ff14SB force field [32] and po ized protein-specific charge (PPC) [49,52-54] models for the 100 snapshots extracted f The 1B excitation energies calculated by the EE-GMFCC method show good correlation with the average electric field based on the Amber ff14SB force field [32] and polarized protein-specific charge (PPC) [49,[52][53][54] models for the 100 snapshots extracted from the QM/MM MD simulation (see Table 3 and Figure 7). The correlation coefficient (R) between 1B excitation energies and the electric fields from the Amber ff14SB charge model is 0.829, and the equation of linear regression is y = 39.5x − 77.1. In comparison, the correlation coefficient (R) between 1B excitation energies and the electric fields from the PPC charge model is 0.771, and the equation of linear regression is y = 39.7x − 76.8. The correlations between the average electric fields (with the Amber and PPC charge models) and 2B excitation energies calculated by EE-GMFCC are shown in Supplementary Figure S7 Table 3. With the Amber and PPC charge models, the calculated average electric field (in MV/cm) between C10 and C1 (see Figure 2) along the polyene chain of chromophore based on the ensemble average over 100 snapshots from QM/MM MD simulation. Experimental values (eV) of excitation energies are given for comparison. 2B represents the two-body corrected QM excitation energy, and 1B represents the one-body QM excitation energy calculated by EE-GMFCC. (with the Amber and PPC charge models) and experimental excitation energies are shown in Supplementary Figure S8 of the Supplementary Materials. Table 3. With the Amber and PPC charge models, the calculated average electric field (in MV/cm) between C10 and C1 (see Figure 2) along the polyene chain of chromophore based on the ensemble average over 100 snapshots from QM/MM MD simulation. Experimental values (eV) of excitation energies are given for comparison. 2B represents the two-body corrected QM excitation energy, and 1B represents the one-body QM excitation energy calculated by EE-GMFCC.   Figure 8 shows that the HOMO magnitude decreases from C10 to C1, whereas the LUMO magnitude decreases in the reverse direction to that of the HOMO. From natural transition orbital (NTO) analysis, the HOMO and LUMO are the dominant orbitals involved in this excitation. The electron excitation from the HOMO to LUMO of the chromophore thus has considerable charge-transfer character. Based on the direction of charge transfer along the polyene chain, we expect that increasing the environmental electric field along the conjugate chain from C10 to C1 will result in a blue shift of the electronic excitation [43]. Conversely, decreasing the magnitude of the electric field will cause a red shift of the electronic excitation. This expectation is borne out in our results: Figure 7 shows that the largest electric field between C10 and C1, with a value of 22.7 MV/cm (with the Amber force field), is observed for PR105D with a corresponding 1B excitation energy of 2.509 eV, which is larger than PR105L with 19.8 MV/cm. Similarly, the smallest electric field of 13.2 MV/cm is observed for PR105K, which has the smallest 1B excitation energy of 2.315 eV. A higher strength of the electric field corresponds to a higher excitation energy, which yields a blue shift (see Figure 9).  Figure 8 shows that the HOMO magnitude decreases from C10 to C1, whereas the LUMO magnitude decreases in the reverse direction to that of the HOMO. From natural transition orbital (NTO) analysis, the HOMO and LUMO are the dominant orbitals involved in this excitation. The electron excitation from the HOMO to LUMO of the chromophore thus has considerable charge-transfer character. Based on the direction of charge transfer along the polyene chain, we expect that increasing the environmental electric field along the conjugate chain from C10 to C1 will result in a blue shift of the electronic excitation [43]. Conversely, decreasing the magnitude of the electric field will cause a red shift of the electronic excitation. This expectation is borne out in our results: Figure 7 shows that the largest electric field between C10 and C1, with a value of 22.7 MV/cm (with the Amber force field), is observed for PR105D with a corresponding 1B excitation energy of 2.509 eV, which is larger than PR105L with 19.8 MV/cm. Similarly, the smallest electric field of 13.2 MV/cm is observed for PR105K, which has the smallest 1B excitation energy of 2.315 eV. A higher strength of the electric field corresponds to a higher excitation energy, which yields a blue shift (see Figure 9). es 2021, 26, x FOR PEER REVIEW 14 of 20  As shown in Figure 9, the electric fields of PR105D (22.7 MV/cm), PR105L (19.8 MV/cm), PR105V (17.0 MV/cm), and PR105K (13.2 MV/cm) decrease gradually as the excitation energy progressively decreases in the order of 2.509, 2.396, 2.390, and 2.315 eV. As discussed above, there is a certain linear correlation between the electric field and excitation energy. It is worth noting that a prediction of the excitation energy change upon point mutation based on an electric field calculation is much faster compared to the ab initio EE-GMFCC calculations, although the correlation is not perfect in this study. The possible cause of such deviation might arise from the inaccuracy of the charge model used in the electric field calculations. To investigate the effect of the charge model on the electric field calculations of PRs, we tested the PPC charge model for comparison with the Amber ff14SB charge model [32]. The PPC model takes the polarization-induced effect of the protein into consideration by assigning a polarized atomic charge to each atom using the self-consistent RESP [57,58] fitting scheme for each amino acid, with the rest of the protein acting as an electrostatic field. Here, we utilized the PPC method to refit the atomic charges of the protein, and applied the PPC charge model to the PR system for each electric field calculation. Further details of the PPC charge fitting scheme are provided in Ref. [59]. The fitted PPC charges replace the original Amber charges for each atom of the PR systems, and the electric field is calculated using Equation (6). The results of EE-GMFCC-2B and a comparison of performance between Amber and PPC charge models are given in Supplementary Figures  S7 and S8 of the Supplementary Materials. In general, the performance of the PPC model is not superior to the result predicted by the Amber ff14SB charge model, indicating that a more sophisticated method might need to be applied to make electric field-excitation energy correlations for quantitative accuracy for PR systems. As shown in Figure 9, the electric fields of PR105D (22.7 MV/cm), PR105L (19.8 MV/cm), PR105V (17.0 MV/cm), and PR105K (13.2 MV/cm) decrease gradually as the excitation energy progressively decreases in the order of 2.509, 2.396, 2.390, and 2.315 eV. As discussed above, there is a certain linear correlation between the electric field and excitation energy. It is worth noting that a prediction of the excitation energy change upon point mutation based on an electric field calculation is much faster compared to the ab initio EE-GMFCC calculations, although the correlation is not perfect in this study. The possible cause of such deviation might arise from the inaccuracy of the charge model used in the electric field calculations. To investigate the effect of the charge model on the electric field calculations of PRs, we tested the PPC charge model for comparison with the Amber ff14SB charge model [32]. The PPC model takes the polarization-induced effect of the protein into consideration by assigning a polarized atomic charge to each atom using the selfconsistent RESP [57,58] fitting scheme for each amino acid, with the rest of the protein

Conclusions
In this study, we applied the Electrostatically Embedded Generalized Molecular Fractionation with Conjugate Caps (EE-GMFCC) method to predict the excitation energy of PRs. Excitation energies of wild-type PR and its nine mutants were calculated with the EE-GMFCC method at the TD-B3LYP/6-31G* level over hundreds of thermally sampled snapshots from ab initio QM/MM molecular dynamics simulations.
The calculated excitation energies show good correlation with the experimental values of absorption wavelengths despite the fact that the experimental wavelengths among these ten systems vary by less than 50 nm. The correlation coefficient (R) between the EE-GMFCC results (1B + 2B) and experimental values is 0.937. In contrast, the results calculated by EE-GMFCC without two-body QM interaction corrections yield poorer agreement with the experiment. The correlation coefficient given by EE-GMFCC with 1B corrections was merely 0.710, indicating that the residues which are spatially in close contact with the central chromophore have the greatest impact on the excitation energy of the chromophore in PR.
We also utilized the GMFCC method to decompose the excitation energy contributions of residues near the chromophore. The most blue-shifting residue is ASP97, which yields a −59.5 nm (+267.9 meV) wavelength shift in average, whereas the most red-shifting residue is MET134 with a +6.6 nm (−26.3 meV) wavelength shift. The overall spectral shift of the 2B QM correction on PR mutants was small, mainly due to a cancellation between blue-shifting and red-shifting residues.
The calculated excitation energies using the EE-GMFCC method with 1B corrections show good correlations with the predicted average electric field using the Amber and PPC charge models, and the correlation coefficients (R) between them are 0.829 and 0.771, respectively. Predicting the excitation energy change based on the average electric field could be an alternative and efficient approach for the rational design of PRs with tailored photo-physical properties. Overall, our results demonstrate that the EE-GMFCC method is a useful tool for accurately and efficiently predicting the excited-state properties of large biological systems.
In this work, a relatively short 1 ps of trajectory data was analyzed because of the two following problems. First, MD simulations using classical force fields do not fully sample the correct configurations of the retinal structure due to the low accuracy of the force field, and we found that the predicted absorption wavelengths of incorrect retinal conformations deviate substantially from the experimental values; thus, a QM/MM approach was critical. Second, QM/MM MD simulations with more than 100 atoms in the QM region are very time consuming, limiting us to trajectories of 10 ps in length, which left 1 ps of production data following equilibration. Although it would have been ideal to use more uncorrelated points for the computation of the excitation energy, given the close comparison with the experiment and the low fluctuations observed, we think that this sampling is sufficient to showcase our methodology.
It is also worth noting that the 1-ps QM/MM MD trajectory that we use for analysis will not sample long timescale protein fluctuations [60]; however, the good agreement found between theoretical excitation energies and experimental measurements suggests that >1-ps fluctuations have a minor influence on the average absorption energy. This finding is in contrast to solvated chromophores, which can have large couplings between the excitation energy and >1-ps solvation dynamics [61][62][63][64]. The origin of this disparity could be due to a relatively conserved environment of the chromophore in the protein matrix, unlike in solvent, although we cannot fully rule out a fortuitous agreement of our results with experiment. The role of long-time fluctuations of the protein on the excitation energies of the PR chromophore is an interesting open question that we hope to address in the future.