Simulating Absorption Spectra of Flavonoids in Aqueous Solution: A Polarizable QM/MM Study

We present a detailed computational study of the UV/Vis spectra of four relevant flavonoids in aqueous solution, namely luteolin, kaempferol, quercetin, and myricetin. The absorption spectra are simulated by exploiting a fully polarizable quantum mechanical (QM)/molecular mechanics (MM) model, based on the fluctuating charge (FQ) force field. Such a model is coupled with configurational sampling obtained by performing classical molecular dynamics (MD) simulations. The calculated QM/FQ spectra are compared with the experiments. We show that an accurate reproduction of the UV/Vis spectra of the selected flavonoids can be obtained by appropriately taking into account the role of configurational sampling, polarization, and hydrogen bonding interactions.


Introduction
Flavonoids belong to the family of polyphenolic secondary metabolites, which are widely found in natural products, such as vegetables and fruits [1]. In particular, their structure derives from phenylchromene, which after being hydroxylated or methoxylated in different positions leads to the different flavonoid compounds [2]. Flavonoids have attracted much interest, due to their biochemical and antioxidant effects which might be beneficial to treat different diseases such as cancer, Alzheimer's disease, and atherosclerosis [3][4][5]. Thanks to their antioxidant and anti-inflammatory properties, together with their ability to inhibit enzyme functions, a large number of flavonoid compounds have been used in a plethora of medicinal, pharmaceutical, and cosmetic applications [6][7][8]. Recently, flavonoids have been proposed as potential drugs for therapeutics against coronavirus disease 2019 (COVID-19) [9][10][11].
Flavonoids also protect plants thanks to their ability to filter the UV radiation, indicating unique optical properties [27]. The latter are very sensitive to the environment, for instance flavonoids' maximum UV/Vis absorption occurs at different wavelengths when they are dissolved in different solvents. However, flavonoids natural environment is water, which is ubiquitous in both vegetables and fruits [28][29][30]. In this work, we model the absorption spectra of Luteolin (L), Kaempferol (K), Quercetin (Q), and Myricetin (M) in aqueous solution (see Figure 1 for their molecular structures) [28]. Luteolin is a yellow dye, which can be derived from the plant Reseda luteola [31]; kaempferol can

Computational Protocol
In order to investigate the electronic properties of the selected molecules dissolved in aqueous solution, we rely on a well established multi-step computational protocol [35,36]: 1. Definition of the system. The geometry of each of the four flavonoids depicted in Figure 1 was first optimized by resorting to an implicit Polarizable Continuum Model (PCM) [40] description of the solvent, and then surrounded by a number of randomly-placed water molecules large enough to represent the solvation shell. 2. Classical MD simulations. An equilibration (NPT) and a subsequent production (NVT) runs were performed in order to sample the system under study. In particular, MD production runs were carried out for each of the four molecules for a time long enough to obtain an accurate sampling of the phase space, in order to correctly reproduce all possible system configurations and their relative energy.
In order to recover the directionality of hydrogen bonding (HB) interactions, we also placed off-site charges (the so-called virtual sites (VS) or dummy atoms) to better describe the lone pairs of the Oxygen atom (see Figure 2 for a graphical representation) [44,65,66]. Therefore, two different classical MD simulation runs were performed for each molecule, i.e., with or without the inclusion of VS (MD VS or MD noVS , respectively). From MD runs, a set of snapshots was extracted to be used in QM/FQ calculations.
3. Definition of the different regions of the two-layer QM/FQ scheme and their boundaries. For each snapshot extracted from MD runs, a sphere centered on the solute was cut. The radius of the droplet was chosen to retain specific solute-water interactions. 4. QM/FQ calculations and comparison with experimental data reported in [28,67].
QM/FQ excitation energies calculations were performed on the set of structures obtained for the four molecules at step 2 of the protocol. The results obtained for each spherical snapshot were then extracted and averaged to produce the final spectrum, which was compared with experiments.
All QM calculations were performed using a locally modified version of the Gaussian16 suite [68]. In all instances, the B3LYP functional in combination with the 6-311+g(d,p) basis set were employed. All MD runs were performed by using the GROMACS package [69]. The GAFF force field was adopted to describe both intramolecular and intermolecular interactions of the solutes [70]. The TIP3P force field was used to describe water molecules [71]. In order to refine the solute electrostatic interactions, RESP charges [72] were computed at the B3LYP/6-311 + g(d,p) level of theory. Electrostatic interactions were treated by using the particle-mesh Ewald (PME) method with a grid spacing of 0.16 Å and a spline interpolation of order 4 [73]. The cross-interactions for Lennard-Jones terms were calculated using the Lorentz-Berthelot mixing rules and intramolecular interactions between atom pairs separated up to three bonds were excluded.
A single molecule was dissolved in a cubic box containing at least 2500 water molecules. In MD VS , the position of virtual sites was determined by performing a molecular orbital localization by means of the Boys procedure [74]. In particular, a pair of VS was assigned to each Oxygen atom (green spheres in Figure 2). The charge of each oxygen atom was uniformly split between its virtual sites.
The molecular systems were initially brought to 0 K applying the steepest descent minimization procedure and then heated to 298.15 K in an NVT ensemble using the velocity-rescaling [75] method with an integration time step of 0.1 fs and a coupling constant of 0.1 ps for 200 ps. A 500 ps long NPT run was performed, using the Parrinello-Rahman barostat and a coupling constant of 1.0 ps, to obtain a uniform distribution of molecules in the box and for thermalization purposes. Finally, 30 ns long MD production runs were carried out in the NVT ensemble, with a 0.1 fs time step.
A snapshot every 100 ps was extracted in order to obtain a total of 300 uncorrelated snapshots for each system. From each snapshot, a solute-centered sphere with radius 15 Å was cut. For each obtained droplet, the solute excitation energies were computed using the polarizable QM/FQ model [35,36,42]. The QM portion was treated at the B3LYP/6-311+g(d,p) level. The FQ SPC parametrization proposed by Rick et al [59]. was exploited. All computed absorption spectra were convoluted with a Gaussian band shape with a Full Width at Half Maximum (FWHM) of 0.37 eV, and then averaged to obtain the final spectrum.

MD Analysis
The analysis of each MD trajectory was performed by using the TRAVIS package [76]. Two different observables are presented and discussed: the radial distribution function (RDF) and the dihedral distribution function (DDF). Then, hydration patters defining HB interactions are discussed for the atomic sites highlighted in Figure 2. All results are commented for both MD noVS and MD VS .

Conformational Analysis Based on MD Simulations
DDFs were extracted for four dihedral angles (α, β, γ, δ, see Figure 2 for their definition) from both MD VS and MD noVS . α, β, and γ represent the dihedral angle between the molecular plane and the selected hydroxyl groups, whereas δ describes the relative orientation of the phenyl group with respect to the main molecular plane (see Figure 2). The α, β, γ DDFs for Myricetin M as obtained from both MD noVS and MD VS are reported in Figure 3. The same DDFs for the other flavonoids are reported in Figures S1-S3 in the Supplementary Materials (SM).
The α DDF depicted in Figure 3 is centered at 0 and 180 degrees, with a range of variability of about 50 degrees. This indicates that the hydroxyl group involved in the definition of α presents a high rotation freedom. In addition, all studied molecules present similar α DDFs (see Figures S1-S3 in the SM), showing that the different number of OH groups on the phenyl moiety does not affect α conformational stability. The β and γ DDFs are characterized by a sharp distribution centered at zero degrees. Such a low rotational freedom of the selected hydroxyl groups is due to the intramolecular interaction which is established with the carboxyl Oxygen atom (O3, see Figure 2). In addition, the narrower spreading of β as compared to γ can be attributed to the highest stability of the six-term ring with respect to the five-term ring formed for the two angles, respectively. Similarly to α, the distributions of both β and γ angles are not affected by the different number of OH groups in the phenyl moiety (see Figures S1-S3 in the SM). As a final comment on α, β, and γ, we note that a qualitatively analog conformational analysis can be obtained by using both MD noVS and MD VS (left and right panels of Figure 3). In Figure 4, δ DDFs for all studied molecules are graphically depicted. The distribution of δ dihedral angle gives insight on the planarity of the molecules because it represents the relative orientation of the two main portions of the system. We first note that δ DDFs for K, Q, and M are not characterized by a main preferential angle, and two different configurations can be identified, centered at ±60 and ±120 degrees. Therefore, for such systems, the two main portions never lie on the same plane. Such a consideration is valid for both MD noVS and MD VS . A different picture arises for L δ DDF.

MD noVS MD VS
In particular, in case of MD noVS , the same preferential angles reported for the other molecules can be identified; however, the planar configuration is also sampled. When a more accurate description of the directionality of molecule-water interactions is taken into account by including virtual sites in the MD (MD VS ), the δ DDF has two main peaks centered at about ±60, thus showing largest molecular stiffness. Therefore, the absence of the OH group involved in the definition of γ dihedral angle significantly affects the rigidity of the molecule.

Hydration Pattern
All studied flavonoids are characterized by several hydroxyl groups, which can strongly interact with the surrounding water molecules via HBs. In order to study hydration patterns in aqueous solutions, both MD noVS and MD VS were analyzed by extracting RDFs between flavonoids' Oxygen atoms and water Hydrogen atoms (Hw). Myricetin RDFs for all interacting sites are depicted in Figure 5. We focus on this system because it is characterized by the largest number of hydroxyl groups; similar data for Luteolin, Kaempferol, and Quercetin are collected in Figures S4-S6 in the SM. All RDFs reported in Figure 5 show a peak at about 1.8 Å, of which the intensity varies for the different oxygen atoms. In particular, in most cases, a strong HB interaction is present (high intensity peak), whereas, for O6, a weaker interaction is reported in both MD noVS and MD VS . Such a behavior can be explained by considering that the O6 hydroxyl group is involved in HB interactions with both O5 and O7 hydroxyl groups. This is confirmed by the fact that O6 RDFs computed for the other flavonoids do not show any significantly lower intensity with respect to RDFs related to the other Oxygen atoms (see Figures S4-S6 in SM).
The inclusion of off-site VSs has two main effects: the intensities of the RDF first peaks increase and their maxima are located at shorter r distances, thus indicating that a stronger solute-solvent HB interaction is described. The only notable exception is O3; however, such a behavior is not unexpected because O3 is involved in intramolecular HB between O2 and O4 hydroxyl groups. The inclusion of VSs better describes such an interaction, without affecting solute-solvent HBs.
To further analyze hydration patterns, we also computed the number of water molecules interacting with each flavonoid molecule, by extracting from MD runs the running coordinating number (RCN), see Table 1. The latter is the integral of the RDF first peak, and is related to the average number of water molecules in the first solvation shell interacting with the selected oxygen site. We first notice that, for all molecules, each oxygen site is bonded to water by at least one HB interaction; the only notable exception is O6 of myricetin, thus further confirming the comments above. Overall, the number of HBs is larger for MD VS than MD noVS . This is again not surprising, and it is due to the fact that the inclusion of VSs allows for a refined, and more physically consistent, description of flavonoid-water HB interactions. In addition, a different picture arises for O3; in this case, a decrease in the number of hydrogen-bonded water molecules is reported for all the studied molecules. In fact, the inclusion of VSs leads to a stronger intramolecular HB interaction, with a consequent decrease of intermolecular HBs with water molecules.

Excitation Energies
QM/FQ excitation energies were calculated on 300 uncorrelated snapshots extracted from MD trajectories; such a number was chosen to assure the convergence of the final spectra. All calculations were performed at the TD-B3LYP/6-311 + G(d,p) level, and the first 20 excited states were computed. QM/FQ raw data recovered from each set of snapshots are reported as stick spectra in Figure 6 for the case of MD noVS runs. As it can be noticed, a large variability both in band intensities and energies is predicted for all flavonoids, similarly to what has already been reported for other systems dissolved in aqueous solution [77][78][79]. Similar stick spectra were obtained from snapshots extracted from MD VS (see Figure S7 in the SM). Stick data were then convoluted by using a Gaussian function (with FWHM = 0.37 eV), and averaged to obtained the final computed UV/Vis spectra, which are also depicted in Figure 6.
The convoluted spectrum of each flavonoid is characterized by two main bands. The first is placed at about 3.4 eV (364 nm) for K, Q and M, whereas for L it is blueshifted by about 0.15 eV (349 nm); the second band is centered in the region 4.7-5 eV (264-248 nm) for all systems. For K, Q, and M, the spectrum is dominated by the second peak, whose intensity is almost twice that of the first band. A different situation is reported for L, for which the two bands have almost equal intensities. The first band at about 3.4-3.6 eV (364-344 nm) is due to a pure HOMO-LUMO transition, whereas the second band results from a combination of several excitations. In order to investigate the nature of the first electronic transition, the involved molecular orbitals (MOs) are plotted in Figure 7. We see that the first excitation can easily be identified as a π − π * transition. In particular, we note that the carbonyl Oxygen atom (O4) takes part into the electronic excitation. Therefore, the differences between L and the other flavonoids reported for the first transition can be ascribed to the fact that in K, Q, and M, O3 is hydrogen bonded to the O4 hydroxyl group (see Figure 7, right), whereas in L O3 is free to form a (weak) HB interaction with the surrounding water molecules. As a consequence, the π − π * excitation is redshifted by about 0.15 eV for K, Q, and M. Figure 7 also reports excitation energies of the first electronic transition for both MD noVS and MD VS . Moving from K to Q and M, a redshift is described by both MDs. This behavior can be explained by considering that the number of hydroxyl groups increases moving from K to M. As a consequence, a larger number of HB sites with the surrounding water molecules are involved, yielding a redshift which is typically observed for π − π * transitions [38,[80][81][82][83]. Such a redshift is not present in Q and M because the bridge OH group in M is only weakly involved in HB interactions with the solvent (see Table 1). We also note that MD VS always predicts higher excitation energies than MD noVS . This is not surprising and it is once again related to the reduced number of water molecules hydrogen bonded to O3 (see Table 1). 3 3.5 As mentioned above, a large spreading in both energies and intensities is reported in stick spectra depicted in Figure 6. In order to investigate the origin of such a variability, we studied how the excitation energy depends on the dihedral angles previously analyzed (see Section 2.2.1). In particular, Figure 8 reports computed excitation energies of the first transition as a function of the δ dihedral angle (see Figure 2 for its definition). Similar plots for α, β, and γ are given in Figures S8-S10 in the SM. We see that the computed values are clusterized around the most populated dihedral angles for both MD runs (see Figure 4). In addition, independently from the value of the dihedral angle, computed energies span the region between 2.9 and 3.9 eV. This demonstrates that the origin of the large variability reported in the stick spectra is the spatial arrangement of water molecules around the solutes. We finally move to the comparison between computed and experimental spectra, the latter reproduced from Refs. [28,67]. Such a comparison is shown in Figure 9. The experimental absorption spectrum of each molecule is characterized by two main peaks, the first lying at about 3.4 eV (365 nm) for K, Q, and M and 3.6 eV (345 nm) for L. The second peak is placed in the region 4.7-5 eV, and has a different shape for each flavonoid. In particular, for L and M, two peaks are present, probably due to vibronic coupling [84][85][86]. In addition, for L and Q, the first peak is the most intense one, whereas the opposite holds for K. For M, the intensities of the two bands almost coincide.
The agreement between QM/FQ and experimental spectra is excellent for all flavonoids, especially if QM/FQ is combined with MD VS runs. In fact, the main difference between MD VS and MD noVS results is the relative intensity of the two band. For K, Q, and M, MD noVS underestimates the intensity of the first transition with respect to the experimental findings. Therefore, a refined modeling of intermolecular interactions by means of off-site VSs seems to be crucial to achieve an accurate reproduction of the general shape of the experimental spectra. The only notable exception is L, for which the experimental second band lies in between MD noVS and MD VS . However, the overall agreement between the experimental and computed spectra is not compromised.  Figure 9. Computed absorption spectra of each studied flavonoid in aqueous solution obtained by using QM/FQ coupled with both MD VS and MD noVS . Experimental UV/Vis spectra reproduced from Refs. [28,67] are also reported.

Summary and Conclusions
In this work, UV-Vis absorption spectra of four flavonoids, namely luteolin, kaempferol, quercetin, and myricetin, dissolved in aqueous solution have been simulated by exploiting a fully polarizable QM/FQ approach combined with MD simulations, performed by either including or discarding off-site VSs. For all investigated molecules, MD noVS and MD VS sample the configurational space in a similar way, whereas a different description of solute-solvent HB interactions arises. This behavior is not unexpected and it is due to a refined description of HB interactions obtained by including VSs in MD runs.
The QM/FQ spectrum of each flavonoid has been computed on a set of uncorrelated snapshots extracted from both MD trajectories. The observed differences between the various systems have been discussed in light of solute-solvent HB interactions arising from MD runs, showing that they significantly affect computed spectra. In particular, the presence of a larger number of potential HB sites, as in case of Q and M as compared to L and K, produces a redshift of the first electronic excitation, which has a π − π * nature. Finally, the comparison between computed and experimental spectra shows that MD VS provides the best agreement, probably due to better configurational sampling and HB description.
The agreement with experiments, in particular for luteolin, might be increased by exploiting polarizable force fields (instead of a standard fixed-charges one) in MD simulations, i.e., by performing a so-called polarizable MD simulation [87,88]. In this way, a more physically consistent way of describing the evolution of solute-solvent interactions could be obtained. In addition, off-site charges introduced in MD VS runs might be included in polarizable QM/MM calculations by resorting to polarizable force fields defined in terms of both fluctuating charges and fluctuating dipoles, similarly to what has recently been developed by some of the present authors [58,89,90]. Finally, in this work, the QM-MM coupling has been limited to the account of electrostatic solute-water interactions. The inclusion of non-electrostatic terms, i.e., Pauli repulsion and dispersion, might improve the agreement between computed and experimental data, in light of similar studies recently reported by some of us [91,92].