An Accurate Approach for Computational pKa Determination of Phenolic Compounds

Computational chemistry is a valuable tool, as it allows for in silico prediction of key parameters of novel compounds, such as pKa. In the framework of computational pKa determination, the literature offers several approaches based on different level of theories, functionals and continuum solvation models. However, correction factors are often used to provide reliable models that adequately predict pKa. In this work, an accurate protocol based on a direct approach is proposed for computing phenols pKa. Importantly, this methodology does not require the use of correction factors or mathematical fitting, making it highly practical, easy to use and fast. Above all, DFT calculations performed in the presence two explicit water molecules using CAM-B3LYP functional with 6-311G+dp basis set and a solvation model based on density (SMD) led to accurate pKa values. In particular, calculations performed on a series of 13 differently substituted phenols provided reliable results, with a mean absolute error of 0.3. Furthermore, the model achieves accurate results with -CN and -NO2 substituents, which are usually excluded from computational pKa studies, enabling easy and reliable pKa determination in a wide range of phenols.


Introduction
Phenols are a class of odorous compounds mainly present in plant essential oils that is attracting increasing scientific and applicative interest. Their use is extremely widespread, ranging from antimicrobials and antivirals to cancer treatments and antioxidants also adopted in fuels [1,2]. Native compounds are exploited in the biomedical field for their ability to interact with cell membranes, provoking cell death without triggering a specific pathway [3][4][5]. This behavior plays a fundamental role, as it often minimizes pharmacological resistance. Because antibiotic resistance is major cause of death every year [6], the possibility of developing new compounds combining the potentially flawless natural properties of phenols with tailored structural modification is compelling. To this end, computational chemistry is pivotal, offering the possibility of designing molecules and predicting their physicochemical properties [7]. It prevents time-and cost-consuming experiments, suggesting easily screenable sets of data. Computational calculation of acid-base dissociation constants (pKa) is among the most embraced methods [8,9].
The literature includes many strategies proposed to calculate pKa through computational models. The most common methodologies are the thermodynamic cycles theory and the direct approach [10]. Considering the dissociation equilibrium of a generic acid, HA: using the thermodynamic cycle theory, pKa determination requires the estimation of the Gibbs free energy of the acid-base equilibrium in the gas phase (∆G g ) and the solvation free energy for each species involved in the equilibrium (∆∆G solv ) [11]. Although accurate, this model is time-consuming and laborious. Conversely, the direct approach is based on the calculation of the Gibbs free energy of the acid-base equilibrium in solution (∆G sol ) [12] without requiring gas-phase calculations [11]. Although the direct method depends on different variables, such as the level of theory, the molecules involved in the equilibrium and the adopted solvation model, it has shown to be as accurate as the thermodynamic cycle theory [13]. However, both models require the free energy of the solvated proton, which cannot be calculated using DFT methods because H + has no electrons [13,14]. Therefore, a wide range of experimental values (between −252.6 and −271.7 kcal·mol −1 ) [11] is usually adopted, although this method is highly questioned [12]. Alternatively, correction factors and linear regressions are commonly used to obtain reliable results, leading to relatively accurate data [15][16][17][18].
To further improve precision in pKa calculation, in the last two decades, the inclusion of explicit water molecules in the first solvation shell has been proposed. Using water as the solvent in a continuum model, ∆pKa (i.e., pKa calc -pKa exp ) values were still higher than 1 unit [17]. Thus, the use of correction factors can be used to reduce the gap between the experimental pKa and the calculated value [17].
In order to overcome issues related to the use of experimental values for proton solvation free energy, the acid-base dissociation equilibrium can be rewritten as Equation (2) [18][19][20]: where n is the number of explicit water molecules.
In this model, the number of charged species is conserved on both sides of the equation; thus, errors in calculations are reduced [11]. This equation [20] allows for the use of continuum solvation theories with zero to several (n) explicit water molecules to shape a solvation cage around the compounds involved in the equilibrium. In this case, correction factors ensure reliable pKa values.
Taking into account the numerous models proposed in the literature, the aim of this work is to offer a practical, accurate and ready-to-use protocol for computational pKa determination of phenol derivatives through a direct approach that does not require correction factors or mathematical fitting [18,19]. A screening of functional and solvation models is thus proposed to accurately compute pKa. A series of differently substituted phenol derivatives was selected on the basis of their biological, pharmaceutical, industrial and synthetic interest in order to validate the proposed methodology.

Results and Discussion
In silico pKa determination requires investigation of a wide set of conditions [21]; in general, DFT calculations are preferred to ab initio calculations, owing to their lower computational and time costs. In this work, we screened multiple functionals in order to identify the most suitable model in terms of reliability and computational cost. B3PW91 was selected, owing to its reliability based on exchange-and gradient-corrected correlation functionals [22]. B3LYP was chosen because it is a versatile functional in modelling small and medium-sized organic molecules [23], and it uses 20% Hartree-Fock (HF) exchange. CAM-B3LYP is based on B3LYP but it better depicts long-range interactions fundamental in describing hydrogen bonds [24]. wB97XD includes a long-range correction based on the empirical dispersion term of van der Waals interactions [25,26]. Hence, such theories adequately describe hydrogen bonds and electrostatic interactions in water solutions [25]. Likewise, 6-311G+dp was selected as a basis set for its accuracy and reliability in describing the behavior of first-and second-row elements, with an acceptable computational cost [24]. Solvation continuum models were chosen based on their isotropic properties. In particular, SMD, CPCM and IEFPCM consider water a continuum dielectric field, but they differ in terms of solvent cavity description. Truhlar and colleagues defined SMD as a continuum solvation model effective for charged and neutral compounds in any solvent in which just a few parameters are known or required [23]. CPCM and IEFPCM are similar from a common end-user point of view, although IEFPCM has a higher computational cost [27].
Thus, pKa was calculated according to the dissociation equilibrium reported in Equation (2) [19,20]. Reaction energy (∆E dep ) was estimated as follows: where E A− , E H2O , E OH− and E HA are the electronic energies of each species (eventually calculated in the presence of explicit water molecules). After obtaining ∆E dep , it is possible to estimate the pKa according to the following equation: [28] pKa = ∆E dep /2.302RT + 15.74 (4) Preliminary studies were performed on phenol and thymol as leading compounds. pKa values were calculated according to equation 4, without adding explicit water molecules. The geometry was optimized for each functional in the vacuum, followed by the calculation of electronic energy in a continuum solvent using the specific solvation models. Table 1 lists ∆pKa values as pKa calc -pKa exp . Results show that with all analyzed functionals and solvation models, pKa values are largely under-or overestimated; thus, ∆pKa values are considerably higher than 1 pKa unit, which is the acceptable threshold value for studies of this sort. The accuracy of data seems not to be affected by the complexity of the theory or by the dielectric polarizability of the models. It is not possible to highlight a general trend. The worst values were obtained by applying the SMD model with B3LYP and B3PW91, whereas ∆pKa values with CPCM and IEFPCM are slightly better. CAM-B3LYP and wB97XD exhibited an opposite trend, as ∆pKa values obtained with SMD are lower than those obtained with the other solvation models.
To improve the model, one and two water molecules were made explicit for each species involved in the equilibrium. A general improvement in ∆pKa values was observed, including one explicit water molecule (Table 2). In particular, all ∆pKa values were less than 4 units and, with B3LYP and CAM-B3LYP in SMD, ∆pKa values of less than 1 were successfully obtained. Furthermore, for each functional, the SMD solvation model led to satisfactory values for acid dissociation constant, with a ∆pKa slightly higher than 1.
The addition of two explicit water molecules led to a further improvement in the system. Specifically, ∆pKa values lower than 1 were obtained with SMD (Table 3). B3LYP and CAM-B3LYP showed improved results, leading to a pKa calc value close to the experimental value for phenol, with a ∆pKa slightly higher than zero.  Figure 1 depicts the dependence of ∆pKa on the number of explicit water molecules for phenol; an increase in the number of water molecules results in a decrease in ∆pKa values to approximately zero. Therefore, no further investigations were performed with additional explicit water molecules.
B3LYP in SMD, ΔpKa values of less than 1 were successfully obtained. Furthermore, for each functional, the SMD solvation model led to satisfactory values for acid dissociation constant, with a ΔpKa slightly higher than 1.
The addition of two explicit water molecules led to a further improvement in the system. Specifically, ΔpKa values lower than 1 were obtained with SMD (Table 3). B3LYP and CAM-B3LYP showed improved results, leading to a pKacalc value close to the experimental value for phenol, with a ΔpKa slightly higher than zero.  Figure 1 depicts the dependence of ΔpKa on the number of explicit water molecules for phenol; an increase in the number of water molecules results in a decrease in ΔpKa values to approximately zero. Therefore, no further investigations were performed with additional explicit water molecules.  A well-finished cavity was obtained with SMD, which is recommended for calculation of the ΔG of solvation. In all cases, an ordered H-bonded closed network was observed around the −OH group [29,30]. As previously reported [12,17], the position of water molecules influences pKa; however, in this work, the optimized geometry with the minimum of energy (thus the most stable one) was selected to compute pKa.  A well-finished cavity was obtained with SMD, which is recommended for calculation of the ∆G of solvation. In all cases, an ordered H-bonded closed network was observed around the −OH group [29,30]. As previously reported [12,17], the position of water molecules influences pKa; however, in this work, the optimized geometry with the minimum of energy (thus the most stable one) was selected to compute pKa. Figure 3 shows the influence of water molecules in the electronic distribution on the potential electronic map (PEM). As expected, the presence of explicit solvent affects the electronic distribution, especially for phenate, for which the negative charge is delocalized. Such representation realistically describes what happens in solution, where water molecules stabilize the negative charge through H-bond interactions.
To substantiate such methodology, a set of differently substituted phenols was selected. Theoretical pKa was determined using the best-performing of the previously tested computational methods, i.e., 1H 2 O/CAM-B3LYP/SMD, 2H 2 O/CAM-B3LYP/SMD, 2H 2 O/CAM-B3LYP/PCM, 2H 2 O/CAM-B3LYP/CPCM and 2H 2 O/B3LYP/SMD. With such functionals and solvation models, ∆pKa values lower than 0.3 were obtained for phenol, in addition to reliable results for thymol. Thus, an assorted range of phenolic compounds bearing at least two alkyl substituents with varying steric hindrance and phenols substituted with electron-donating or electron-withdrawing groups was selected (i.e., halogens, methoxy, cyano and nitro groups). Importantly, halogens, methoxy, cyano and nitro substituents were exclusively included at position 4 of the phenols to avoid intramolecular hydrogen bonding at the reaction center [16].   To substantiate such methodology, a set of differently substituted phenols was selected. Theoretical pKa was determined using the best-performing of the previously tested computational methods, i.e., 1H2O/CAM-B3LYP/SMD, 2H2O/CAM-B3LYP/SMD, 2H2O/CAM-B3LYP/PCM, 2H2O/CAM-B3LYP/CPCM and 2H2O/B3LYP/SMD. With such functionals and solvation models, ΔpKa values lower than 0.3 were obtained for phenol, in addition to reliable results for thymol. Thus, an assorted range of phenolic compounds    To substantiate such methodology, a set of differently substituted phenols was selected. Theoretical pKa was determined using the best-performing of the previously tested computational methods, i.e., 1H2O/CAM-B3LYP/SMD, 2H2O/CAM-B3LYP/SMD, 2H2O/CAM-B3LYP/PCM, 2H2O/CAM-B3LYP/CPCM and 2H2O/B3LYP/SMD. With such functionals and solvation models, ΔpKa values lower than 0.3 were obtained for phenol, in addition to reliable results for thymol. Thus, an assorted range of phenolic compounds bearing at least two alkyl substituents with varying steric hindrance and phenols  Table 4 shows the experimental and calculated pKa values of the selected compounds, and the ∆pKa as the difference between experimental and calculated pKa, the mean absolute error (MAE) and standard deviation (std. dev.) for each computational method are reported in Table 5. Calculations performed with 1H 2 O/CAM-B3LYP/SMD led to satisfactory results. In particular, ∆pKa < 1.1 units (MAE = 0.72) was obtained for all dialkyl-substituted phenols, which is considered acceptable for theoretical pKa determinations. Satisfactory results were also achieved with halogens and methoxy substituents. However, considerable ∆pKa values were obtained with strong electron-withdrawing groups, such as -CN and -NO 2 , which significantly affect the electron density on the phenol ring. Conversely, good to excellent results were obtained with two explicit water molecules. In particular, the SMD solvation model achieved ∆pKa < 0.2 units for all dialkyl-substituted phenols, except for thymol, the pKa of which was overestimated, at 0.9 units. 4-Bromo-and 4-chlorothymol (4bromo-2-isopropyl-5-methylphenol and 4-chloro-2-isopropyl-5-methylphenol, respectively) pKa were satisfactorily calculated with all solvation models, and SMD was suitable for the accurate determination of the pKa of phenols with complex substituents, such as -CN and -NO 2 groups. In particular, the pKa of -NO 2 -substituted phenols was slightly underestimated (0.6 units), whereas with 4-hydroxybenzonitrile, a ∆pKa = 0.3 was obtained.
Such remarkable results strengthen the applicability of this method [31][32][33]. Cyano and nitro substituents are rarely reported in computational studies on pKa determination and are usually associated with a high ∆pKa [34,35]. Thus, the simulation of their electronic effect is challenging. With all the tested computational methods, with the exception of 2H 2 O/CAM-B3LYP/SMD, pKa calc values for phenols bearing -CN and -NO 2 groups were largely underestimated (Tables 4 and 5). On the contrary, 2H 2 O/CAM-B3LYP/SMD led to accurate results without the introduction of correction factors or mathematical regressions, as typically observed in the literature [16,28,29]. In addition, for all analyzed compounds, 2H 2 O/CAM-B3LYP/SMD led to the lowest MAE (0.39) and std. dev. (0.29), with maximum positive and negative deviations of 0.89 and −0.60, respectively, which is in the range of acceptable values for theoretical pKa determinations. Figure 4 shows the relationship obtained between pKa calc and pKa exp using such a computational method; the reported slope (1.019) is close to the ideal value (i.e., 1.000 for pKa calc = pKa exp ) [16], highlighting the accuracy of such a model, without requiring correction factors.
for thymol, the pKa of which was overestimated, at 0.9 units. 4-Bromo-and 4chlorothymol (4-bromo-2-isopropyl-5-methylphenol and 4-chloro-2-isopropyl-5methylphenol, respectively) pKa were satisfactorily calculated with all solvation models, and SMD was suitable for the accurate determination of the pKa of phenols with complex substituents, such as -CN and -NO2 groups. In particular, the pKa of -NO2-substituted phenols was slightly underestimated (0.6 units), whereas with 4-hydroxybenzonitrile, a ΔpKa = 0.3 was obtained. Such remarkable results strengthen the applicability of this method [31][32][33]. Cyano and nitro substituents are rarely reported in computational studies on pKa determination and are usually associated with a high ΔpKa [34,35]. Thus, the simulation of their electronic effect is challenging. With all the tested computational methods, with the exception of 2H2O/CAM-B3LYP/SMD, pKacalc values for phenols bearing -CN and -NO2 groups were largely underestimated (Tables 4 and 5). On the contrary, 2H2O/CAM-B3LYP/SMD led to accurate results without the introduction of correction factors or mathematical regressions, as typically observed in the literature [16,28,29]. In addition, for all analyzed compounds, 2H2O/CAM-B3LYP/SMD led to the lowest MAE (0.39) and std. dev. (0.29), with maximum positive and negative deviations of 0.89 and −0.60, respectively, which is in the range of acceptable values for theoretical pKa determinations. Figure 4 shows the relationship obtained between pKacalc and pKaexp using such a computational method; the reported slope (1.019) is close to the ideal value (i.e., 1.000 for pKacalc = pKaexp) [16], highlighting the accuracy of such a model, without requiring correction factors.  the presence of the electron-withdrawing substituent, strengthening the H bond between the −OH group and the first water molecule (H-bond length < 1.67 Å) but reducing the availability of the oxygen lone pair. Therefore, the second water molecule is more than 4 Å from the −OH group, preventing an H bond. Unexpectedly, an open cage is also energetically favored in the case of 4-methoxyphenol, being ca. 1.24 kcal⋅mol −1 more stable than the ordered H-bonded cage.

Computational Method and Data Analysis
DFT calculations were performed using Gaussian 16 rev. A.03. [36]. All compounds were geometrically optimized in vacuum and in continuum solvent. Calculations were performed using wB97XD, B3PW91, B3LYP and CAM-B3LYP functionals with 6-311G+dp basis set. For calculations in solution, the following continuum solvation models were used: SMD (solvation model based on density), CPCM (conductor-like polarizable continuum model) and IEFPCM (integral equation formalism polarizable continuum model). For each functional and basis set, electronic energy was collected in the presence of zero, one or two explicit water molecules [37]. The starting geometries for the hydrated molecules were based on chemical intuition, as proposed by Cunningham and colleagues [17], and drawn using GaussView 6.0 software. All calculations parameters were set up for geometry optimization (true minima) of each species, avoiding negative frequencies. pKa values were computed at 298.15 K. Optimized cartesian coordinates are reported in Supp. Info.

Experimental pKa Determination
Buffer solutions with different pH values were prepared using varying KH 2 PO 4 , K 2 HPO 4 and K 3 PO 4 concentrations. The pH of each solution was checked with a glass electrode. A 10 −2 M stock solution was prepared in methanol for the analyzed phenols (4-bromothymol and 4-nitrothymol). A volume of 30 µL of the stock solution was diluted in 3 mL of the buffer solutions at varying pH values, and the UV-vis absorption spectra were recorded. For all the tested compounds, the final concentration in the cuvette was 10 −4 M. For data analysis, the absorbance values were reported as a function of pH at the appropriate wavelength; then, sigmoidal fitting was used to obtain the inflection point ( Figures S1-S2), pKa exp (4-bromothymol) = 9.92 ± 0.04; pKa exp (4-nitrothymol) = 7.38 ± 0.06.

Conclusions
In this work, we developed an easy-to-use method based on the direct approach for computing pKa without introducing correction factors. Phenol and thymol were used as leading compounds of a set of molecules with various substituents on the aromatic ring. Four DFT functionals, three solvation models and a range of explicit water molecules (from zero to two) were compared.
For all analyzed phenols, 2H 2 O/CAM-B3LYP/6-311G+dp/SMD led to accurate results, with a mean absolute error of 0.37, which is an acceptable value for theoretical pKa determinations. In addition, this methodology achieved reliable results with nitro and cyano substituents, which are usually associated with very high ∆pKa values, proving that this method can be applied to a wide range of substituted phenols. Results were obtained without the introduction of any correction factors or mathematical regressions, making this approach a valid and accurate method for the direct calculation of the pKa of natural and synthetic phenols.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author.