A Computational Protocol Combining DFT and Cheminformatics for Prediction of pH-Dependent Redox Potentials

Discovering new materials for energy storage requires reliable and efficient protocols for predicting key properties of unknown compounds. In the context of the search for new organic electrolytes for redox flow batteries, we present and validate a robust procedure to calculate the redox potentials of organic molecules at any pH value, using widely available quantum chemistry and cheminformatics methods. Using a consistent experimental data set for validation, we explore and compare a few different methods for calculating reaction free energies, the treatment of solvation, and the effect of pH on redox potentials. We find that the B3LYP hybrid functional with the COSMO solvation method, in conjunction with thermal contributions evaluated from BLYP gas-phase harmonic frequencies, yields a good prediction of pH = 0 redox potentials at a moderate computational cost. To predict how the potentials are affected by pH, we propose an improved version of the Alberty-Legendre transform that allows the construction of a more realistic Pourbaix diagram by taking into account how the protonation state changes with pH.


Introduction
The main goal of this contribution is the establishment and validation of a standard procedure for the computational prediction of redox potentials of organic molecules undergoing a protoncoupled electron transfer. One of the main motivations for benchmarking such predictions is the computational screening of the vast chemical space of organic molecules to identify electrolytes for redox flow batteries (RFBs). [1][2][3][4][5][6][7][8][9][10][11][12][13] At some point in any computational workflow for materials discovery, one needs a reliable method to predict with reasonable accuracy and moderate computational cost the properties of an already pre-selected candidate pool. The redox potential is one of the most important properties of redox-active materials. Its pH dependence is especially important in aqueous RFBs where the pH affects the molecules' solubility and electrochemical behavior and can even be exploited to increase the voltage. 14 The main theoretical concepts and the most commonly used computational methods for modeling organic redox materials have recently been summarized by the present authors in a review article. 15 Here, we will not repeat those general concepts and focus instead on how different methods and details of the procedure affect the agreement with experimental data. We will use notation consistent with ref. 15 throughout this article. The experimental redox potentials at pH 0, 7, and 13 reported by Wedege et al. 16 for a set of molecules of the quinone family have been chosen as a consistent data set for validating our computational protocol. It includes molecules with a different number of aromatic rings (i.e., benzo-, naphtho-and anthraquinones) and a diverse range of substituents.

Methods and results
The procedure we present is designed to be integrated in a computational discovery workflow: it takes as input the molecular structures of the oxidized and reduced forms (Ox and Red), typically in a text string format such as SMILES, and outputs redox potentials as a function of pH. It can briefly be summarized as follows: 1) Determine the main protonation state at pH 0 and find the lowest energy conformer of Ox and Red, 2) Calculate the free energies of Ox and Red to obtain the standard redox potential ! , 3) Transform ! to the pH values of interest.
In proton-coupled redox reactions H + is one of the reagents. ! is defined in the thermodynamic standard state where [H + ] = 1 M or pH=0. Redox potentials at other pH values, i.e. at conditions different from the standard state, are described by the Nernst equation. 17 The pH of the solution also determines the protonation state of the reactants and products, and therefore affects their energies and the number of protons exchanged in the redox reaction. The redox potential depends essentially on the reaction free energy in solution (see eqs. (1) and (2)). In principle, the redox potential at any pH could be calculated from the free energies at the protonation state most abundant at that pH. However, at high pH most OH groups tend to be deprotonated so that the major species of hydroxylated reduced quinones can have several negative charges (up to 6 in the set considered here). Such multiple anions can be challenging for the convergence of electronic structure methods since the additional electrons may not be sufficiently stabilized. 15 Moreover, the accuracy of implicit solvation models is known to deteriorate when increasing the charge of the solute. 18 For these reasons it is preferable to compute the redox potential at pH=0 and then transform it to higher pH values 19,20 using expressions based on the Nernst equation such as eq. (5).

Initial guess: protonation and conformer.
Step 1 of the procedure outlined above, whose goal is to determine the most likely molecular structure as an initial guess for the subsequent geometry optimization, is performed with inexpensive cheminformatics and force field based tools. For each SMILES string, the protonation state of the major microspecies at pH=0 is determined using the pKa calculator of the cheminformatics software package ChemAxon. 21 The resulting SMILES string of the pH=0 structure is then converted to a 3D structure using Open Babel, 22,23 and its lowest energy conformation is determined using a conformer search script, part of the AMS software suite, 24 which is based on RDKit 25 and was locally modified by us to use the MMFF94 force field 26 instead of UFF. This modification was necessary because UFF does not correctly identify the lowest energy conformers with intramolecular hydrogen bonds, while MMFF94 does. This is due to a different treatment of electrostatic interactions (the main component of hydrogen bonding) in the two force fields. 27,28 The relationship between intramolecular hydrogen bonding and stability has been reported before. 10,29 Calculation of the redox potential at pH 0. For a general proton-coupled reduction reaction Ox + " + # H $ → Red where electrons and # protons are transferred to the molecule, the standard redox potential is: where n is the number of electrons exchanged, e is the elementary charge and SHE = 4.43 V 30 is the absolute potential of the standard hydrogen electrode. The reduction free energy in solution ∆ sol ! (expressed in eV) can be written as i.e. the difference between the free energies of solvated Red and Ox, balanced by the solvation free energy of # protons, where ∆ s ! (H $ ) = −11.38 eV. 31 The accuracy of the calculated redox potential depends essentially on the electronic structure methods and approximations adopted to compute the free energies in solution. Our goal is to determine which combination of methods performs best against experimental reference data.
Choice of standard method. The most common choice for computing molecular properties at a quantum chemical level is density functional theory (DFT) as it offers reasonably high accuracy at a moderate computational cost. We choose as a standard method the B3LYP hybrid density functional, a very popular choice as a general-purpose method for ground state geometry optimizations. The DFT-D3-BJ dispersion correction 32 is always included unless stated otherwise. We use the ADZP basis set (double zeta with polarization and diffuse functions) as it is a good compromise between size and cost for molecular properties. All DFT calculations were performed with the Amsterdam Density Functional (ADF) software version 2019.302. 33,34 Solvation and thermal contributions to free energy. Quantum chemical methods are often used in conjunction with implicit solvation models to include the effect of the solvent on the molecule's free energy. Here, we use the COSMO 35,36 implicit solvation model as a standard method. The geometry optimization can be performed directly in solution (with COSMO), or a single point COSMO calculation can be performed at the geometry optimized in gas phase. The thermal contribution to the free energy, which includes zero-point energy, vibrational enthalpy and entropy, is obtained from a harmonic vibrational frequency calculation in gas phase at the BLYP/ADZP level, which is much faster than B3LYP, since ADF cannot compute analytical gradients with hybrid functionals. This frequency calculation is done on a geometry optimized with the same method (BLYP in gas phase).
In Figure 1 we report scatter plots of computed vs. experimental redox potentials comparing different levels of approximation. The mean absolute error (MAE) and mean signed error (MSE) of each method are reported in the legend. In Figure 1a we observe that gas-phase electronic energies, neglecting solvation and thermal contributions, yield a rather bad prediction of experimental potentials: data are broadly scattered and there is a significant systematic error. Including the solvation contribution improves the agreement (MAE ~ 0.3 V) and geometry optimization with a solvation model gives slightly better results than in gas phase. The addition of the thermal contribution appears crucial for the prediction of redox potentials: the MAE is significantly reduced to 74 mV and the MSE is very small (6 mV), a negligible systematic error. This method (B3LYP optimization in COSMO + BLYP thermal correction) is the standard against which we will compare other methods. Figure 1b shows that computing the frequencies in gas or in solution has little effect on the thermal contribution, with a slight preference for the gas phase. We then compare two other solvation methods with COSMO. In Figure 1c we show that with SM12 37 single point calculations (optimization is not possible in ADF) at COSMO geometries we obtain a systematic error of 0.1 V. When empirically correcting for this error by subtracting the MSE from the computed potentials, the MAE becomes comparable with COSMO. In Figure 1d we assess the performance of the COSMO-RS method, 38,39 which combines quantum chemistry with statistical mechanics. The solvation free energy obtained with COSMO-RS was added to the gasphase energy. In this case, we also observe a systematic error of about 0.1 V, and when correcting for it, the MAE becomes almost identical to that of COSMO. Comparing electronic structure methods. Once the best set of approximations to the free energy contributions are determined, we turn our attention to the electronic structure method. While B3LYP is probably the most used off-the-shelf density functional for organic molecules, its performance in benchmark studies of molecular properties does not justify its popularity. In a recent benchmark study 40 performed with ADF, its performance for general properties and reaction energies (Figure 3 in ref. 40) is reported to be mediocre. Therefore, we consider it necessary to compare B3LYP with a few different electronic structure methods: the GGA functional BLYP and two of the best performing functionals according to ref. 40: the meta-hybrid M06-2X-D3(0) and the double hybrid rev-DOD-BLYP-D4. We also recalculate the gas phase electronic energies using the composite method G4MP2 implemented in the Gaussian 16 software. 41 Both the double hybrid and G4MP2 include an energy term obtained with the secondorder Møller-Plesset perturbation theory (MP2). As expected, BLYP performs worse than B3LYP (Figure 2a) even when correcting for the −0.2 V systematic error. The potentials obtained with M06-2X-D3(0) and 6-31G(2df,p) basis set with COSMO using Q-Chem 5.2 42 (with thermal contribution obtained in gas phase at the same level of theory) are affected by a 0.1 V systematic error. When correcting for that, the prediction is still slightly worse than B3LYP. With the double hybrid rev-DOD-BLYP-D4 (Figure 2c) we observe a negative systematic error and, after correcting for it, the MAE is not better than B3LYP. Similarly, Figure 2d shows that computing the electronic energies with the G4MP2 composite method and adding the B3LYP solvation free energy yields a systematic error (MSE = 0.17 V); when corrected, the method is not better than B3LYP. Despite expectations, the B3LYP functional in conjunction with COSMO implicit solvation model appears to deliver the most accurate prediction of redox potentials among the several methods we tested. Since this property is essentially an energy difference, we can attribute the small MAE and MSE achieved by this method to fortuitous error cancellations. It should be noted that the choices of the two constants that enter eq. (1), SHE and ∆ s ! (H $ ), obviously affect all data points by the same amount. The discussion about the correct values of these constants can be found elsewhere, [43][44][45] and it is not clear which values are consistent with the COSMO model, but we note that with the values chosen here the B3LYP/COSMO method has reasonably good predicting power without the need for empirical correction of systematic errors.
Limits of implicit solvation model. Now we turn our attention to the outliers. For some of the molecules in the data set, the calculated potential is as much as ~0.4 V away from the experimental value. From the method comparisons in Figures 1 and 2 we observe that these outliers are not affected by the choice of solvation model or electronic structure method. Therefore, we suspect that these large errors are due to the failure of implicit solvation models to capture specific interactions between molecule and solvent, which appear to have a large impact in isolated cases. This hypothesis can be verified by computing solvation free energies of these outliers with explicit solvent molecules. Since explicit solvation methods are time-consuming (both human and computational time) they are not suitable for high-throughput screening. Nevertheless, we select the outlier AQTH14 (for which the redox potential at pH=0 is underestimated by ~0.4 V) to verify if computing the solvation free energy in presence of explicit water molecules improves the agreement with the experimental potential. We adopt a clustercontinuum model as described by Bryantsev et al. 46 and compute the solvation free energies of the oxidized and reduced forms using the monomer cycle shown in Scheme 1 in ref. 46. Under this approximation, the cluster-continuum (cc) solvation free energy of a solute A solvated by n water molecules is where the first term is the COSMO solvation free energy of one water molecule in water, the second term is the free-energy change of one mole of water from 55.34 M (concentration of H2O in water) to 1 M, and the third is the free-energy change of one mole of an ideal gas from 1 atm to 1 M. The latter two corrections ensure that all reactants and products in gas phase and in solution are in the same 1 M standard state, as explained in ref. 46. The free energies sol ! (Red, Ox) in eq.
(2) are then obtained as sol is the sum of the B3LYP gasphase energy and the BLYP thermal contribution.
The literature on cluster-continuum models prescribes 46,47 that n should be increased until the solvation free energy ∆ s 44 (A) converges to a plateau. We manually built several clusters formed by the solute and n water molecules, with the goal of maximizing the number of solute-solvent and solvent-solvent hydrogen bonds. For each value of n, we pre-optimized several cluster geometries using the GFN-xTB semi-empirical method, 48 optimized the most stable ones with DFT and then performed a Boltzmann average of ∆ s 44 (A) over the different clusters. We built clusters for n = 1-9, 10, 12, 14, 16 and 18. The clusters with n = 1-9 were built by placing water molecules on only one side of the molecular plane, and those with n = 10-18 were built by repeating the clusters with n = 5-9 on the other side of the molecular plane, taking advantage of symmetry. Only one cluster was considered for n = 9, 18 respectively since nine waters were found to form a stable 3 by 3 cluster and no other stable geometries were found. As shown in Figure 3, the solvation free energies of both redox forms decrease when increasing n thanks to the solvating effect of the explicit water molecules, but none of them has converged to a plateau yet. The resulting redox potential becomes closer to the experimental value, although not linearly with n, confirming our hypothesis that the error is due to specific solute-solvent interactions missed by implicit solvation (n = 0). The non-linear behavior may be due to the fact that the manually built and optimized clusters do not constitute a sample large enough to achieve a converged average. We anticipate that better sampling, and convergence of the solvation free energy to a minimum, could be achieved by extracting solute-solvent clusters from a molecular dynamics simulation at finite temperature, but such undertaking is beyond the scope of this study.

Transformation to higher pH values.
When electron transfer is coupled to proton transfer, the redox potential depends on the pH. The transformation of the pH=0 redox potential to higher pH values can be achieved by modifying the free energies of Red and Ox according to the Alberty-Legendre transform: 20,49 sol +7 = sol ! + 7 (10)pH, where sol ! is the free energy at pH=0, 5 is the number of hydrogen atoms at pH=0, R is the Rydberg constant and T is the temperature. In contrast to equation 4.4-10 in ref. 49, here we neglect the dependence on the ionic strength. At a given pH the redox potential U is directly proportional to the free energy difference ∆ sol #5 and the slope of the U vs pH curve at 298.15 K will therefore be (10) = 0.059 V, multiplied by the number of protons that are transferred to the molecule at pH 0 C 5 (Red)− 5 (Ox)D . This approximation, under which the slope is constant and independent of pH, neglects the fact that the redox species will be deprotonated at pH > pK , .
We have refined this method by updating the slopes of sol #5 (Red) and sol #5 (Ox) at each of the pK , values obtained from ChemAxon. 21 In practice, we construct approximate Pourbaix diagrams for sol #5 (Red) and sol #5 (Ox) and consequently for #5 . For each redox form, we evaluate sol #5 ! for a set of pH values pH 6 = {0, … , pK ,7 , … , pK ,2 , … ,14} which includes any pK , values in the range (0,14). sol ! is given by eq. (2) and for pH 6 > 0 we use the formula The number of hydrogens is that of the major microspecies predicted by ChemAxon at a pH value pH 8,9: = pH 6"7 + 1/2(pH 6 − pH 6"7 ), i.e. halfway between the current and the previous. This ensures that the slope of the segment pH 6"7 − pH 6 always depends on the molecule's updated protonation state, including cases where pH 6"7 is one of the pK , s.
The free energies computed with either eq. (5) or (6) enter eq. (2) and the potential as a function of pH is then obtained from eq. (1). The resulting Pourbaix diagrams of all molecules are shown in Error! Reference source not found.. In Figure 4 we compare the experimental potentials with those computed at pH=0 and those transformed to pH=7 and 13 using eq. (5) and (6). At pH=7 there is not much difference between the two methods, because almost all molecules are in the same protonation state as at pH=0, i.e. there are no pK , s below 7. On the other hand, at pH=13 most molecules have undergone some deprotonation leading to a smaller slope of the U vs pH curve. As a result, using eq. (5) leads to a systematic underestimation of the potentials (MSE = −0.137 V). The adjustment of the slope using eq. (6) corrects this systematic error (MSE = −0.016 V) and significantly improves the agreement with the experimental potentials.
Considering the low computational cost of obtaining the pK , s and protonation states using ChemAxon, the proposed procedure to transform pH=0 potentials using eq. (6) should become a standard method for predicting redox potentials at any pH value. It is of course also possible, instead of transforming the pH=0 potentials, to compute the pH=7 and 13 potentials directly with DFT using the same procedure as for the pH=0 potential: 1) Determine the major protonation state at a given pH using ChemAxon; 2) Find lowest energy conformer; 3) Compute free energies of Ox and Red to determine the redox potential with eqs.
(1) and (2). pH=. Since most molecules in our data set have the same protonation state at pH 0 and 7, we only show the potentials obtained directly at pH=13. As shown in Error! Reference source not found., this method leads to large underestimation of the redox potentials, that we attribute to consistent failure of the implicit solvation model to capture the full solvation free energies, especially of the reduced forms which have larger negative charges. This result confirms that, as discussed earlier, it is much more accurate to compute free energies at pH=0 protonation states and then transform them to the pH of interest.

Summary and conclusion
In this contribution we have benchmarked a procedure to predict redox potentials of organic molecules from first principles. Taking as the input the SMILES strings of the reduced and oxidized forms, we show that widely available cheminformatics tools can be used to determine the main protonation state at pH=0 and the major conformer. Two DFT calculations for each redox form are then needed to compute the redox potential: one geometry optimization with COSMO implicit solvation and one geometry optimization with subsequent frequency calculation in gas phase. The former includes a reasonable estimate of the solvation free energy, and the latter provides an estimate of the thermal contribution to the free energy. Among several electronic structure methods tested (including composite methods), the B3LYP hybrid density functional was found to yield the best agreement with a consistent set of experimental potentials (MAE = 0.074 V and MSE = −0.006 V without empirical corrections). It may seem surprising that B3LYP, which is the most popular but not particularly new or generally accurate functional, performs better for redox potentials than other methods reported to perform best in other more comprehensive benchmarks. The very popular combination of B3LYP with double-ζ polarized basis sets (6-31G* or similarly ADZP) is believed to be the cause of the error cancellations (errors due to the functional and to the basis set incompleteness are similar and of opposite sign) 50 that make it reasonably accurate in many cases, especially when including a dispersion correction such as D3. 51 In a few cases the potential prediction was found to be up to 0.4 V off the experimental value. By employing a mixed explicit-implicit solvation model on one of these outliers, we have verified that such errors are very likely due to failure of the implicit solvation model to capture specific solvent-solute interactions. These exceptional cases cannot be predicted from the outset. In the near future, significantly more effort should be dedicated to the establishment of rapid protocols for computing reasonably good solvation free energies with explicit solvation. Such protocols that overcome the limitations of implicit solvation could then be used in high-throughput screening studies where more accurate prediction of redox potentials is desirable.
Finally, we presented a novel and computationally inexpensive procedure to transform the pH=0 redox potentials computed with DFT to any other pH value. Using the pK , s predicted by the ChemAxon cheminformatics software, 21 we construct an accurate estimation of the Pourbaix diagram that takes into account the changes in slope of the U vs. pH curve due to deprotonations of both redox forms. The potential predictions at pH=13 obtained with this variable slope approach correct the systematic underestimation resulting from the constant slope transformation that considered only the pH=0 protonation states. pH=. The MAE with respect to experimental potentials were found to be 0.121 V at pH=7 and 0.144 V at pH=13 with our new method.
In summary, we have established and validated a procedure to predict redox potentials of organic molecules at a modest computational cost, using standard DFT methods and widely available cheminformatics tools, that can be easily implemented in high-throughput screening studies. Redox potentials can be predicted with MAE below 0.15 V over a wide range of pH.