2.1. Virtual Isomerisation Experiments
Experimental data for nine solvents were used to validate the accuracy of the calculated equilibrium constants for the tautomerisation of
1 and
2. Linear relationships were achieved, despite the overestimation of ln(K
T) values. This systematic error is shared with other computational methods of predicting rate constants [
28]. This being the case, it is convenient to normalise the equilibrium constants to assist data visualisation and interpretation. The proportionality achieved between experimental and calculated equilibrium constants (shown in
Figure 2) means the latter also correlate with π* or β.
Figure 2 is also annotated with how calculated π* or β values are obtained, whereby a normalised calculated equilibrium constant corresponds to solvent polarity via the virtual free energy relationship equation.
A noteworthy observation was the in silico recreation of the experimental deviation of acidic solvents from the free energy relationship describing the tautomerisation of
1 (
Figure 2a). A greater proportion of the diketo-tautomer exists in an acetic acid solution than anticipated from just the dipolarity (π*) of the solvent. This is due to the protonation of
1, increasing the stabilisation of the diketo-tautomer compared to the enol [
20]. This behaviour was validated by an additional virtual experiment in propionic acid (see
Supplementary Material).
2.2. Dataset of Calculated Kamlet–Abboud–Taft Solvatochromic Parameters
A dataset of 175 solvents was taken from the work of Marcus [
29]. This is the most complete collection of KAT parameters obtained under the same experimental conditions. Estimations of π* (
Figure 3a) and β (
Figure 3b) were derived from normalised virtual ln(K
T) values using the relationship established by the training datasets (
Figure 2). The dipolarity of acidic solvents could not be described for the reason already provided. This includes carboxylic acids, phenols, and fluoroalcohols. The π* values of water and perfluorinated alkanes were also overestimated. The accuracy of calculated β values was satisfactory except for amines and other highly basic (β > 0.80) solvents, at which point the model was unrepresentative. In experiment, the enol:diketo ratio of
2 in ethanol (β = 0.75) is >99%. Solvents can have a considerably higher hydrogen bond accepting ability but will only minimally increase enol tautomer concentration. By contrast, the experimental proportion of
1 in its enol form only reaches ca. 50% in low polarity hydrocarbons. It might appear that an experimental limitation is again mirrored by its virtual equivalent. However, the same issue affected the separate approaches developed by Diorazio and Waghorne (discussed in the
Supplementary Material) [
26,
27]. This indicates it is not necessarily the use of the dimedone tautomerisation that restricts the valid range of β predictions, but perhaps a more fundamental problem of acid-base behaviour interfering with hydrogen bonding models.
The mean average error (MAE) of the calculated π*, β, and α values are 0.15, 0.07, and 0.06 respectively (after removing ineligible compounds). Particularly for the prediction of π* an improvement in accuracy was sought. Previously, σ-moments generated by COSMOtherm have been used to estimate the Abraham solute parameters [
25]. The affinity of a solvent towards a solute (quantified as chemical potential) can be described as a function of these σ-moments (
Table 1) [
30]. Although these parameters alone do not directly correlate with the KAT parameters (see
Supplementary Material), they can be used to correct estimations of π* and β. It was found that the π* calculation error was proportional to the molecular surface area of the solvent. Similarly, the calculation of β was improved by accounting for the asymmetry of the charge distribution on the surface of a solvent molecule. The correction is sensitive to the chemical functionality of the solvent. Using acyclic ethers as an example, the error of a calculated π* value is corrected with Equation (1) (
Figure 4a), and the error of a calculated β value is corrected according to Equation (2) (
Figure 4b).
Figure 3a,b compare uncorrected and corrected calculated KAT parameters with experimental values for the entire Marcus dataset, with an increase in predictive accuracy compared to previously described literature methods.
Corrected calculated KAT parameters are not possible to obtain for all solvents. For instance, if the number of solvents in the primary dataset with the same functionality was three or less no correction was constructed (e.g., nitroalkanes, because only data for nitrobenzene and nitromethane are available). Also, the solvent types with erroneous uncorrected calculated KAT parameters (e.g., basic solvents such as amines) could not be transformed into valid estimations. A correction was applied to the calculated α values by setting all values below 0.10 to zero, mirroring experimental practices (refer to the
Supplementary Material for discussion and the full KAT parameter dataset).
The KAT parameters for a secondary dataset of 23 new solvents were then obtained. The purpose of this exercise was to verify that the correction factors are meaningful to solvents not used to define the proportionality between σ-moments and the error of calculated π* and β values. Some of these solvents were also needed for the subsequent case studies. The introduction of multifunctional compounds in this second compound set increased the prediction difficulty, but the typical error remained acceptable for uncorrected π* (
Figure 3c) and β values (
Figure 3d). After correction, a marginal improvement to the MAE was achieved. There is an indication that the multifunctional nature of some of these additional solvents is not addressed by the correction factors. Specifically, the correction of β values for this secondary dataset showed a bias not observed in the primary dataset, with relatively large errors remaining large (but slightly reduced on average). The failure to correct large errors may be an anomaly arising from the small dataset, or because the original representation of the solvent surface charges on the more unusual solvent molecules is inaccurate. Assuming the latter, the adequacy of the linear single variable correction factors presently used will be investigated in the future, with non-linear multiparameter models sought.
Calculated α values were more erroneous on average because some solvents in the secondary dataset are capable of intramolecular hydrogen bonds and thus appeared aprotic in their σ-profile. One example is 2-methoxyethanol, which is calculated to form a hydrogen bond resulting in a 5-membered ring structure. This conformer no longer has the ability to donate a hydrogen bond. Higher energy conformations of these solvents not featuring an intramolecular hydrogen bond could be used to obtain a realistic prediction of α (see
Supplementary Material).
It was also possible to evaluate the polarity of ionic liquids. Alternative methods are available for the calculation of β and α [
31], but not π*. Although we can accurately calculate π* for the first time (
Figure 3c) and also β (
Figure 3d), the prediction of α failed because the electron density of cations that gives rise to an equivalent interaction to hydrogen bond donation in ionic liquids differs to neutral molecular solvents and true hydrogen bonds (see
Supplementary Material). The calculation of π* and β is possible for mixtures of solvents including deep eutectic mixtures (data is provided in the
Supplementary Material), but it is not possible to simply combine the σ-profiles of each component to determine α. Ionic liquid and deep eutectic mixture data were not subject to corrections.
2.3. Application of Calculated KAT Parameters to Free Energy Relationships
The most effective use of the KAT parameters is the construction of free energy relationships, which can have two purposes. Firstly, in the tradition of physical organic chemistry, information about the mechanism of the chemical process is revealed. Secondly, extrapolation of the free energy relationship predicts the properties of the optimum solvent that will maximise the phenomenon being measured. To establish whether calculated KAT parameters are accurate enough for this purpose, sixteen case studies from the literature were found. This exercise necessitated the use of independently chosen solvents, so it was not possible to avoid solvents with the least accurately calculated KAT parameters. The variable described by each free energy relationship (e.g., ln(k), ΔG°, etc.) was correlated with both experimental and calculated KAT parameters. Typically, the weakest correlations were obtained with uncorrected calculated KAT parameters. After correction, the calculated KAT parameters generally approached the accuracy of the corresponding empirical free energy relationship based on experimental KAT parameters. All the free energy relationships are explained in the
Supplementary Material.
To illustrate with one of the sixteen case studies, the kinetics of a Menschutkin reaction between 1,2-dimethylimidazole and benzyl bromide (conducted by Skrzypczak and Neta,
Scheme 1) [
32], is accelerated by solvents with high π* values and decelerated by hydrogen bond donating solvents as represented by the empirical free energy relationship in Equation (3).
They found propylene carbonate provided increased rates of reaction over the conventional solvent acetonitrile. The free energy relationship was computed using experimental (
Figure 5a) and calculated (
Figure 5b) KAT parameters with Equation (3) to obtain calculated ln(k) values. The corrected calculated KAT parameters, in particular, were able to replicate the solvent effect quantified by the empirical free energy relationship. Screening the dataset of calculated KAT parameters now at our disposal, the potentially most beneficial reaction solvents were found by solving the free energy relationship. Out of 198 candidates (175 from the Marcus dataset and the 23 additional solvents), propylene carbonate provided the tenth largest predicted ln(k) and acetonitrile the 50th. Examining the top ten solvent candidates and removing those that are either solid at the reaction temperature of 21 °C, nucleophilic and thus reactive, or severely toxic, only dihydrolevoglucosenone (Cyrene™) and N-butyl pyrrolidone remain alongside propylene carbonate (
Figure 5c). This output is vindicated by existing studies of the Menschutkin reaction (albeit under different reaction conditions), where Cyrene™ [
33], and
N-butyl pyrrolidone [
34], both considered greener alternatives to conventional dipolar aprotic solvents [
14], provided greater rates of reaction than acetonitrile (82% and 31% increase to rate constant magnitude under their respective reaction conditions).
This success of this model can be compared to when density functional calculations have been employed to predict the rate of the Menschutkin reaction and also identify an optimum solvent [
28]. After normalising predicted rate constants using experimental data (a necessary step in common with our approach), a 30% increase to the rate constant was predicted in nitromethane compared to acetonitrile (40% observed experimentally). This result is consistent with Abraham’s earlier experimental work on the Menschutkin reaction [
36]. By limiting optimisation opportunities to conventional solvents, substitution is effective from a reaction performance perspective but not necessarily desirable with respect to human health. Nitromethane is a suspected carcinogen and may exhibit reproductive toxicity. By contrast, Cyrene™ and
N-butyl pyrrolidone have acceptable toxicity hazards and accordingly are licensed for multi-tonne production (in accordance with EU REACH regulation).
2.4. 2-Methyltetrahydrofuran Identified as a Rate Accelerating Solvent in Michael Addition Chemistry
The Menschutkin reaction is an uncomplicated reaction with many kinetic studies available in the literature and a strong history of empirical and computational analysis that makes solvent substitution reasonably straightforward. To provide a sterner test for this new methodology, a new experimental dataset for a reaction with previously unassessed solvent effects was created and used to identify a superior solvent. For this purpose, the kinetics of a Michael addition catalysed by potassium phosphate was measured in six solvents (
Scheme 2). 1,4-Addition reactions are widely used in drug discovery and studied in the development of enantioselective catalysis [
37]. It was found that the rate of the reaction favoured high β values and small solvent molar volumes (V
M) according to Equation (4).
In this reaction, a hydrogen bond accepting solvent may interact with the conjugate acid of the catalyst to favour the deprotonation of 3 and consequently the formation of 4. The molar volume term is indicative of a bimolecular reaction in which the cavity occupied by the reactants in solution is reduced in size upon forming a single activated complex.
Calculated KAT parameters and predicted molar volumes obtained in COSMOtherm were used to calculate ln(k) from the free energy relationship in Equation (4), which correlated to experimental values with sufficient accuracy to screen for potential new solvents (
Figure 6). To increase the rate of this reaction, the required combination of a strongly hydrogen bond accepting solvent that is also a small molecule indicated 2-methyltetrahydrofuran (2-MeTHF) as a good candidate. It is bio-based and has become a popular replacement for traditional ethers in process chemistry [
38]. By extrapolating the empirical relationship in Equation (4), an accelerated rate of reaction was predicted in 2-MeTHF over the previous best solvent (dimethyl carbonate) of 131%. When using the corrected calculated β value of 2-MeTHF and its calculated molar volume, a rate enhancement of 150% is predicted. Experiment found the actual rate constant to increase by 180%.
2.5. Design of a Novel Solvent for the Synthesis of Tetrahydropyridines
The true value of the new method presented in this work is to identify the performance of a solvent without experimental KAT parameters before it has even been synthesised. This significantly reduces the time needed for solvent design and selection, which is critical given the regulation of popular solvents is escalating in many global territories. With this goal in mind, a reaction complicated by multiple solvent effects was chosen to demonstrate the capability of this method. The indium(III) chloride catalysed reaction between benzaldehyde,
p-anisidine, and
1 forms a highly substituted tetrahydropyridine (
5,
Scheme 3). The product was isolated by filtration and then recrystallised. Previous reports of this reaction favour acetonitrile and methanol as yield maximising solvents [
39,
40].
Product yields from equilibrium-controlled reactions can be modelled as a function of solvent polarity [
20]. To do so, an apparent equilibrium constant (K’) must be derived for the free energy relationship, and so a judicious choice of reaction conditions was needed to eliminate kinetic effects (provided in the
Supplementary Material). Equation (5) is solved by dividing the moles of
5 (m) isolated at time
t (i.e., the conclusion of the reaction) by the molar amount of
1 (n) not incorporated into the product (inferred by subtracting the molar yield from the initial quantity of yield-limiting
1). After fitting ln(K’) to the empirical KAT parameters in the same manner as for other free energy relationships, experiment showed the yield increased with greater solvent dipolarity (Equation (6) and
Figure 7a).
For a successful reaction the diketo-tautomer of
1 is required to react with
p-anisidine before a Knoevenagel condensation and the final cycloaddition (
Scheme 3) [
41]. Therefore, polar solvents that increase diketo-tautomer concentrations (
Figure 1b) provide higher yields (
Figure 7c). The chelation of indium by the enol-tautomer can be considered as a competing and stoichiometric reaction given the high catalyst loading. Equation (6) implies classical dipolar aprotic solvents such as dimethyl sulphoxide (π* = 1.00) will provide the greatest reaction productivity. However, the product is soluble in this category of solvents, allowing the final step of the reaction to become an equilibrium that favours the intermediates (
Scheme 3), otherwise avoided if the product precipitates. Furthermore, solvents featuring a carbonyl functionality will be reactive in this case study (e.g., Cyrene™, which failed to produce any product), as will strong nucleophiles.
To design a stable and dipolar solvent that is unable to dissolve the product, the Hansen solubility parameters of the tetrahydropyridine were calculated from experimental solubility data using the HSPiP software (full data is provided in the
Supplementary Material). The results suggested the product is not soluble in aliphatic alcohols (found outside the Hansen sphere in
Figure 8). The most dipolar alcohol solvents include glycerol and other polyols, but experimental testing resulted in multiple products caused by the acetalisation and ketalisation of the reactants. The reduction of Cyrene™ created an alternative, novel solvent. The predicted π* value of levoglucosanol is 0.83, corrected to 0.93, which is greater than solvents previously shown to produce high yields. This corresponds to a predicted yield of 69% (using the uncorrected calculated π* value of levoglucosanol) or 72% (from the corrected calculated π*). The synthesis of levoglucosanol from Cyrene™ using sodium borohydride in water permitted the determination of an experimental π* value, which was 0.89. The empirical free energy relationship suggested a reaction yield of 70% would be achieved experimentally. After isolation and recrystallisation, 73% of the theoretical product yield was obtained, exceeding that observed from any of the initial solvent set and in line with predictions (
Figure 7).