Tautomerism in Azo and Azomethyne Dyes: When and If Theory Meets Experiment

The performance of 26 hybrid density functionals was tested against a tautomeric dataset (TautData), containing experimental information for the keto-enol tautomeric equilibrium in 16 tautomeric azodyes and Schiff bases in cyclohexane, carbon tetrachloride and acetonitrile. The results have shown that MN12-SX, BHandH and M06-2X can be used to describe the tautomeric state of the core structures in the frame of ~0.5 kcal/mol error and correctly predict the tautomeric state in respect of dominating tautomeric form. Among them MN12-SX is the best performer, although it fails to describe the nonplanarity of some of the enol tautomers. The same experimental dataset was used to develop and test a special DFT functional (TautLYP) aimed at describing the tautomeric state in azo- and azomethyne compounds in solution when nonspecific solvents are used.


Introduction
Prototropic tautomerism [1][2][3][4][5][6][7][8][9][10][11][12] is one of the most important phenomena in organic chemistry despite the relatively small proportion of molecules in which it can occur. Tautomers are the chameleons of chemistry, capable of changing by a simple change of environment from an apparently established structure to another, then back again when the original conditions are restored. The importance of the tautomerism and proton transfer [12] in the life science, drug design and technology makes it vital to know and/or to predict which tautomer is the major one, since not only the structure, but chemical properties are bound up with this.
According to the classification of dyes, more than half of them are azo compounds and a substantial part are tautomeric or potentially tautomeric [13][14][15][16]. Compound 1 (Scheme 1), one of the first tautomeric compounds discovered [17] and studied almost one and half century [18], is a typical example. Experimental and theoretical investigations of the tautomerism have a solid practical reason, because the color and the stability of the azodyes are strongly influenced by their tautomerism [19]. The Schiff bases are another important class of dyes exhibiting interesting photo-and thermochromic properties and bioactivity, again, in most of the cases, related to their tautomerism [20][21][22].
While the potentially tautomeric structures can be easily distinguished either theoretically or experimentally [23], the study of real tautomeric equilibria in solution is a challenge. On one side, the real equilibrium from the viewpoint of the detection limits of analytical chemistry means Gibb's free energy in the range roughly from −2 to +2 kcal/mol, which is a challenge for quantum chemistry even now [24]. On the other side, the exact estimation of the ∆G values (derived from the tautomeric constant K T , defined as keto/enol, [K]/[E] ratio, Scheme 2), requires isolated tautomeric forms, a condition that is impossible in most of the cases. A major breakthrough in the UV-Vis spectroscopy was the development of chemometric methods to obtain the absorption spectra of the individual tautomers, even though these are never present in their pure form, and then to estimate the tautomeric constants as function of a number of external parameters (temperature, irradiation, solvents, pH, and concentration) [25,26].
The affordable accuracy of quantum chemistry in respect of correct prediction of tautomeric equilibria depends on one side on the molecular size and particularly on the total number of atoms, and on the other, on the requirement to take into account the effect of the solvent. Currently molecules with up to 10 atoms can be very accurately studied by coupled cluster theory, similar to 100 atoms with second order Moller-Plesset perturbation theory (MP2), similar to 1000 atoms with DFT and beyond this number with semiempirical quantum chemistry and force-field methods [28]. The previous results obtained by using MP2 for tautomeric azonaphthols have shown that this level of theory very strongly overestimates the enol tautomer irrespective of the used basis set [27]. This along with the computational affordability directs the possible solution into the DFT field. There is a While the potentially tautomeric structures can be easily distinguished either theoretically or experimentally [23], the study of real tautomeric equilibria in solution is a challenge. On one side, the real equilibrium from the viewpoint of the detection limits of analytical chemistry means Gibb's free energy in the range roughly from −2 to +2 kcal/mol, which is a challenge for quantum chemistry even now [24]. On the other side, the exact estimation of the ΔG values (derived from the tautomeric constant KT, defined as keto/enol, [K]/[E] ratio, Scheme 2), requires isolated tautomeric forms, a condition that is impossible in most of the cases. A major breakthrough in the UV-Vis spectroscopy was the development of chemometric methods to obtain the absorption spectra of the individual tautomers, even though these are never present in their pure form, and then to estimate the tautomeric constants as function of a number of external parameters (temperature, irradiation, solvents, pH, and concentration) [25,26]. The availability of such experimental data has given us the opportunity to check the reliability of the quantum-chemical methods in predicting the position of the tautomeric equilibria in selected azonaphthols and related Schiff bases in solution [24,27]. The results for these dyes and for the tautomeric compounds in general, clearly indicate that there is no universal reliable theoretical solution and the predicted tautomeric shift, often wrong, depends strongly on the theoretical background. The affordable accuracy of quantum chemistry in respect of correct prediction of tautomeric equilibria depends on one side on the molecular size and particularly on the total number of atoms, and on the other, on the requirement to take into account the effect of the solvent. Currently molecules with up to 10 atoms can be very accurately studied by coupled cluster theory, similar to 100 atoms with second order Moller-Plesset perturbation theory (MP2), similar to 1000 atoms with DFT and beyond this number with semiempirical quantum chemistry and force-field methods [28]. The previous results obtained by using MP2 for tautomeric azonaphthols have shown that this level of theory very strongly overestimates the enol tautomer irrespective of the used basis set [27]. This along with the computational affordability directs the possible solution into the DFT field. There is a substantial number of datasets used for benchmarking and/or density functional development in the field of thermochemistry, kinetics, and noncovalent interactions [29][30][31][32][33]. To our best knowledge, no The affordable accuracy of quantum chemistry in respect of correct prediction of tautomeric equilibria depends on one side on the molecular size and particularly on the total number of atoms, and on the other, on the requirement to take into account the effect of the solvent. Currently molecules with up to 10 atoms can be very accurately studied by coupled cluster theory, similar to 100 atoms with second order Moller-Plesset perturbation theory (MP2), similar to 1000 atoms with DFT and beyond this number with semiempirical quantum chemistry and force-field methods [28]. The previous results obtained by using MP2 for tautomeric azonaphthols have shown that this level of theory very strongly overestimates the enol tautomer irrespective of the used basis set [27]. This along with the computational affordability directs the possible solution into the DFT field. There is a substantial number of datasets used for benchmarking and/or density functional development in the field of thermochemistry, kinetics, and noncovalent interactions [29][30][31][32][33]. To our best knowledge, no attempts have been made to develop a density functional that describes reasonably well the relative energies of substantial set of tautomeric compounds. The obvious reason is that there is no way for systematic improvement of the DFT functionals performance, especially in the absence of experimental data to test the theoretical development.
Therefore, the aim of the current communication is to introduce a tautomeric dataset (TautData), containing reliable and precise experimental data for the tautomeric equilibrium of basic dye structures in solution (Schemes 1 and 3) and to test how the most promising density functionals, MP2 and coupled cluster methods predict the tautomerism of these compounds. In addition, the available experimental information has been used to develop a cheap solution for correct description of the tautomerism of the dyes listed in Scheme 1. The reliability of the new tautomeric functional (TautLYP) is compared with the others in a large collection of dyes derived from the structures in Scheme 1 (Scheme 3).
(TautData), containing reliable and precise experimental data for the tautomeric equilibrium of basic dye structures in solution (Schemes 1 and 3) and to test how the most promising density functionals, MP2 and coupled cluster methods predict the tautomerism of these compounds. In addition, the available experimental information has been used to develop a cheap solution for correct description of the tautomerism of the dyes listed in Scheme 1. The reliability of the new tautomeric functional (TautLYP) is compared with the others in a large collection of dyes derived from the structures in Scheme 1 (Scheme 3). Scheme 3. Substituted compounds derived from 1, 4 and 5.

Results and Discussion
The compounds used for benchmarking and DFT functional development (Scheme 1) represent core structures for azo-and corresponding azomethyne dyes [15,34]. Compound 1 is one of the first tautomeric dyes discovered. Compounds 2 and 3 are the core structures of the major class (>60%) of the industrial azodyes (neutral and their metal complexes). Compounds 4 and 5 exhibit interesting photochemical and thermochemical properties in solution and solid state [20,35]. Their tautomeric behavior in solution is very different. In 2-5 the existing intramolecular hydrogen bonding limits the effect of the solvent on the shift of the tautomeric equilibrium. As a result, the tautomeric constants can be changed by change of the solvent in a relatively narrow range. The specific effect of the solvent is also different; the strength of the intramolecular hydrogen bonding in 4 and 5 limits the effect of the proton acceptor solvents, while the same bond in 2 and 3 is weak and can be easily broken [36,37]. The proton acceptor/donor sites in the tautomers of 1 are accessible for the solvents, which allows for a large scale of the tautomeric constant change [38]. In 6, due to aromatic reasons, the tautomeric equilibrium is shifted almost fully to the keto form [13,39,40]. The structural effects are also different; substitution on para position of the phenyl ring in the azodyes leads to a shift of the tautomeric equilibrium towards the enol form when electron donor group is implemented and vice versa. The effect of the substitution in the Schiff bases is limited due to the substantial nonplanarity of the phenyl ring in respect of the overall conjugated system [41].
Keeping in mind that as a rule the solvent effect on the tautomeric equilibrium is specific, experimental data obtained in cyclohexane are used for the benchmarking in the current study. In this way it is assumed that the implicit solvation describes the effect of the solvent. The corresponding ΔG values at room temperature for compounds 1-6 are listed in the last row of Table  1. The positive values indicate a more stable enol form and vice versa. It is clearly seen that the set of 1-6 covers the whole range of changes, from 1.42 in 4 to −1.31 in 6, which along with the variety of structural features discussed above, makes the selection of dyes very suitable for benchmarking. The experimental data are compared in Table 1 to the predicted relative stabilities of the tautomers generated by variety of hybrid GGA and meta-GGA density functionals. Their order of appearance depends on the statistical data of their performance presented in Table 2. The selection of the Scheme 3. Substituted compounds derived from 1, 4 and 5.

Results and Discussion
The compounds used for benchmarking and DFT functional development (Scheme 1) represent core structures for azo-and corresponding azomethyne dyes [15,34]. Compound 1 is one of the first tautomeric dyes discovered. Compounds 2 and 3 are the core structures of the major class (>60%) of the industrial azodyes (neutral and their metal complexes). Compounds 4 and 5 exhibit interesting photochemical and thermochemical properties in solution and solid state [20,35]. Their tautomeric behavior in solution is very different. In 2-5 the existing intramolecular hydrogen bonding limits the effect of the solvent on the shift of the tautomeric equilibrium. As a result, the tautomeric constants can be changed by change of the solvent in a relatively narrow range. The specific effect of the solvent is also different; the strength of the intramolecular hydrogen bonding in 4 and 5 limits the effect of the proton acceptor solvents, while the same bond in 2 and 3 is weak and can be easily broken [36,37]. The proton acceptor/donor sites in the tautomers of 1 are accessible for the solvents, which allows for a large scale of the tautomeric constant change [38]. In 6, due to aromatic reasons, the tautomeric equilibrium is shifted almost fully to the keto form [13,39,40]. The structural effects are also different; substitution on para position of the phenyl ring in the azodyes leads to a shift of the tautomeric equilibrium towards the enol form when electron donor group is implemented and vice versa. The effect of the substitution in the Schiff bases is limited due to the substantial nonplanarity of the phenyl ring in respect of the overall conjugated system [41].
Keeping in mind that as a rule the solvent effect on the tautomeric equilibrium is specific, experimental data obtained in cyclohexane are used for the benchmarking in the current study. In this way it is assumed that the implicit solvation describes the effect of the solvent. The corresponding ∆G values at room temperature for compounds 1-6 are listed in the last row of Table 1. The positive values indicate a more stable enol form and vice versa. It is clearly seen that the set of 1-6 covers the whole range of changes, from 1.42 in 4 to −1.31 in 6, which along with the variety of structural features discussed above, makes the selection of dyes very suitable for benchmarking. The experimental data are compared in Table 1 to the predicted relative stabilities of the tautomers generated by variety of hybrid GGA and meta-GGA density functionals. Their order of appearance depends on the statistical data of their performance presented in Table 2. The selection of the functionals is related to their popularity, use in investigation of tautomeric systems and previous benchmarking of 1 [18] and 1-3 [27].    It seems from the data that no one of the existing density functionals fits the experiment perfectly. Linearity, shown in Table 1, can be used for monitoring whether the general tendency of predicting is correct or not. Basically, the perfect situation should have correlation and slope tending to 1 and a zero intercept. Some of the functionals give highly correlated results (M06-2X, BHandHLYP, ωB97X), while in other cases the predictions scatter (M06-HF, TPSS, O3LYP). The description the tautomeric state of the studied compounds is quite unrealistic in the case of HF and MP2. In most of the cases the slope tends to 1, which with the relatively good correlations indicates that corresponding functional correctly describes the relative structural aspects of the tautomerism (SOGGA11-X for instance). The intercept roughly shows if the theory overestimates one or another tautomer in the tautomeric mixture; most of the functionals overestimate the stability (negative slope) of the keto tautomer, some of them very strongly (CAM-B3LYP and below). The statistical data in Table 2 follows the major conclusions up to now. Generally it is very difficult to write a clear conclusion about tendencies; the best behaving density functionals belong to different classes (GH/RSH mGGA/GGA), so there is no clear relation with the HF exchange and even functionals belonging to the same family (BHandH, BHandHLYP, B3LYP) behave in a different way. It is obviously that MN12-SX, BHandH and M06-2X can be used to describe the tautomeric state of 1-6 in the frame of~0.5 kcal/mol error and correctly predict the tautomeric state in respect of dominating tautomeric form (i.e. positive/negative ∆E). In the frame of 1 kcal/mol precision SOGGA11-X, M-11 and BHandHLYP could be also used. However, this is a general statistical conclusion which should be taken with care. If BHandHLYP is considered one example, it predicts molar fraction of the keto tautomer as 33, 75, 90, 29, 61, 99 (in %, calculated from the corresponding ∆E values in Table 1)  It is obvious from Tables 1 and 2 that TautLYP describes the experimental dataset much better compared to the next best performers, which is not surprising since it is fitted against exactly this experiment. Therefore, it is needed to check it and the others' reliability against different datasets. For this purpose, structural data (bond length and dihedral angles) of 1 and 6 will be used along with equilibrium data for the dyes from Scheme 3.
Currently, structural data for the enol and keto tautomers of 1 are available in the literature [18] along with data for the fixed enol tautomer (OMe compound) of 6 [69]. In the case of compounds 2-5 even if crystallographic data are available the structures contain contribution of both tautomers [70,71], which makes the structural information unsuitable for benchmarking.
The structural data along with the predicted results for the density functionals from TautLYP to BHandHLYP are collected in Table S1, Supplementary information. In Table 3 the statistical evaluation of the performance of the functionals is given.
The results indicate that all studied DFT functionals describe the bond lengths reasonably well. The differences are very slight (except in the case of BHandH) to speculate about best performers. In respect of the angles description two distinguished groups are seen, depending on the non-planarity description. MN12-SX, BHandH and M11 predict the enol form planar, which reflects in substantially higher MAE and MAPE. In respect of the total MAPE TautLYP and M06-2X are the best performers.
Experimental data in CCl 4 and acetonitrile for the tautomeric state of the core structures and their substituted compounds are collected in Table 4. As in the case of cyclohexane, the solvents have been selected to avoid specific solute-solvent interactions and to assume implicit solvation model. The performance of the best functionals from Table 2 is compared with that of B3LYP in Table 5. The statistical results are based on the data from Table S2. It is seen that TautLYP predicts the experiment slightly better comparing to MN12-SX, but the information from Table 3 should also be taken into account for the final decision which of them to use. Comparing to B3LYP both have MAPE in orders better.  (6) * according to the crystallographic data the Ph and naphthalene rings in the enol forms of 1 and 6 are twisted in respect of the N=N group (see Table S1 and Scheme S1 for more information). Availability of CCSD(T) data in the gas phase makes possible to compare the results with these of TautLYP. Of course, there is a difference in the linear equations of TautLYP in cyclohexane (Table 1) and in the gas phase ( Table 6) in respect of the offset. The gas phase equation gives systematic predominance of the enol tautomers, which is reasonable because the higher dipole moment of the keto forms, giving difference in the implicit solvation model in cyclohexane, is not accounted. Keeping in mind that for the CCSD(T) calculations MP2 optimized geometry is used, in Table 6 the results for TautLYP and TautLYP//MP2 are presented. Although the linearity parameters are almost the same there is difference in the prediction of the ∆E values for the azo compounds in the frame of~0.2 kcal/mol, which is a result from the increased twisting of the phenyl ring in the MP2 optimization geometry. In the case of 4 and 5 where the angles predicted by TautLYP and MP2 are near, the difference in the ∆E values is small. Comparing MP3//MP2 and MP4//MP2 with TautLYP, it is seen that the former strongly overestimates the keto tautomer, while the latter has relatively good prediction ability. The overall impression for CCSD(T) is that the enol form is overestimated, especially in the case of 6. A comparison of the CCSD(T) and TautLYP//MP2 predicted ∆E values shows that they are almost identical for 1, strongly different in 6 and well correlated in the case of 2-5 (with a difference of 0.9-1.1 kcal/mol in respect of the enol stabilization by CCSD(T)). Further studies of the performance of TautLYP by using the standard databases for DFT development, as well as comparison with CCSD(T) calculations in solution, could shed light on the applicability of this density functional for general purposes and for predicting the equilibrium state of other tautomeric systems in solution. The calculations are in progress and the results will be reported later.

Materials and Methods
In total, 35 compounds (1-6 and derived from them) were synthesized, purified and characterized according to the standard procedures [14,15]. For each of them the protocol for determination of the tautomeric constants as described previously was applied [26,72]. The protocol includes UV-Vis spectra measurements followed by a chemometric data processing [72]. To make the data comparable, all spectra were measured at the same instrumental conditions (scan speed, detector output and spectral slit) on Jasco V570 UV-Vis-NIR double beam spectrometer (JASCO International Co. Ltd., Tokyo, Japan) equipped with a thermostatic cell holder (Huber MPC-K6 thermostat with 1 • C precision (Peter Huber Kältemaschinenbau AG, Offenburg, Germany)) in spectral grade solvents (cyclohexane and acetonitrile, both Multisolvent grade from Sharlau (Scharlab, Barselona, Spain)) at 25 • C. The obtained spectra of each compound were processed using resolution of overlapping bands in order to get the molar fractions of the keto and enol tautomers and their spectral characteristics in the corresponding solvent. The chemometric protocol is described in detail, together with examples in [72]. The tautomeric molar fractions allow for calculation of the tautomeric constant K T (defined as [K]/[E]) and hence, evaluation of the corresponding ∆G value, which is finally included in TautData.
For the aims of benchmarking, in the TautData dataset are finally included only precise ∆G values in cyclohexane and acetonitrile, obtained based on molar fractions of the enol tautomer in the frame ±5%. The data for CCl 4 are taken from the literature and are obtained previous by us using the same protocol [36,41,73,74]. Keeping in mind that they are obtained at different time, the precision might be lower comparing to these in cyclohexane and acetonitrile due to slightly different experimental conditions.
The quantum chemical calculations in current study were performed by using Gaussian 09 Rev. D.01 program suite [75]. In all cases 6-31++G** basis set was applied. The hybrid density functionals are used as implemented. All structures were optimized without restrictions in gas phase under tight optimization conditions by using ultrafine grid in the computation of the two-electron integrals and their derivatives. The optimized structures were then characterized as true minima by vibrational frequency calculations. The solvent effect was then described by applying the polarizable continuum model (PCM) in its integral equation formalism variant (IEFPCM) [76]. In the case of MP2 normal optimization conditions were used. In the CCSD(T) [77] single point calculations the convergence on the energy was set to 10 −6 . For the purpose of benchmarking the ∆E values, representing the energy difference E K −E E in the corresponding solvent, are used.
The TautLYP was optimized using the general equation for exchange-correlation functional: where ∆E Becke X and ∆E LYP C are the generalized gradient approximations-the Becke's 1988 exchange functional [57] and the correlation functional of Lee, Yang and Parr [62,78], respectively; E VWN C is the VWN local-density approximation to the correlation functional [79].
Thus, E XC is governed by 5 parameters, which gives it a high extent of flexibility and possibility to be fitted against experimental data. In this case searching global minimum of the target function S T was performed according to the following equation: where n T is the number of the tautomeric compounds included in the fitting (six compounds, Scheme 1), ∆G exp are the experimentally determined Gibb's free energies 25 • C in cyclohexane (as described above) and ∆E E XC are the relative energies (difference between the energies of the keto and enol tautomeric forms of the same compound optimized in gas phase and calculated in cyclohexane as a single point) calculated with the functional defined by equation (1) at particular values of a − e. The optimization in respect to equation (2) was performed by using advanced simplex optimization of these optimization parameters (m, which means m + 1 dimensional optimization space and m vertices of the simplex). In this case m = 5. The method of Nelder and Mead was applied [80] for the optimization. The process was terminated when the standard deviation σ 2 , defined by Equation (3), became less than 1 × 10 −9 .
where S T j is the value of the optimization function (2), defined by a set of a j − e j parameters; S T is the average value. At that stage there are no substantial differences between the a − e values in the m vertices, because the simplex is too small.

Conclusions
TautData dataset containing precise information for the position of the tautomeric equilibrium of azodyes and Schiff bases was created. It contains information for 6 compounds in cyclohexane, 13-in carbon tetrachloride and 15-acetonitrile and can be used for benchmarking of density functional development.
Using the information for the position of the tautomeric equilibrium in cyclohexane of 6 core structures of industrially important azo-and azomethyne dyes, a new hybrid density functional, namely TautLYP, was fitted. Its reliability was checked by using the data from TautData, showing very good performance.
Benchmarking of existing DFT functionals was performed by using TautData and available structural information for some of the tautomers. The results have shown that MN12-SX, BHandH and M06-2X can be used to describe the tautomeric state of the core structures in the frame of~0.5 kcal/mol error and correctly predict the tautomeric state in respect to a dominating tautomeric form. Among them, MN12-SX is the best performer, although it fails to describe the nonplanarity of some of the enol tautomers.