As explained before, in order to arrive at a consistent data set, we first need to parametrize the more simple classes of molecules such as the n-alkanes and the n-alcohols, even though these have been treated elsewhere before. Following the approach outlined in the above, one new group is parametrized in each successive step.
3.1. n-Alkanes and Monomethyl Alkanes
For the n-alkanes, we collected data from the NIST database, from the CAPEC database as used in [
15] and from Rossini and co-workers [
21]. All individual data can be found in the
Supplementary Materials as Table S2. The CH
3 GC parameter was simply taken as 50% of the heat of formation of ethane from the Rossini paper, i.e., −42.36 kJ/mol. There is a little variation in the experimental values for ethane with the CAPEC and NIST values but well below 1 kJ/mol, so for any practical use, this makes no difference. When we adopted the CH
2 increment from the Rossini data set, −20.63 kJ/mol, we obtained an excellent result for the entire alkane data set with the model equation
Equation (2), as with all other equations that will follow for other classes of molecules, is essentially of the general form of Equation (1), but now only the first-order term ∑I NiCi has been retained. Ci is the contribution of the group i, e.g., CH3, to the heat of formation, and Ni is the number of times this group is present in the molecule. The average absolute difference between the model Equation (2) and experiment (Rossini values and CAPEC values when not available from Rossini) was found to be 0.53 kJ/mol. We emphasize that this value could have been lower as the experimental data set probably suffers from some errors, albeit small ones. The deviations become clearly larger as soon as there are no available data from the Rossini group, which is from heneicosane onwards. The variation in the values for the CH2 increments is very small for the Rossini data, but non-negligible for the other data set. For the higher alkanes starting with heneicosane, for which we have only CAPEC data, the increments vary between 19.3 and 21.44 kJ/mol. When we calculate the average absolute difference between the experiment and model for the Rossini data (ethane up till eicosane), we only find a value of 0.19 kJ/mol. In conclusion, we may state that the model works excellently, as is also known from other works.
For the mono-methylalkanes, we needed to introduce one new group, namely, the CH group, for which we found the value −4 kJ/mol the best overall solution. Group contribution values for CH
3 and CH
2 were previously determined in
Section 3.1. With the model equation
we obtained an average absolute difference between the model and experiment of 1.91 kJ/mol, and all individual values, except for (2-methylnonane), are within chemical accuracy (1 kcal/mol or 4.2 kJ/mol). Here, we again used Rossini’s data wherever available [
21] and otherwise NIST, and if both were not available, we used CAPEC data. All individual data can be found in
Table S3 in the Supplementary Material. The value for the group CH was chosen as −4 kJ/mol because that value showed the best agreement between model and experimental values with the exception of 2-methylnonane which deviated by 5.34 kJ/mol from the experiment. A more negative value for the CH group would have led to a better overall (averaged) agreement with the experiment; however, in that case, other values would have been beyond chemical accuracy, more specifically, 4-methylheptane. Our choice was based on the observation that the CH
2 increment associated with 2-methylnonane (−22.35 kJ/mol) is an indicator for an error in the experimental value (−260.2 kJ/mol). Finally, later on, we experienced that our choice for the CH group parameter also positively influenced the results for other classes, e.g., “2-Alkenes + substituent at double bond” and “1-Alkenes + substituent NOT at double bond”, avoiding results that would otherwise be beyond chemical accuracy.
3.2. Oxygen-Containing Series: n-Alcohols, n-Aldehydes, 2-Alkanones, Mono- and Dicarboxylic Acids, Ethers
For the new OH group to be introduced, we found a group contribution value of −171 kJ/mol to be the appropriate value for a good model, leading to the best results for the model based on the formula
to evaluate the heats of formation for the n-alcohols and
for the secondary alcohols. The averaged absolute difference between the model and experiment was evaluated as 1.54 kJ/mol. All individual data can be found in
Table S4 in the Supplementary Material. The overall good performance including that for the higher members of the groups 1-tetradecanol and 1-eicosanol suggest that the model values are more accurate than the experimental values.
When adopting a new group contribution value of −124 kJ/mol for the aldehyde group (terminal C=O), the average absolute difference (model–exp) using CAPEC database experimental data was found to be only 0.31 kJ/mol, our model being
When we adopt the simplest formula possible for the alkanones,
we find a very good account of the heat of formation based on the pure additive contribution using a group contribution parameter of −133 kJ/mol for the keto group, and an averaged absolute difference (model–exp) of 1.10 kJ/mol, whereas all individual deviations are below 2 kJ/mol. Data are collected in
Table S6 in the Supplementary Material.
With the formula
the group contribution parameter for the carboxylic group was determined as −391 kJ/mol, leading to an averaged absolute difference (model–exp) of 1.12 kJ/mol. All individual data are collected in
Table S7 in the Supplementary Material. Regarding individual values, the values for pentanoic acid and hexanoic acid show deviations of 4.25 and 3.88 kJ/mol, respectively. Whilst still basically within chemical accuracy, we observe that (i) the increments in the NIST data set are irregular (see
Table S7 in the Supplementary Material) and (ii) when we take the corresponding values from the CAPEC database, the values for (model–exp) go down to 1.25 and 0.78 kJ/mol, respectively. In this case, we thus may conclude there is an issue with some experimental data, whereas otherwise, the model shows very satisfactory performance.
For dicarboxylic acids, we found that we can describe the heats of formation well by
with an average absolute difference (exp–model) of 0.89 kJ/mol. All data are shown in
Table S7 in the Supplementary Material. These results suggest there is definitely no need for higher-order contributions as suggested in the work of [
4]. For MG ICAS23 [
4,
17], we found that for propanedioic acid, a secondary group had been introduced (HOOC-CH
n-COOH (n in 1..2)); for butanoic acid, yet another second-order group had been introduced (HOOC-CH
n-CH
m-COOH (n, m in 1..2)); and for pentadioic and hexadioic acids, a third-order group was introduced (HOOC-(CH
n)m-COOH (m > 2, n in 0..2)), adding up to four additional parameters to be fitted, whereas our current approach has none.
For the methylalkylethers, the group value for the (H
3)COC(-) group was determined as −175 kJ/mol. The averaged absolute difference (model–exp) of 2.42 kJ/mol with the model
and data are to be found in
Table S8 in the Supplementary Material. We observed that for dimethylether, the difference (model–experiment) is 9.1 kJ/mol, whereas also Hukkerikar et al. [
4], using a different GC approach, found a difference 10.2 kJ/mol. This suggests we should treat dimethylether as a separate, individual species in the GC approach with the experimental value of −184.1 kJ/mol associated with it, assuming this is a correct value.
Initially, we made an attempt to cover all ethers and di-alkylethers with one formula. However, it became clear that many individual values would not comply with chemical accuracy. Consequently, we introduced two distinct values for the methyl-alkylethers (H
3)COC(-), and for the other di-alkylethers R’-COC-R. The GC contribution for the latter was determined as −168 kJ/mol. For these latter class of di-alkylethers, we report an averaged absolute difference with a value of 3.48 kJ/mol and thus within chemical accuracy, whereas the MG ICAS23 approach gives 3.38 kJ/mol. Still, our model value for di-n-pentylether differs by 13.5 k/J/mol from the experimental value. Interestingly, the MG ICAS23 approach [
4] also reveals a difference of 13.5 kJ/mol. Moreover, also remarkable is the increment, from experimental data, for di-n-pentylether. Compared to di-n-butylether, the difference is 56 kJ/mol, which is a lot more than the additive value 2 * 20.68 = 41.7 kJ/mol. A value of 41.7 kJ/mol would account for the difference 13.7 kJ/mol for di-n-pentylether. For these longer alkyl chains, there is no chemical argument why such non-additive behavior would be realistic. One should therefore question this exceptional value for a single species, di-n-pentylether. On the basis of the current data and arguments, we conclude, for the time, that the models properly predict the heat of formation for all ethers. Both MG ICAS23 and the here proposed model perform appropriately.
3.3. Alkenes
Experimental data for the 1-alkenes were preferably taken from Rossini et al. [
22,
23], and otherwise from NIST or CAPEC. With a choice of +62.5 kJ/mol for the C=C group and our formula describing the heats of formation for the 1-alkenes,
we arrived at an average absolute difference (model–exp) of 0.17 kJ/mol only. All individual data can be found in
Table S9 in the Supplementary Material. These findings once more confirm the correctness of our adopted values for the CH
3 and CH
2 groups. For ethylene itself, we need a dedicated single value, assuming the NIST value of +52.4 kJ/mol is correct. This is apparent from the increment of −31.49 kJ/mol between ethylene and 1-propene. We may also look at this from a different perspective, namely, by taking the heat of formation of +52.4 kJ/mol of ethylene as the basic value for C=C (rather than 62.5 kJ/mol for C=C-), and seeing all mono-substituted species as having a group nearest neighbor interaction of 10.1 kJ/mol. At the moment, it seems as appropriate to consider ethylene as a separate species with an individual value.
When we now look at the alkenes which have a non-terminal double bond, e.g., 2-pentene or 3-hexene, adopting the equation based on our choice for defining the groups ΔH
f (non-terminal alkenes) = 2* GC
CH3 + N
CH2 * GC
CH2 + GC
C=C to evaluate the heats of formation as for the 1-alkenes, we find that the heats of formation deviate clearly, typically by about 10 kJ/mol, from the experimental values. It is to be noted that this is not observed for the alkynes (see later) which obey the same model for the 1-alkynes as well as for the other alkynes listed. This is likely to be the reason why in the Marrero–Gani model [
10] the alkynes’ required groups include CH
3, CH
2 and C≡C, whereas for the alkenes, including the 1-alkenes, a second-order parameter was included to arrive at sufficiently accurate model values. To find out more about whether this is indeed truly non-additivity for the 2-alkenes, we performed density functional theory (DFT) calculations using the B3LYP functional and the 6 – 311 + G** basis set to evaluate the energies of alkenes and alkynes. The results are shown in
Table 2. When we look at the structural differences between 1-ene and 2-ene, the difference regarding groups is one CH
3 more and one CH
2 less. The same can be said for the ynes. As we see from
Table 2, the increment is about 25 kJ/mol for the ynes and 12 kJ/mol for the enes. Therefore, there is, maybe a little surprisingly from a naive chemical point of view, a difference between ynes and enes. It is the enes that pose the non-additive issue, as when we take the difference between our GC value for CH
3 and CH
2, we end up with a value over 20 kJ/mol which we obtained from the DFT calculations for the ynes, so these behave in a normal additive way. Thus, we need to correct the model values for the 2-enes and 3-enes or, in other words, we need to introduce a nearest neighbor interaction. For trans-R-C=C-R’, this correction was determined as 11.5 kJ/mol, and thus we introduced a new group, trans-R-C=C-R’, with a GC parameter value of +73.5 kJ/mol. For cis-R-C=C-R’, the correction was slightly larger, and the new group trans-R-C=C-R’ has an associated GC parameter of +78 kJ/mol. Thus, for the enes other than the 1-enes, we have a group contribution of +73.5 kJ/mol for the trans R-C=C-R’ group and +78 kJ/mol for the cis R-C=C-R’ group, compared to +62.5 kJ/mol for the 1-enes. One could also formulate it such that we can obtain a nearest neighbor effect contribution of 11 kJ/mol for trans-R-C=C-R’, and, similarly, 15.5 kJ/mol for cis-R-C=C-R’. With these new parameters and model formulae
we obtain good agreement (within chemical accuracy) between model and experimental values for the trans-alkenes with an absolute average difference between the model and experiment of only 0.55 kJ/mol. All individual data are collected in
Table S10 in the Supplementary Material.
We will now treat various alkylsubstituted alkenes separately, and as we will see in some cases, nearest neighbor effects are present; in others, they are not. For the 1-enes with an alkyl substituent not directly attached to a double-bond carbon atom, we can describe the heats of formation similar to the non-substituted 1-alkenes, and the formula reads
The averaged absolute difference (model–exp) is 2.54 kJ/mol and all individual model values are within chemical accuracy from the experimental values. As described before, the group CH has a value of −4 kJ/mol attributed to it. Data are collected in
Table S11 in the Supplementary Material. In works using second- and third-order groups, e.g., the MG ICAS23 software suite [
17], three additional second-order group parameters were involved ((CH
3)
2CH, CH
2-CH
m=CH
n (m,n in 0..2), CH
p-CH
m=CH
n (m,n in 0..2; p in 0..1)), whereas the presently proposed model did not involve any new parameters.
The next data set comprises 2-alkenes with a substituent at the double bond, and all data are collected in
Table S12 in the Supplementary Material. Unfortunately, we have few experimental data available. Applying the formula expected on the basis of the groups that constitute this class of species results in an averaged absolute difference between model and experimental values of 1.0 kJ/mol and all individual values being within chemical accuracy. Compared to the best method thus far, implemented in the MG ICAS23 software suite, the MG approach needed two additional second-order group parameters (CH
3-CHm=CH
n (m,n in 0..2), CH
2-CH
m=CH
n (m, n in 0..2)), whereas no additional parameters were needed for the model we present here, viz., Equation (11). It is interesting to note that this is different from the pure 2-alkenes (see above) where a, albeit single, nearest neighbor parameter needed to be added, but at present, we cannot exclude the idea that a fortuitous cancellation of two effects is involved.
For the 1-enes with an alkyl substituent at the double-bond carbon atom, i.e., 2-(m)ethyl−1-alkenes, the formula comprises a new group contribution term, GC
C=C(C)-R, which includes a neighbor (to the double bond) effect of magnitude 8 kJ/mol in order to achieve the required chemical accuracy. The group contribution for the C=C bond with substituent (C=C(C)-R) therefore is +70 kJ/mol compared to +62 kJ/mol for the unsubstituted 1-enes. All individual values are collected in
Table S13 in the Supplementary Material, for which we obtained an average absolute difference (model–exp) of 1.21 kJ/mol, whereas all individual differences are below 2 kJ/mol. The MG ICAS23 [
4,
17] result (we once more compare to the best available results reported until now) for the average absolute difference (exp–model) reads 1.50 kJ/mol. However, to achieve this, the method has four additional second-order group parameters ((CH
3)
2CH, CH
2-CH
m=CH
n (m,n in 0..2), CH
3-CH
m=CH
n (m,n in 0..2), CH
p-CHm=CH
n (m,n in 0..2; p in 0..1)) compared to a single nearest neighbor parameter in the presently proposed model.
3.4. Alkynes
As we have more experimental values from the NIST database, we use the Rossini data [
24] when available and otherwise NIST experimental data to determine the GC parameter for the C≡C group. This does not impose any kind of issue, as the experimental values from the two data sets that can be compared differ by less than 2 kJ/mol. The main reason to select the larger NIST set is to check whether there are no deviations from a larger data set to ensure that we have a proper parameter value for the C≡C group. The value of this parameter was found to be 229 kJ/mol. The averaged absolute difference (model–exp) was 1.53 kJ/mol whilst using
to evaluate the model values. Data are to be found in
Table S14 of the Supplementary Material. The value of 229 kJ/mol was selected such that also individual values are within chemical accuracy (1 kcal/mol).
For the 2-, 3-, 4- and 5-ynes, we also found good agreement between experimental values (Rossini values, and if they are not available, NIST data) and the model
and calculated an averaged absolute difference (model–exp) of 1.78 kJ/mol. Only 2-pentyne has an error (5.25 kJ/mol) somewhat larger than chemical accuracy (4 kJ/mol). The NIST database provides two values for 2-butyne: 145 and 148 kJ/mol. The latter value 145 kJ/mol originates from [
24], whereas the value of 148 kJ/mol originates from [
25]. In their later paper [
24], Rossini et al. stated not to have found the reason for these (small) differences.
For 2-pentyne, there is only one value, and the reference is to the Rossini et al. works, and therefore it is identical to the value in the previous column in
Table S14 of the Supplementary Material. At this moment, we cannot conclude whether an experimental deviation is involved or that minor interactions are involved. Regarding the latter, a small additional term of the size 1.5 kJ/mol added to the C≡C group contribution (+229 kJ/mol) would make all non-terminal alkynes have a deviation within chemical accuracy. However, as this would be added on the basis of the need for, in essence, a single deviation, which still might be an experimental error of minor size, we do not propose this at this stage. All individual data are collected in
Table S14 in the Supplementary Material.
3.5. Nitrogen-Containing Species: n-Alkylamines and Nitriles
The availability of experimental led to the selection of these data from different literature sources: experimental data are based on the Rossini value for ethylamine (Ref. [
26] page 623), NIST values when available, and otherwise CAPEC data base values. By selecting a value of +13 kJ/mol for the amine group and our formula describing the heats of formation for the alkylamines,
we arrived at an averaged absolute difference (model–exp) of 1.20 kJ/mol. The methylamine species is to be considered separately. This is not only because of the difference between the model and experiment (5.86 kJ/mol) but also because the increment between methylamine and ethylamine is clearly larger than the typical CH
2 value of 20.63 kJ/mol, but around 26 kJ/mol. Thus, methylamine should best be considered a separate species. All individual values are collected in
Table S15 of the Supplementary Material.
The nitriles form an interesting class, more specifically, the dinitriles. For the mono-nitriles, the group value for the nitrile group was found to be +116 kJ/mol based on the model
We evaluated an averaged absolute difference (model–exp) of 0.80 kJ/mol. Data can be found in
Table 3.
When we now look at the α,ω-dinitriles, for 1,5-pentanedinitrile and 1,6-hexanedinitrile, we also find excellent agreement between the model, viz.,
and the experimental values shown in
Table 3. For malononitrile (N≡CCC≡N) and butane-dinitrile (N≡CCCC≡N), also known as succinonitrile, the difference between model and experimental values is around 55 and 20 kJ/mol, respectively. As the other species reveal good agreement between model and experimental values, and because dinitrile species with a long CH
2 sequence in between should be considered to be similar in behavior to the mono-nitriles but now with two nitrile groups at both ends, we assume these deviations are due to the interaction between the CN groups, which is further supported by the observation that for malononitrile, the deviation is much larger than for butane-dinitrile. We performed ab initio Hartree–Fock and DFT B3LYP calculations to evaluate the energy differences between the successive dinitriles, as presented in
Table 4. These confirm that malononitrile and butane-dinitrile are distinct cases, and higher species follow the formula Equation (15b). This is supported by Beckhaus et al. [
27], who reported, in a paper on geminal substituent effects, a synergetic destabilization by two geminal cyano substituents of 11.5 kcal/mol or 48 kJ/mol. In their paper, they also evaluated the heat of formation and found 266.5 kJ/mol for malononitrile, in good agreement with the NIST value in
Table 5. For alkylsubstituted malononitriles, this effect decreases with increasing substitution but is still present, viz.,
Table 5. Thus, the alkylsubstituted malonitriles need to be treated separately which we will not further discuss here, but [
27] provides the first useful data to push this forward.
Now, we need to mention an important difference from the previous MG ICAS23 [
17] approach which reveals that for 1,6-hexanedinitrile, the groups involved are CH
2 (twice), CH
2CN (twice) and a third-order group NC-(CH
n)
m-CN, m > 2. Similarly, the heat of formation of 1,5-pentanedinitrile is accounted for by CH
2 (one), CH
2CN (twice) and a third-order group NC-(CH
n)
m-CN (m > 2), whereas 1,4-butanedinitrile is described by two CH
2CN groups and a second-order group named NC-CH
n-CH
m-CN (n, m in 1..2), and, finally, malononitrile is described by one CH
2CN group and one CN group. Thus, malononitrile is considered a regular molecule for which the heat of formation is purely additive with first-order contributions, namely, for CH
2CN and CN only. Based on the energy differences between successive species in the series, and our ab initio and DFT calculations and the work by Beckhaus et al. [
27], we have to conclude that the MG ICAS23 results are not based on the correct chemistry.
In summary, because of the presence of (large) geminal effects in malononitrile and succinonitrile, we need to treat these as individual entities in the GC method to circumvent more parameters for these two species only. All other mono- and dinitriles can be described based on the group contribution for CH2, CH3 and CN only.
3.6. Benzene/Phenyls
For the mono-substituted benzenes, from experimental data from Rossini et al. [
28,
29], we initially attempted ΔH
f (mono-substituted benzene) = GC
Phenyl + N
substituent * GC
Substituent. When we included benzene and thus only have the contribution GC
Phenyl, we had deviations between the model and experiment. We observed the same for the di-, tri- and tetra-substituted benzenes that will be discussed later. However, when we add a single parameter, AromMonoalkyl, for the mono-alkylsubstituted benzenes with the numerical value 6 kJ/mol, we arrive at
and we find good agreement between our new model and experimental values. All individual values are collected in
Table S16 in the Supplementary Material. The averaged absolute difference (model–exp) found reads 0.88 kJ/mol, and all individual values are within chemical accuracy; actually, many have a deviation less than −1 kJ/mol. In our approach, we only have the single additional parameter valued at 6 kJ/mol. This additional parameter is not added in the unsubstituted benzene, so it could be said that benzene itself is an exception.
It was to be expected that for the substituted benzenes, a simple additive behavior of the energies of the individual groups will not lead to adequate results. The substitution of a conjugated system will lead to interaction energies between the groups, even though perhaps small. The MG ICAS23 [
4,
10,
17] approach introduced second-order parameters for substituted benzenes, for each substitution pattern, e.g., AROMRINGs1s2s for 1,2 substitution, AROMRINGs1s3s5 for 1,5-substitution, AROMRINGs1s2s3s4s for 1,2,3,4-tetrasubstitution. By doing that, the quality of the fitting of the experimental data was improved substantially. However, it means that for di-substituted benzenes, we have AROMRINGs1s2s, AROMRINGs1s3s and AROMRINGs1s4s, which implies three parameters; for the tri-substituted benzenes, we have AROMRINGs1s2s3s, AROMRINGs1s2s,4s, AROMRINGs1s2s5s, AROMRINGs1s3s4s and AROMRINGs1s3s5s, which implies five additional parameters. For tetra-substituted benzene, we even have potentially three additional parameters: AROMRINGs1s2s3s4s, AROMRINGs1s2s3s5s, AROMRINGs1s2s4s5s.
As we aim at results within chemical accuracy with a minimum number of parameters, i.e., avoiding overfitting, we describe the di-substituted alkylbenzenes by
where the single additional substitution parameter AromDialkyl leads to a good description of the heats of formation of the di-substituted alkylbenzenes when this parameter takes the value 18.5 kJ/mol, viz.,
Table S16b. The averaged absolute difference between the model and experiment is 1.52 kJ/mol and all individual values are within chemical accuracy.
For the tri-substituted alkylbenzenes, we adopted
with AromTrialkyl = +30 kJ/mol, and the averaged absolute difference (model–exp) found reads 2.33 kJ/mol and all individual values are within chemical accuracy for both models, viz.,
Table S16c.
Finally, for the tetra-substituted alkylbenzenes, we adopted
with AromTetraalkyl = +40 kJ/mol, and the averaged absolute difference (model–exp) was calculated as 1.15 kJ/mol. For the present model, all individual values are clearly within chemical accuracy, viz.,
Table S16d.
In summary, we obtained excellent results whilst involving a single, individual substitution parameter for each of the mono-, di-, tri- and tetra-alkylsubstituted benzenes (we already had all other parameters from the previous part of this paper). This seems the best methodology reported thus far, as (see above) the previous best approach involved substantially more parameters with the risk of overfitting.
3.8. A Most Interesting Case: Cycloalkanes
An interesting case are the cycloalkanes in which ring strain plays a significant role. We have already seen one case in which taking into account chemical knowledge is crucial, i.e., malononitrile and succinonitrile, to obtain a reliable predictive model also beyond the molecules used for the parametrization. When we apply the up till now best approach, MG ICAS23 [
4,
10,
17], we find that a range of third-order parameters were involved for the cylcoalkanes. In addition, for methylcyclopentane and methylcyclohexane, additional third-order 5- and 6-member ring parameters are introduced, whereas these are surprisingly absent in the unsubstituted equivalents cyclopentane and cyclohexane. This is surprising as one would expect that cyclohexane and methylcyclohexane, not suffering from ring strain, would behave quite normally as a collection of CH
2 groups, and as we will see below, they do indeed. Furthermore, despite all additional third-order parameters, the performance when applied to the cycloalkanes varies a lot: from chemical accuracy up till some 80 kJ/mol in error for cyclopropane and cyclobutane and more than 20 kJ/mol for cyclodecane. For the very constrained case of bicyclobutane, the deviation from the experiment (MG ICAS23 value −169 kJ/mol) is even around 380 kJ/mol (for references, see the caption of
Table 4).
As cyclohexane is a species known to exhibit practically no ring strain (only 0.4 kJ/mol according to Anslyn and Dougherty [
30]), one would expect the heat of formation can be described as the sum of CH
2 group contributions. This is confirmed by the results for cyclohexane and all n-alkylsubstituted cyclohexanes in
Table 6: the only groups involved were CH
2 (only group for cyclohexane), and CH
3 and CH for the n-alkylsubstituted cyclohexanes, for which parameter values were determined at the beginning of the present study, viz.,
Section 3.1. All other cycloalkanes show very significant differences between such a model, i.e., ΔH
f (cycloalkane) = N
CH2 * GC
CH2, and the experimental values. The issue here is the ring strain in these cyclic molecules. Ring strain is a result of energy differences introduced by CCC bending and CCCC torsional changes and not simply a property which is additive in a few additional parameters. With increasing ring size and increasing substitution, these contributions vary, and it is not surprising that many additional parameters are needed, while for several species, the difference between the model and experiment is still huge (see numbers quoted above). As the ring strain is not a simple additive parameter or a few simple additive parameters covering the series, the solution to this problem is as follows: we need to take into account common chemical knowledge, namely, recognition of and concrete experimental values for the ring strain [
29], add the values
and, similarly for the methylsubstituted cycloalkanes,
and we then obtain the value in the column “Model ΔHf” in
Table 6, and the difference from experimental values is shown in the column “Model–Exp”. The averaged absolute difference over all cycloalkanes in
Table 6 is 2.32 kJ/mol. This is a good result, with only a single value beyond chemical accuracy, viz., methylcyclohexane. We notice that the deviation for the n-alkylsubstituted cyclopentanes is typically 3.6 kJ/mol throughout, which may originate from a different ring strain energy for the substituted cyclopentane compared to the pure cyclopentane. The same applies to the n-alkylsubstituted cyclohexanes for which the difference is, on average, approximately 2.4 kJ/mol. If this could be confirmed, the averaged absolute deviation would decrease from 2.32 down to 0.46 kJ/mol.
Whilst taking into account ring strain, the most striking example in this series is bicyclobutane: 0.74 kJ/mol from the experiment where other methods totally fail.
In summary, these results clearly reveal that for the (alkylsubstituted-)cycloalkanes, one should take into account ring strain, a well-known phenomenon in the field of physical organic chemistry. However, as these ring strain energies themselves are evaluated from the differences between the experimental values and the expected, additive values, the straightforward way is to adopt the experimental values for the unsubstituted cycloalkanes; in other words, adopt each unsubstituted cycloalkane as an individual new group.