Next Article in Journal
Characterization and Quantification by LC-MS/MS of the Chemical Components of the Heating Products of the Flavonoids Extract in Pollen Typhae for Transformation Rule Exploration
Previous Article in Journal
New Non-Toxic Semi-Synthetic Derivatives from Natural Diterpenes Displaying Anti-Tuberculosis Activity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Generally Applicable Computer Algorithm Based on the Group Additivity Method for the Calculation of Seven Molecular Descriptors: Heat of Combustion, LogPO/W, LogS, Refractivity, Polarizability, Toxicity and LogBB of Organic Compounds; Scope and Limits of Applicability

Department of Chemistry, University of Basel, Basel 4003, Switzerland
Molecules 2015, 20(10), 18279-18351; https://doi.org/10.3390/molecules201018279
Submission received: 5 August 2015 / Revised: 25 September 2015 / Accepted: 29 September 2015 / Published: 7 October 2015
(This article belongs to the Section Medicinal Chemistry)

Abstract

:
A generally applicable computer algorithm for the calculation of the seven molecular descriptors heat of combustion, logPoctanol/water, logS (water solubility), molar refractivity, molecular polarizability, aqueous toxicity (protozoan growth inhibition) and logBB (log (cblood/cbrain)) is presented. The method, an extendable form of the group-additivity method, is based on the complete break-down of the molecules into their constituting atoms and their immediate neighbourhood. The contribution of the resulting atom groups to the descriptor values is calculated using the Gauss-Seidel fitting method, based on experimental data gathered from literature. The plausibility of the method was tested for each descriptor by means of a k-fold cross-validation procedure demonstrating good to excellent predictive power for the former six descriptors and low reliability of logBB predictions. The goodness of fit (Q2) and the standard deviation of the 10-fold cross-validation calculation was >0.9999 and 25.2 kJ/mol, respectively, (based on N = 1965 test compounds) for the heat of combustion, 0.9451 and 0.51 (N = 2640) for logP, 0.8838 and 0.74 (N = 1419) for logS, 0.9987 and 0.74 (N = 4045) for the molar refractivity, 0.9897 and 0.77 (N = 308) for the molecular polarizability, 0.8404 and 0.42 (N = 810) for the toxicity and 0.4709 and 0.53 (N = 383) for logBB. The latter descriptor revealing a very low Q2 for the test molecules (R2 was 0.7068 and standard deviation 0.38 for N = 413 training molecules) is included as an example to show the limits of the group-additivity method. An eighth molecular descriptor, the heat of formation, was indirectly calculated from the heat of combustion data and correlated with published experimental heat of formation data with a correlation coefficient R2 of 0.9974 (N = 2031).

1. Introduction

The published methods for the calculation of a molecular descriptor, if based on a given set of experimental data for known molecules, usually cannot be generalized, be it that they are based on certain molecular fragment parameters such as bond energies [1,2,3], only applicable for thermodynamic properties, be it that they are founded on simple atom contribution methods [4], referring to the atoms’ properties themselves or on substituents [5], which are also of limited viability. Hence, the goal was to find a method which would overcome all of these limitations and, beyond this, would allow the development of a general computer algorithm for the reliable calculation of as many molecular descriptors as possible which utilises the molecular structures and properties as available from a given compounds database.
The most promising approach was described by Ghose and Crippen for the calculation of the logPO/W values [6,7], where the molecules are broken down into a set of up to 110 atom types, for which the hydrophobicity contribution was calculated from experimental data using the group-additivity model and least-squares technique. Analogously, the authors used this approach for the evaluation of the molar refractivity [8]. The standard fitting procedure for the latter, however, was replaced by a quadratic programming algorithm, arguing that the “physical concept of molar refractivity is the volume of the molecule or atom, which cannot have a negative value”, which is not guaranteed if the standard procedure is applied.
Furthermore, K. J. Miller [9,10] applied the group additivity method for the calculation of the molecular polarizability using atomic hybrid components and atomic hybrid polarizabilites, an approach which differs from the present one in that the type of the neighbourhood atoms is ignored.
Klopman, Wang and Balthasar [11] tried a similar method to Ghose and Crippen’s for the estimation of the aqueous solubility of organic compounds, deriving their own experience on the applicability of the group-additivity method for the calculation of the logP values. Analogously, H. Sun [12] developed a universal group-additivity system for the prediction of logP, solubility logS, logBB (to which will be referred to later) and human intestinal absorption.
Earlier methods for the calculation of the heat of combustion have either been derived from the additivity of bond energies as suggested by Pauling [1], Klages [2] and Wheland [3], or are based on various empirical relations between certain features of a series of molecules, such as the percentage of carbon [13] or hydrogen [14], and their heat of combustion. Further attempts [15] have been made using group contributions, which are based on theoretical assumptions and the “heats of atomization”. Another approach has been chosen by Kharash [16,17] in that his method of calculation depends on the number of electrons in a molecule, multiplied by the combustion value of each electron and the result corrected for structural and functional features. There are many more publications suggesting various empirical methods for the calculation of the heat of combustion from experimental data (short abstracts of which have been given by Handrick [18]), however, in all these cases they are limited to specific classes of molecules. In 1956, Handrick [18] published a method which is “based on adequate experimental evidence that the molar heat of combustion of any organic homologous series bearsa straight-line relation to the number of atoms of oxygen lacking in the molecule which are required to burn the compounds to carbon dioxide, water, nitrogen, HX, and sulfur dioxide.” He called this number “molecular oxygen balance”. For the calculation he used this parameter together with a number of rules for various functional groups and applying paraffin as a base. Evidently, none of the methods described so far provides a straightforward pathway to a simple algorithm for the calculation of the heat of combustion, which is generally applicable for any kind of complexities of molecules. Nevertheless, Handrick’s observation of the rigid relation between starting material and combustion products clearly indicated that a generalizable approach for the calculation of the heat of combustion is achievable.
For the calculation of the heat of formation there are many highly sophisticated quantum-theoretical methods on the market nowadays, (see, e.g., Ohlinger et al. [19]). However, these methods have a few disadvantages in that they are usually progressively time-consuming and thus expensive for routine evaluations and limited to relatively small molecules. Beyond this, the accuracy of their results is by no means better than the one achieved by group-additivity methods. Therefore, the latter approach, as described in 1993 by Cohen and Benson [20] for enthalpy-of-formation calculations, has still found its justification in that it is very fast and its parameters are based on experimental data.
A particularly difficult field in computer chemistry is the prediction of the biological activity of molecules, because in most cases their mode of action is unknown and even varies from molecule to molecule. Therefore, studies dealing with the calculation of bioactivity descriptors based on a series of experimental data usually do not, or only summarily, discuss the reason as to why a certain set of molecular parameters has been applied. Typical examples are the descriptors toxicity and the blood-brain barrier described in the following.
Prediction of the toxicity of organic compounds in water has become another important area for QSAR studies. In most cases the experimental data for a series of commonly used compounds have been determined by their effects on the protozoan Tetrahymena pyriformis. Various methods have been applied to predict this descriptor: recently, Schultz [21] derived the toxicity of a series of substituted benzenes from the hydrophobicity, determined as logPO/W, plus the electrophilic reactivity, quantified by the maximum superdelocalizability Smax; Duchowicz et al. [22] filtered out seven parameters from a set of 1338 topological, geometrical and electronic molecular descriptors, feeding them into an artificial neural network to evaluate the toxicity of 250 phenol derivatives; similarly, Melagraki et al. [23] used the hydrophobicity (logPO/W), the acidity constant (pKa), the HOMO and LUMO orbital energies and the hydrogen bond donor number (Nhdon) and applying an ANN method based on the radial basis function architecture for the prediction of the toxicity of 221 phenols and compared the data to standard multiple linear regression models; Ellison [24] reduced the number of parameters to the hydrophobicity logPO/W itself plus a constant to derive the toxicity of alcohols, esters, ketones and cyanides, defining for each of these groups a structural range of applicability; density functional theory as well as other semiempirical Hamiltonian methods have been used by Pasha [25] to evaluate—besides the molecular weight—the hardness, chemical potential, total energy and electrophilic index, which are then introduced into a multiple linear regression analysis and various other regression calculations for the evaluation of the toxicity of 50 phenol derivatives. A preliminary attempt, induced by Ellison’s work, to directly correlate logPO/W with toxicology data of 335 compounds for which both experimental data are known and which encompass the whole range of chemical structures mentioned above yielded a correlation coefficient R2 of 0.7043 (the correlation diagram of which is shown further down). This encouraging result gave reason to try to apply the group-contribution method itself for the calculation of a compound’s toxicology value, based on the experimental data of the entire spectrum of chemical structures as far as their experimental data were available.
The blood-brain barrier (BBB) is a very efficient cellular system to protect the brain from unwanted content in the surrounding blood stream. In most cases, this may be desirable to prevent CNS-related side-effects of drugs. Logically, however, this barrier also tries to prevent intrusion of therapeutic chemicals for treatment of cerebral diseases. Fortunately, at least in the therapeutic sense, this barrier is not completely insurmountable, but the experimental determination of the barrier penetration of a new drug is time-consuming and expensive. Therefore, many attempts to predict the degree of BBB penetration, defined as the steady-state brain/blood distribution ratio logBB, have been published: Luco [26] used topological descriptors in partial least-squares analysis for the modeling logBB of 61 compounds; Fu et al. [27] based their model on the molecular volume and polar surface area of 79 compounds; the electrotopological states of the constituting atoms of 106 molecules was used by Rose et al. [28]. Thermodynamic calculations, such as the evaluation of the free solvation energy by Keserü and Molnar [29] as well as molecular dynamics simulations, e.g., by Carpenter et al. [30], have been applied to predict logBB, based on a very limited number of examples. Genetic algorithms have been used by Hou and Xu [31] on a series of 27 descriptors calculated from 96 structurally diverse compounds in order to select the statistically most significant groups of linear models with up to three or four descriptors. They concluded from the best-fitting models that logP and the partial negative solvent-accessible surface area play a crucial role in the BBB permeability. Similarly, Chen et al. [32] also observed the importance of the polar surface area and logP, using an artificial neural network model. On the other hand, P. Garg and J. Verma [33], also based on an ANN model, concluded that the order of importance in the evaluation of the BBB permeability is the molecular weight, followed by the polar surface area, logP, the number of H-bond acceptors and the number of H-bond donors. Quantum chemical descriptors (dipole moment, polarizability, equalized molecular electronegativity, molecular hardness, molecular softness, molecular electrophilicity, charges, charge separations, covalent H-bond acidity and basicity as well as electrostatic potential derived properties), calculated by an ab initio method, have been put together by van Damme et al. [34] with a series of classical descriptors encompassing logP, molecular weight, polar surface area and further structure- and shape-related properties in a model of finally eight parameters. Again, it turned out that loP and the polar surface area, besides the Mulliken charge-related descriptors, seem to be essential attributes of the model to reproduce the logBB data best, which they ascribe to the assumption that “logBB is a function of the lipophilicity and electronic properties of the molecule” [34]. Several further authors carried out logBB calculations based on the two parameters logP and polar surface area of the molecules, either on these parameters alone such as Clark [35] or together with the polarizabilty (De Sä et al. [36]), or including the number of acidic or basic atoms (Vilar et al. [37]), or only logP together with the molecular mass or the isolated atomic energy (Bujak et al. [38]). Interestingly however, Lanevskij et al. [39] observed that there is no direct correlation between logPO/W and logBB at all (a fact which is confirmed in the present work), indicating “that logBB is not a measure of lipophilicity-driven BBB permeability” [39]. They found that replacement of the experimental logBB values by the ratios of total brain to unbound plasma concentrations (which meant to correct logBB by the amount of protein binding in the plasma) considerably improved correlation with logP. Sun [12] tried a direct approach to evaluate logBB by applying a number of atom type descriptors, which is very similar to the present group-additivity method, characterizing 57 compounds, representing a limited structural diversification set.
In view of the many different—successful but mostly elaborate—attempts to reliably evaluate all the molecular descriptors mentioned above it seemed unrealistic to propose a general and simple computer algorithm which would be able to calculate all the descriptors at once. However, as will be shown here, the present algorithm lifts all the limitations discussed above and is not only suitable for the calculation of thermodynamic (heat of combustion and—indirectly-formation), solubility-related (logP and logS), optical (molar refractivity), electrical (molecular polarizability) as well as biological (toxicology and potentially CNS-related) properties of a molecule at once, but also delivers reliable results and, beyond this, has the advantage of being easily extendable to compounds with structural features for which as yet no parameters are known without the need to readjust the computer algorithm.

2. General Procedure

The general algorithm for the calculation of the mentioned molecular descriptors is founded on the principle of atom group contributions in analogy to the method described by Ghose and Crippen [6,7], extended in some cases by a few specific terms which will be outlined later on.

2.1. Definition of the Atom Groups

The present calculation procedure takes advantage of a knowledge database of presently more than 20,000 compounds, stored in geometry-optimized three-dimensional form, wherein—fulfilling the first requirement—for a certain number of molecules the experimental values for the molecular descriptors considered here are known and included in the database, each by a specific term known to the computer algorithm.
The second requirement for the calculation of the contributions of the atom groups is their definition. Since in the present approach, which should be equally applicable for the calculation of various molecular descriptors which have nothing in common but the molecular structure as a whole, no prior assumption was allowed as to the method of partitioning the molecule into its fragments. Therefore, in a potentially naive attempt, the molecular structures are broken down into their lowest-possible but still distinguishable fragments, i.e., into the constituting atoms and their immediate neighbourhood as was suggested by Cohen and Benson [20,40]. Under this prerequisite, in principle, the definition of the group terms and their setup in a table could have been taken over by a computer algorithm, which would make use of the structural information of all the molecules in the database for which the requested experimental data are known, but in order to maintain a certain logic in the table order, the group terms have been generated manually and set up in a general table, which then should serve as a “mother” table for the individual parameters tables.
The above-mentioned fragmentation principle made it easy to define the atom groups in a standardized way enabling it to be set up into a programmable algorithm: each group consists of a central atom and its immediate neighbour atoms. The central atom, called “backbone atom”, is bound to at least two other atoms and is characterized by its atom name, its atom type being defined by either its orbital hybridization or bond type or its number of bonds, where required for distinction, and by its charge, if not zero. The neighbour atoms are collected in a term which lists all the neighbours following the order H > B > C > N > O > S > P > Si > F > Cl > Br > I and for each encompasses—in this order—the bond type of its bond with the backbone atom (if not single), its atom name and its number of occurrences (if >1). (For better readability of a neighbours term containing iodine its symbol is written as J.) Additionally, if the total net charge of the neighbour atoms is non-zero, the charge is appended to the neigbour term by a “(+)” or “(−)”, respectively.
Finally, for N with three single bonds (atom type “N sp3”) and O and S with two single bonds (atom types “O” and “S2”, respectively), where neighbour atoms are part of a conjugated moiety, the neighbour term is further supplemented by the terms “(pi)”, “(2pi)” or “(3pi)”, respectively. This is to take account of the increased strength of a group’s bonds due to the π-orbital conjugation of the backbone atom’s lone-pair electrons with conjugated neighbour moieties.
Hence, an atom group is uniquely defined by the term for the backbone-atom type and the term for its neighbours, which is easily interpretable as shown in the examples Table 1. For clarity the backbone atom is pronounced in the “meaning” column in boldface.
Table 1. Group examples and their meaning.
Table 1. Group examples and their meaning.
Atom TypeNeighboursMeaningAtom TypeNeighboursMeaning
C sp3H3CC–CH3N sp3H2CC–NH2
C sp3H3NN–CH3N sp3H2C(pi) C–N*H2
C sp3H2C2C–CH2–CN sp3C2N(2pi)C–N*(N)–C
C sp3H2COC–CH2–ON sp2H=CC=NH
C sp3HC3C–CH(C)–CN sp2C=NN=N–C
C sp3HC2ClC–CH(Cl)–CN sp2=COC=N–O
C sp3HCO2C–CH(O)–ON(+) sp3H3CC–NH3+
C sp3C3NC–C(C)2–NN(+) sp3H2C2C–NH2+–C
C sp3C2F2C–CF2–CN(+) sp2CO=O(−)O=N+(O)–C
C sp2H2=CC=CH2N aromatic:C2C:N:C
C sp2HC=CC=CH–CN(+) sp=N2(−)N=N+=N(−)
C sp2HC=NN=CH–COHCC–OH
C sp2H=CNC=CH–NOHC(pi)C–O*H
C sp2HN=OO=CH–NOSi2Si–O–Si
C sp2C2=OO=C(C)–CP3C3C–P(C)–C
C sp2C=CNC=C(C)–NP4CO2=OO=P(O2)–C
C sp2=CNOC=C(N)–OP4N2O=OO=P(O)(N)–N
C sp2N=NON=C(N)–OS2HC(pi)C–S*H
C sp2NO=OO=C(N)–OS2CSC–S–S
C aromaticH:C2 aC:CH:CS4CO=O2C–S(=O)2–O
C aromaticH:C:NC:CH:NS4O2=OO–S(=O)–O
C aromatic:CN:NC:C(N):NSiC2Cl2C–SiCl2–C
C spH#C bC#CHSiOCl3O–SiCl3
C spC#NN#C–C
C sp#CNC#C–N
C sp=C2C=C=C
C sp=C=OC=C=O
a: : represents an aromatic bond; b: # represents a triple bond; *: lone-pair electrons form π-orbital conjugated bonds with neighbour atoms.
It is evident that this radical break down of molecules into the atom groups as shown does not reflect any knowledge about the molecules’ three-dimensional structure. Yet, it is well known that structural peculiarities such as buttressing effects, ring strains, gauche bond interactions or internal hydrogen bonds have a distinct influence on the values of the molecules’ heat of formation and combustion.
In the case of the calculation of logP values, Klopman et al. [41], using a different group-additivity method, found that for pure saturated and unsaturated hydrocarbons inclusion of a correction factor per carbon atom clearly improved conformance with experiments. They also added a correction parameter for non-branched (CH2)n chains on (hetero)aromatics with a polar end group X where n is greater than 1. Although the atom group fragmentation method in the present case is more detailed, the suggested correction factors have been included here as well (and in the case of the non-branched CH2 chains without restrictions). They indeed caused some improvement as will be outlined later.
In order to take account of these specific steric interactions and hydrophobic effects, the table of atom groups has been extended by some groups for which the terms “atom type” and “neighbours” are not rigorously applicable, but which are treated in the calculation of the group contributions in exactly the same way as ordinary atom groups. In Table 2, the definitions of these special groups and their explanation are given.
Table 2. Special Groups and their Meaning.
Table 2. Special Groups and their Meaning.
Atom TypeNeighboursMeaning
HH AcceptorIntramolecular H bridge between acidic H (on O, N or S) and basic acceptor (O, N or F)
HHIntramolecular H–H distance <2 Angstroms
HHIntramolecular H–H distance 2–2.3 Angstroms
Angle60 Bond angle <60 deg
Angle90 Bond angle between 60 and 90 deg
Angle102 Bond angle between 90 and 102 deg
AlkaneNo of C atomsCorrection factor per carbon atom in pure alkanes
Unsaturated HCNo of C atomsCorrection factor per carbon atom in pure aromatics, olefins and alkynes
X(CH2)nNo of CH2 groupsCorrection factor per CH2 group in CH2 chains with end group X = CH3, NH2, OH, SH or halogen
The present detailed fragmentation of the molecules clearly bears positive and negative consequences. On the positive side lies the stronger “individualization” of the atom groups leading to better conformance with experimental data. This is particularly evident when dealing with molecules which can acquire various prototropic forms, e.g., ordinary amino acids, the equilibrium of which usually lies on the zwitterionic side. This paper will show that the differences between the calculated and experimental values of certain properties immediately answer the question concerning these equilibria. A second advantage of the present fragmentation method is the easy extendability of the number of atom groups if required for the inclusion of further molecules with known experimental descriptors data without the need to alter the computer algorithm. In fact, it is the applied parameters table itself instructing the computer program which atomic and special groups are to be taken into account for the calculations of the contributions and subsequently the descriptor data.
The negative side of this detailed molecule break-down, however, already shows up at the time of evaluating the group-contribution values: the number of molecules carrying a specific atom group can decrease to figures, which are no longer representative to confirm the final contribution value. In the extreme case of only one molecule for a given atom group, its calculated contribution value is merely the “last” summand to exactly fit the experimental descriptor value. The present work took account of this in that in all the consecutive calculations of molecular descriptors only atom groups were considered which were represented by at least three independent training molecules.
An obvious consequence of these conditions is apparent when entering a new molecule for which not all of the atom groups it contains are found—or if found are represented by less than three training molecules—in the parameters table. In that case the corresponding molecular descriptor can simply not be evaluated. This consequently requires that the first step of an automated calculation algorithm is to check if all these conditions are met.

2.2. Calculation of the Group Contributions

The algorithm for the evaluation of the atom group contributions for each of the title descriptors is identical. The only difference is given by the input data: the first step is the extraction from the database of a list of molecules with the known experimental value of the descriptor in question. For each molecule of this list the atom groups are then defined and counted following the rules given above.
The further proceeding is then ruled by the content of the manually set-up “mother”-parameters table of atomic and special groups: this mother table initially covers all possible combinations of “backbone” atom types and neighbourhoods. For a specific descriptor, however, always a certain—and for each descriptor different—surplus number of atom groups remains which is not represented in any molecule of the applied molecules list. These atom groups are removed before proceeding further, thus leaving an individual parameters table for a particular descriptor. This table is finally complemented with those special groups shown in Table 2 as required for this descriptor.
The resulting data set is then translated into an M × (N + 1) matrix where M is the number of molecules and (N + 1) the number of atomic and special groups plus an element for the experimental value. Each matrix element (i,j) then receives the number of occurrences of the jth atomic or special group in the ith molecule. After normalization of this matrix into an Ax = B matrix equation and its equalization by means of the Gauss-Seidel calculus, the resulting group-contribution values are entered into the corresponding parameters table. Additionally, to each atomic and special group the number of its occurrences (its frequency) and the number of molecules containing it are added. Next, the parameters table receives the information about the goodness of fit (R2), the average and standard deviation and the total number of molecules on which the calculation is based.

2.3. Calculation of the Descriptors

Once the group contributions are set up in the corresponding parameters tables, the computation of any of the descriptors’ values Y is a mere summing up of the contributions of the atom groups found in a molecule following the general Equation 1
Y = i a i A i + j b j B j + C
wherein ai and bj are the contribution values, listed in the respective parameters table, Ai is the number of occurrences of the ith atom group, Bj is the number of occurrences of the special groups and C is a constant. However, as was mentioned earlier, this calculation is limited to molecules for which each atom group it contains (not special group!) the corresponding one is present in the corresponding parameters table and its value is confirmed by at least three training molecules. Hence, a computer algorithm has to start with the definition and counting of all the molecule’s atom groups (applying the same procedure as in the second step for the calculation of the group contributions), then check for any atom group that is missing (or is not confirmed) in the parameters table and then either continue using the above formula if all groups are found or reject further calculation. Calculation of all the title descriptors at once on a notebook is done in a split second, once the compound’s three-mensional structure is generated and added to the molecules database (see Appendix).

2.4. Cross-Validation Calculations

In order to check the plausibility of the results of the group-additivity method for the prediction of the molecular descriptors, in each case a k-fold cross-validation calculation is carried out, whereby, after a few tentative calculations with various k values, k is in all cases chosen to be 10. Accordingly, the complete list of compounds holding a particular experimental descriptor value is first copied into a training set, wherefrom a test set is extracted by the transfer of every k-th, i.e., every 10th compound, thus producing a training set containing 90% of the molecules of the original list and the remaining 10% as test set. In a next step, the training set is used to calculate the atom groups parameters set and then, by means of these parameters, the prediction value is evaluated for each molecule of the test set and added to its properties list. This procedure is repeated k (=10) times, each time shifting the extraction process for the test-set from the re-setup training set by the repetition run-time number, this way making sure that each compound is used exactly once as a test molecule and that no inadvertent clusters of certain structures are extracted from the training sets. Finally, the collected prediction data of all the test molecules are used to evaluate the cross-validated regression coefficient Q2 and the corresponding average and standard deviation. These data are finally entered at the end of each parameters table. The number of compounds on which these cross-validation calculations are founded is in general smaller than the number of compounds used for the evaluation of the correlation coefficient R2, because due to the exclusion of the test compounds in the atom group parameters calculations certain atom groups may no be longer represented by enough molecules and, thus, test compounds having these atom groups are excluded from the prediction calculation.

3. Results

General remark: In all the correlation diagrams of the following chapters cross-validated data, if included, are indicated as red circles.

3.1. Heat of Combustion

In order to achieve reproducibility over all compound classes and literature references, the experimental data have only been accepted for the calculations if the starting material as well as its combustion products are described as relaxed in their thermodynamic standard states, i.e., in their stable form at 25 °C and standard atmospheric pressure. The computation of the atom group contributions listed in Table 3 are based on the experimental data of organic molecules published in several papers, essentially E. S. Domalski’s collection of compounds [42] containing the elements C, H, N, O, P and S, supplemented with data for further nitrogen compounds by Young et al. [43], for a series of amino acids by Ovchinnikov [44], for fluoro and chloro compounds by Cox et al. [45], Smith et al. [46] and Shaub [47], for bromo compounds by Bjellerup [48], for peroxy acids and esters by Swain Jr. et al. [49], for silicon-containing compounds by Tannenbaum et al. [50] and Good et al. [51], and finally by the National Institute of Standards and Technology [52] and their respective literature citations. A number of experimental heat-of-combustion data was indirectly evaluated from experimental heat-of-formation values of compounds, for which only these were cited [53], using standard heat-of-formation data for the oxidation products. Where required the data are multiplied from kcal/mol to kJ/mol by the factor 4.1868. The calculations excluded compounds containing elements that differ from H, B, C, N, O, P, S, Si or the halogens. Explanations of the groups definitions in Table 3 are given in Table 1.
Table 3. Atom groups and their Contributions (in kJ/mol) for Heat-of-Combustion Calculations.
Table 3. Atom groups and their Contributions (in kJ/mol) for Heat-of-Combustion Calculations.
NrAtom TypeNeighboursContributionOccurrencesMolecules
1BC3−4309.0533
2C sp3H3B439.8831
3C sp3H3C−773.8322941153
4C sp3H3N−1199.1011065
5C sp3H3N(+)−817.9433
6C sp3H3O−1112.98178115
7C sp3H3S−1396.742319
8C sp3H3P−1052.6431
9C sp3H3Si−1008.775116
10C sp3H2BC553.8962
11C sp3H2C2−652.474413912
12C sp3H2CN−1074.20183117
13C sp3H2CN(+)−705.224426
14C sp3H2CO−980.99610374
15C sp3H2CS−1274.7810672
16C sp3H2CP−852.2252
17C sp3H2CF−623.1587
18C sp3H2CCl−617.405142
19C sp3H2CBr−623.392219
20C sp3H2CJ−685.52108
21C sp3H2CSi−932.852213
22C sp3H2N2−1480.5292
23C sp3H2N2(+)−807.5111
24C sp3H2NO−1375.7211
25C sp3H2O2−1279.46119
26C sp3H2OCl−951.9532
27C sp3H2S2−1932.8853
28C sp3HC3−529.62363254
29C sp3HC2N−957.934737
30C sp3HC2N(+)−575.783332
31C sp3HC2O−850.09277138
32C sp3HC2S−1152.312016
33C sp3HC2F−504.4233
34C sp3HC2Cl−497.941010
35C sp3HC2Br−500.7097
36C sp3HC2J−558.9211
37C sp3HCN2−1363.1711
38C sp3HCN2(+)−672.5622
39C sp3HCO2−1153.934030
40C sp3HCF2−433.9487
41C sp3HCFCl−472.9644
42C sp3HCCl2−494.6298
43C sp3HCClBr−518.1811
44C sp3HCBr2−476.3711
45C sp3HN3(+)−870.1911
46C sp3HO3−1433.0844
47C sp3HOF2−729.4822
48C sp3C4−403.8011791
49C sp3C3N−813.971310
50C sp3C3N(+)−426.891312
51C sp3C3O−730.083630
52C sp3C3S−1023.121512
53C sp3C3F−179.9322
54C sp3C3Cl−361.2122
55C sp3C3Br−362.5322
56C sp3C3J−432.3011
57C sp3C2N2(+)−626.5654
58C sp3C2O2−1004.062524
59C sp3C2F2−320.266015
60C sp3C2FCl−318.8421
61C sp3C2Cl2−356.7344
62C sp3CN3(+)−746.4164
63C sp3CO3−1284.9276
64C sp3COF2−649.8311
65C sp3CF3−243.864536
66C sp3CF2Cl−302.7386
67C sp3CF2Br−320.4654
68C sp3CFCl2−323.4355
69C sp3CFClBr−275.6711
70C sp3CCl3−366.351413
71C sp3CBr3−339.3911
72C sp3N4(+)−896.0711
73C sp3O4−1580.1422
74C sp3OF3−531.6522
75C sp2H2=C−702.52164148
76C sp2H2=N−928.8011
77C sp2HC=C−566.63462270
78C sp2HC=N−762.241413
79C sp2HC=O−396.096057
80C sp2H=CN−958.413224
81C sp2H=CN(+)−595.0833
82C sp2H=CO−747.982018
83C sp2H=CS−1161.32119
84C sp2H=CF−546.9822
85C sp2H=CCl−555.3365
86C sp2H=CBr−573.3922
87C sp2H=CSi−833.0533
88C sp2HN=N−1134.461815
89C sp2HN=O−762.261010
90C sp2H=NO−916.5322
91C sp2HO=O−545.941919
92C sp2H=NS−1372.7222
93C sp2C2=C−433.9912597
94C sp2C2=N−630.4065
95C sp2C2=O−248.779478
96C sp2C=CN−825.513326
97C sp2C=CO−602.481616
98C sp2C=CS−1031.7133
99C sp2C=CF−439.0353
100C sp2C=CCl−397.7585
101C sp2CN=N−991.991716
102C sp2CN=O−621.4312895
103C sp2CN=S−1460.2832
104C sp2CO=O−389.60500370
105C sp2CO=O(−)−534.914945
106C sp2C=OS−844.4844
107C sp2C=OF−174.2811
108C sp2C=OCl−205.8087
109C sp2C=OBr−204.2222
110C sp2C=OJ−281.7022
111C sp2=CN2−1249.5188
112C sp2=CNO(+)−678.4222
113C sp2=COF−430.5722
114C sp2=CF2−415.9798
115C sp2=CFCl−359.7511
116C sp2=CCl2−407.9443
117C sp2=CJ2−544.2521
118C sp2N2=N−1416.914035
119C sp2N2=O−1022.835647
120C sp2N2=S−1839.8355
121C sp2N=NO−1202.5211
122C sp2NO=O−772.0877
123C sp2N=OS−1488.4811
124C sp2NS=S−2092.9932
125C sp2O2=O−546.6766
126C sp2O=OCl−338.2522
127C aromaticH:C2−543.643345599
128C aromaticH:C:N−776.864730
129C aromaticH:C:N(+)−497.1032
130C aromaticH:N2−1022.1622
131C aromatic:C3−407.7223572
132C aromaticC:C2−413.58769420
133C aromaticC:C:N−630.623817
134C aromaticC:C:N(+)−361.5411
135C aromatic:C2N−844.05161113
136C aromatic:C2N(+)−494.9714476
137C aromatic:C2:N−644.821913
138C aromatic:C2O−619.1212293
139C aromatic:C2S−1044.732113
140C aromatic:C2F−401.834014
141C aromatic:C2Cl−393.223320
142C aromatic:C2Br−399.8444
143C aromatic:C2J−468.141714
144C aromatic:C2Si−686.6421
145C aromatic:CN:N−1064.4732
146C aromatic:C:NO−835.9853
147C aromaticN:N2−1260.9063
148C aromatic:N3−583.1833
149C aromatic:N2Cl−828.4811
150C spH#C−653.923428
151C spC#C−506.415534
152C spC#N−508.615340
153C sp#CN−1006.6922
154C sp#CCl−512.2111
155C spN#N−912.2022
156C sp#NO−801.8911
157C sp=C2−554.4766
158C sp=C=N−741.1922
159C sp=C=O−323.5511
160C sp=N=O−433.0654
161C sp=N=S−1250.0011
162N sp3H2C144.434944
163N sp3H2C(pi)191.61124102
164N sp3H2N−321.731211
165N sp3H2N(pi)−263.4211
166N sp3H2S−356.5411
167N sp3HC2657.843028
168N sp3HC2(pi)707.925847
169N sp3HC2(2pi)714.3011784
170N sp3HCN209.2132
171N sp3HCN(pi)254.66159
172N sp3HCN(2pi)274.142725
173N sp3HCN(+)(2pi)382.9333
174N sp3C31170.072218
175N sp3C3(pi)1214.782722
176N sp3C3(2pi)1214.872413
177N sp3C3(3pi)1229.1622
178N sp3C2N739.4111
179N sp3C2N(pi)781.0611
180N sp3C2N(+)(pi)919.5864
181N sp3C2N(2pi)771.901613
182N sp3C2N(+)(2pi)879.1043
183N sp3C2N(3pi)787.2555
184N sp3C2Si750.9111
185N sp3C2Cl(2pi)747.4811
186N sp3C2Br(2pi)769.4511
187N sp3CN2(2pi)384.2264
188N sp3CN2(3pi)424.6511
189N sp2H=C−7.7088
190N sp2C=C550.753732
191N sp2C=N310.592814
192N sp2C=N(+)237.351111
193N sp2=CN119.105142
194N sp2=CN(+)302.1411
195N sp2C=O396.9755
196N sp2=CO192.01129
197N sp2N=N−89.716431
198N sp2N=O−43.1222
199N sp2O=O356.3522
200N aromaticH2:C(+)−122.0373
201N aromaticHC:C(+)814.5711
202N aromaticC2:C(+)1314.7411
203N aromatic:C2412.856447
204N aromatic:C:N134.4521
205N(+) sp3H3C259.933635
206N(+) sp3H2C2381.0544
207N(+) sp3HC3531.0363
208N(+) sp2CO=O(−)116.31218116
209N(+) sp2C=NO(−)139.3211
210N(+) sp2NO=O(−)−143.131411
211N(+) sp2O2=O(−)436.53116
212N(+) aromaticH:C2297.1822
213N(+) spC#C(−)−520.5122
214N(+) sp=N2(−)−156.851010
215OHC389.05437219
216OHC(pi)283.41309243
217OHN(pi)−67.4396
218OHO−30.2587
219OHS−2.7365
220OHSi209.4111
221OC2778.41245141
222OC2(pi)676.08299224
223OC2(2pi)540.344341
224OCN(pi)0.0022
225OCN(+)(pi)0.00116
226OCN(2pi)242.7133
227OCO377.06118
228OCO(pi)233.20119
229OCS309.60179
230OCP386.05135
231OCP(pi)225.0931
232OCSi392.68237
233OSi234.7583
234P3C30.0011
235P4C2O=O−128.0311
236P4C3=O−172.0211
237P4O3=O8.5455
238S2HC−110.343935
239S2HC(pi)−117.4433
240S2C2629.974036
241S2C2(pi)613.0477
242S2C2(2pi)652.851211
243S2CS25.47168
244S2CS(pi)13.6563
245S4C2=O764.2844
246S4C2=O21000.311414
247S4CO=O2(−)113.6211
248S4NO=O22.7311
249S4O2=O−121.5244
250S4O2=O289.4466
251S4O=O2F−120.5211
252S4O=O2Cl−114.1011
253SiH3C−1004.6344
254SiH2C2−581.6522
255SiHC3−193.1822
256SiHC2Cl−561.6911
257SiHCCl2−414.4811
258SiHO3−463.1511
259SiC4130.9033
260SiC3N0.0011
261SiC3O−97.1632
262SiC3Cl70.0911
263SiC3Br57.5511
264SiC2O238.9483
265SiC2Cl2−0.6644
266SiCO38.6265
267SiCCl3−133.9011
268HH Acceptor1.2510080
269H.H−1.221623467
270H..H−1.092258595
271Angle60 −38.4512038
272Angle90 −25.2818687
273Angle102 −5.65469184
ABased on 2151
BGoodness of fitR21.00 2031
CDeviationAverage16.00 2031
DDeviationStandard22.93 2031
EK-fold cvK10.00 1965
FGoodness of fitQ20.9999 1965
GDeviationAverage (cv)17.50 1965
HDeviationStandard (cv)25.20 1965
In view of the hitherto various approaches mentioned above to calculate the heat of combustion, which are mostly restricted to a limited class of compounds, it seems at first glance odd to assume that the present simple group additivity method should be able to cover the whole spectrum of classes of chemical compounds. However, on second thought this approach resembles the bond-energy addition method as suggested by Pauling [1], Klages [2] and Wheland [3], except that in this case not the energy of specific bonds are summed up but the energy of bond clusters around “backbone” atoms. In particular, the contributions of the intramolecular effects are worth mentioning, showing that while intramolecular interactions (lines 268–270) seem negligible, the ring strain effects (lines 271–273) are quite significant and follow the expected order and sign.
In Table 3, row A indicates the total number of molecules on which the calculation of the atom group parameters is based. Rows B to D, showing the correlation coefficient R2, average and standard deviation of the complete training set, and rows F to H, presenting the analogous values Q2 and deviations resulting from the k-fold cross-validation calculation with k = 10 (row E) prove the surprisingly excellent correlation of the calculated with the experimental data in view of the large range of heat-of-combustion values of between −42,860 (glyceryl tribrassidate, calc. −42,915) and −217.71 (oxalic acid dihydrate, calc. −235.5) kJ/mol with a goodness of fit R2 of >0.9999 and a standard deviation of <23 kJ/mol. The cross-validated correlation coefficient Q2 of also 0.9999 and the only slightly larger deviation values prove the excellent quality of the group-additivity method for the prediction of heat-of-combustion data. As was mentioned earlier, in all correlation and deviation calculations only atom groups are considered which are represented by at least three molecules (last column); as a consequence, the number of molecules for the evaluation of these data is smaller than the basis set (row A) and atom groups that do not fulfil this requirement should only be viewed as indicative.
The deviations are also in good agreement with the variations of experimental data from various sources for several compounds, as exemplified by the compounds listed in Table 4. (A more detailed discussion of the reliability of published data is given in the next chapter.) For the calculations the amino acids are assumed to generally adopt the zwitterionic form (except those where the amino group is bound to a conjugated system as, e.g., in N-phenylglycine or N-formylleucine). However, test calculations applying their neutral forms show only minor differences in the data in comparison with those of the zwitterions as would be expected for this prototropic equilibrium.
Table 4. Heat-of-Combustion: Experiment vs. Calculation (in kJ/mol).
Table 4. Heat-of-Combustion: Experiment vs. Calculation (in kJ/mol).
CompoundExperimentalCalculated
Domalski [42]Various
Valine−2921.5−2910.7 [44]−2932.9
Threonine−2102.6−2084.6 [44]−2090.5
l-Proline −2746.2 [44]−2749.6
dl-Proline−2729.8−2729.6 [44]−2749.6
Isoleucine−3586.0−3578.3 [44]−3587.8
l-Serine−1455.8−1448.2 [44]−1441.4
dl-Serine −1441.9 [44]−1441.4
N-Carboxymethylglycine−1657.1−1641.8 [44]−1670.5
N-Formylleucine−3685.6−3814.6 [44]−3852.8
Trimyristin−27,842−27,643.7 [54]−27,771.8
Figure 1 graphically represents perfect compliance of the calculated with the experimental data for the heat of combustion. The complete set of results is available in a separate document of the Supplementary Material under the name of “Experimental vs Calculated Heat-of-Combustion Data Table.doc”, the associated list of compounds as SD file named “Compounds List for Heat-of-Combustion Calculations.sdf”.
Figure 1. Correlation diagram of heat-of-combustion data (10-fold cross-validated: N = 2031, Q2 = 0.9999, slope = 1.0).
Figure 1. Correlation diagram of heat-of-combustion data (10-fold cross-validated: N = 2031, Q2 = 0.9999, slope = 1.0).
Molecules 20 18279 g001
In the histogram (Figure 2) the distribution of the deviations of the complete training-set and the cross-validation data show a nearly perfect Gaussian bell curve, where the cross-validation deviations (in red) are typically less populated in the center area and more in the periphery of the histogram.

3.2. Heat of Formation

The excellent reliability of the predicted heat of combustion data also enabled the indirect calculation of the heat of formation of the molecules making use of the heats of formation of their oxidation products. Consequently, the same limitations concerning the elements as well as the computation constraints were valid. For these evaluations the heat of formation values of CO2, H2O, H3BO3, H2SO4(+115 H2O), H3PO4(c), SiO2 and aqueous hydrogen halides, given by Skinner [55] and Domalski [20] were applied.
For comparison the predicted heat of formation values were checked against experimental values the main source of which was again Domalski’s collection of compounds [42], supplemented by data from the table volume “Standard Thermodynamic Properties of Chemical Substances” [53]. Further experimental data for hydrocarbons were provided by Domalski and Hearing [56], National Institute of Standards and Technology [52] and for amino acids by V. V. Ovchinnikov [44].
Figure 2. Histogram of heat-of-combustion data (S = 25.2).
Figure 2. Histogram of heat-of-combustion data (S = 25.2).
Molecules 20 18279 g002
Figure 3. Correlation diagram of heat-of-formation data (N = 2031, R2 = 0.9974, slope = 1.0).
Figure 3. Correlation diagram of heat-of-formation data (N = 2031, R2 = 0.9974, slope = 1.0).
Molecules 20 18279 g003
The experimental enthalpy values extended from −7251 (Perfluorohexadecane, calc. −7232.48) to +792 (1,1′-dimethyl-5,5′-azotetrazole, calc. +764.35) kJ/mol. No outlier had to be removed from the enthalpy calculations. With regard to the high correlation coefficient R2 and the regression line having a slope of 1 (shown in Figure 3) the conclusion seems justified that any further prediction in- and outside the given range is reliable.
Despite the surprisingly low average and standard deviations in Table 3, which translate into analogous deviations for the heat of formation due to the indirect evaluation from the heat of combustion (neglecting their increase caused by the error propagation) one should not forget that from the perspective of a kineticist who is interested in reactivities and equilibria, a “sufficiently accurate” standard deviation should not exceed 4 kJ/mol, still equivalent to a change of an equilibrium constant at room temperature by a factor of >5 or the difference between about 90% and 64% yield in a chemical reaction, independent of the enthalpy magnitude itself [20].
In order to put the the deviations also into perspective with the uncertainty of the published input data, Table 5 compares the experimental data provided by various sources of a number of compounds with the result of the present calculations.
Table 5. Heat of Formation: Experiment vs. Calculation (in kJ/mol).
Table 5. Heat of Formation: Experiment vs. Calculation (in kJ/mol).
CompoundExperimentalCalculated
Domalski [42]Various
Ethyleneglycol−455.1−460.0 [53]−461.81
Benzaldehyde−84.2−87.0 [53]−86.37
Brassidic acid−896.0−960.7 [53]−913.74
Triphenylene141.2151.8 [56]173.36
Fluoranthene191.6230.3 [56]176.03
Pyrene114.9125.5 [56]152.23
Leucine−636.3−648.0 [44]−639.07
N-Carboxymethylglycine−919.0−932.6 [44]−905.86
l-Serine−726.8−732.7 [44]−741.09
Isoleucine−635.6−640.6 [44]−634.07
Table 4 and Table 5 also shed light onto the reliability of the published experimental thermodynamic data. Most authors discuss the probable error margins only summarily if at all. Domalski [42] defers in more detail to the uncertainties and derives their magnitude from the number of significant figures in the reported heat-of-combustion and formation data. Accordingly, a value cited to 0.01 is associated with an error of 0.05 to 0.5, a value cited to 0.1 with an error of 0.5 to 2 and a value cited to 1 with an error of 2 to 20 kcal/mol. Another important point is the state of the compound at room temperature for which the value is given. In some cases the authors provide data for two diffferent standard states; in this case the present paper applied the values for the normal state. A detailed discussion about the general accuracy of the experimental enthalpy data is given by Cohen and Benson [20].

3.3. Applicability and Limitations of the Group-Additivity Method for Thermodynamics Calculations

For the chemical practician the question certainly arises as to whether the present group-additivity method now is accurate enough to be applied on the thermodynamics of, e.g., chemical reactions and/or equilibria. A particularly interesting area is the issue of tautomerism, not only because it has been the subject for decennia of debates which are still ongoing but also because it can be used as a sensitive test for the applicability of the computation method. The present paper takes advantage of the ample literature concerning azo-hydrazone as well as keto-enol tautomerism to assess the quality of the present method. Table 6 presents a list of azo dyes which are known to exhibit an equilibrium between the azo and the hydrazone form. The lower enthalpy values, indicated in boldface, should correspond to the form which dominates the azo-hydrazone equilibrium. This is indeed the case: it is well known that arylazo-substituted anilines only undergo tautomerization in acidic solution, whereas arylazonaphthols generally prefer the hydrazone form, which—by the way—exhibits a large shift of the electronic absorption spectra. 2- and 4-Phenylazophenol, on the other hand, only show a weak tendency to tautomerize to the hydrazone form.
Table 6. Thermodynamic Data (kJ/mol) of Azo Dyes.
Table 6. Thermodynamic Data (kJ/mol) of Azo Dyes.
CompoundHydrazone Form ∆Hf CalcAzo Form ∆Hf CalcaRef.
4-Phenylazophenol154.32141.92+[57]
2-Phenylazophenol150.32141.92+[57]
4-Aminoazobenzene400.41315.61+[58]
2-Aminoazobenzene397.81318.90+
1-Phenylazo-2-naphthol160.21183.51+[59,60]
4-Phenylazo-1-naphthol164.21183.51+[61]
1-Phenylazo-2-naphthylamine410.30357.30+[59,60]
4-Phenylazo-1-naphthylamine410.30359.80+[62]
a Conformance with experimental data.
The limitations of the group-additivity principle are evident in Table 7. While the calculations for 1-(N-phenylformimidoyl)-2-naphthol are in line with experiment that it essentially exists in the enol form [41] and for acetone the calculated values for the keto and enol forms are at best inconclusive, the data for cyclohexanone and cyclopentanone are in clear contrast with the true dominant stable tautomers proven experimentally by Hine and Arata [63,64].
Experimental findings of the series of β-diketones (as neat liquids) are in conformance with the calculations, with the exception of 1,1-bis(benzoyl)ethane which shows the influence of steric hindrance: Allen and Dwek [65] explained the lack of enolization of this compound with the steric and/or inductive effect of the additional methyl group on the central carbon atom, clearly favouring the +I effect, which seems justified: Figure 4 shows that the additional methyl group on the central carbon atom essentially only twists the phenyl groups out of plane, but has no steric influence on the stability of the H bridge.
Table 7. Thermodynamic Data (kJ/mol) of Tautomeric Ketones and β-Diketones.
Table 7. Thermodynamic Data (kJ/mol) of Tautomeric Ketones and β-Diketones.
CompoundKeto Form ∆Hf
Calc.
Enol Form ∆Hf
Calc.
Experiment ∆Hf
Exp
aRef.
1-(N-Phenylformimidoyl)-2-naphthol70.2346.53 +[66]
Acetone−243.18−234.38−248.1+[63]
Cyclohexanone−281.24−296.04−276.1[63]
Cyclopentanone−233.35−251.75−240.2[64]
Phenol−57.50−166.90−165.2+[67]
2-Pyridone−120.62−136.32−166.3[68,69,70]
4-Pyridone−98.82−120.52−148.9+[68,69,70]
Carbostyril−74.43−90.53−144.9[71,72,73]
Acetylacetone−415.25−429.25−427.6+[65]
Bis(trifluoroacetyl)methane−1659.98−1676.28+[65]
Dibenzoylmethane−203.72−221.02+[65]
1,1-Bis(benzoyl)ethane−231.31−258.91[65]
a Conformance with experimental data.
Figure 4. Energy-minimized enol forms of dibenzoylmethane (left) and of 1,1-bis(benzoyl)ethane showing the steric effect of the additional methyl group on the structure of the latter (graphics by ChemBrain IXL).
Figure 4. Energy-minimized enol forms of dibenzoylmethane (left) and of 1,1-bis(benzoyl)ethane showing the steric effect of the additional methyl group on the structure of the latter (graphics by ChemBrain IXL).
Molecules 20 18279 g004
The tautomeric equilibria of the pyridones have been studied extensively by many physical methods in the solid state and in solutions of various polarities (see citations in references [68,69,70]) and they indicate that in the condensed phase the equilibrium of 2-pyridone lies on the keto (lactam) side (by an indirectly measured enthalpy difference of 0.4 ± 0.6 kcal/mol [69]) and that 4-pyridone’s equilibrium is shifted to the enol (4-hydroxypyridine) side with an indirectly estimated enthalpy gap of 2.4 ± 0.6 kcal/mol [69]. Theoretical studies [68,69,70,71,72,73] also predicted a preference in the gas phase for the lactam form in the case of 2-pyridone (by ca. 1.7 kJ/mol), while the enol form for 4-pyridone was calculated to be more stable (by ca. 10 kJ/mol). The present calculations evidently only agree with the findings for 4-pyridone. On the other hand, the predicted direction of the equilibrium between the carbon-analogue phenol and its tautomers cyclohexa-2,4-diene-1-one and cyclohexa-2,5-diene-1-one is in line with experimental findings [67].
Then there is carbostyril: for more than a century this compound’s tautomerism has been under investigation [71,72,73]. The first assumption by A. Claus [71] in 1896 that the keto (lactam) form was dominant in solution rested on the analysis of its chemical selectivity towards bromination, an approach which nowadays, in view of today’s theoretical and practical knowledge about the reactivity/selectivity processes and kinetics of proton shifts, seems founded on pure speculation but was nonetheless correct as modern theoretical studies [73] confirmed. These studies, however, calculated an enthalpy difference between the lactam and lactim form of only about 1 kcal/mol. The calculated data of both forms listed in Table 7 deviate too far from the experimental ones to provide support for one or the other.
The deficiencies exhibited in Table 7 point to two principal weaknesses of the group-additiviy method: the first one is connected with the origin of the values of the group contributions and the second one is assignable to the intended isolation of the atom groups. The failure to correctly predict the keto-enol ratio in the case of acetone, cyclohexanone and cyclopentanone seems to be attributable to the fact that 12 out of the 15 compounds defining the enol moiety in the evaluation of the group contributions are aromatic systems, namely substituted furans, isoxazoles and tropolone, which could imprint the stabilizing effect of their extended conjugation onto the values of the relevant contributions. This deficiency could possibly be overcome provided that there are reliable experimental data available of isolated enols (e.g., enol ethers) which could be included in the contribution evaluations.
The second weakness of the group additivity method shows its effect in the wrong preference of the enol form for 1,1-bis(benzoyl)ethane. This deficiency is principally insurmountable because steric and electronic effects and other unusual conformational information cannot be considered by per se isolated atom groups. Even in the particular case of β-diketones where the hydrogen bridge normally contributes to the stabilization of the enol form, the lack of this effect in 1,1-bis(benzoyl)ethane is too little as to change the picture.

3.4. LogPOctanol/Water

The partition coefficient P between octanol and water, or more precisely: its logarithm logP, is a standard model for the expression of the lipophilicity of biological drugs in medicinal and agro chemistry and, therefore, reliable methods for its evaluation from the drugs’ structure, in particular prior to their synthesis, are very desirable. Various calculation methods have successfully been applied, of which those developed by Ghose and Crippen [6,7], Klopman et al. [41], Visvanadhan et al. [54], Leo [74], Wang et al. [75], Hou and Xu [76] and others may be especially mentioned, because they are also based on the atomic-group additivity method and therefore may serve as benchmarks for the present method. Most experimental log P data for this paper have been extracted from Klopman’s [41], some from Lipinski’s [77] and from Sangster’s [78] collection. Net charged compounds (not zwitterions) and strong acids are principally excluded from the present logP evaluations. Table 8 lists the atom groups and their contribution resulting from the linearization procedure using the experimental data of more than 2700 compounds of a large varietya list of which is available in the supplementary material under the name of “Compounds List for LogP Calculations.sdf”. At the same location the complete set of results is accessible under the mane of “Experimental vs Calculated LogP Data Table.doc”.
The only difference to the enthalpy Table A1 lies in the special groups 273–276 in Table 8 which replace the special groups required to factor in intramolecular and ring-strain effects on the heats of combustion and formation. These new special groups were suggested by Klopman et al. [41]. Groups 274 and 275 take account of the particularities of saturated and unsaturated hydrocarbons and are therefore only included in the calculations if no heteroatoms are present in the compound. In that case the contribution is multiplied by the number of carbon atoms in the molecule. The meaning of group 276 has been extended over that of Klopman’s intention in that it is considered in all classes of compounds having CH2 chains ending with CH3, NH2, OH, SH or halogen. Another evidently important contributor is the H-bridge special group (no. 273) which—if found in the compound—increases the lipophilicity by 0.49 units.
The resulting goodness of fit R2 of 0.9543 for 2697 training compounds and the cross-validated correlation coefficient Q2 of 0.9448 for 2638 test molecules covering a logP range of between −4.41 (Ornithine, calc. −3.54) and 12.53 (Tetracosane, calc. 12.75) is within the same area of those published elsewhere, the average and standard deviations are within the experimental error. For comparison, Klopman et al. [19], using an extended group-contribution approach similar to the present, achieved an R2 of 0.93, a cross-validated Q2 of 0.926, a standard deviation of 0.38 (cross-validated 0.404), based on 1663 compounds. R. Wang’s XLOGP model [75] yielded, based on 1831 molecules, an R2 of 0.968 and a standard deviation of 0.37.
An analysis of the error distribution shows that the calculated logP values of 2041 of the 2697 compounds (76%) deviates by less than or equal to the cross-validated standard error (S = 0.51) from the experimental value, while only 85 compounds (3%) are outliers with errors of more than twice that standard error. Figure 5 presents the correlation diagram of the logP data, showing that the data points of the cross-validated test set (red circles) in most cases overlap the black crosses of the training set, while the histogram (Figure 6) proves the evenness of the deviation distribution about the experimental values for both the training and test sets. The slope of the regression line in Figure 5 is slightly below 1 at 0.96.
Table 8. Atom group Contributions for LogP Calculations.
Table 8. Atom group Contributions for LogP Calculations.
NrAtom TypeNeighboursContributionOccurrencesMolecules
1Const 0.2527802780
2C sp3H3C 0.4719691118
3C sp3H3N0.39435300
4C sp3H3N(+)−0.3111
5C sp3H3O−0.09340250
6C sp3H3S−0.195651
7C sp3H2C20.352064714
8C sp3H2CN0.36701387
9C sp3H2CN(+)−0.342319
10C sp3H2CO−0.24558430
11C sp3H2CS−0.387659
12C sp3H2CF−0.3155
13C sp3H2CCl0.485138
14C sp3H2CBr0.882219
15C sp3H2CJ0.9933
16C sp3H2CP2.8911
17C sp3H2N21.5744
18C sp3H2NO0.1555
19C sp3H2NS0.6433
20C sp3H2O2−0.0677
21C sp3H2S2−1.2344
22C sp3HC30.21388230
23C sp3HC2N0.32210167
24C sp3HC2N(+)−0.362726
25C sp3HC2O−0.18389193
26C sp3HC2S−0.6488
27C sp3HC2F0.2111
28C sp3HC2Cl0.616018
29C sp3HC2Br0.7175
30C sp3HCN21.0065
31C sp3HCNO0.642020
32C sp3HCNS0.523030
33C sp3HCO2−0.474124
34C sp3HCOS0.0033
35C sp3HCOCl0.1131
36C sp3HCOBr1.2511
37C sp3HCOP0.4411
38C sp3HCF20.3522
39C sp3HCCl21.15109
40C sp3HOF2−0.1411
41C sp3C4−0.09131101
42C sp3C3N0.333130
43C sp3C3N(+)−0.7811
44C sp3C3O−0.237159
45C sp3C3S−0.461717
46C sp3C3F0.9433
47C sp3C3Cl0.56288
48C sp3C3Br0.7011
49C sp3C2N2−1.5611
50C sp3C2NO−0.0455
51C sp3C2O20.2266
52C sp3C2F20.6322
53C sp3C2Cl20.731110
54C sp3CNO21.3511
55C sp3CF31.067472
56C sp3CF2Cl1.3432
57C sp3CFCl21.3432
58C sp3CCl31.712018
59C sp3CCl2Br0.0011
60C sp3OF31.0522
61C sp3SF31.2477
62C sp3SFCl21.2011
63C sp3SCl30.9333
64C sp2H2=C0.577465
65C sp2H2=N−0.7711
66C sp2HC=C0.25390249
67C sp2HC=N−0.642424
68C sp2HC=O−0.483232
69C sp2H=CN0.0210490
70C sp2H=CN(+)−0.231717
71C sp2H=CO0.701312
72C sp2H=CS−0.371514
73C sp2H=CCl0.77108
74C sp2H=CBr0.7511
75C sp2HN=N0.197054
76C sp2HN=O−0.381211
77C sp2HO=O−0.0655
78C sp2H=NS−0.3544
79C sp2C2=C0.19150126
80C sp2C2=N−0.068885
81C sp2C2=N(+)1.5911
82C sp2C2=O−0.61209166
83C sp2C=CN0.508673
84C sp2C=CN(+)−0.3633
85C sp2C=CO0.594338
86C sp2C=CS−0.151914
87C sp2C=CF0.0233
88C sp2C=CCl0.973020
89C sp2C=CBr0.9344
90C sp2C=CJ0.9511
91C sp2C=CP0.0011
92C sp2=CN20.982424
93C sp2=CN2(+)0.651111
94C sp2CN=N0.326865
95C sp2CN=N(+)−0.1022
96C sp2CN=O−0.59468376
97C sp2C=NO−0.6011
98C sp2=CNO0.6144
99C sp2=CNO(+)0.0322
100C sp2CN=S−0.2488
101C sp2C=NS−0.4565
102C sp2=CNS−0.5255
103C sp2=CNCl3.1111
104C sp2=CNBr1.0053
105C sp2C=NCl2.4011
106C sp2CO=O0.14522473
107C sp2CO=O(−)−2.324343
108C sp2C=OS−1.3444
109C sp2=COCl1.5411
110C sp2=CSBr−1.6911
111C sp2=CF20.4211
112C sp2=CCl21.401210
113C sp2=CBr21.4811
114C sp2N2=N0.772827
115C sp2N2=N(+)0.9222
116C sp2N2=O0.10141139
117C sp2N=NO0.2011
118C sp2N2=S0.3387
119C sp2N=NS0.152626
120C sp2N=NCl1.7933
121C sp2N=NBr0.7932
122C sp2NO=O0.33116113
123C sp2=NOS−0.0611
124C sp2N=OS−0.1377
125C sp2NO=S0.9311
126C sp2=NS2−1.5222
127C sp2NS=S−0.7953
128C sp2=NSCl0.7111
129C aromaticH:C20.3296602071
130C aromaticH:C:N−0.40277192
131C aromaticH:C:N(+)−0.942423
132C aromaticH:N2−1.081010
133C aromatic:C30.16390171
134C aromaticC:C20.1819821323
135C aromaticC:C:N−0.497363
136C aromaticC:C:N(+)−0.4544
137C aromatic:C2N0.28639526
138C aromatic:C2N(+)0.10188154
139C aromatic:C2:N−0.109372
140C aromatic:C2:N(+)−0.012020
141C aromatic:C2O0.621096749
142C aromatic:C2S−0.15177143
143C aromatic:C2F0.4010372
144C aromatic:C2Cl0.861707556
145C aromatic:C2Br0.97242105
146C aromatic:C2J1.325034
147C aromatic:C2P0.6211
148C aromaticC:N2−1.3188
149C aromatic:C:N2−1.3311
150C aromatic:CN:N0.683632
151C aromatic:C:NO0.572618
152C aromatic:C:NS−0.0755
153C aromatic:C:NF0.4011
154C aromatic:C:NCl0.311816
155C aromatic:C:NBr0.2011
156C aromaticN:N20.365442
157C aromatic:N3−0.4144
158C aromatic:N2O0.5599
159C aromatic:N2S−0.5133
160C aromatic:N2Cl−0.2387
161C spH#C−0.161010
162C spC#C0.281814
163C spC#N−0.189086
164C spN#N0.6822
165C sp#NS−0.6233
166C sp=N=S1.862221
167N sp3H2C−1.375656
168N sp3H2C(pi)−0.84313287
169N sp3H2N−0.581717
170N sp3H2S−1.133636
171N sp3HC2−1.196463
172N sp3HC2(pi)−0.89237213
173N sp3HC2(2pi)−0.40328283
174N sp3HCN−1.1143
175N sp3HCN(pi)−0.41109
176N sp3HCN(2pi)0.774848
177N sp3HCO−2.2211
178N sp3HCO(pi)−1.1488
179N sp3HCS−1.4444
180N sp3HCS(pi)−1.235050
181N sp3HCP−2.0833
182N sp3HCP(pi)−0.6811
183N sp3C3−1.31136120
184N sp3C3(pi)−0.97151136
185N sp3C3(2pi)−0.73153140
186N sp3C3(3pi)−0.842323
187N sp3C2N−1.6611
188N sp3C2N(pi)−1.583128
189N sp3C2N(2pi)−0.755450
190N sp3C2N(3pi)−0.9188
191N sp3C2O(pi)−0.3855
192N sp3C2S−1.1877
193N sp3C2S(pi)0.0776
194N sp3C2S(2pi)0.9922
195N sp3C2P0.0222
196N sp3CN2(2pi)2.1211
197N sp3CS2−0.2811
198N sp3CS2(pi)−0.5511
199N sp2H=C−0.771613
200N sp2C=C−0.71195173
201N sp2C=N−0.151716
202N sp2=CN0.3610081
203N sp2C=N(+)−6.3711
204N sp2=CN(+)−0.8522
205N sp2=CO−0.263429
206N sp2C=O−0.7422
207N sp2=CS−1.5265
208N sp2N=N−0.612922
209N sp2N=O0.254037
210N aromaticH2:C(+)0.5174
211N aromaticHC:C(+)−0.1343
212N aromaticC2:C(+)−0.5511
213N aromatic:C20.43356258
214N aromatic:C:N−0.2742
215N(+) sp3H3C−0.812929
216N(+) sp3H2C20.0855
217N(+) sp3HC31.1511
218N(+) sp2C=CO(−)0.0811
219N(+) sp2CO=O(−)0.14233195
220N(+) sp2NO=O(−)−0.2122
221N(+) sp2O2=O(−)0.5311
222N(+) aromaticH:C20.9244
223N(+) aromatic:C2O(−)−0.582020
224N(+) sp=C=N(−)1.6011
225N(+) sp=N2(−)0.0011
226OHC−0.55424263
227OHC(pi)−0.69587528
228OHN−0.061010
229OHN(pi)0.0266
230OC20.30188114
231OC2(pi)−0.29599478
232OC2(2pi)−0.78298277
233OCN0.1933
234OCN(pi)0.4377
235OCN(+)(pi)−0.0411
236OCN(2pi)0.021413
237OCS0.0542
238OCS(pi)−1.1944
239OCP0.349649
240OCP(pi)−0.704028
241ON2(2pi)0.8044
242S2HC0.8766
243S2HC(pi)0.3233
244S2C21.354746
245S2C2(pi)1.186361
246S2C2(2pi)1.424948
247S2CN0.0033
248S2CN(2pi)2.5111
249S2CS0.9521
250S2CS(pi)1.7242
251S2CP1.111614
252S2CP(pi)0.6332
253S2N2−2.6722
254S2N2(2pi)5.7411
255S4C2=O−0.7199
256S4C2=O2−0.221414
257S4CO=O2−0.3621
258S4CN=O20.019286
259S4C=O2F0.6222
260S4NO=O20.0044
261S4N2=O21.2755
262S4O2=O0.7111
263P4CO2=O−1.8422
264P4CO2=S0.3611
265P4COS=S−2.3911
266P4O3=O−0.952121
267P4O3=S0.981212
268P4O2=OS0.4311
269P4O2S=S0.631110
270P4O=OS2−0.8222
271P4N2O=O−0.3422
272P4NO=OS−1.4722
273HH Acceptor0.49151139
274AlkaneNo of C atoms0.1627430
275Unsaturated HCNo of C atoms0.051473125
276X(CH2)nNo of CH2 groups0.101362579
ABased on 2780
BGoodness of fitR20.9543 2697
CDeviationAverage0.35 2697
DDeviationStandard0.46 2697
EK-fold cvK10.00 2638
FGoodness of fitQ20.9448 2638
GDeviationAverage (cv)0.38 2638
HDeviationStandard (cv)0.51 2638
Figure 5. Correlation diagram of logP data (10-fold cross-validated: N = 2640, Q2 = 0.9451, slope = 0.96).
Figure 5. Correlation diagram of logP data (10-fold cross-validated: N = 2640, Q2 = 0.9451, slope = 0.96).
Molecules 20 18279 g005
Figure 6. Histogram of logP data (S = 0.51).
Figure 6. Histogram of logP data (S = 0.51).
Molecules 20 18279 g006
Wang et al. [75] added some further special groups as correction factors into their XLOGP program among which the amino acid indicator is worth mentioning because it seems to have a dramatically improving effect on the standard deviation in their program. The present method, however, does not require the incorporation of this indicator because the amino acids, being generally considered in solution as existing in the form of zwitterions, are accordingly included in the contribution calculation with the exception of those where the amino group is conjugated with a double-bonded or aryl moiety which lowers its basicity and thus causes the non-ionic form to be more stable. The experimental values confirm in all cases the zwitterionic form except—as expected—for N-phenylglycine. The difference of the logP between the non-ionic and the zwitterionic form (except for N-phenylglycine) amounts to ca. −1.87 units, as is shown in Table 9, close to Wang’s amino acid indicator value of −2.27. The calculated logP value of the dominant form is written in boldface.
A more opaque picture is found with compounds which undergo keto-enol tautomerism as shown in Table 10. While the calculated logP data for phenol, carbostyril, the 4-hydroxyform of uracil and acetylacetone and their tautomeric forms agree within the standard deviation with the experimental values, they can only be viewed as indicative in the case of acetone, cyclohexanone and 2-pyridone as both logP values for the respective tautomers exceed the standard deviations. Beyond this, acetylacetone is a tautomeric chameleon in that its tautomeric equilibrium strongly depends on the solvent: Allen and Dwek [65] showed that the percentage of enol decreased from 95% in cyclohexane to 75% in acetone and to 60% in dimethyl sulfoxide. In water the equilibrium is definitively shifted to the diketo side due to the strong intermolecular hydrogen bonding with the keto groups which obstructs the stabilizing effect of the intramolecular H-bridge [79].
Table 9. LogP of Amino acids.
Table 9. LogP of Amino acids.
CompoundZwitterionic LogP CalcExperiment LogP ExpNon-ionic LogP Calc
Aspartic acid−2.93−3.70−1.06
Threonine−3.48−3.50−1.61
Glycine−3.22−3.00−1.31
Ornithine−3.54−2.89−1.67
Alanine−2.75−2.83−0.88
Lysine−3.19−2.82−0.92
Levodopa−1.90−2.74−0.03
Histidine−3.27−2.52−1.40
Cysteine−2.75−2.49−0.78
Valine−2.08−2.10−0.21
Methionine−2.10−1.87−0.23
Tyrosine−1.51−1.800.36
Isoleucine−1.73−1.690.14
Leucine−1.73−1.570.14
Phenylalanine−1.12−1.430.75
Tryptophane−1.34−1.040.53
2-Amino-5-phenylvaleric acid−0.42−0.361.45
N-Phenylglycine−0.660.621.02
Table 10. LogP of Ketones and Lactams.
Table 10. LogP of Ketones and Lactams.
CompoundKeto form LogPCalcExperiment LogPExpEnol form LogPCalca
Acetone0.6−0.241.20(+)
Cyclohexanone1.430.811.82(+)
Phenol0.991.461.76+
2-Pyridone0.02−0.581.09(+)
Carbostyril1.491.262.51+
Uracil−0.77−1.07−1.25 b+
Acetylacetone0.340.41.23+
a Conformance with experimental data; b 4-hydroxy form.
Figure 7. Correlation of logP with logS (N = 839, R2 = 0.7817).
Figure 7. Correlation of logP with logS (N = 839, R2 = 0.7817).
Molecules 20 18279 g007

3.5. Aqueous Solubility

Solubility in water is one of the most important properties of organic compounds since the first raindrops filled the oceans of this planet, otherwise the astrobiologist’s sentence: “where there is water, there is life” would be utterly senseless. Nowadays its importance is evident not only with respect to environmental considerations, e.g., in synthetical processes, but also in view of the biological activity of drugs, where it plays a key role. This has already been indirectly expressed in the descriptor logPO/W. While this descriptor defines the relative solubility of a solute between octanol and water, where saturation is not required, the aqueous solubility in mol/L, expressed as logS, i.e., the logartihm of the solubility, is defined as the amount of solute in a saturated water solution. Nevertheless, as Banerjee et al. [80] showed on a selected set of 27 examples, there is a direct inverted correlation between logP and logS with a correlation coefficient of 0.94, resulting in the linear regression equation logP = 5.2 − 0.68 × logS. This compares with a calculation in the present work, where these two descriptors were correlated based on 839 compounds yielding a correlation coefficient of 0.78 and the regression equation logP = 0.32 − 0.80 × logS (Figure 7). Solubility data were extracted from a database provided by Hou et al. [81] and Wang et al. [82] on the ADME website [83] in the internet. Analogous to the atom groups calculations for logP net-charged compounds as well as strong acids are excluded from the logS calculations. In contrast to Hou’s and Wang’s approach, compounds that normally exist as twitter ions such as amino acids are entered in the twitter-ionic form in these calculations. In Table 11 the group contributions resulting from as set of 1487 molecules of a great structural variety are collected.
Table 11. Atom group Contributions for LogS Calculations.
Table 11. Atom group Contributions for LogS Calculations.
NrAtom TypeNeighboursContributionOccurrencesMolecules
1Const 0.4414921492
2C sp3H3C−0.311571806
3C sp3H3N−0.87173113
4C sp3H3N(+)−0.0322
5C sp3H3O−0.32157110
6C sp3H3S−0.151210
7C sp3H2C2−0.322091604
8C sp3H2CN−0.85278144
9C sp3H2CN(+)−0.6865
10C sp3H2CO−0.29328248
11C sp3H2CS−0.104330
12C sp3H2CP−5.1511
13C sp3H2CF−0.8311
14C sp3H2CCl−0.624134
15C sp3H2CBr−1.291816
16C sp3H2CJ−1.7555
17C sp3H2N2−1.5822
18C sp3H2NO−0.9999
19C sp3H2NS−1.1122
20C sp3H2O2−0.6766
21C sp3H2S2−0.4755
22C sp3H2SCl−1.0611
23C sp3HC3−0.26531270
24C sp3HC2N−0.817260
25C sp3HC2N(+)−0.712322
26C sp3HC2O−0.38321174
27C sp3HC2S−0.4686
28C sp3HC2F−1.8511
29C sp3HC2Cl−0.892816
30C sp3HC2Br−1.0244
31C sp3HC2J−1.9011
32C sp3HCO2−0.692817
33C sp3HCOBr−4.6511
34C sp3HCCl2−1.241312
35C sp3HCClBr−1.0511
36C sp3HOF2−0.3611
37C sp3C4−0.17234162
38C sp3C3N−0.641616
39C sp3C3O−0.159182
40C sp3C3S0.1822
41C sp3C3F−0.521010
42C sp3C3Cl−0.423412
43C sp3C3Br−0.6911
44C sp3C2O2−1.3588
45C sp3C2Cl2−2.251110
46C sp3CF3−1.092424
47C sp3CF2Cl−1.7832
48C sp3CFCl2−1.7011
49C sp3CCl3−2.121211
50C sp3CCl2Br0.0011
51C sp2H2=C−0.537463
52C sp2HC=C−0.27338204
53C sp2HC=N−2.1799
54C sp2HC=O0.242222
55C sp2H=CN−0.542321
56C sp2H=CO−0.1387
57C sp2H=CS−0.3296
58C sp2H=CCl−1.0065
59C sp2H=CBr−0.8821
60C sp2H=CJ−1.8321
61C sp2HN=N−1.951912
62C sp2HN=O−0.1922
63C sp2H=NO−0.6511
64C sp2HO=O−0.2277
65C sp2H=NS−0.2511
66C sp2C2=C−0.26153128
67C sp2C2=N−0.841110
68C sp2C2=O0.01188132
69C sp2C=CN−1.052522
70C sp2C=CO−0.354432
71C sp2C=CS−0.1455
72C sp2C=CF−0.6222
73C sp2C=CCl−0.914525
74C sp2C=CBr−0.4533
75C sp2CN=N−1.6798
76C sp2CN=O−0.33261201
77C sp2C=NO−1.8755
78C sp2=CNO(+)−1.6822
79C sp2C=NS−0.3422
80C sp2CO=O−0.06306266
81C sp2CO=O(−)0.502323
82C sp2C=OS2.1711
83C sp2=CCl2−1.661411
84C sp2=CBr2−3.0411
85C sp2N2=O−1.469895
86C sp2N2=S−1.931010
87C sp2NO=O−0.554845
88C sp2N=OS−0.8377
89C sp2=NS2−1.0911
90C aromaticH:C2−0.304203812
91C aromaticH:C:N0.519160
92C aromaticH:N20.3777
93C aromatic:C3−0.3628187
94C aromaticC:C2−0.39927556
95C aromaticC:C:N0.652723
96C aromatic:C2N−0.74270216
97C aromatic:C2N(+)−0.726850
98C aromatic:C2:N−0.312922
99C aromatic:C2O−0.25376252
100C aromatic:C2S−0.234226
101C aromatic:C2F−0.613619
102C aromatic:C2Cl−1.10570215
103C aromatic:C2Br−1.533824
104C aromatic:C2J−1.472116
105C aromatic:CN:N−0.913424
106C aromaticC:N20.1022
107C aromatic:C:NO0.131212
108C aromatic:C:NCl−0.8755
109C aromaticN:N2−0.942415
110C aromatic:N2Cl−0.5477
111C spH#C−0.211716
112C spC#C−0.551917
113C spC#N−0.192624
114C sp=N=S−2.9911
115N sp3H2C0.93129
116N sp3H2C(pi)0.6011199
117N sp3H2N0.6144
118N sp3HC22.252017
119N sp3HC2(pi)1.297566
120N sp3HC2(2pi)0.74211158
121N sp3HCN0.7622
122N sp3HCN(pi)0.3476
123N sp3HCN(2pi)−0.4133
124N sp3C33.156457
125N sp3C3(pi)2.206660
126N sp3C3(2pi)1.628075
127N sp3C3(3pi)1.3077
128N sp3C2N1.4411
129N sp3C2N(pi)2.8044
130N sp3C2N(2pi)1.301713
131N sp3C2N(3pi)0.7266
132N sp2C=C1.493532
133N sp2C=N−0.2232
134N sp2=CN1.831513
135N sp2=CO1.5277
136N sp2=CS−0.3721
137N sp2N=N2.0811
138N sp2N=O−0.5444
139N(+) sp3H3C0.502121
140N(+) sp3H2C20.2911
141N(+) sp3HC31.9711
142N(+) sp2CO=O(−)−0.157557
143N(+) sp2O2=O(−)−0.5452
144N aromatic:C2−0.5813889
145N aromatic:C:N0.1121
146OHC0.60377217
147OHC(pi)0.34306240
148OHN(pi)1.0011
149OC20.6910663
150OC2(pi)0.24320249
151OC2(2pi)−0.257672
152OCN(+)(pi)−0.2152
153OCN(2pi)−0.3066
154OCP−0.077836
155OCP(pi)−1.232520
156P4CO2=S5.4411
157P4O3=O2.7977
158P4O3=S0.451615
159P4O2=OS0.6722
160P4O2S=S−1.431413
161S2HC−0.5433
162S2HC(pi)−0.8422
163S2C2−0.531414
164S2C2(pi)−1.031212
165S2C2(2pi)−1.022525
166S2CP0.211615
167S2CS−0.8453
168S2N2(2pi)0.0011
169S4C2=O0.9133
170S4C2=O20.0966
171S4C=OS1.3811
172HH Acceptor−0.488568
173AlkaneNo of C atoms−0.3328239
174Unsaturated HCNo of C atoms−0.101350121
175X(CH2)nNo of CH2 groups−0.121220426
ABased on 0.00 1492
BGoodness of fitR20.9051 1441
CDeviationAverage0.52 1441
DDeviationStandard0.67 1441
EK-fold cvK10.00 1419
FGoodness of fitQ20.8838 1419
GDeviationAverage (cv)0.57 1419
HDeviationStandard (cv)0.74 1419
Hou’s group-additivity method [81], which based on a 2D-molecular topology, included—besides the atom groups in a SMARTS representation—the square of the molecular weight and a term called “hydrophobic carbon” to achieve better correlation. They achieved a correlation coefficient R of 0.96 (R2 = 0.92) and a standard deviation of 0.61, based on 1290 compounds. Wang’s [82] team, on the other hand, based their group-additivity approach on the solvent-accessible surface area (SASA) of each atom type and added the calculated logP value and the square of the molecular weight. Their best results showed a correlation coefficient R2 of 0.886 and a root mean square error of 0.705, using 1708 molecules.
The present list of groups encloses two groups which can be viewed as replacement of the Hou’s “hydrophobic carbon”: the terms “Alkane” and “Unsaturated HC” (no. 173 an 174). These two groups only apply for pure hydrocarbons. The last term “X(CH2)n” (no. 175) takes account of the hydrophobicity of alkyl chains. Group 172, on the other hand, considers the hydrophobic effect of intramolecular H-bridges. While Hou’s correlation is better (correleation coefficient R = 0.96, predictive Q = 0.94, mean error 0.57 units) than the present one, Wang’s approach is in the same range with a best leave-one-out Q2 of 0.886 and a root-mean-square error of 0.705 (compare with lines B, F and H in Table 11). Five outliers listed in Table 12 have been omitted from the calculations because their deviations exceed by far the expectable error range. Figure 8 and Figure 9 illustrate the distribution of the 1441 compounds’ experimental vs. calculated and 10-fold cross-validated logS data around the linear regression line, which exhibits a slope of 0.92 and a const of −0.14. The complete list of compounds and logS results is accessible in the supplementary material under “Experimental vs Calculated LogS Data Table.doc” and “Compounds List for LogS Calculations.sdf”.
Table 12. Molecules with extreme LogS Deviations.
Table 12. Molecules with extreme LogS Deviations.
Compound NamelogS ExplogS CalcDeviation
1-Hexadecanol−7.26−4.04−3.22
1-Octadecanol−8.40−4.68−3.72
Bromadiolone−4.45−9.334.88
Eicosane−8.17−12.544.37
Hexacosane−8.33−16.448.11
Figure 8. Correlation diagram of logS data (10-fold cross-validated: N = 1419, Q2 = 0.8838, slope = 0.92).
Figure 8. Correlation diagram of logS data (10-fold cross-validated: N = 1419, Q2 = 0.8838, slope = 0.92).
Molecules 20 18279 g008

3.6. Refractivity

In their very instructive paper, Ghose and Crippen [8] explained in a detailed rationale the physical background of the molar refractivity, relating it to the volume of the molecule and of its constituting atoms and assigning the contributions of the atom groups to the atom volumes. As a consequence this assignment did not allow the simple least-squares method because it cannot guarantee positive-only contribution values. However, since the present paper is only interested in the final result, i.e., the molar refractivity value as such, and is thus not bound to the constraints of the physical arguments—analogous to the total neglect of the chemical background for the calculations of the thermodynamic data—it is free to tentatively apply the same algorithm as used for the calculation of the other descriptors. Logically, it follows that the resulting atom group contributions cannot be assigned to any physical meaning.
Figure 9. Histogram of logS data (S = 0.74).
Figure 9. Histogram of logS data (S = 0.74).
Molecules 20 18279 g009
The experimental data for the present studies are extracted from publications of Ghose and Crippen [8], complemented by V. N. Visvanadhan et al. [54]. Further molar refractivity (MR) values were calculated from the refractive indices (nD) and densities (d) provided by the CRC Handbook of Chemistry and Physics [84], using the equation MR = (nD2 − 1)/(nD2 + 2) × (M/d), where M is the molecular weight. The scope of compounds applicable for the refractivity calculation is limited to net-uncharged molecules, containing no further elements than H, B, C, N, O, S, P, Si and halogen and that are not strong acids.A complete list of compounds applied in the refractivity calculations can be viewed in the supplementary material in “Compounds List for Refractivity Calculations.sdf”, their results in “Experimental vs Calculated Refractivity Data Table.doc”.
The range of experimental refractivity values lies between 8.23 (methanol, calc. 8.09) and 242.2 (tripalmitin, calc. 243.12). The goodness of fit of the calculated values for both the training set as well as the 10-fold cross-validated data with experiment is excellent, as is shown in Table 13 on lines D and F. Accordingly, calculated refractivity values of 3388 out of 4122 compounds (82.2%) differ by the cross-validated standard deviation or less from experimental data. These results compare very well with those presented by Ghose and Crippen [8] which—based on 504 compounds—yielded a correlation coefficient R2 of 0.994 and a standard deviation of 1.269.
Table 13. Atom group Contributions for Refractivity Calculations.
Table 13. Atom group Contributions for Refractivity Calculations.
NrAtom TypeNeighboursContributionOccurrencesMolecules
1BHO228.1011
2BC343.0544
3BO352.6466
4C sp3H3C5.6856552801
5C sp3H3N12.60200122
6C sp3H3N(+)15.4232
7C sp3H3O13.12418305
8C sp3H3S14.133329
9C sp3H3P12.0965
10C sp3H3Si10.0340088
11C sp3H2BC−8.53124
12C sp3H2C24.6291012185
13C sp3H2CN11.48601317
14C sp3H2CN(+)14.311917
15C sp3H2CO12.081514999
16C sp3H2CS12.86167116
17C sp3H2CP11.2095
18C sp3H2CF5.641918
19C sp3H2CCl10.49203173
20C sp3H2CBr13.49123109
21C sp3H2CJ18.673631
22C sp3H2CSi8.927141
23C sp3H2N218.4422
24C sp3H2NO20.3411
25C sp3H2NS19.7611
26C sp3H2O219.351919
27C sp3H2OCl17.9087
28C sp3H2OBr20.9722
29C sp3H2S220.2622
30C sp3H2SCl18.8522
31C sp3H2SiCl14.7565
32C sp3H2SiBr17.8543
33C sp3H2Si212.2522
34C sp3HC33.53993706
35C sp3HC2N10.448566
36C sp3HC2N(+)13.2366
37C sp3HC2O11.00387326
38C sp3HC2P10.0221
39C sp3HC2S12.082319
40C sp3HC2F4.4311
41C sp3HC2Cl9.415653
42C sp3HC2Br12.456053
43C sp3HC2J17.8677
44C sp3HCN2(+)23.2611
45C sp3HCNCl(+)18.8822
46C sp3HCO218.284337
47C sp3HCOCl17.16108
48C sp3HCOBr21.6211
49C sp3HCS220.1911
50C sp3HCF25.6777
51C sp3HCFCl10.6176
52C sp3HCFBr13.4511
53C sp3HCCl215.352726
54C sp3HCClBr19.0054
55C sp3HCBr221.021312
56C sp3HCJ231.5211
57C sp3HNO224.8122
58C sp3HO325.8244
59C sp3HOF213.7511
60C sp3HOCl223.4511
61C sp3HS328.8111
62C sp3HSiCl219.6854
63C sp3C42.52249215
64C sp3C3N9.312016
65C sp3C3N(+)11.8422
66C sp3C3O10.0210194
67C sp3C3S11.3364
68C sp3C3F3.3322
69C sp3C3Cl8.4466
70C sp3C3Br11.4166
71C sp3C3J17.0622
72C sp3C3Si7.5511
73C sp3C2NCl(+)18.5811
74C sp3C2O217.3366
75C sp3C2OCl16.1811
76C sp3C2F25.077927
77C sp3C2FCl9.1922
78C sp3C2Cl214.191714
79C sp3C2ClBr17.3411
80C sp3C2Br220.2055
81C sp3C2J230.5911
82C sp3CNF211.3462
83C sp3CNF2(+)15.0421
84C sp3CO324.6722
85C sp3CO2Si19.8211
86C sp3COF211.8322
87C sp3CF36.097761
88C sp3CF2Cl10.86107
89C sp3CF2Br13.4154
90C sp3CFCl215.4897
91C sp3CCl320.403331
92C sp3CCl2Br25.7511
93C sp3CBr329.6043
94C sp3O431.5833
95C sp3OCl327.5611
96C sp3SCl334.8611
97C sp2H2=C5.46470408
98C sp2HC=C4.641233735
99C sp2HC=N9.931514
100C sp2HC=N(+)14.9311
101C sp2HC=O6.34113110
102C sp2H=CN11.202820
103C sp2H=CN(+)13.7822
104C sp2H=CO2.277869
105C sp2H=CP10.0111
106C sp2H=CS12.263227
107C sp2H=CF5.1811
108C sp2H=CCl10.192219
109C sp2H=CBr13.12119
110C sp2H=CJ18.2011
111C sp2H=CSi8.811712
112C sp2HN=N16.2387
113C sp2HN=O12.891111
114C sp2H=NO6.5533
115C sp2HO=O4.042322
116C sp2H=NS16.4311
117C sp2C2=C3.52385292
118C sp2C2=N8.852017
119C sp2C2=O5.08330310
120C sp2C2=S11.7211
121C sp2C=CN10.901614
122C sp2C=CN(+)13.3411
123C sp2C=CO1.655651
124C sp2C=CS11.281413
125C sp2C=CF4.4996
126C sp2C=CCl9.424331
127C sp2C=CBr12.051414
128C sp2C=CJ18.2011
129C sp2CN=N16.5311
130C sp2CN=O11.805148
131C sp2C=NO6.7277
132C sp2CO=O2.82919734
133C sp2C=NS15.7633
134C sp2C=OP11.5911
135C sp2C=OS12.5844
136C sp2C=OF5.6111
137C sp2C=OCl11.367364
138C sp2C=OBr14.0133
139C sp2C=OJ20.4711
140C sp2=CNO(+)12.7211
141C sp2=CO2-1.0622
142C sp2=COS8.8711
143C sp2=COCl6.5311
144C sp2=COBr9.2311
145C sp2=COJ14.3911
146C sp2=CSCl17.0064
147C sp2=CSBr19.8743
148C sp2=CSJ24.4211
149C sp2=CF26.7953
150C sp2=CFCl10.3043
151C sp2=CCl215.251311
152C sp2=CBr220.6222
153C sp2N2=O17.8144
154C sp2N2=S24.8422
155C sp2NO=O10.221414
156C sp2NO=S18.8811
157C sp2N=OS20.9022
158C sp2N=OCl17.7611
159C sp2=NOCl10.4211
160C sp2=NS225.9822
161C sp2=NSCl21.3111
162C sp2=NSBr25.2911
163C sp2O2=O0.691312
164C sp2O=OS-13.1111
165C sp2O=OCl8.891312
166C sp2OS=S18.9711
167C sp2S2=S31.9811
168C sp2=OSCl19.3411
169C aromaticH:C24.4555761171
170C aromaticH:C:N6.2814192
171C aromaticH:N28.1911
172C aromatic:C34.4315377
173C aromaticC:C23.551231850
174C aromaticC:C:N5.525244
175C aromaticC:C:N(+)6.4821
176C aromatic:C2N11.36164149
177C aromatic:C2N(+)13.985750
178C aromatic:C2:N6.261514
179C aromatic:C2O1.71341264
180C aromatic:C2S11.963936
181C aromatic:C2F4.4013069
182C aromatic:C2Cl9.1111992
183C aromatic:C2Br11.945953
184C aromatic:C2J17.011918
185C aromatic:C2P10.22107
186C aromatic:C2Si7.664528
187C aromatic:CN:N14.0911
188C aromaticC:N27.2011
189C aromatic:C:NO4.2633
190C aromatic:C:NF5.4511
191C aromatic:C:NCl11.1833
192C aromatic:C:NBr13.8811
193C aromatic:C:NJ20.1411
194C aromaticN:N216.6652
195C aromatic:N2Cl11.8811
196C spH#C4.257367
197C spC#C4.09164111
198C spC#N5.53121104
199C sp#CO1.8255
200C sp#CSi7.3621
201C sp#CCl9.6832
202C sp#CBr12.1522
203C sp#CJ17.2311
204C spN#N11.9422
205C sp#NP−4.4811
206C sp#NS12.6944
207C sp=C24.991010
208C sp=C=O5.8032
209C sp=N215.5911
210C sp=N=O10.161613
211C sp#NO4.5511
212C sp=N=S18.381212
213N sp3H2C−2.38127113
214N sp3H2C(pi)−2.887771
215N sp3H2N4.0588
216N sp3HC2−10.348280
217N sp3HC2(pi)−10.414342
218N sp3HC2(2pi)−10.981313
219N sp3HCN−3.22106
220N sp3HCN(pi)−4.0344
221N sp3HCN(+)(pi)4.1422
222N sp3HCN(2pi)−3.9233
223N sp3HCO−0.7811
224N sp3HSi2−0.1842
225N sp3C3−17.69115101
226N sp3C3(pi)−18.046057
227N sp3C3(2pi)−18.331717
228N sp3C3(3pi)−20.1033
229N sp3C2N−11.1744
230N sp3C2N(pi)−10.9988
231N sp3C2N(2pi)−12.2466
232N sp3C2N(3pi)−13.1611
233N sp3C2N(+)(pi)−3.7022
234N sp3C2N(+)(2pi)−3.9522
235N sp3C2O−8.3311
236N sp3C2P−7.63104
237N sp3C2Si−11.1722
238N sp3CCl2(pi)9.0411
239N sp2H=C−1.8311
240N sp2C=C−9.296056
241N sp2C=N−2.00137
242N sp2C=N(+)0.5666
243N sp2=CN−2.44119
244N sp2=CO−0.501716
245N sp2=CP−7.6711
246N sp2=CS2.8732
247N sp2N=N0.2211
248N sp2N=O5.3966
249N sp2O=O−0.321111
250N(+) sp3HC3−21.2411
251N(+) sp2C=NO(−)−2.1622
252N(+) sp2CO=O(−)−2.949079
253N(+) sp2NO=O(−)0.0066
254N(+) sp2O2=O(−)0.751411
255N aromatic:C2−1.62114101
256N aromatic:C:N0.3563
257N(+) aromatic:C2O(−)0.0011
258N(+) spC#C(−)−3.7433
259N(+) sp=C=N(−)−2.8211
260N(+) sp=N2(−)1.0844
261OHC−5.03516451
262OHC(pi)4.48220210
263OHN0.0022
264OHN(pi)0.721010
265OHO2.6455
266OHS7.5033
267OHP5.6965
268OHSi1.0822
269OBC−22.37186
270OBC(pi)−10.6121
271OC2−13.20392268
272OC2(pi)−3.861009801
273OC2(2pi)5.33104103
274OCN(pi)0.001111
275OCN(+)(pi)0.911411
276OCN(2pi)0.2755
277OCO−5.351510
278OCO(pi)4.6622
279OCP−2.4413457
280OCP(pi)6.373922
281OCS−1.883523
282OCSi−7.238331
283OCSi(pi)1.87178
284OCCl0.5811
285ON2(2pi)−4.2911
286OP27.77106
287OSi2−1.2511429
288P3H2C4.1011
289P3HC20.0011
290P3C3−10.1033
291P3C2Cl−0.6011
292P3CCl213.6733
293P3O3−3.0099
294P3O2Cl5.1911
295P3OCl215.8911
296P4HO2=O1.0955
297P4C2O=O−8.9211
298P4CO2=O−6.4988
299P4CO2=S2.1111
300P4C=OCl220.0411
301P4CNO=O10.0711
302P4N3=O−5.2711
303P4N2O=O−2.9421
304P4N2=OF0.3111
305P4NO2=O4.8911
306P4O3=O−3.702619
307P4O3=O(-)−3.2411
308P4O3=S3.881210
309P4O2=OS−3.4733
310P4O2=OF0.8711
311P4O2=OCl5.3722
312P4O2S=S4.2822
313P4O2=SCl13.2611
314P4O=OCl215.3211
315S2HC0.445646
316S2HC(pi)0.291110
317S2C2−8.615349
318S2C2(pi)−8.172927
319S2C2(2pi)−8.933434
320S2CP3.3655
321S2CS−0.28179
322S2CS(pi)−14.1921
323S2CCl0.0011
324S2N2(2pi)−5.1111
325S2S29.0711
326S4C2=O−7.9433
327S4C2=O2−7.6477
328S4CO=O2−4.381010
329S4C=OCl−8.7611
330S4C=OS1.6011
331S4C=O2F0.8311
332S4C=O2Cl6.9477
333S4N=O2Cl10.2111
334S4O=OCl11.2911
335S4O2=O0.3188
336S4O2=O20.0444
337S4O=O2Cl10.8822
338S4O=O2F5.3511
339SiH3C7.4043
340SiH2C21.3644
341SiH2CCl11.6011
342SiHC3−4.5555
343SiHC2O0.6521
344SiHC2Cl5.6722
345SiHCO25.53196
346SiC4−9.881816
347SiC3N−4.2443
348SiC3O−5.064526
349SiC3F−5.7211
350SiC3Cl−0.461111
351SiC3Br2.6111
352SiC3Si−4.3921
353SiC2N21.4131
354SiC2O2−0.168524
355SiC2SiCl5.0721
356SiC2F2−1.3122
357SiC2Cl29.6099
358SiCO35.011717
359SiCF33.1811
360SiCCl319.471615
361SiCBr328.5211
362SiO410.0055
363SiO3Cl15.1611
364SiOCl325.2811
ABased on 4300
BGoodness of fitR20.9989 4122
CDeviationAverage0.44 4122
DDeviationStandard0.66 4122
EK-fold cvK10.00 4039
FGoodness of fitQ20.9988 4039
GDeviationAverage (cv)0.46 4039
HDeviationStandard (cv)0.70 4039
In view of the large number of experimental data for the calculation of the atom group contributions, their excellent correlation coefficients R2 and Q2 and the solid physical foundation of the refractivity value itself on the molecular volume [8] it is safe to say that experimental refractivity values that deviate by more than 4 times the cross-validated standard deviation (i.e., >2.8 units) from the calculated data, also observed and discussed in detail in Ghose and Crippen’s paper [8], are most probably based on incorrectly measured values of either the refractive index or the density or both or are typing errors in the source text as their deviation can no longer be ascribed to a temperature dependence of the measurements and therefore would require a re-examination. The excellent compliance between experimental and calculated refractivity data of more than 4000 compounds on the other hand—as visualized in Figure 10 and Figure 11—is proof that the present atomic-groups contribution method and the underlying algorithm are appropriate for refractivity calculations as long as one abstains from the attempt to interpret the group contribution values themselves. These results also prove that this group-additivity method is a very reliable tool for the indirect determination of the density of a compound from a simple measurement of its refractive index.
Figure 10. Correlation diagram of refractivity data (10-fold cross-validated: N = 4039, Q2 = 0.9988, slope = 1.0).
Figure 10. Correlation diagram of refractivity data (10-fold cross-validated: N = 4039, Q2 = 0.9988, slope = 1.0).
Molecules 20 18279 g010
Figure 11. Histogram of refractivity data (S = 0.70).
Figure 11. Histogram of refractivity data (S = 0.70).
Molecules 20 18279 g011

3.7. Polarizability

Miller and Savchik [9] were the first to apply an atomic-groups contribution method for the calculation of the molecular polarizability which, however, is only based on the atoms and their degree of hybridisation, neglecting the nature of their neighbourhood atoms. This method requires that the sum of the contributions of the atomic hybrid components is squared and then multiplied by 4/N, where N is the total number of electrons, to receive the molecular polarizability. Although this method is only based on 20 atom group parameters, the deviations between the experimental and calculated molecular polarizabilities are in line with the experimental variances [10].
In contrast to Miller’s approach the present atom groups include—besides the atomic degree of hybridisation—the central atom’s immediate neighbourhood atoms, which on the one hand has the disadvantage of requiring a larger number of atom groups to enable the calculation of a large number of compounds, but on the other hand is easily extendable to new atom groups if required. As will be shown, the results and standard deviation are comparable to Miller’s work [10].
The experimental data for the evaluation of the group contributions, listed in Table 14, are extracted from the Handbook of Chemistry and Physics [85] and Miller’s publication [10], enabling a direct comparison of the results.A table of these results can be accessed in the supplementary material under “Experimental vs Calculated Polarizability Data Table.doc”, the corresponding list of compounds in an SD file called “Compounds List for Polarizability Calculations.sdf”.
Table 14. Atom group Contributions for Polarizability Calculations.
Table 14. Atom group Contributions for Polarizability Calculations.
NrAtom TypeNeighboursContributionOccurrencesMolecules
1Const 0.62406406
2C sp3H3C1.92351219
3C sp3H3N4.671612
4C sp3H3O3.503223
5C sp3H3S3.4263
6C sp3H2C21.80410123
7C sp3H2CN4.522516
8C sp3H2CN(+)4.5622
9C sp3H2CO3.357851
10C sp3H2CS3.1432
11C sp3H2CF2.191111
12C sp3H2CCl3.901917
13C sp3H2CBr4.861514
14C sp3H2CJ7.2633
15C sp3H2O25.4911
16C sp3H2OCl5.6911
17C sp3HC31.801613
18C sp3HC2N4.4311
19C sp3HC2O3.1354
20C sp3HC2Cl3.721212
21C sp3HC2Br5.1311
22C sp3HCNCl(+)7.5622
23C sp3HCO26.3753
24C sp3HCF22.0711
25C sp3HCCl25.7144
26C sp3C41.471310
27C sp3C3N(+)4.2611
28C sp3C3Cl6.1111
29C sp3CF32.6543
30C sp3CF2Cl4.0254
31C sp3CCl37.9033
32C sp3O49.2111
33C sp2H2=C1.963931
34C sp2HC=C1.957040
35C sp2HC=N2.3844
36C sp2HC=O2.0588
37C sp2H=CN2.32139
38C sp2H=CO1.6521
39C sp2H=CS3.1542
40C sp2H=CCl3.80108
41C sp2H=CBr4.8844
42C sp2H=CJ6.7211
43C sp2HN=N4.1165
44C sp2HN=O4.3233
45C sp2HO=O3.1744
46C sp2C2=C1.831814
47C sp2C2=N3.6421
48C sp2C2=O2.481914
49C sp2C=CN2.1344
50C sp2C=CO1.8233
51C sp2C=CCl4.4621
52C sp2CN=N4.6622
53C sp2CN=O3.8888
54C sp2CO=O2.633331
55C sp2C=OCl4.0711
56C sp2=CN23.5522
57C sp2=CF20.2011
58C sp2=CCl25.4322
59C sp2N2=N4.2011
60C sp2N2=O3.4633
61C sp2O2=O3.4722
62C sp2O=OCl4.7222
63C aromaticH:C21.68777130
64C aromaticH:C:N2.51179
65C aromaticH:N22.8611
66C aromatic:C31.9112540
67C aromaticC:C21.5211652
68C aromaticC:C:N2.2243
69C aromatic:C2N3.592724
70C aromatic:C2N(+)3.94118
71C aromatic:C2:N2.35178
72C aromatic:C2O2.502112
73C aromatic:C2S3.4563
74C aromatic:C2F1.514215
75C aromatic:C2Cl3.471812
76C aromatic:C2Br4.49109
77C aromatic:C2J6.4855
78C spH#C1.461210
79C spC#C1.59129
80C spC#N1.922219
81C sp#CCl3.9911
82C sp#CBr5.3111
83C sp=C=O1.8211
84N sp3H2C−1.1376
85N sp3H2C(pi)−0.532522
86N sp3H2N1.4154
87N sp3HC2−3.2933
88N sp3HC2(pi)−3.7855
89N sp3HC2(2pi)−1.24117
90N sp3HCN(pi)−1.1211
91N sp3HCN(2pi)−0.0411
92N sp3C3−6.7333
93N sp3C3(pi)−6.7344
94N sp3C3(2pi)−4.2622
95N sp3C2N(pi)−3.8722
96N sp3C2N(2pi)−2.9533
97N sp2H=C−1.7311
98N sp2C=C−0.9486
99N sp2=CN0.0065
100N sp2O=O1.1111
101N aromatic:C2−0.821913
102N aromatic:C:N0.1421
103N(+) sp2CO=O(−)−0.351613
104OHC−0.771918
105OHC(pi)−0.041313
106OHS2.3821
107OC2−2.713121
108OC2(pi)−1.683431
109OC2(2pi)−0.611110
110OCN(pi)0.0011
111OCS0.5642
112OCP−0.04124
113P3O3−0.6111
114P4O3=O−0.6022
115P4O3=S1.8111
116S2HC1.7011
117S2C20.0622
118S2C2(2pi)−0.5433
119S4C2=O0.2522
120S4C2=O20.0822
121S4O2=O20.0033
ABased on 406
BGoodness of fitR20.995 351
CDeviationAverage0.35 351
DDeviationStandard0.51 351
EK-fold cvK10.00 308
FGoodness of fitQ20.9897 308
GDeviationAverage (cv)0.46 308
HDeviationStandard (cv)0.76 308
It can be seen that, e.g., while Miller [10] only needed one parameter for a tetrahedral carbon (CTE in his term) the present table lists 32 different atom groups for the same type of carbon (C sp3 in this paper’s term) to cover a similar number of compounds. At this point it must be stressed again that for all the calculations of the goodness of fit and the cross validations only atom groups were considered for which the number of representative molecules (shown in the right column of the group-contribution tables) exceeds 2. Nevertheless, as the present calculation method is a simple summing up of the group contributions, the evaluation of a molecular polarizability value can in principle be done manually. The cross-validated standard deviation of 0.76 for the limited number of experimental examples is comparable to the measuring inaccuracies as discussed by Miller [10]. (Due to the relatively small set of compounds for the polarizability calculations a tentative leave-one-out cross validation calculation was carried out which resulted in a Q2 of 0.9901 and a standard deviation of 0.75, based on 312 molecules.) These deviations are also reflected in the dispersion of the data about the regression line in Figure 12 and the relatively wide Gaussian bell form in Figure 13. Nevertheless, the excellent correlation coefficients R2 and Q2 of the cross validation prove that the feasibility of the group-additivity method. The deviations do not correlate with the size of the molecules and, thus, the polarizabilities, however, there is evidence (see Figure 12) that the polycyclic aromatic and heteroaromatic compounds exhibit generally poorer accordance with experiment, an observation which is also reflected in Miller’s results. A reduction of this drift might be achieved if more experimental data for large conjugated molecules were available.
Figure 12. Correlation diagram of polarizability data (10-fold cross-validated: N = 308; Q2 = 0.9897; slope = 0.99).
Figure 12. Correlation diagram of polarizability data (10-fold cross-validated: N = 308; Q2 = 0.9897; slope = 0.99).
Molecules 20 18279 g012
Figure 13. Histogram of polarizability data (S = 0.76).
Figure 13. Histogram of polarizability data (S = 0.76).
Molecules 20 18279 g013

3.8. Aqueous Toxicity

The most commonly used method due to its reliability and robustness for measuring aqueous toxicity is the growth inhibition of the protozoan cilate Tetrahymena pyriformis, defined as pIGC50, where IGC50 expresses the aqueous concentration of a molecule in mmoL/L causing a 50% growth inhibition under static conditions. Reviewing the many efforts mentioned in the introductory chapter to find reasonable physical or physico-chemical descriptors for the prediction of a molecule’s aqueous toxicity, the most evident ones are those which depend on the aqueous solubility, i.e., logPO/W and the molecule’s solubility itself. Ellison et al. [24] presented a plot of experimental toxicity data of 87 saturated alcohols and ketones against their logP (40 logP values of which were calculated), showing for this limited group a correlation coefficient of 0.96. An analogous plot, but on a much larger data basis, where both experimental logP and toxicity data are known, is shown in Figure 14. All the experimental toxicity data were made available in the publication of Ellison et al. [24], while logP and logS data originate from the same sources as in the previous chapters D and E. The linear regression equation pIGC50 = 0.68 × logP − 1.34 in Figure 14 corresponds well with Ellison’s regression formula pIGC50 = 0.78 × logP − 2.01. A direct but inverse correlation between the toxicity and the solubility of molecules is given in Figure 15, with a—rather more indicative—correlation coefficient of 0.6186 and a linear regression equation pIGC50 = −0.58 × logP − 1.03.
Michałowicz and Duda [86], on the other hand, also ascribed the noxious effect of variously substituted phenols to their dissociation constant pKa. This assumption, however, could not be confirmed in this study as Figure 16 illustrates where the experimental pKa values of 115 compounds, extracted from the Handbook of Chemistry and Physics [87], are put in relation to their experimental toxicity data and evidently exhibit no correlation at all.
Figure 14. Correlation diagram of logP against toxicity (N = 335, R2 = 0.7043).
Figure 14. Correlation diagram of logP against toxicity (N = 335, R2 = 0.7043).
Molecules 20 18279 g014
Figure 15. Correlation diagram of logS against toxicity (N = 253, R2 = 0.6186).
Figure 15. Correlation diagram of logS against toxicity (N = 253, R2 = 0.6186).
Molecules 20 18279 g015
Figure 16. Correlation diagram of pKa against toxicity (N = 112, R2 = 0.0282).
Figure 16. Correlation diagram of pKa against toxicity (N = 112, R2 = 0.0282).
Molecules 20 18279 g016
Regarding the promising correlation of the experimental logP and solubility with the toxicity data and the fact that both the former are very successfully predictable by means of the well-established group-additivity method it was obvious to try this method for the direct prediction of the toxicity of molecules without the detour via other descriptors. Table 15 shows the result of this attempt. The goodness of fit Q2 of 0.8404 for 810 cross-validated molecules is clearly better than the correlation coefficient R2 for the logP vs. toxicity correlation and the cross-validated standard deviation S of 0.42 is well within the experimental error range of about 0.5 as was assumed by Ellison et al. [24]. Taking this standard deviation as a benchmark then 78.5% of the experimental values are correctly predicted for those 836 molecules for which the conditions for the group-additivity calculation based on Table 15 are fulfilled and only for 3.6% the predicted exceed the experimental values by more than twice this deviation as can be seen in the enclosed table in the supplementary material named “Experimental vs Calculated Toxicity Data Table.doc”. The associated list of compounds is available at the same location as SD file named “Compounds List for Toxicity Calculations.sdf”.
Table 15. Atom group Contributions for Toxicity Calculations.
Table 15. Atom group Contributions for Toxicity Calculations.
NrAtom TypeNeighboursContributionOccurrencesMolecules
1Const −1.66859859
2C sp3H3C0.24772469
3C sp3H3N0.13125
4C sp3H3O0.497267
5C sp3H3S0.3153
6C sp3H2C20.34986313
7C sp3H2CN0.08107
8C sp3H2CN(+)0.5544
9C sp3H2CO0.58205188
10C sp3H2CS0.343118
11C sp3H2CCl0.311313
12C sp3H2CBr0.751514
13C sp3H2CJ0.8622
14C sp3HC30.146358
15C sp3HC2O0.455150
16C sp3HC2S0.0011
17C sp3HC2Cl−0.0711
18C sp3HC2Br0.7543
19C sp3HCCl20.3511
20C sp3HCBr20.8811
21C sp3C40.203227
22C sp3C3O0.422322
23C sp3C3N0.2111
24C sp3C2O21.0611
25C sp3CF30.8244
26C sp3CCl3−0.0311
27C sp2H2=C0.093130
28C sp2HC=C0.208457
29C sp2HC=N0.4822
30C sp2HC=O0.052121
31C sp2H=CO0.2798
32C sp2H=CS0.391811
33C sp2HO=O−0.1177
34C sp2C2=C0.271110
35C sp2C2=N0.2444
36C sp2C2=O−0.516262
37C sp2C=CO0.2176
38C sp2C=CS0.6454
39C sp2C=CBr0.7511
40C sp2CN=O0.282525
41C sp2CN=S1.2311
42C sp2CO=O−0.04122116
43C sp2=CO20.1811
44C sp2=CSCl0.4311
45C aromaticH:C20.232322569
46C aromaticH:C:N0.064427
47C aromaticC:C20.29485362
48C aromatic:C30.234422
49C aromaticC:C:N0.0088
50C aromatic:C2N1.066058
51C aromatic:C2N(+)1.33135105
52C aromatic:C2:N0.6764
53C aromatic:C2O0.48360282
54C aromatic:C2S0.4399
55C aromatic:C2F0.527539
56C aromatic:C2Cl0.76209114
57C aromatic:C2Br0.806950
58C aromatic:C2J1.181311
59C aromatic:C:NF0.0253
60C aromatic:C:NCl0.4722
61C aromatic:C:NBr0.6711
62C spH#C0.0988
63C spC#C0.171411
64C spC#N−0.334341
65C sp=N=S0.8711
66N sp3H2C−0.6133
67N sp3H2C(pi)−0.906665
68N sp3H2N−0.0511
69N sp3HC2(pi)−0.9355
70N sp3HC2(2pi)−1.7144
71N sp3HCN(pi)0.0011
72N sp3HCO(pi)0.0811
73N sp3C3−0.5431
74N sp3C3(pi)−1.0433
75N sp2C=C0.0011
76N sp2=CO−0.4366
77N sp2C=O−0.0711
78N(+) sp2CO=O(−)−0.50139109
79N aromatic:C2−0.293330
80OHC−1.05163149
81OHC(pi)−0.07295254
82OC2−1.1243
83OC2(pi)−0.60182165
84OHN0.0711
85OHN(pi)0.0166
86OC2(2pi)−0.301515
87S2HC0.1164
88S2C20.0265
89S2C2(pi)−0.1066
90S2C2(2pi)−0.151311
91S4C2=O−1.3233
92S4C2=O2−1.2244
ABased on 859
BGoodness of fitR20.8665 836
CDeviationAverage0.29 836
DDeviationStandard0.39 836
EK-fold cvK10.00 810
FGoodness of fitQ20.8404 810
GDeviationAverage (cv)0.31 810
HDeviationStandard (cv)0.42 810
A comparison of these results with published data is difficult as the latter are either based on only a limited set of structures, on a small basis of compounds or on an entirely different approach. Nevertheless, a few numbers should provide an idea as to how classify the present result: Schultz [21] calculated an equation for the toxicity based on logP and the superdelocalizability of 197 benzene derivatives yielding in a correlation coefficient R2 of 0.816 and a standard deviation S of 0.34. Melagraki et al. [23] trained an RBF neural network to yield an equation for the toxicity calculation founded on the logP, pKa, ELUMO, EHOMO and Nhdon values of 180 phenols with an R2 of 0.6022 and a root mean square of 0.5352. Duchowicz et al. [22] published the results of the QSAR calculations of 200 phenol derivatives to give a seven-parameters equation with a R2 of 0.7242 (R = 0.851) and an S of 0.442. Finally, Ellison et al. [24], who only derived a compound’s toxicity from its logP value found an equation for 87 saturated alcohols and ketones which yielded an R2 of 0.96 and an S of 0.20.
Tentatively, a validation test was carried out applying the leave-one-out method yielding a Q2 of 0.8409 and a standard deviation of again 0.42, based on 816 molecules. A tentative extention of the atom groups in Table 15 by the “pseudo atom” types as used in Table 8 for the calculation of logP (i.e., “H”, “Alkane”, “Unsaturated HC” and “X(CH2)n”)—combined or one by one—interestingly either had no effect or even led to a deterioration of the goodness of fit.
Figure 17 and Figure 18 illustrate the correlation diagram and histogram of the toxicity calculations. The slope of 0.85 in Figure 17, calculated from the training set, reflects the slightly lower correlation between experimental and predicted values. (An analogous calculation of the slope using the cross-validated data yielded a slope of 0.84.).
Figure 17. Correlation diagram of toxicity data (10-fold cross-validated: N = 810, Q2 = 0.8404, slope = 0.85).
Figure 17. Correlation diagram of toxicity data (10-fold cross-validated: N = 810, Q2 = 0.8404, slope = 0.85).
Molecules 20 18279 g017
Figure 18. Histogram of toxicity data (S = 0.42).
Figure 18. Histogram of toxicity data (S = 0.42).
Molecules 20 18279 g018

3.9. Blood-Brain Barrier

The blood-brain barrier is literally a “hard nut” to crack, not only for the molecules which are supposed to penetrate it but also for the theoretician who tries to find a reliable tool for the prediction of their potential to enter the brain tissue as is evident upon reviewing the many attempts to define suitable molecular descriptors to start with described in the introductory chapter. Interestingly, some of the most commonly applied and seemingly logical descriptors such as logPO/W, polar surface area (PSA), solvent-accessible surface area (SASA) or molecular polarizabilty exhibit no correlation to speak of with the blood-brain distribution ratio logBB, as has already been stated by Lanevskij et al. [39] for logPO/W and as is shown in Figure 19, Figure 20, Figure 21 and Figure 22.
The experimental logBB data are collected from the references [27,28,29,30,31,32,33,34,35,36,37,38,39,40], logP data originate from the same sources as in chapter D, PSA and SASA values are calculated internally using an approximation function (see Appendix), and experimental polarizabilty data are taken from the Handbook of Chemistry and Physics [85] and Miller’s [10] publication.
Figure 19. Correlation diagram of logP against logBB (N = 198, R2 = 0.2815).
Figure 19. Correlation diagram of logP against logBB (N = 198, R2 = 0.2815).
Molecules 20 18279 g019
Figure 20. Correlation diagram of polar surface area (PSA) against logBB (N = 438, R2 = 0.3335).
Figure 20. Correlation diagram of polar surface area (PSA) against logBB (N = 438, R2 = 0.3335).
Molecules 20 18279 g020
Figure 21. Correlation diagram of solvent-accessible surface area (SASA) against logBB (N = 493, R2 = 0.0334).
Figure 21. Correlation diagram of solvent-accessible surface area (SASA) against logBB (N = 493, R2 = 0.0334).
Molecules 20 18279 g021
Figure 22. Correlation diagram of molecular polarizability against logBB (N = 49, R2 = 0.2717).
Figure 22. Correlation diagram of molecular polarizability against logBB (N = 49, R2 = 0.2717).
Molecules 20 18279 g022
It therefore seemed reasonable to abstain from any attempt to base logBB-prediction calculations on other etablished molecular descriptors and proceed with the group-additivity method as described earlier, which is very similar to H. Sun’s [12] method. While Sun applied his three-component model on only 57 compounds, yielding a correlation coefficient R2 of 0.897, a 7-fold cross-validated Q2 of 0.504 and root-mean square error of 0.259, the present calculation extended over 487 molecules and resulted in a goodness of fit R2 of 0.6991 for the evaluable training set of 413 molecules, and yielded a 10-fold cross-validated Q2 of 0.4786 and a deviation of 0.52 for the test set of 385 molecules. The large difference between R2 and Q2 is ominous and indicates the limits of the present group-additivity method. A leave-one-out cross-validation calculation produced a marginally better Q2 of 0.4825 but left the standard deviation unchanged. Since in general, as Sun [12] stated in his paper, a value of Q2 below 0.5 is regarded as at best statistically meaningful but no longer representative for a good model, the complete list of 176 atom groups and their contribution has been omitted from Table 16 presented below. It therefore only lists the result of the least-squares and 10-fold cross-validation calculations. The complete list is available in the supplementary material under the name of “LogBB Parameters Table.doc”. The associated list of results is viewable at the same location under the name of “Experimental vs Calculated LogBB Data Table.doc” and the corresponding list of compounds as SD file with the name of “Compounds List for LogBB Calculations.sdf”.
Table 16. Results of the logBB Calculations.
Table 16. Results of the logBB Calculations.
NrAtom TypeNeighboursContributionOccurrencesMolecules
1Const 0.21486486
2C sp3H3C0.06519255
..................
ABased on 486
BGoodness of fitR20.6991 413
CDeviationAverage0.30 413
DDeviationStandard0.39 413
EK-fold cvK10.00 385
FGoodness of fitQ20.4786 385
GDeviationAverage (cv)0.40 385
HDeviationStandard (cv)0.52 385
Figure 23 illustrates the large dispersion of the training and particularly the cross-validated data about the regression line which exhibits a slope of 0.70. The distribution of the deviations, shown in the histogram (Figure 24), nearly extends over the complete experimental values range of between −2.15 and +1.6. In conclusion, it is obvious to see that the present group-additivity model is too inaccurate for the prediction of logBB for an unlimited scope of molecular structures. On the other hand, reviewing the many publications which base their predictions either on too few examples or on models that are at best useful for only a very limited structural diversity or even rest on inappropriate parameters visualized above, it follows that a universal approach for the prediction of logBB for the complete spectrum of medicinal chemistry is still outstanding.
Figure 23. Correlation diagram of logBB data (10-fold cross-validated: N = 385; Q2 = 0.4786; slope = 0.70).
Figure 23. Correlation diagram of logBB data (10-fold cross-validated: N = 385; Q2 = 0.4786; slope = 0.70).
Molecules 20 18279 g023
Figure 24. Histogram of logBB data (S = 0.53).
Figure 24. Histogram of logBB data (S = 0.53).
Molecules 20 18279 g024

4. Conclusions

A generally applicable computer algorithm based on the well-established group-additivity method has been presented and has been applied for the calculation of the seven molecular descriptors heat of combustion, logP, logS, molar refractivity, molecular polarizability, aqueous toxicity and logBB. An eighth descriptor, the heat of formation, was calculated indirectly using the calculated value of the heat of formation. The definition of the atom groups has been set up in a way that allowed a straightforward program code of the computer algorithm except for the special groups for which, however, code development could take advantage of the information of the 3D-molecular structures stored in the molecules database. The complete algorithm, realized in ChemBrain IXL, thus enables the computation of the contributions of all the atom groups as well as all the described special groups for descriptor evaluations; their inclusion, however, is governed by their presence or absence in the respective parameters tables. Within this context it is worth mentioning that for the prediction of the refractivity, molecular polarizability and toxicity in principle a 3D geometry is not required.
The present group-additivity algorithm has shown its versatility in that it is capable of producing results at once that are in good to excellent agreement with experimental data for six of the seven title descriptors. The present study has also shown the limits of the group-additivity method as such in an area where too many unknown or incalculable factors influence the experimental data as has been exemplified for logBB.
The number of molecules in the database—at present about 20,700—which encompasses a representative collection of organic and metal-organic compounds of commercial as well as scientific relevance and which has all the referenced data stored, and the amount of compounds for which the title descriptors could be evaluated under the given constraints provides an accountable estimate of the scope of applicability of each of the presented tables of group contributions. For the heat of combustion and formation it is ca. 75%, for logP ca. 84%, for logS ca. 73%, for the molecular polarizability ca. 42%, for the refractivity ca. 75% and for the toxicity ca. 41%. These percentage numbers evidently reflect the number of experimental data available at present. There is no doubt, however, that even with a larger database of compounds for the calculation of the group contributions there is a limit to the improvement of the accuracy of the predictions on the basis of this method, not only because there is little hope that the existing experimental databases and their deficiencies will be re-examined in the laboratories but also because of influences on the results that can principally not be dealt with by this method, as there are non-neighbouring effects (e.g., gauche or cis), intramolecular charge effects or non-bonded interactions.
In view of these facts there is truth in the words which Cohen and Benson [10] stated in their closing remarks saying that the atom group additivity method is “a useful tool for making rapid property estimates or for checking the likely reliability of existing measurements”.

Supplementary Materials

Supplementary materials can be accessed at: https://www.mdpi.com/1420-3049/20/10/18279/s1.

Acknowledgments

The author is indebted to the library of the University of Basel for allowing him full and free access to the electronic literature database.

Conflicts of Interest

The author declares no conflict of interest.

Appendix

The database of ChemBrain IXL has the compounds stored as three-dimensional structures. Therefore each new input undergoes optimization of its three-dimensional form before being stored. New compounds can be added to the database either by importing them from external standard MOL or SD files or manually on a self-explanatory, easy-to-use “three-dimensional” drawing board. 3D-structure optimization is carried out within ChemBrain IXL by means of a force-field method called “steepest descent” [88], where the search direction to the energy minimum is given at each current atomic position by the first derivative of Equation (A1).
E = E s t r + E a n g + E t o r + E v d w + E o o p
This function denotes the sum of all bond stretching (Estr), bond angle (Eang), torsional angle (Etor), Van-der-Waals interaction (Evdw) and out-of-plane bending (Eoop) energies. Optionally, the algorithm scans the hyper-surface for the compound’s global energy minimum.
At present, optionally, besides seven of the eight descriptors presented here four more descriptors of a compound, which can be derived directly from its structure, are calculated immediately after addition and completion of the 3D-structure optimization and then entered into the molecule’s descriptors list: molecular volume, molecular surface, polar surface area (PSA) and solvent-accessible surface area (SASA). The molecular volume is defined by the Van-der-Waals radii of the atoms and its value is approximated numerically by scanning a small but defined cube through the entire spacial box defined by the total width, length and height of the molecule and adding up those cubes which lie inside the range of any atom’s VdW radius. For fhe calculation of the molecular surface the approximation Equation (A2) is used, where A is the total molecular surface, rj is the corresponding radius of atom j, Nj is the number of points evenly distributed on atom j’s sphere and nj is the number of those points which are not occluded by the spheres of other atoms.
A = 4 π j r j 2 n j N j
The calculation of SASA is based on the same function but assumes an extended radius for each atom accounting for the radius of the surrounding solvent molecules, which by default is taken as 1.5 Angstroms, approximately the value of water. For the calculation of PSA again the same function is used but the sum is limited to the VdW surfaces of the polar atoms oxygen, nitrogen, sulfur, phosphorus and hydrogen attached to the former atoms as suggested by Ertl et al. [89].
The present work is part of a project called ChemBrain IXL available from Neuronix Software (www.neuronix.ch, Rudolf Naef, Lupsingen, , Switzerland).

References

  1. Pauling, L. Nature of the Chemical Bond; Cornell University Press: Ithaca, NY, USA, 1940; pp. 47–58. [Google Scholar]
  2. Klages, F. Über eine Verbesserung der additiven Berechnung von Verbrennungswärmen und der Berechnung der Mesomerie-Energie aus Verbrennungswärmen. Chem. Ber. 1949, 82, 358–375. [Google Scholar] [CrossRef]
  3. Wheland, G.W. Theory of Resonance; Wiley: New York, NY, USA, 1944; pp. 52–87. [Google Scholar]
  4. Broto, P.; Moreau, G.; Vandycke, C. Molecular structure: Perception, autocorrelation descriptor and SAR studies: System of atomic contributions for the calculation of the n-octanol/water partition coefficients. Eur. J. Med. Chem. Chim. Ther. 1984, 19, 71–78. [Google Scholar]
  5. Fujita, T.; Iwasa, J.; Hansch, C. A new substituent constant, π, derived from partition coefficients. J. Am. Chem. Soc. 1964, 86, 5175–5180. [Google Scholar] [CrossRef]
  6. Ghose, A.K.; Crippen, G.M. Atomic physicochemical parameters for three-dimensional structure-directed quantitative structure-activity relationships I. Partition coefficients as a measure of hydrophobicity. J. Comput. Chem. 1986, 7, 565–577. [Google Scholar] [CrossRef]
  7. Ghose, A.K.; Pritchett, A.; Crippen, G.M. Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships III: Modeling hydrophobic interactions. J. Comput. Chem. 1988, 9, 80–90. [Google Scholar]
  8. Ghose, A.K.; Crippen, G.M. Atomic Physicochemical parameters for three-dimensional structure-directed quantitative structure-activity relationships. 2. Modeling dispersive and hydrophobic interactions. J. Chem. Inf. Comput. Sci. 1987, 27, 21–35. [Google Scholar] [CrossRef] [PubMed]
  9. Miller, K.J.; Savchik, J.A. A new empirical Method to calculate Average Molecular Polarizabilities. J. Am. Chem. Soc. 1979, 101, 7206–7213. [Google Scholar] [CrossRef]
  10. Miller, K.J. Additivity methods in molecular polarizability. J. Am. Chem. Soc. 1990, 112, 8533–8542. [Google Scholar] [CrossRef]
  11. Klopman, G.; Wang, S.; Balthasar, D.M. Estimation of aqueous solubility of organic molecules by the group contribution approach, application to the study of biodegradation. J. Chem. Inf. Comput. Sci. 1992, 32, 474–482. [Google Scholar]
  12. Sun, H. A universal molecular descriptor system for prediction of LogP, LogS, LogBB, and absorption. J. Chem. Inf. Comput. Sci. 2004, 44, 748–757. [Google Scholar] [CrossRef] [PubMed]
  13. Janecke, E. Die Verbrennungs-und bildungswärmen organischer Verbindungen in Beziehung zu ihrer Zusammensetzung. Z. Elektrochem. 1934, 40, 462–469. [Google Scholar]
  14. Jones, W.H.; Starr, C.E. Determination of heat of combustion of gasolines. Ind. Eng. Chem. Anal. Ed. 1941, 13, 287–290. [Google Scholar] [CrossRef]
  15. Hougen, O.A.; Watson, K.M. Chemical Process Principles Part II; Wiley: New York, NY, USA, 1947; pp. 758–765. [Google Scholar]
  16. Kharash, M.S. Heats of combustion of organic compounds. J. Res. Bur. Stand. 1929, 2, 359–430. [Google Scholar]
  17. Kharash, M.S.; Sher, B. The electronic conception of valence and heats of combustion of organic compounds. J. Phys. Chem. 1925, 29, 625–658. [Google Scholar] [CrossRef]
  18. Handrick, G.R. Heats of combustion of organic compounds. Ind. Eng. Chem. 1956, 48, 1366–1374. [Google Scholar] [CrossRef]
  19. Ohlinger, W.S.; Klunzinger, P.E.; Deppmeier, B.J.; Hehre, W.J. Efficient calculation of heats of formation. J. Phys. Chem. A 2009, 113, 2165–2175. [Google Scholar] [PubMed]
  20. Cohen, N.; Benson, S.W. Estimation of heats of formation of organic compounds by additivity methods. Chem. Rev. 1993, 93, 2419–2438. [Google Scholar] [CrossRef]
  21. Schultz, T.W. Structure-toxicity relationships for benzenes evaluated with tetrahymena pyriformis. Chem. Res. Toxicol. 1999, 12, 1262–1267. [Google Scholar] [CrossRef] [PubMed]
  22. Duchowicz, P.R.; Mercader, A.G.; Fernández, F.M.; Castro, E.A. Prediction of aqueous toxicity for heterogeneous phenol derivatives by QSAR. Chemom. Intell. Lab. Syst. 2008, 90, 97–107. [Google Scholar] [CrossRef]
  23. Melagraki, G.; Afantitis, A.; Makridima, K.; Sarimveis, H.; Igglessi-Markopoulou, O. Prediction of toxicity using a novel RBF neural network training methodology. J. Mol. Model. 2006, 12, 297–305. [Google Scholar] [CrossRef] [PubMed]
  24. Ellison, C.M.; Cronin, M.T.D.; Madden, J.C.; Schultz, T.W. Definition of the structural domain of the baseline non-polar narcosis model for Tetrahymena pyriformis. SAR QSAR Environ. Res. 2008, 19, 751–783. [Google Scholar] [CrossRef] [PubMed]
  25. Pasha, F.A.; Srivastava, H.K.; Singh, P.P. Comparative QSAR study of phenol derivatives with the help of density functional theory. Bioorg. Med. Chem. 2005, 13, 6823–6829. [Google Scholar] [CrossRef] [PubMed]
  26. Luco, J.M. Prediction of the brain-blood distribution of a large set of drugs from structurally derived descriptors using partial least-squares (PLS) modeling. J. Chem. Inf. Comput. Sci. 1999, 39, 396–404. [Google Scholar] [CrossRef] [PubMed]
  27. Fu, X.C.; Song, Z.F.; Fu, C.Y.; Liang, W.Q. A simple predictive model for blood-brain barrier penetration. Pharmazie 2005, 60, 354–358. [Google Scholar] [PubMed]
  28. Rose, K.; Hall, L.H.; Kier, L.B. Modeling blood-brain barrier partitioning using the electrotopological state. J. Chem. Inf. Comput. Sci. 2002, 42, 651–666. [Google Scholar] [PubMed]
  29. Keserü, G.M.; Molnar, L. High-throughput prediction of blood-brain partitioning: A thermodynamic approach. J. Chem. Inf. Comput. Sci. 2001, 41, 120–128. [Google Scholar] [CrossRef] [PubMed]
  30. Carpenter, T.S.; Kirshner, D.A.; Lau, E.Y.; Wong, S.E.; Nilmeier, J.P.; Lightstone, F.C. A method to predict blood-brain barrier permeability of drug-like compounds using molecular dynamics simulations. Biophys. J. 2014, 107, 630–641. [Google Scholar] [CrossRef] [PubMed]
  31. Hou, T.; Xu, X. ADME evaluation in drug discovery; 1. Applications of genetic algorithms to the prediction of blood-brain partitioning of a large set of drugs. J. Mol. Model. 2002, 8, 337–349. [Google Scholar] [PubMed]
  32. Chen, Y.; Zhu, Q.-J.; Pan, J.; Yang, Y.; Wu, X.-P. A prediction model for blood-brain barrier permeation and analysis on its parameter biologically. Comput. Methods Programs Biomed. 2009, 95, 280–287. [Google Scholar] [CrossRef] [PubMed]
  33. Garg, P.; Verma, J. In silico prediction of blood brain barrier permeability: An artificial neural networkmodel. J. Chem. Inf. Model. 2006, 46, 289–297. [Google Scholar] [CrossRef] [PubMed]
  34. Van Damme, S.; Langenaeker, W.; Bultinck, P. Prediction of blood-brain partitioning: A model based on ab initio calculated quantum chemical descriptors. J. Mol. Gr. Model. 2008, 26, 1223–1236. [Google Scholar] [CrossRef] [PubMed]
  35. Clark, D.E. Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 2. Prediction of blood-brain barrier penetration. J. Pharm. Sci. 1999, 88, 815–821. [Google Scholar] [CrossRef] [PubMed]
  36. De Sá, M.M.; Pasqualoto, K.F.M.; Rangel-Yagui, C.O. A 2D-QSPR approach to predict blood-brain barrier penetration of drugs acting on the central nervous system. Braz. J. Pharm. Sci. 2010, 46, 741–751. [Google Scholar] [Green Version]
  37. Vilar, S.; Chakrabarti, M.; Costanzi, S. Prediction of passive blood-brain partitioning: Straightforward and effective classification models based on in silico derived physicochemical descriptors. J. Mol. Gr. Model. 2010, 28, 899–903. [Google Scholar] [CrossRef] [PubMed]
  38. Bujaka, R.; Struck-Lewicka, W.; Kaliszan, M.; Kaliszan, R.; Markuszewski, M.J. Blood-brain barrier permeability mechanisms in view of quantitativestructure-activity relationships (QSAR). J. Pharm. Biomed. Anal. 2015, 108, 29–37. [Google Scholar] [CrossRef] [PubMed]
  39. Lanevskij, K.; Dapkunas, J.; Juska, L.; Japertas, P.; Didziapetris, R. QSAR analysis of blood-brain distribution: The influence of plasma and brain tissue binding. J. Pharm. Sci. 2011, 100, 2147–2160. [Google Scholar] [CrossRef]
  40. Benson, S.W. Thermochemical Kinetics, 2nd ed.; Wiley: New York, NY, USA, 1976. [Google Scholar]
  41. Klopman, G.; Li, J.-Y.; Wang, S.; Dimayuga, M. Computer automated log P calculations based on an extended group contribution approach. J. Chem. Inf. Comput. Sci. 1994, 34, 752–781. [Google Scholar] [CrossRef]
  42. Domalski, E.S. Selected values of heats of combustion and heats of formation of organic compounds containing the elements C, H, N, O, P and S. J. Phys. Chem. Ref. Data 1972, 1, 221–277. [Google Scholar] [CrossRef]
  43. Young, J.A.; Keith, J.E.; Stehle, P.; Dzombak, W.C.; Hunt, H. Heats of combustion of some organic nitrogen compounds. Ind. Eng. Chem. 1956, 48, 1375–1378. [Google Scholar] [CrossRef]
  44. Ovchinnikov, V.V. Thermochemistry of heteroatomic compounds: Analysis and calculation of thermodynamic functions of organic compounds of V–VII groups of Mendeleev’s Periodic table. Am. J. Phys. Chem. 2013, 2, 60–71. [Google Scholar] [CrossRef]
  45. Cox, J.D.; Gundry, H.A.; Head, A.J. Thermodynamic properties of fluorine compounds part 1—Heats of combustion of p-fluorobenzoic acid, pentafluorobenzoic acid, hexafluorobenzene and decafluorocyclohexene. Trans. Faraday Soc. 1964, 60, 653–665. [Google Scholar] [CrossRef]
  46. Smith, N.K.; Gorin, G.; Good, W.D.; McCullough, J.P. The heats of combustion, sublimation, and formation of four dihalobiphenyls. J. Phys. Chem. 1964, 68, 940–946. [Google Scholar] [CrossRef]
  47. Shaub, W.M. Estimated thermodynamic functions for some chlorinated benzenes, phenols and dioxins. Thermochim. Acta 1982, 58, 11–44. [Google Scholar] [CrossRef]
  48. Bjellerup, L. On the accuracy of heat of combustion data obtained with a precision moving bomb calorimetric method for organic bromine compounds. Acta Chem. Scand. 1961, 15, 121–140. [Google Scholar] [CrossRef]
  49. Swain, H.A., Jr.; Silbert, L.S.; Miller, J.G. The heats of combustion of aliphatic long chain peroxyacids, t-butyl peroxyesters, and related acids and esters. J. Am. Chem. Soc. 1964, 86, 2562–2566. [Google Scholar] [CrossRef]
  50. Tannenbaum, S.; Kaye, S.; Lewenz, G.F. Synthesis and properties of some alkylsilanes. J. Am. Chem. Soc. 1953, 75, 3753–3757. [Google Scholar]
  51. Good, W.D.; Lacina, J.L.; DePrater, B.L.; McCullough, J.P. A new approach to the combustion calorimetry of silicon and organosilicon compounds: Heats of formation of quartz, fluoro silicic acid, and hexamethyldisiloxane. J. Phys. Chem. 1964, 68, 579–586. [Google Scholar] [CrossRef]
  52. NIST National Institute of Standards and Technology Data Gateway. Available online: http://srdata.nist.gov/gateway/ (accessed on 1 May 2015).
  53. Standard Thermodynamic Properties of Chemical Substances. In CRC Handbook of Chemistry and Physics; Lide, D.R. (Ed.) CRC Press LLC: Boca Raton, FL, USA, 2005; pp. 5-5–5-60.
  54. Visvanadhan, V. N.; Ghose, A.K.; Revankar, G.R.; Robins, R.K. Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships. 4. Additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibiotics. J. Chem. Inf. Comput. Sci. 1989, 29, 163–172. [Google Scholar]
  55. Skinner, H.A. Key heat of formation data. Pure Appl. Chem. 1964, 8, 113–130. [Google Scholar] [CrossRef]
  56. Domalski, E.S.; Hearing, E.D. Estimation of the thermodynamic properties of hydrocarbons at 298.15 K. J. Phys. Chem. Ref. Data 1988, 17, 1637. [Google Scholar] [CrossRef]
  57. Rau, H. Über die Fluoreszenz p-substituierter adsorbierter Azoverbindungen. Ber. Bunsenges. Phys. Chem. 1971, 75, 1343–1347. [Google Scholar] [CrossRef]
  58. Yoshihiro, K.; Hohi, L.; Akio, K. Direct evidence for the site of protonation of 4-aminoazobenzene by nitrogen-15 and carbon-13 nuclear magnetic resonance spectroscopy. J. Phys. Chem. 1980, 84, 3417–3423. [Google Scholar]
  59. Kelemen, J.; Moss, S.; Sauter, H.; Winkler, T. Azo-Hydrazone Tautomerism in Azo Dyes. II. Raman, NMR and Mass Spectrometric Investigations of 1-Phenylazo-2-naphthylamine and 1-Phenylazo-2-naphthol Derivatives. Dyes Pigm. 1982, 3, 27–47. [Google Scholar] [CrossRef]
  60. Kelemen, J. Azo-Hydrazone Tautomerism in Azo Dyes. I. A Comparative Study of 1-Phenylazo-2-naphthol and 1-Phenylazo-2-naphthylamine Derivatives by Electronic Spectroscopy. Dyes Pigm. 1981, 2, 73–91. [Google Scholar] [CrossRef]
  61. Reeves, R.L.; Kaiser, R.S. Selective solvation of hydrophobic ions in structured solvents. Azo-hydrazone tautomerism of azo dyes in aqueous organic solvents. J. Org. Chem. 1970, 35, 3670–3675. [Google Scholar] [CrossRef]
  62. Yatsenko, A.V. The structures of organic molecules in crystals: Simulations using the electro-static potential. Rus. Chem. Rev. 2005, 74, 521. [Google Scholar] [CrossRef]
  63. Hine, J.; Arata, K. Keto-enol-tautomerism. II. The calorimetrical determination of the equi-librium constants for keto-enol tautomerism for cyclohexanone and acetone. Bull. Chem. Soc. Jpn. 1976, 49, 3089–3092. [Google Scholar] [CrossRef]
  64. Hine, J.; Arata, K. Keto-enol-tautomerism. I. The calorimetrical determination of the equi-librium constants for keto-enol tautomerism for cyclopentanone. Bull. Chem. Soc. Jpn. 1976, 49, 3085–3088. [Google Scholar] [CrossRef]
  65. Allen, G.; Dwek, R.A. An n.m.r. study of keto-enol tautomerism in β-diketones. J. Chem. Soc. B 1966, 161–163. [Google Scholar] [CrossRef]
  66. Dudek, G.O.; Dudek, E.P. Spectroscopic Studies of Keto-Enol Equilibria. IX. N15-Substi-tuted Anilides. J. Am. Chem. Soc. 1966, 88, 2407–2412. [Google Scholar] [CrossRef]
  67. Zhu, L.; Bozzelli, J.W. Kinetics and thermochemistry for the gas-phase keto-enol tauto-merism of phenol ↔ 2,4-cyclohexadienone. J. Phys. Chem. 2003, 107, 3696–3703. [Google Scholar] [CrossRef]
  68. Katritzky, A.R.; Szafran, M. AM1 study of the tautomerism of 2- and 4-pyridones and their thio-analogs. J. Mol. Struct. THEOCHEM 1989, 184, 179–192. [Google Scholar] [CrossRef]
  69. Schlegel, H.B.; Gund, P.; Fluder, E.M. Tautomerization of Formamide, 2-Pyridone, and 4-Pyridone: An ab Initio Study. J. Am. Chem. Soc. 1982, 104, 5347–5351. [Google Scholar] [CrossRef]
  70. Moreno, M.; Miller, W.H. On the tautomerization reaction 2-pyridone-2-hydroxypyridine: An ab initio study. Chem. Phys. Lett. 1990, 171, 475–479. [Google Scholar] [CrossRef]
  71. Claus, A. CLXIII. Zur Kenntniss des Carbostyrils und seiner Derivate, ein Beitrag zur Lösung der Tautomerie-frage. J. Prakt. Chem. 1896, 53, 325–334. [Google Scholar]
  72. Hartley, W.N.; Dobbie, F.R.S.; Dobbie, J.J. LXII—A study of the absorption spectra of isatin, carbostyril, and their alkyl derivatives in relation to tautomerism. J. Chem. Soc. Trans. 1899, 75, 640–661. [Google Scholar] [CrossRef]
  73. Fabian, W.M.F.; Niederreiter, K.S.; Uray, G.; Stadlbauer, W. Substituent effects on absorption and fluorescence spectra of carbostyrils. J. Mol. Struct. 1999, 477, 209–220. [Google Scholar] [CrossRef]
  74. Leo, A.J. Calculating log Poct from structures. Chem. Rev. 1993, 93, 1281–1306. [Google Scholar] [CrossRef]
  75. Wang, R.; Fu, Y.; Lai, L. A new atom-additive method for calculating partition coefficients. J. Chem. Inf. Comput. Sci. 1997, 37, 615–621. [Google Scholar] [CrossRef]
  76. Hou, T.J.; Xu, X.J. ADME evaluation in drug discovery. 2. Prediction of partition coeffi-cient by atom-additive approach based on atom-weighted solvent accessible surface areas. J. Chem. Inf. Comput. Sci. 2003, 43, 1058–1067. [Google Scholar] [CrossRef] [PubMed]
  77. Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 1997, 23, 3–25. [Google Scholar] [CrossRef]
  78. Sangster, J. Octanol-water partition coefficients of simple organic compounds. J. Phys. Chem. Ref. Data 1989, 18, 1111–1229. [Google Scholar] [CrossRef]
  79. Reichardt, C. Solvents and Solvent Effects in Organic Chemistry, 3rd ed.; Wiley-VCH: New York, NY, USA, 2003; p. 123. [Google Scholar]
  80. Banerjee, S.; Yalkowsky, S.H.; Valvani, S.C. Water solubility and octanol/water partition coefficients of organics. Limitations of the soh bility-part it ion coefficient correlation. Environ. Sci. Technol. 1980, 14, 1227–1229. [Google Scholar]
  81. Hou, T.; Xia, K.; Zhang, W.; Xu, X. ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach. J. Chem. Inf. Comp. Sci. 2004, 44, 266–275. [Google Scholar] [CrossRef] [PubMed]
  82. Wang, J.; Krudy, G.; Hou, T.; Holland, G.; Xu, X. Development of reliable aqueous solubility models and their application in drug-like analysis. J. Chem. Inf. Model. 2007, 47, 1395–1404. [Google Scholar] [CrossRef]
  83. The ADME databases. Available online: http://modem.ucsd.edu/adme/databases/databases_logS.htm (accessed on 1 May 2015).
  84. Physical Constants of Organic Compounds. In CRC Handbook of Chemistry and Physics; Lide, D.R. (Ed.) CRC Press: Boca Raton, FL, USA, 2005; pp. 3-1–3-740.
  85. Atomic and Molecular Polarizabilities. In CRC Handbook of Chemistry and Physics; Lide, D.R. (Ed.) CRC Press: Boca Raton, FL, USA, 2005; pp. 10-1–10-182.
  86. Michałowicz, J.; Duda, W. Phenols—Sources and toxicity. Pol. J. Environ. Stud. 2007, 16, 347–362. [Google Scholar]
  87. Dissociation Constants of Organic Acids and Bases. In CRC Handbook of Chemistry and Physics; Lide, D.R. (Ed.) CRC Press: Boca Raton, FL, USA, 2005; pp. 10-1–10-182.
  88. Ermer, O. Calculation of Molecular Properties Using Force Fields. Applications in Organic Chemistry. In Bonding Forces; Series Structure and Bonding; Springer: Berlin/Heidelberg, Germany, 2005; Volume 27, pp. 161–211. [Google Scholar]
  89. Ertl, P.; Rohde, B.; Selzer, P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 2000, 43, 3714–3717. [Google Scholar] [CrossRef] [PubMed]
  • Sample Availability: Samples of the compounds are not available.

Share and Cite

MDPI and ACS Style

Naef, R. A Generally Applicable Computer Algorithm Based on the Group Additivity Method for the Calculation of Seven Molecular Descriptors: Heat of Combustion, LogPO/W, LogS, Refractivity, Polarizability, Toxicity and LogBB of Organic Compounds; Scope and Limits of Applicability. Molecules 2015, 20, 18279-18351. https://doi.org/10.3390/molecules201018279

AMA Style

Naef R. A Generally Applicable Computer Algorithm Based on the Group Additivity Method for the Calculation of Seven Molecular Descriptors: Heat of Combustion, LogPO/W, LogS, Refractivity, Polarizability, Toxicity and LogBB of Organic Compounds; Scope and Limits of Applicability. Molecules. 2015; 20(10):18279-18351. https://doi.org/10.3390/molecules201018279

Chicago/Turabian Style

Naef, Rudolf. 2015. "A Generally Applicable Computer Algorithm Based on the Group Additivity Method for the Calculation of Seven Molecular Descriptors: Heat of Combustion, LogPO/W, LogS, Refractivity, Polarizability, Toxicity and LogBB of Organic Compounds; Scope and Limits of Applicability" Molecules 20, no. 10: 18279-18351. https://doi.org/10.3390/molecules201018279

Article Metrics

Back to TopTop