Calculation of the Three Partition Coefficients logP ow , logK oa and logK aw of Organic Molecules at Standard Conditions at Once by Means of a Generally Applicable Group-Additivity Method

: Assessment of the environmental impact of organic chemicals has become an important subject in chemical science. Efficient quantitative descriptors of their impact are their partition coefficients logP ow , logK oa and logK aw . We present a group-additivity method that has proven its versatility for the reliable prediction of many other molecular descriptors for the calculation of the first two partition coefficients and indirectly of the third with high dependability. Based on the experimental logP ow data of 3332 molecules and the experimental logK oa data of 1900 molecules at 298.15 K, the respective partition coefficients have been calculated with a cross-validated standard deviation S of only 0.42 and 0.48 log units and a goodness of fit Q 2 of 0.9599 and 0.9717, respectively, in a range of ca. 17 log units for both descriptors. The third partition coefficient logK aw has been derived from the calculated values of the former two descriptors and compared with the experimentally determined logK aw value of 1937 molecules, yielding a standard deviation σ of 0.67 log units and a correlation coefficient R 2 of 0.9467. This approach enabled the quick calculation of 29,462 logP ow , 27,069 logK oa and 26,220 logK aw values for the more than 37,100 molecules of ChemBrain’s database available to the public.


Introduction
Environmental considerations of organic molecules as potential contaminants have become an important subject in recent years.Several descriptors have been applied to quantify their impact on the natural environment, among them the octanol/water partition coefficient logP ow (more recently named logK ow ), a standard model for the description of the lipophilicity of drugs in medicinal and agricultural chemistry, whereby octanol is the substitute for the natural organic matter, and the octanol/air partition coefficient K oa and the air/water partition coefficient logK aw both indicate the role of the chemicals for air-breathing organisms [1][2][3].In view of the time consumption and costs of their experimental determination, fast mathematical methods for the prediction of their value attributed to a molecule have been developed.An excellent comprehensive overview of the various methods for the prediction of the logK ow -among many other descriptors-is given by Nieto-Draghi et al. [4].Cappelli et al. [5] analysed a series of free programs based on atom/fragment contributions, hydrophobicity contributions of atoms, the number of carbon atoms and heteroatoms as well as Monte Carlo methods to calculate logP ow and found correlation coefficients R 2 of between 0.7 and 0.8 and root mean square errors (RMSE) from 0.8 to 1.5.A number of authors [6][7][8][9][10][11][12][13] have successfully carried out logP ow calculations for a large variability of compounds based on various group-additivity methods.Plante and Werner [14] presented a logP ow prediction method based on the combination of the calculated data of the four different open-source group-additivity calculation methods AlogP, XlogP2, SlogP and XlogP3 into a single model, providing a best RMSE of 0.63.Ulrich et al. [15] used deep neural networks (DNNs) for the logP ow calculations, based on ca.14,000 different SMILES representations of molecules including potential tautomers, whereby, however, a substantial number of compounds might have been presented as duplicates and triplicates to the DNNs.Their best prediction performance yielded an RMSE of 0.47.Recently, an entirely different path was followed by Sun et al. [16]: since logP ow is proportional to the Gibbs free energy of the transfer from one solvent to another, it can be calculated using the free solvation energy in these solvents.Sun used the molecular mechanics-Poisson Boltzmann surface area (MM-PBSA) method for the determination of the free energies of solvation.Their best RMSE for the 707 compounds test set was 0.91.
Many publications [17][18][19][20][21][22][23][24][25][26][27][28] dealing with the prediction of the coefficient K oa , based on various QSPR methods, are limited to specific chemical families, thus lacking general applicability.Li et al. [29] used a group-additivity method based on five fragment constants and one structural correction factor for the evaluation of logK oa , limited to halogenated aromatic pollutants.Recently, Ebert et al. [30] suggested a general-purpose fragment model for the calculation of the air/water partition coefficient logK aw resembling the atom groupadditivity method presented in one of our earlier papers [13] for the calculation of-among several further descriptors-the octanol/water partition coefficient logP ow .
The goal of the present paper was to suggest the extension of a simple tool, which has already served well for the prediction of the octanol/water coefficient logP ow described in [13], to enable it to calculate all three mentioned partition coefficients at once by means of a uniform computer algorithm based on the atom group-additivity method detailed in [13].Since under common standard conditions, any third partition coefficient can be directly calculated from the other two if we neglect the effect of the contamination of water in octanol (and vice versa) influencing the determination of the logP ow values, which will be addressed later on, it made sense to select the two coefficients for which any group parameters could be founded on the most reliable as well as the largest number of experimental data.It turned out that the experimental data for the partition coefficients logP ow and logK oa provided excellent basis sets for the evaluation of their respective tables of atom and special group parameters.Accordingly, from the subsequently calculated values of a molecule's logP ow and logK oa , its air/water partition coefficient logK aw should easily be evaluable following the equation logK aw ≈ logP ow − logK oa .

Method
The calculation method is based on a regularly updated object-oriented database of more than 37,100 compounds stored in their geometry-optimised 3D structure, encompassing pharmaceuticals, herbicides, pesticides, fungicides, textile and other dyes, ionic liquids, liquid crystals, metal-organics, lab intermediates and many more, collecting-among further molecular experimental and calculated descriptors-a large set of experimental logP ow , logK oa and logK aw data, outlined in the respective sections below.It should be stressed that for the calculation of the partition coefficients, the 3D geometry-optimised form of the compounds is not required-except for the algorithm-based determination of intramolecular hydrogen bridges, the impact of which will be discussed further down.In order to avoid structural ambiguities in the presentation of the chemical structures to the computer algorithm defining the molecules´atom groups, a special algorithm ensured at the time of the input of a new compound that any six-membered aromatic ring system is defined by six aromatic bonds instead of alternating single-double-bonds.

Definition of the Atom and Special Groups
The details of the atom group-additivity model applied in the present study have been outlined in [13].Accordingly, the definition of the atom types and their immediate atomic neighbourhood and meaning are retained as described in Table 1 of [13] and are also valid for both the logP ow and logK oa descriptors.However, since these atom groups are not able to cover certain additional structural effects such as intramolecular hydrogen-bridge bonds and the influence of saturated cyclic compared to saturated noncyclic systems, a number of additional special groups had to be introduced.In a paper applying a different group-additivity method for the calculation of logP ow , Klopman et al. [8] discovered that the inclusion of a correction value per carbon atom in pure saturated and unsaturated hydrocarbons improved compliance with the experiment.This has indeed been confirmed in the present study.
In order to take account of these and further potential structure-related peculiarities, the list of atom groups has been extended by "special groups" for which the column-title terms "atom type" and "neighbours" in the subsequent tables should not be taken literally, but which the computer algorithm treats in the same way as ordinary atom groups.In Table 1, the respective special groups, their nomenclature and meaning are detailed.In order to enable a future comparison of the contributions of the special group parameter sets within this study, the same special groups have been applied for the calculation of both descriptors logP ow and logK oa .At present, the list of elements is limited to H, B, C, N, O, P, S, Si and halogen, but an extension is always possible, provided that corresponding molecules with experimental descriptor data are available.

Calculation of the Atom and Special Group Contributions
Since the algorithm for the evaluation of the parameter values of the atom groups has been outlined in detail in [13], its four steps may just be summarised as follows: the first step encompasses the selection of all the compounds from a database of, at present, more than 37,100 compounds for which the experimental descriptor data in question are known and their storage is in a temporary compounds list.In the second step, the molecules in the temporary list are broken down into their constituting atom groups, whereby their central atoms, called "backbone atoms", are characterised in that they are bound to at least two covalently bound neighbour atoms.The atom groups' atom types and neighbour terms are generated according to the rules described in [13] and their occurrence is registered.Any molecule carrying an atom group that is not found in the pre-defined group parameters table is discarded from the temporary compound list.The third step generates an M × (N + 1) matrix, wherein M is the number of molecules, N + 1 is the number of pre-defined atom groups plus the container for the molecule´s descriptor value, and the matrix element (i,j) contains the number of registered occurrences of the jth atom group in the ith molecule.Atom groups and their related jth column, which are not present in any molecule of the temporary molecules list, are removed from the M × (N + 1) matrix.In the final step, this adjusted matrix is normalised into an Ax = B matrix, followed by its balancing by means of fast Gauss-Seidel calculus [31] to receive the atom and special group parameters x.These parameters are then added to their related atom and special group in the corresponding parameter table assigned to the specific descriptor.
The group parameter calculation is then immediately followed by the computation of each molecule's descriptor value in question, on the basis of these group parameters Liquids 2024, 4 234 according to Equation (1) outlined in the next section, and compared with its experimental value to receive the related statistics data, which are finally added at the bottom of the parameter table.Following the above-mentioned procedure resulted in the two parameter sets in Tables 2 and 3, designed for the calculation of the molecules' logP ow and logK oa values, respectively.

Calculation of Descriptors logP ow and logK oa
Based on the aforementioned respective atom-group-parameter tables, the descriptors logP ow and logK oa of a molecule can now easily be calculated by simply summing up the contribution of each atom and special group occurring in the molecule, following Equation (1), wherein i and j are the number of atom groups A i and special groups B j , respectively, a i is the contribution of atom group A i , b j is the contribution of special group B j and c is the constant listed at the top of the respective parameter tables.
In Table 4, a typical example is presented with endosulfan sulphate (Figure 1), demonstrating the ease of the calculation of logK oa for which the experimental value was 9.68 [30].Note that the term "endocyclic bonds" only concerns C-C single bonds.Evidently, the group-additivity method is limited to the calculation of a molecule`s logPow or logKoa for which a parameter value in the respective Tables 2 or 3 is known for each atom group that is found in the molecule.Beyond this, since the reliability of these parameter values increases with the number of independent molecules upon which their calculation is based, the lowest reliability limit for these parameters was set to three molecules, which, as a consequence, excluded any atom group based on less than three molecules from further calculations.Accordingly, only atom groups for which the number of molecules is three or more, shown in the rightmost columns of Tables 2 and 3, have been accepted as "valid" for descriptor calculations.This explains the lower number of molecules for which the logPow and logKoa have been calculated (lines B, C and D in Tables 2  and 3) than the number upon which the evaluation of the complete set of parameters is based (line A).

Cross-Validation Calculations
The plausibility of the group-parameter calculations was immediately checked by applying a 10-fold cross-validation algorithm, which comprises 10 recalculations of the complete set of group parameters, whereby, before each recalculation, every other 10th compound of the total compounds' list is temporarily removed from the calculation and separated into a test list, thus ensuring that each molecule has played the role of a test sample once.The combined test data were then statistically worked up and their results added to Tables 2 and 3 at the bottom in lines E, F, G and H.It may be noticed that the total number of test compounds shown in the right-most column of the statistics lines is lower than that in the training set in lines B, C and D; this is a consequence of the require- Evidently, the group-additivity method is limited to the calculation of a molecule's logP ow or logK oa for which a parameter value in the respective Table 2 or Table 3 is known for each atom group that is found in the molecule.Beyond this, since the reliability of these parameter values increases with the number of independent molecules upon which their calculation is based, the lowest reliability limit for these parameters was set to three molecules, which, as a consequence, excluded any atom group based on less than three molecules from further calculations.Accordingly, only atom groups for which the number of molecules is three or more, shown in the rightmost columns of Tables 2 and 3, have been accepted as "valid" for descriptor calculations.This explains the lower number of molecules for which the logP ow and logK oa have been calculated (lines B, C and D in Tables 2 and 3) than the number upon which the evaluation of the complete set of parameters is based (line A).

Cross-Validation Calculations
The plausibility of the group-parameter calculations was immediately checked by applying a 10-fold cross-validation algorithm, which comprises 10 recalculations of the complete set of group parameters, whereby, before each recalculation, every other 10th compound of the total compounds' list is temporarily removed from the calculation and separated into a test list, thus ensuring that each molecule has played the role of a test sample once.The combined test data were then statistically worked up and their results added to Tables 2 and 3 at the bottom in lines E, F, G and H.It may be noticed that the total number of test compounds shown in the right-most column of the statistics lines is lower than that in the training set in lines B, C and D; this is a consequence of the requirement that only "valid" atom groups are to be used for descriptor calculations, and due to the 10% lower number of training samples in each recalculation, the number of "valid" atom groups (as defined in the prior section) tends to decrease to an unpredictable degree.Atom groups, which are represented by less than three molecules, as shown in the right-most column, and are thus not "valid" for descriptor calculations, are therefore remnants of the parameter calculation based on the complete compound set (line A in Tables 2 and 3).Nevertheless, they have deliberately been left in Tables 2 and 3 for use in future calculations with additional molecules potentially carrying under-represented atom groups in this ongoing project.

Sources of logP ow Values
The majority of the experimental logP ow data originates from the comprehensive collection of Klopman et al. [8], supplemented by works of Sangster [32] and Lipinski et al. [33], already cited in [13].Additional data have been provided for unsubstituted and substituted, saturated and unsaturated hydrocarbons, alcohols and esters in the works of Tewari et al. [34]; for heterocycles, hetarenes and carboxylic acids by Ghose et al. [6,7]; complemented for amines, amides and nitro derivatives by Leo [10].Further data for the aforementioned compound classes have been found in papers by Abraham et al. [35], for certain drugs by Hou and Xu [12] and Wang et al. [11], for organophosphorus derivatives by Czerwinski et al. [36], for the specific energetic compound 2,4-dinitroanisole by Boddu et al. [37], for a number of fluorobenzenes, -anilines and -phenols by Li et al. [38] and finally for a number of pesticides and oil constituents in a paper by Saranjampour et al. [39].

Sources of logK oa Values
Recently, Ebert et al. [40] published a comprehensive collection of more than 2000 experimental logK oa values upon which the present study is essentially based.This set of data has been complemented with data for 75 chloronaphthalene derivatives by Puzyn et al. [41], for 14 PAHs by Odabasi et al. [42], for some methylsiloxanes and dimethylsilanol by Xu and Kropscott [43] and for ethyl nitrate by Easterbrook et al. [44].

Sources of logK aw Values
Ebert's paper [30], cited in the introductory section, presented in their supplementary information a large collection of experimental logK aw data, which served as reference values for the calculated data.Sander [45] provided an extensive library of Henry's law constants for more than 2600 compounds, which, after translation into logK aw values at 298.15 K, complemented Ebert's data set.

Partition Coefficient logP ow
As shown at the bottom of Table 2, the number of molecules upon which the present group parameter set is based is 3332, substantially larger than the 2780 samples in our earlier paper [13].Beyond this, the significantly better statistical results in Table 2 (lines B to H) with, e.g., a cross-validated (cv) standard deviation S of 0.42 (line H) vs. the earlier value of 0.51 is the result of the removal of molecules from the parameter computation for which the experimental value deviates by more than three times the value of S. The 122 molecules thus removed (3.5% of the total set) have been collected in an outlier list, available in the Supplementary Materials.The larger number of compounds for the group parameter computation not only significantly improved the statistical results but also enlarged the list of "valid" atom groups from 195 to 214, enabling the calculation of the logP ow value of at present 29,462 molecules (79.4% of the total dataset).The correlation coefficients R 2 of 0.9648, the (cross-validated) Q 2 of 0.9599 and the cv standard error S of 0.42, based on 3246 and 3164 molecules, respectively, are significantly better than in our earlier paper [13] and compare very well not only with Klopman's [8] results, which are based on a group-additivity method comparable to ours and have R 2 and Q 2 values of 0.93 and 0.926, respectively, but also with the statistical results of more elaborate calculation methods published recently [4,5,[14][15][16].As shown in the correlation diagram of Figure 2 and the histogram for Figure 3    It is worth mentioning that the observation discussed in our earlier paper (see Table 9 in [13]) concerning the two forms of amino acids (nonionic or zwitterionic) is not only confirmed by the new and extended group parameter set of Table 2 but also that the logP ow differences in nearly all cases even more clearly distinguish the two forms.On the other hand, the ambiguous results concerning the keto/enol forms of the compounds listed in Table 10 in [13] could not be lifted by the new parameter set, which is not surprising in view of the sometimes strong solvent dependence of the equilibrium, as exemplified with acetylacetone [46].In view of the discussion of certain particularities concerning the subsequent calculation of the third partition coefficient logK aw in Section 4.3, it should be stressed at this point that the calculated logP ow values for the hydrocarbons do not show any abnormal or systematic deviations from experimental values.

Partition Coefficient logK oa
The calculation of the group parameter set of Table 3 used for the prediction of the logK oa values is essentially based on the curated data set provided in Ebert's paper [40], whereby compounds with just one "backbone atom" such as halomethanes or hydrocyanide had to be omitted as they are obviously not calculable using the present method.After the removal of another 129 compounds as outliers (6.36% of the total), following the same exclusion criterion as in the previous section, 1900 samples with their experimental data (line A in Table 3) remained for the computation of the group parameter values.Again, the outliers have been collected in a separate list available in the Supplementary Materials for readers who might want to re-evaluate their logK oa values.It is worth mentioning that the observation discussed in our earlier paper (see Table 9 in [13]) concerning the two forms of amino acids (nonionic or zwitterionic) is not only confirmed by the new and extended group parameter set of Table 2 but also that the logPow differences in nearly all cases even more clearly distinguish the two forms.On the other hand, the ambiguous results concerning the keto/enol forms of the compounds listed in Table 10 in [13] could not be lifted by the new parameter set, which is not surprising in view of the sometimes strong solvent dependence of the equilibrium, as exemplified with acetylacetone [46].In view of the discussion of certain particularities concerning the subsequent calculation of the third partition coefficient logKaw in Section 4.3, it should be stressed at this point that the calculated logPow values for the hydrocarbons do not show any abnormal or systematic deviations from experimental values.

Partition Coefficient logKoa
The calculation of the group parameter set of Table 3 used for the prediction of the logKoa values is essentially based on the curated data set provided in Ebert's paper [40], whereby compounds with just one "backbone atom" such as halomethanes or hydrocya- The subsequent calculation of the logK oa values of 1829 training and 1765 test molecules based on 167 "valid" atom and special groups (line A) revealed excellent statistical results with a correlation coefficient R 2 of 0.9765, a standard deviation s of 0.44 (lines B and D) and a cross-validated Q 2 of 0.9717 with a corresponding S of 0.48 (lines F and H), visualised in the correlation diagram on Figure 4 and the histogram on Figure 5.These statistical data even outperform those given in Ebert's paper and thus also the competing methods mentioned therein such as COSMOtherm [47] and EPI-Suite KOAWIN [48], not only confirming the versatility but also the reliability of the present group-additivity approach, which allowed the calculation of the logK oa value for 27,044 molecules (72.9% of the entire database).Again, it should be kept in mind that just like in Section 4.1, any particularly large or systematic deviations between the experimental and calculated logK oa values for the hydrocarbons could not be observed.
only confirming the versatility but also the reliability of the present group-additivity approach, which allowed the calculation of the logKoa value for 27,044 molecules (72.9% of the entire database).Again, it should be kept in mind that just like in Section 4.1, any particularly large or systematic deviations between the experimental and calculated logKoa values for the hydrocarbons could not be observed.

Partition Coefficient logK aw
Once the partition coefficients logP ow and logK oa were calculated by means of the group-additivity method based on Tables 2 and 3, respectively, it was easy to determine the logK aw values, applying Equation ( 2) on each molecule in the database for which both descriptors had been calculated, adding up to 26,220 molecules.In order to assess the quality of the logK aw values, it is important to recognise the flaws of this approach: while the logP ow values were experimentally measured in a mixture of water-saturated octanol and octanol-saturated water, the logK oa measurements occurred in dry octanol, an aspect that has been discussed in detail by Ebert et al. [40].Hence, Equation (2) serves only as an approximation.In addition, since both descriptors on the right side of the equation appear with their own standard error, the error-propagation rule stipulates a standard error of logK aw that is clearly larger than either of the two constituting descriptors.Entering the standard errors S for the test molecules of 0.42 (for logP ow ) and 0.48 (for logK oa ) into an error-propagation calculation, the expected standard error S for logK aw is 0.638.logK aw (calc) = logP ow (calc) − logK oa (calc) (2)

Partition Coefficient logKaw
Once the partition coefficients logPow and logKoa were calculated by means of the group-additivity method based on Tables 2 and 3, respectively, it was easy to determine the logKaw values, applying Equation ( 2) on each molecule in the database for which both descriptors had been calculated, adding up to 26,220 molecules.In order to assess the quality of the logKaw values, it is important to recognise the flaws of this approach: while the logPow values were experimentally measured in a mixture of water-saturated octanol and octanol-saturated water, the logKoa measurements occurred in dry octanol, an aspect that has been discussed in detail by Ebert et al. [40].Hence, Equation (2) serves only as an approximation.In addition, since both descriptors on the right side of the equation appear with their own standard error, the error-propagation rule stipulates a standard error of logKaw that is clearly larger than either of the two constituting descriptors.Entering the standard errors S for the test molecules of 0.42 (for logPow) and 0.48 (for logKoa) into an error-propagation calculation, the expected standard error S for logKaw is 0.638.logKaw (calc) = logPow (calc) − logKoa (calc) (2) In order to test the reliability of the thus-calculated logK aw values, a representative number of experimentally determined logK aw data, extracted from the comprehensive databases of Ebert et al. [30] and Sander [45], were added to the database.In the latter case, the Henry's law solubility constants H s cp were translated into the corresponding logK aw values at 298.15 K.The comparison of the calculated with the experimental logK aw values is visualised in the correlation diagram of Figure 6 and the histogram in Figure 7.
The complete set of experimental data was separated from the outliers, applying the same exclusion conditions as for the logP ow and logK oa values, and the outliers were collected in a corresponding list, available in the Supplementary Materials.Comparison of the remaining dataset with the calculated values yielded a standard error of 0.67, slightly higher than that predicted by the error-propagation calculation.A detailed analysis of the experimental data revealed two potential explanations for the inordinate scatter: (1) Within a series of substitution isomers, e.g., the tetra-or hexachlorobiphenyls, the tri-or pentachlorodiphenyl ethers or the dichloroanisoles, the experimental logK aw values varied in a range of up to and over 1 unit, which was hard to assign to the specific positioning of the substituents.At any rate, the group-additivity-based calculation of the logP ow and logK oa values was not able to distinguish between these substitution isomers.(2) Sander's comprehensive database of Henry's law constants [45], listing the experimental H s cp values for a compound originating from various authors, showed for many compounds large differences between their H s cp values, in some cases exceeding one unit after translation into logK aw , e.g., for undecane, acetylacetone or anthraquinone.
Liquids 2024, 4, FOR PEER REVIEW 22 In order to test the reliability of the thus-calculated logKaw values, a representative number of experimentally determined logKaw data, extracted from the comprehensive databases of Ebert et al. [30] and Sander [45], were added to the database.In the latter case, the Henry's law solubility constants Hs cp were translated into the corresponding logKaw values at 298.15K.The comparison of the calculated with the experimental logKaw values is visualised in the correlation diagram of Figure 6 and the histogram in Figure 7.A thorough analysis of the correlation diagram in Figure 6 and the histogram in Figure 7 revealed an interesting peculiarity, visible as an indentation at the upper end of the correlation diagram and as a weak hump on the right side of the histogram: except for some siloxanes with experimental logK aw values above 1.6 and normal scatter about calculated values, the predicted logK aw for the remaining compounds with experimental logK aw values above −1.0 were nearly systematically too low by ca.0.5-1 units.It turned out that they were all pure hydrocarbons, in particular alkanes, alkenes and alkynes.The correlation diagram of the logK aw data in Figure 8, focussing on these hydrocarbons, confirms this observation.The complete set of experimental data was separated from the outliers, applying the same exclusion conditions as for the logPow and logKoa values, and the outliers were collected in a corresponding list, available in the Supplementary Materials.Comparison of the remaining dataset with the calculated values yielded a standard error of 0.67, slightly higher than that predicted by the error-propagation calculation.A detailed analysis of the experimental data revealed two potential explanations for the inordinate scatter: (1) Within a series of substitution isomers, e.g., the tetra-or hexachlorobiphenyls, the tri-or pentachlorodiphenyl ethers or the dichloroanisoles, the experimental logKaw values varied in a range of up to and over 1 unit, which was hard to assign to the specific positioning of the substituents.At any rate, the group-additivity-based calculation of the logPow and logKoa values was not able to distinguish between these substitution isomers.(2) Sander's comprehensive database of Henry's law constants [45], listing the experimental Hs cp values for a compound originating from various authors, showed for many compounds large differences between their Hs cp values, in some cases exceeding one unit after translation into logKaw, e.g., for undecane, acetylacetone or anthraquinone.
A thorough analysis of the correlation diagram in Figure 6   Since, as mentioned in Sections 4.1 and 4.2, no particularly large or systematic deviations between the experimental and calculated logP ow and logK oa data for the hydrocarbons could be detected, a potential explanation for this peculiarity might be based on the experimental conditions for the determination of the logP ow values as mentioned by Ebert et al. [40]: since water-saturated octanol is a more polar solvent than pure octanol, while octanol-saturated water is less polar than pure water, the experimental logP ow values, measured in an octanol/water mixture, tend to be shifted to smaller absolute values than theory would predict.While this is true for all measured solutes, it is possibly most effective for the least polar solutes such as the mentioned hydrocarbons, thus leading to experimental logP ow values that are particularly low for the hydrocarbons.As a consequence, their calculation based on the group-additivity method predicts equally low logP ow values, which again lead to low logK aw data when Equation ( 2) is applied and when compared with experimental logK aw values that are determined under pure air/water conditions.
imental logPow values that are particularly low for the hydrocarbons.As a consequence, their calculation based on the group-additivity method predicts equally low logPow values, which again lead to low logKaw data when Equation ( 2) is applied and when compared with experimental logKaw values that are determined under pure air/water conditions.While the atom group parameters are descriptor-specific and their comparison between descriptors does not make sense, special groups serve as differentiators of molecules that carry these groups from those that do not.Therefore, their meanings have overlapping descriptors; their values, however, must be viewed in the context of the value range of the descriptors.In the present case, the value ranges of logP ow and logK oa are similar (ca.17 log units) and in the same area, and thus, a direct comparison of the special group contributions in Tables 2 and 3 is permissible and leads to a few interesting observations: While the groups "(COH)n", "Alkane", "Unsaturated HC" and "Endocyclic bonds" in both tables only contribute to a minor degree (but nevertheless improve the statistical results) and consequently show only minor differences between the two tables, a significant difference was found for the groups "H/H Acceptor" and "(COOH)n".The former special group, taking account of intramolecular hydrogen bridges, indicates a small but clearly higher chance of being a compound carrying an intramolecular H-bridge towards the octanol side in an octanol/water mixture, thus raising the logP ow value.In contrast, the same H-bridge-carrying molecule has its inclination significantly shifted to the air side in an octanol/air environment compared to that without H-bridges, expressed in its lower logK oa value.The reason may be found in the lower solvent-solute interaction caused by the H-bridge being bound intramolecularly, leading in both cases to a preference for the Liquids 2024, 4 257 less polar of the respective two media.A typical example is the compound couple 2-and 3-nitroaniline, sampled in Table 5, where the former molecule carries a H-bridge between an amino-H and an oxygen of the nitro group.An inverse effect can be found with molecules carrying two or more carboxylic acid functions: While the additional contribution of a second or third COOH function shows little effect in an octanol/water environment with a slightly increased shift towards water, leading to a lower logP ow value, in an octanol/air environment, each additional COOH group drastically tilts the equilibrium towards the octanol side, thus strongly raising the logK oa value.This may be demonstrated by the couple of hexanoic/1,6-hexanedioic acid, where both have the same carbon-chain length but where the second molecule carries two carboxylic acid functions, which tilts the octanol/air equilibrium by a factor of more than 10,000 towards the octanol side as shown in Table 6.Now, it is well known that monocarboxylic acids usually exist as dimeric associates in all three aggregate states.This association effect on the solubility is inherently taken into account in the atom group parameter evaluation of the COOH function.On the other hand, dicarboxylic acids do not only form dimers but also cyclical and linear oligomeric associates, with drastic consequences on their solubility in the various solvents.It is these additional associations that are considered by the special group "(COOH)n".As a consequence, solutes with a low tendency to interact with solvents, either inherent or induced by intramolecular hydrogen bridges, show a trend to higher logK aw values; the additional intermolecular association of di-and tricarboxylic acids, on the other hand, results in a significantly lower logK aw value, as exemplified in Table 7, where the respective calculated data in Tables 5 and 6 have been applied in Equation (2).The experimental logK aw values have been extracted from Ebert et al. [30].

Conclusions
The present study, which is part of an ongoing project, put to use a tool for the simple and reliable calculation of the two partition coefficients logP ow and logK oa that has proven its unmatched versatility in the equally reliable prediction of now up to 19 physical, thermodynamic, solubility-, optics-, charge-and environment-related molecular descriptors [13,[49][50][51][52][53][54][55], based on a common group-additivity method.The large
, the experimental logP ow values range from −4.6 to +12.53 with a fairly even Gaussian error distribution.
the histogram for Figure3, the experimental logPow values range from −4.6 to +12.53 with a fairly even Gaussian error distribution.
and the histogram in Figure 7 revealed an interesting peculiarity, visible as an indentation at the upper end of the correlation diagram and as a weak hump on the right side of the histogram: except for

4. 4 .
Interpretation of the Special Groups' Contributions to logP ow and logK oa , and Ultimately for logK aw

Table 1 .
Special groups and their meaning.

Table 2 .
Atom and special groups and their contribution in logP ow calculations.

Table 3 .
Atom and special groups and their contribution in logK oa calculations.

Table 4 .
Example calculation of the logK oa of endosulfan sulphate.

Table 4 .
Example calculation of the logKoa of endosulfan sulphate.

Table 7 .
Experimental and calculated logK aw of some examples.