Prediction of Environmental Properties for Chlorophenols with Posetic Quantitative Super-Structure/Property Relationships (QSSPR)

Due to their widespread use in bactericides, insecticides, herbicides, andfungicides, chlorophenols represent an important source of soil contaminants. Theenvironmental fate of these chemicals depends on their physico-chemical properties. In theabsence of experimental values for these physico-chemical properties, one can use predictedvalues computed with quantitative structure-property relationships (QSPR). As analternative to correlations to molecular structure we have studied the super-structure of areaction network, thereby developing three new QSSPR models (poset-average, cluster-expansion, and splinoid poset) that can be applied to chemical compounds which can behierarchically ordered into a reaction network. In the present work we illustrate these posetQSSPR models for the correlation of the octanol/water partition coefficient (log Kow) and thesoil sorption coefficient (log KOC) of chlorophenols. Excellent results are obtained for allQSSPR poset models to yield: log Kow, r = 0.991, s = 0.107, with the cluster-expansionQSSPR; and log KOC, r = 0.938, s = 0.259, with the spline QSSPR. Thus, the poset QSSPRmodels predict environmentally important properties of chlorophenols.


Introduction
The widespread use of synthetic organic compounds in industry, agriculture, health care, and household is an important source of soil and water contamination.Other sources of contamination are accidental spills, hazardous waste disposal sites, storage tanks, or municipal landfills.To minimize the environmental impact of organic pollutants, the remediation of contaminated soil usually starts with the extraction of the pollutants into an aqueous phase, followed, if necessary, by other chemical or biological treatments.Knowledge of various physico-chemical properties of the organic pollutants is necessary for the design of these remediation processes [1,2].Whenever the values for these physicochemical properties are not experimentally available, various quantitative structure-property relationships (QSPR) have often been used to predict these properties.
The success of the soil remediation process for a particular organic compound depends on the distribution of that chemical in soil/water or soil/solvent systems.The partition of an organic pollutant between the water (hydrophobic) and organic (hydrophobic) phases is generally correlated with various properties, such as the water solubility S and the octanol/water partition coefficient K ow .The environmental fate of organic compounds is also correlated with the soil adsorption partition coefficient K OC .The modeling of these properties from structural parameters, with various QSPR models, has been investigated in many papers [3][4][5][6][7][8][9][10][11][12].
Phenol and its derivatives are common environmental contaminants [13][14][15][16][17][18], and most of them are known or suspected to be human carcinogens.Besides the fact that phenols give an unpleasant taste and odor to drinking water, they are powerful toxics for various biological processes.Due to their widespread use in industry, household, forest industries, and as disinfectants, chlorophenols represent an important source of soil contaminants [13, [19][20][21].The environmental fate (soil adsorption, water solubility, partition between soil and water, reaction rates) of these chemicals depends on their physicochemical properties.
Unlike the classical QSPR & QSAR (quantitative structure-property & -activity relationships), the reaction poset super-structure QSSPR and QSSAR models do not use conventional molecular descriptors to correlate physical, chemical, or biological properties.In the reaction poset approach, the molecular properties are predicted from a response framework generated by the super-structure of the substitution-reaction network.In the present work we apply these poset QSSPR models to predict the octanol/water partition coefficient (log K ow ) and the soil sorption coefficient (log K OC ) of chlorophenols.These fittings are here (favorably) validated via a leave-one-out procedure.

The Reaction Poset Diagram for Phenol Substitution
The poset super-structural QSSPR and QSSAR models make special use of the mathematical structure of a partially ordered set induced from a substitution-reaction network, when a molecular skeleton is subjected to successive steps of substitution.Starting from an unsubstituted compound, substituents are progressively introduced one after another, with earlier substituents fixed at their different possible positions.
The special super-structure considered here is the substitution-reaction network that starts with phenol and continues with consecutive formal substitution reactions in which a H atom from the phenyl ring is replaced with a Cl atom (Figure 1).The poset reaction diagram starts with phenol at the top and ends with pentachlorophenol at the bottom, while all the remaining different patterns of substitution occur in between.The arrows indicate the hierarchic generation of the different patterns of more substituted compounds from the different patterns of less substituted ones.
As we present in detail in the following sections, the poset reaction diagram from Figure 1 is subjected to various mathematical treatments to generate poset super-structural QSSPR and QSSAR models to predict the octanol/water partition coefficient and the soil sorption coefficient of chlorophenols.The topology of the chlorophenol reaction poset is the basis for all these models, which is a notable departure from the classical QSAR and QSPR models that use various structural descriptors.

Experimental Data
As can be seen from Figure 1, chlorinated phenols constitute a series of 19 substituted compounds, which can be further classified as three monochlorophenols, six dichlorophenols, six trichlorophenols, three tetrachlorophenols, and one pentachlorophenol.The parent phenol is included in the poset as a 20th member.The United States Environmental Protection Agency (EPA) has classified chlorophenols as priority pollutants owing to their environmental toxicity.Due to their wide use in industry and the household (as bactericides, insecticides, herbicides, fungicides, and wood preservatives) [13-21], chlorophenols are easily released in the environment, either from direct use or accidental spillage.As a consequence, they cause severe environmental problems, being frequently detected in surface water, wastewater, soil, and sediments [17,34,35].Exposure to chlorophenols can result in irritations of the respiratory tract and of the eyes.Higher doses can induce convulsions, shortness of breath, coma, or even death.The toxicity of chlorophenols is determined by the number and position of the Cl atoms, and by the concentration in a particular environmental compartment.
Due to their importance as environmental pollutants which can produce serious risks for human health, we have developed reaction poset super-structural QSSPR and QSSAR models for octanol/water partition coefficients (log K ow ) and soil sorption partition coefficients (log K OC ) of chlorophenols.All experimental data were collected from the literature: log K ow [36,37]; log K OC [38].

Posetic Applications in General
Partially ordered sets (or posets) have been advocated as of very general utility in chemistry [22,23], having numerous chemical applications [24].Brüggemann and co-workers [39][40][41][42][43][44] have proposed their use as an attractive way of handling complex information within the environmental area.Poset models in ranking or prioritizing chemical pollutants have been proposed [45][46][47][48].A book on the chemical and environmental science applications has appearred [49], and beyond this they are advocated [50] as of rather general utility in science, with there then also being numerous mathematical developments.
Formally a poset consists of a set P with a relation f which satisfies two conditions: first, for α, β ∈ P, α f β ⇒ β ⊁ α; and second for α, β, γ ∈ P, α f β and β f γ ⇒ α f γ.In the particular case of chlorophenols (Figure 1) the set P consists of chemical compounds derived from phenol by substituting aromatic H atoms with Cl atoms, and the ordering α f β means that β is obtainable from α after some (non-zero) number of chlorinations.The relation which allows either α f β or α = β is denoted α ≽ β, and the relation where α f β without any intervening members of P is denoted α → β, in which case one says α covers β.The Hasse diagram H(P) of P displays these covering relations, chosen to be oriented downward.

Reaction Poset Super-Structures
As presented in Figure 1, the chemical basis of our reaction-poset super-structural models is represented by the mathematical structure of a partially ordered set induced from a substitutionreaction network when a molecular skeleton is subjected to successive steps of substitution.The mathematical poset focused on here is represented just by the bare super-structural reaction network (or Hasse diagram), without explicit reference to the molecular structures shown at the different nodes of the network in Figure 1.
These reaction-network posets are of a special type.They always have a unique maximum and a unique minimum, and moreover each is self-dual, mapping into itself under the interchange of substituted and unsubsituted sites.Yet further these posets are ranked (according to the number of substituents), those members at the same rank being isomers.In general these posets are not mathematical lattices (defined as posets for which every pair of elements has a unique least upper bound and a unique greatest lower bound).In particular, our phenol substitution poset is not a latticee.g., because member 5 and 7 do not have a unique least upper bound (but rather two: 2 and 3).But still they have an interesting structure, reminiscent of a "finite geometry" on a space of skeletal substitution positions, with the geometric structure mediated by the skeletal group, here C 6 H 5 OH for our phenol example.

Posetic QSSPR and QSSAR Modelling
The reaction poset super-structure QSPR models considered here are based on the substitutionreaction network that starts with phenol and continues with consecutive formal substitution reactions in which a H atom from the phenyl ring is replaced with a Cl atom.After five steps of successive substitution, all reaction branches converge to pentachlorophenol, which concludes the reaction network (Figure 1).Each vertex in the Hasse diagram may be identified to the property value for the corresponding compound.
The topology of the chlorophenol reaction poset is the basis for all models investigated in the present paper, namely poset-average, cluster-expansion, and splinoid poset.Otherwise information about the molecular structures is foregone -though it may be seen that the poset has embedded in it much information about molecular structure, and especially about interrelations between molecular structures.Following our previous procedure tested for a number of chemical classes (chlorobenzenes [26,32], methylbenzenes [26], methylcyclobutanes [31], and polychlorinated biphenyls [33]), we evaluate the models by comparing their leave-one-out (LOO) cross-validation statistics giving them the correlation coefficient r and standard deviation s.We next briefly describe the three reaction poset super-structural QSSPR models to be utilized here.

Average-Poset Model
Starting from the Hasse diagram (Figure 1) our poset-average method [26] computes a predicted value X(β) pred for a property X of a compound β as the average of two averages, namely the average of experimental values X(α) exp for all compounds α from the previous level that connect by incoming arrows to B, and the average of experimental values X(γ) exp for all compounds γ from the next level that that receive outgoing arrows from B. To apply this the experimental property values must be available for all diagram positions adjacent to B. For example, in Figure 2 we present the reaction poset diagram for chlorophenols, in which each vertex (compound) has attached the experimental value for log K ow [36,37].The poset-average log K ow predicted value for 4-chlorophenol (4-ClP) is computed with the formula: As one can see from this example, the properties computed with the poset-average method are parameter-free predictions, and the statistical indices are obtained via LOO statistics.

Splinoid Poset Model
The chloro-substitution network of phenol is represented here as a Hasse diagram H(P) (Figure 1) which mathematically represents a finite poset P.An oriented edge in the Hasse diagram here represents the transition α→β from a chemical compound α with n chlorine atoms to one β with n+1 chlorine atoms, and we attach a real variable x α→β ranging from 0 to 1, that represents the transformation of α into β.When formulating the splinoid QSSPR model for a property X, one considers cubic spline polynomials (in x α→β ) on the oriented edges α→β of the Hasse diagram H(P).Further each vertex α of H(P) or P is identified by a value a α and a slope b α for the spline polynomials incident at α.The splinoid poset QSSPR model is generated based on known values of the property X for a subset K⊆P of the chemical compounds.Briefly, the splinoid fit consists of the following steps: first, the cubic splines match values a α at the nodes α ∈K to the known property values; second, the incoming and outgoing slopes through each node match to the corresponding b α value; and third, a relevant total "curvature" of the overall spline fit is minimized (subject to the constraints of the first two conditions).With the splinoid QSSPR determined for the vertices from K, one can predict the property values for the remaining chemical compounds that do not have an experimental value for the property X these being the compounds that form the "unknown" set U of vertices α ∉K.
A mathematical derivation [27] leads to a closed formula predicting the values of X for the set U of chemical compounds.Let A denote the adjacency matrix of the Hasse diagram H(P), and let S denote the oriented adjacency matrix of H(P), where: The in-degree on vertex α ∈P is denoted by d →α , and the out-degree on vertex α ∈P is denoted by d α→ .
Then, we introduce two diagonal matrices: Further define the matrices U (the |U|×|P| submatrix of the unity matrix I, with rows indexed by the elements of U), and K (the |K|×|P| submatrix of the unity matrix I, with rows indexed by the elements of K), and the derived matrix: The (column) vector of known property values is denoted by k r .Then, the vector u r that contains the predictions for the unknown property values a α is computed from:

− − =
For a few different reaction networks we have studied the matrix UMU t which appears in practice to be invertible regardless of how sparse the "known" data is in the network up to the point that very few ( ≤ 2) known data are available.The coefficients appearing in the spline polynomials do not explicitly appear in our splinoid formula for u r , but they are complicit in the derivation of this formula for u r .
The present formula gives u r in terms of the poset structure, and thence completes the splinoid QSSPR algorithm, which turns out to give a robust model in accommodating a diversity of missing values for several compounds (which may possibly even be adjacent).This is a significant advantage of the splinoid model, which uses the topology of the Hasse diagram to generate a response network for the investigated property.To achieve comparison with the results from the other poset QSSPR models, we have used the splinoid model in the leave-one-out cross-validation procedure.

Cluster-Expansion Model
Formal cluster-expansion in general re-expresses a scalar function (or property) for the different members of a poset in terms of related functions focusing more strongly on earlier members of the poset.Much of the formal theory is described by Rota [51] for general posets, and its chemical application in the case that the partial ordering is the subgraph partial ordering is described in [28][29][30][31].Generally, for a scalar property X defined on the members of a poset P (with partial ordering f ) one may expand X for α ∈ P, as where the sum goes over all β ≽ α, f(β,α) is a cluster function that maps pairs of members of P onto real numbers with f(β,α) = 0 whenever β ⋡ α, and is such that f(α,α) ≠ 0. Further, X f (β) is an f transform property depending on X and the cluster function f.Conveniently, this cluster-expansion may be truncated to a limited sequence of non-zero cluster approximants, and so applied whenever the earlier terms offer a good approximation of the property X.
For our reaction-network posets, we choose [31][32][33] that f(β,α) be the number of ways in which substitution pattern α occurs as a subset of substitution pattern β.For the poset diagram of chlorophenols, we have truncated the cluster-expansion model to X f contributions from the chlorine atoms situated through the second and third rows of the poset (Figure 1).The number of parameters (i.e., the X f (β)) from the third row is reduced from 5 to 3 through the approximation of making them depend solely on the relative positions of the two chlorine atoms (as ortho, meta, and para): (2,4-Cl 2 P) = X f (3,5-Cl 2 P) ≡ e X f (2,5-Cl 2 P) ≡ f where P indicates phenol.The parameters associated to the second row of the poset are abbreviated to This truncated cluster-expansion model proves to be able to model the properties of chlorophenols.In each series of QSSPR models, phenol was considered as a reference structure, namely, the property values are shifted so that X(phenol) = 0.The set of X f (β) parameters (a, b, c, d, e, f) can be computed by a least-squares procedure based on a subset of molecules, or by "inversion" from small systems -and here we use the former choice.All models were tested in a leave-one-out cross-validation procedure, in order to obtain results comparable with those from the other poset QSSPR models.

Results and Discussions
The first group of poset QSSPR models is developed for the octanol/water partition coefficient K ow of chlorophenols.All 20 values, including that for phenol, were collected from the literature [36,37].The predictions obtained with the reaction poset super-structure QSSPR models are of very good quality: poset-average, r = 0.987, s = 0.115; cluster-expansion, r = 0.991, s = 0.107; splinoid poset, r = 0.990, s = 0.122.As can be seen from the plots of experimental vs. predicted K ow values (Figure 3), there are no significant outliers or deviations from linearity.The second application for log K OC considers the situation when not all 20 experimental values of the chlorophenols are known.We found in the literature only 12 values for the soil sorption coefficient K OC for chlorophenols and phenol [38].Due to the absence of a significant number of experimental values, the poset-average method cannot be used.On the other hand, we obtained good statistics for the cluster-expansion (r = 0.912, s = 0.287) and splinoid poset (r = 0.938, s = 0.259).The predictive values by these two different methods are identified in Table 1.The splinoid scheme reproduces exactly the 12 known experimental values, which then in the Table 1 are entered in bold-face.Comparision of predictions for the 12 known ones when one-by-one they are left out are shown in Figure 4. Overall the correlation coefficients are very good for such complex property correlations, whence a subsequent natural question concerns the relation to molecular structure and a comparison to more conventional QSPR fittings.There are many hundreds of possible choices of molecular-structure descriptors, so that a definitive comparison to QSPR is elusive, even for the limited case of chlorophenols, though the more fundamental question concerns a more general range.But obviously QSPR schemes focus on molecular structure as the fundamental object of study, whereas our posetic approach focuses on the super-structural reaction network as the fundamental object of study (so that we have used the abbreviation QSSPR).Questions of what QSSPR tells us about molecular structure, though rather incompletely answered, might be compared to the incompletely answered converse question of what ordinary QSPR approaches tell one about the reaction network.Our splinoid QSSPR approach clearly tends to assign similar values for structures which are similar in the sense that they have a large common graphical substructure (since then the two molecular structures are close together in the reaction poset), while the splinoid fit interpolates as smoothly as possible between the nearby known values.Likewise with two molecular structures sharing a large common substructure and so being nearby in the posetic diagram, the cluster expansion we make gives a similar set of predecessors for two such nearby structures, and thence similar numerical values for the fitted property.Both QSPR & QSSPR schemes, then tend to assign similar property values to "similar" structures.We believe that there is an even tighter formal relationship between our reaction-network cluster expansion and common (QSPR-based) substructural cluster expansions -as is seen in the examples where we have indicated a molecular substructural interpretation of our retained reaction-network-cluster terms.We believe there is a general correspondence between the two types of cluster expansions, though in the structural & super-structural circumstances the terms are ordered differently, and thence different later terms are generally omitted in the two schemes.This surely warrants more formal study, only the beginning of which is described in [31], and is not pursued here.Overall it seems that one might frequently anticipate similar fittings from QSPR & QSSPR schemes -so long as the QSPR is limited to structures occurring within the reaction-network superstructure.As an example comparison, we consider QSPR fittings to two structural indexes: # Cl (ξ) the number of chlorine atoms in the chlorophenol ξ ; and ( ) χ ξ the Randic connectivity index for the H-deleted graph which also are very good statistics.As expected from the excellence of our earlier cluster-expansion fit, and its close relationship to typical invariants for QSPR fittings, the results for either type of approach are very good, and very similar as to error statistics.Though the results are comparable, what we have done is to show that an alternative novel sort of (QSSPR) approach is also available, and that for the example here along with a few elsewhere, rather high quality fits are achievable.

Conclusions
Chlorophenols are widely used as bactericides, insecticides, herbicides, fungicides and wood preservatives [13-21], which makes them frequent environmental pollutants, either from direct use or accidental spillage.Exposure to chlorophenols can result in irritations of the respiratory tract and of the eyes.Commonly detected in surface water, wastewater, soil, and sediments [17,34,35], chlorophenols were classified by the EPA as priority pollutants.The investigation of their sorption behavior is fundamental to simulate and eventually predict their environmental fate.Because the octanol/water partition coefficient K ow and the soil sorption partition coefficient K OC are useful to estimate the mobility of an organic compound in soil, both are important to understand the distribution of chemical compounds in soil, sediments, and water.Because the laboratory methods for the determination of K ow and K OC are time consuming, the reaction poset super-structure QSSPR models demonstrated here can be applied to obtain reliable predictions for these properties.
To predict the octanol/water partition coefficient log K ow and the soil sorption coefficient log K OC of chlorophenols we have compared the predictive power of three reaction poset super-structural QSSPR models developed in our group [22][23][24][25][26][27][28][29][30][31][32][33], namely poset-average, cluster-expansion, and splinoid poset.The poset super-structural QSSPR models make special use of the mathematical structure of a partially ordered set induced in a substitution-reaction network when a molecular skeleton (such as benzene, naphthalene, or biphenyl) is subjected to successive steps of substitution.Starting from an unsubstituted compound, substituents are progressively introduced one after another, with earlier substituents fixed at their different possible positions.The special super-structure considered here is the substitution-reaction network that starts with phenol and continues with consecutive formal substitution reactions in which a H atom from the phenyl ring is replaced with a Cl atom.The poset reaction diagram starts with phenol at the top and ends with pentachlorophenol at the bottom, while all the different patterns of substitution occur in between.The poset-average is a local non-parametric method, the cluster-expansion is a parametric method, and the splinoid poset method is a global interpolation method.
Based on the poset reaction diagram, all three of these QSSPR models reflect in distinct ways the topology of the network that describes the interconversion of chemical species.All three poset QSSPR methods give very good predictions for the properties investigated here.For log K ow , the clusterexpansion gives slightly better leave-one-out predictions & validations (r = 0.991, s = 0.107), while for log K OC the best LOO predictions & validations are obtained with the splinoid poset method (r = 0.938, s = 0.259).Thus, we have extended the application of the poset QSSPR models to the prediction of environmentally important properties of chlorophenols.Evidently especially the splinoid and clusterexpansion models are applicable to circumstances where there is missing data, as in the case of the soil sorption coefficient.There seems promise for further similar uses of such posetic reaction-networks for QSSPR and QSSAR modeling.But in addition, it seems to us that it would be of value to further extend our approach with the simultaneous use of two or more reactions, so as to treat in one setting a larger range of structures -this then yielding a "multi-poset".Further, we think that it could be interesting if there were revealed a formal relation between QSSAR (or QSSPR) on one hand and QSAR (or QSPR) on the other.In particular, it would be of interest if features of the present QSSPR (or QSSAR) were identified to engender greater distinction in fittings.Certainly much work remains, both in the general context of partial orderings, and for our currently studied special case of substitution-reaction-network posets.

Figure 3 .
Figure 3. Plot of experimental vs. predicted octanol/water partition coefficient for chlorophenols with the poset-average, cluster-expansion, and splinoid poset QSSPR models.

Figure 4 .
Figure 4. Plot of experimental vs. predicted soil sorption coefficient for chlorophenols with the clusterexpansion and splinoid poset QSSPR models.

10 11 12 13 14 15 16 17 18 19 20 Figure 1. The
posetic phenol super-structural substitution-reaction network.The black enlarged dots indicate the sites on which an aromatic H atom of phenol has been replaced by a Cl atom.

Table 1 .
Experimental and predicted values for cluster-expansion and splinoid QSSPR models for soil sorption coefficient, log K OC .The experimental values are presented in bold.
not distinguishing C, O, or Cl atoms).That is, one considers a fitting of a molecular property X to