Reproduction is permitted for noncommercial purposes.

The aim of this study was to investigate the characteristic polynomials resulting from the molecular graphs used as molecular descriptors in the characterization of the properties of chemical compounds. A formal calculus method is proposed in order to identify the value of the characteristic polynomial parameters for which the extremum values of the squared correlation coefficient are obtained in univariate regression models. The developed calculation algorithm was applied to a sample of nonane isomers. The obtained results revealed that the proposed method produced an accurate and unique solution for the best relationship between the characteristic polynomial as molecular descriptor and the property of interest.

Polynomials derived from molecular graphs and matrixes find applications in chemistry for the construction of structural descriptors and topological indices [

where A(G) is the adjacency matrix (A being a square matrix) of the molecular graph G, and I is the identity matrix.

Crum-Brown and Fraser published in 1898 the observation that the physiological action of ammonium salts is a function of their chemical composition and structure [

Hosoya first reported the use of the absolute values of the coefficients of the characteristic polynomial of a non-ciclic chemical compound in 1971 [

Starting with the characteristic polynomials as molecular descriptors in characterization of structure-property relationships, the aim of the research was to develop a formal calculation algorithm able to identify the value of the characteristic polynomial parameter for which the extremum values of squared correlation coefficients are obtained in univariate regression models.

Let’s consider a sample of _{i}

The characteristic polynomial can be built and calculated based on the compound’s structure by using the following generic functions (where for the simplification all polynomials are of the same degree

where _{ki}_{0i}_{ki}_{i}

Each chemical compound from the sample (_{i}_{i}_{i}

We have compounds (_{i}_{i}_{i}

For characterization of the compounds’ property, the abstract function of the characteristic polynomial is not useful; the value associated with the characteristic polynomial function is necessary:

where _{i}

A problem arises at this point: what is the value of

It is well known that the Pearson product-moment correlation coefficient is the most used correlation coefficient for quantitative variables. In our example this coefficient indicates the strength and direction of the linear relationship between property of interest and characteristic polynomial. Transforming the problem into a formula, the problem becomes:

where _{y}_{ChP(X)}_{Y}_{ChP(X)}

The above parameters could be written as: μ_{Y}_{y}^{2}^{2}^{2}_{ChP(X)}_{ChP(X)}^{2}^{2}^{2}

To solve the problem it is necessary to find equations of unknown grade in

where ∂_{1}_{j}

Note that it is difficult to work with ^{2}^{2}

So, the roots _{1}_{j}^{2}

where dot (·) designs any function (such as r, r^{2} in our case)

In order to find which among the solutions (_{1}_{k}_{k}

Assuming that there is a string of polynomials (as in

the proposed implementation of the model uses the following elementary mathematical operations:

÷ Multiplication:

÷ Addition:

÷ Average:

÷ Product:

÷ Derivative:

In order to solve

The proposed calculus could be done with pen and paper, but is time consuming, especially when there are many compounds of interest. Thus, a formal computation method could help to find the exact and unique solution of the best relationship between characteristic polynomial and property of interest.

Parse polynomials formulas for all given molecules (_{j}_{j}

The polynomials are stored as sums of monomials;

Every monomial is in fact a pair of two values: the power of variable (

A measured data value is assigned with a polynomial through

Search in the polynomial formulas and remove the identical monomials (as in

It is safe to remove the repeated monomials (such for example the X^{9} or − 8·X^{7}, see

It is better to remove the identical monomials in order to reduce the calculation complexity, magnitude of numbers, and errors propagation.

Compute the polynomial of squared correlation coefficient formula as pair of two polynomials: numerator and denominator. Comment: The following procedures has been used:

Compute the mean and dispersion of ^{2}^{2}

Compute the average polynomial (as polynomial):

Compute the average of

Construct square polynomials of _{j}^{2}^{2}

Make the product of

Change the sign of ^{2}

Add M2ChP to ^{2}^{2}

Multiply the obtained ^{2}^{2}

Multiply

Add the obtained

Square the obtained ^{2}

Return the pair of polynomials (

Calculate derivative of the numerator of ^{2}

Calculate derivative of the denominator of ^{2}

Calculate the product between

Calculate the product between

Change the sign of the

Add the

Factorize ^{p}^{p}

Find roots of _{i}_{i}_{i}_{i}

The procedure of finding roots is an approximate one for at least two reasons. First, the

The returning of the ɛ_{i}

The procedure of finding roots is a recursive one and it also calculates and uses all superior derivatives of the polynomial in order to find all real roots of the equation.

Use the set of roots _{i}^{2}_{i}_{1}_{≤}_{i}_{≤}_{m}^{2}_{i}_{1}_{≤}_{i}_{≤}_{m}

Display the results: _{i}_{i}^{2}_{i}_{1}_{≤}_{i}_{≤}_{m}

The above-presented algorithm has been implemented using PHP language (Hypertext Preprocessor). In order to illustrate its effectiveness, the program was run for a sample of nonane isomers, the Henry’s law constant (solubility) being the property of interest.

Nonane isomers are acyclic saturated hydrocarbon structures with the general chemical formula C_{9}H_{20}. There are thirty-five compounds in this class: 4-methyloctane (c_{1}), 3-ethyl-2,3-dimethylpentane (c_{2}), 3,3-diethylpentane (c_{3}), 2,2,3,3-tetramethyl-pentane (c_{4}), 2,3,3,4-tetramethylpentane (c_{5}), nonane (c_{6}), 2,3,3-trimethylhexane (c_{7}), 3,3,4-trimethylhexane (c_{8}), 3-ethyl-3-methylhexane (c_{9}), 2,2,3,4-tetramethylpentane (c_{10}), 3,4-dimethylheptane (c_{11}), 2,3,4-trimethylhexane (c_{12}), 3-ethyl-4-methylhexane (c_{13}), 3-ethyl-2,2-dimethylpentane (c_{14}), 3-ethyl-2,4-dimethylpentane (c_{15}), 2,3-dimethylheptane (c_{16}), 3,3-dimethylheptane (c_{17}), 4,4-dimethylheptane (c_{18}), 3-ethylheptane (c_{19}), 4-ethyl-heptane (c_{20}), 2,2,3- trimethylhexane (c_{21}), 2,2,5-trimethylhexane (c_{22}), 2,4,4-trimethylhexane (c_{23}), 3-ethyl-2-methylhexane (c_{24}), 2,2,4,4-tetramethylpentane (c_{25}), 3-methyloctane (c_{26}), 2,5-dimethylheptane (c_{27}), 3,5-dimethylheptane (c_{28}), 2,3,5-trimethylhexane (c_{29}), 2-methyloctane (c_{30}), 2,2-dimethylheptane (c_{31}), 2,4- dimethylheptane (c_{32}), 2,6-dimethylheptane (c_{33}), 2,2,4-trimethyl-hexane (c_{34}), and 4-ethyl-2-methylhexane (c_{35}), respectively. The Henry’s law constant (solubility of a gas in water) of alkanes expressed as trace gases of potential importance in environmental chemistry was the property of interest. The measured values were taken from a previously reported research [_{H}_{aq}/dm^{3}_{aq}]/atm).

In the fist step of the calculation algorithm, the polynomial formulas for all thirty-five compounds and associated measured Henry’s law constants were parsed. After the second step of the computing algorithm, two identical monomials (X^{9} and -8·X^{7}) were identified and those monomials were removed from the polynomials (see

The polynomial of the squared correlation coefficient resulting from the third step of the calculation algorithm was of the tenth degree:

The derivative of the ^{2}

Note that just the first significant digits were displayed in

The solutions of roots for the squared correlation coefficient obtained by the proposed algorithm for the sample of nonane isomers are presented in _{i}^{2}_{i}_{X}_{i}_{i}

As it can be observed from _{i}

Analyzing the results presented in

Regarding the proposed method one question can arise: why use the proposed method when the Hosoya Z index [

It is well known that the squared correlation coefficients increase with the number of variables used by a linear regression model [

The proposed calculation algorithm is able to obtain unique and reproducible solutions. The solutions are unique, meaning that for a sample of compounds with a property of interest the maximum value of the squared correlation coefficient between property and characteristic polynomials is always given by a single pair of roots. The computation algorithm can be applicable on any class of compounds when the characteristic polynomials are used as descriptors in analysis of the relationship between compounds’ structure and their properties.

Nonane isomers: Henry’s law constant and characteristic polynomials.

Comp. Abbrev. | k_{H}(·10^{−5}) [M/atm] |
Characteristic polynomial | After second step of calculation algorithm |
---|---|---|---|

c_{1} |
10 | X^{9} − 8·X^{7} + 20·X^{5} − 17·X^{3} + 3·X |
20·X^{5} − 17·X^{3} + 3·X |

c_{2} |
15 | X^{9} − 8·X^{7} + 17·X^{5} − 12·X^{3} + 2·X |
17·X^{5} − 12·X^{3} + 2·X |

c_{3} |
15 | X^{9} − 8·X^{7} + 16·X^{5} − 8·X^{3} |
16·X^{5} − 8·X^{3} |

c_{4} |
16 | X^{9} − 8·X^{7} + 15·X^{5} − 6·X^{3} |
15·X^{5} − 6·X^{3} |

c_{5} |
16 | X^{9} − 8·X^{7} + 18·X^{5} − 16·X^{3} + 5·X |
18·X^{5} − 16·X^{3} + 5·X |

c_{6} |
17 | X^{9} − 8·X^{7} + 21·X^{5} − 20·X^{3} + 5·X |
21·X^{5} − 20·X^{3} + 5·X |

c_{7} |
17 | X^{9} − 8·X^{7} + 17·X^{5} − 10·X^{3} |
17·X^{5} − 10·X^{3} |

c_{8} |
17 | X^{9} − 8·X^{7} + 17·X^{5} − 11·X^{3} + 2·X |
17·X^{5} − 11·X^{3} + 2·X |

c_{9} |
17 | X^{9} − 8·X^{7} + 18·X^{5} − 14·X^{3} + 3·X |
18·X^{5} − 14·X^{3} + 3·X |

c_{10} |
17 | X^{9} − 8·X^{7} + 16·X^{5} − 6·X^{3} |
16·X^{5} − 6·X^{3} |

c_{11} |
18 | X^{9} − 8·X^{7} + 19·X^{5} − 15·X^{3} + 3·X |
19·X^{5} − 15·X^{3} + 3·X |

c_{12} |
18 | X^{9} − 8·X^{7} + 18·X^{5} − 12·X^{3} + 2·X |
18·X^{5} − 12·X^{3} + 2·X |

c_{13} |
18 | X^{9} − 8·X^{7} + 19·X^{5} − 16·X^{3} + 4·X |
19·X^{5} − 16·X^{3} + 4·X |

c_{14} |
18 | X^{9} − 8·X^{7} + 17·X^{5} − 10·X^{3} |
17·X^{5} − 10·X^{3} |

c_{15} |
18 | X^{9} − 8·X^{7} + 18·X^{5} − 12·X^{3} |
18·X^{5} − 12·X^{3} |

c_{16} |
19 | X^{9} − 8·X^{7} + 19·X^{5} − 14·X^{3} + 2·X |
19·X^{5} − 14·X^{3} + 2·X |

c_{17} |
19 | X^{9} − 8·X^{7} + 18·X^{5} − 12·X^{3} + 2·X |
18·X^{5} − 12·X^{3} + 2·X |

c_{18} |
19 | X^{9} − 8·X^{7} + 18·X^{5} − 12·X^{3} |
18·X^{5} − 12·X^{3} |

c_{19} |
19 | X^{9} − 8·X^{7} + 20·X^{5} − 18·X^{3} + 5·X |
20·X^{5} − 18·X^{3} + 5·X |

c_{20} |
19 | X^{9} − 8·X^{7} + 20·X^{5} − 18·X^{3} + 4·X |
20·X^{5} − 18·X^{3} + 4·X |

c_{21} |
19 | X^{9} − 8·X^{7} + 17·X^{5} − 9·X^{3} |
17·X^{5} − 9·X^{3} |

c_{22} |
19 | X^{9} − 8·X^{7} + 17·X^{5} − 6·X^{3} |
17·X^{5} − 6·X^{3} |

c_{23} |
19 | X^{9} − 8·X^{7} + 17·X^{5} − 8·X^{3} |
17·X^{5} − 8·X^{3} |

c_{24} |
19 | X^{9} − 8·X^{7} + 19·X^{5} − 15·X^{3} + 2·X |
19·X^{5} − 15·X^{3} + 2·X |

c_{25} |
19 | X^{9} − 8·X^{7} + 15·X^{5} |
15·X^{5} |

c_{26} |
20 | X^{9} − 8·X^{7} + 20·X^{5} − 17·X^{3} + 4·X |
20·X^{5} − 17·X^{3} + 4·X |

c_{27} |
20 | X^{9} − 8·X^{7} + 19·X^{5} − 13·X^{3} + 2·X |
19·X^{5} − 13·X^{3} + 2·X |

c_{28} |
20 | X^{9} − 8·X^{7} + 19·X^{5} − 14·X^{3} + 3·X |
19·X^{5} − 14·X^{3} + 3·X |

c_{29} |
20 | X^{9} − 8·X^{7} + 18·X^{5} − 10·X^{3} |
18·X^{5} − 10·X^{3} |

c_{30} |
21 | X^{9} − 8·X^{7} + 20·X^{5} − 16·X^{3} + 2·X |
20·X^{5} − 16·X^{3} + 2·X |

c_{31} |
21 | X^{9} − 8·X^{7} + 18·X^{5} − 10·X^{3} |
18·X^{5} − 10·X^{3} |

c_{32} |
21 | X^{9} − 8·X^{7} + 19·X^{5} − 13·X^{3} |
19·X^{5} − 13·X^{3} |

c_{33} |
21 | X^{9} − 8·X^{7} + 19·X^{5} − 12·X^{3} |
19·X^{5} − 12·X^{3} |

c_{34} |
21 | X^{9} − 8·X^{7} + 17·X^{5} − 7·X^{3} |
17·X^{5} − 7·X^{3} |

c_{35} |
21 | X^{9} − 8·X^{7} + 19·X^{5} − 14·X^{3} + 2·X |
19·X^{5} − 14·X^{3} + 2·X |

M/atm = (mol_{aq}/dm^{3}_{aq})/atm

Algorithm of calculation: solutions for nonane isomers.

Solution | x_{i} |
ɛ_{i} |
r^{2}(x_{i}) |
---|---|---|---|

1.1 | − 1.656… | − 5.5…·10^{−11} |
0.296… |

2.1 | − 0.856… | 1.1…·10^{−13} |
0 |

3.1 | − 0.481… | 2.7…·10^{−13} |
0.055… |

3.2 | 0.481… | 2.7…·10^{−13} |
0.055… |

2.2 | 0.856… | 1.1…·10^{−13} |
0 |

1.2 | 1.656… | − 5.5…·10^{−11} |
0.296… |

x_{i} = root; r^{2}(x_{i}) = squared correlation coefficient; ɛ_{i} = numerical error;

… = for all numbers only first significant digits were presented

The research was partly supported by UEFISCSU Romania through project ET46/2006.