Next Article in Journal
Policy Change Concerning Sample Submission
Previous Article in Journal
Synthesis of 2-(4-Methoxyphenyl)pyrrolo[2,1-d]pyrido[2,3-c]-[1,5]-thiazepin-3(2H)-one
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Total and Local Quadratic Indices of the Molecular Pseudograph's Atom Adjacency Matrix: Applications to the Prediction of Physical Properties of Organic Compounds

Yovani Marrero Ponce
Department of Pharmacy, Faculty of Chemical-Pharmacy and Department of Drug Design, Bioactive Chemical Center. Central University of Las Villas, Santa Clara, 54830, Villa Clara, Cuba
Molecules 2003, 8(9), 687-726;
Submission received: 17 June 2003 / Revised: 29 July 2003 / Accepted: 3 August 2003 / Published: 15 August 2003


A novel topological approach for obtaining a family of new molecular descriptors is proposed. In this connection, a vector space E (molecular vector space), whose elements are organic molecules, is defined as a “direct sum” of different ℜi spaces. In this way we can represent molecules having a total of i atoms as elements (vectors) of the vector spaces ℜi (i=1, 2, 3,..., n; where n is number of atoms in the molecule). In these spaces the components of the vectors are atomic properties that characterize each kind of atom in particular. The total quadratic indices are based on the calculation of mathematical quadratic forms. These forms are functions of the k-th power of the molecular pseudograph’s atom adjacency matrix (M). For simplicity, canonical bases are selected as the quadratic forms’ bases. These indices were generalized to “higher analogues” as number sequences. In addition, this paper also introduces a local approach (local invariant) for molecular quadratic indices. This approach is based mainly on the use of a local matrix [Mk(G, FR)]. This local matrix is obtained from the k-th power (Mk(G)) of the atom adjacency matrix M. Mk(G, FR) includes the elements of the fragment of interest and those that are connected with it, through paths of length k. Finally, total (and local) quadratic indices have been used in QSPR studies of four series of organic compounds. The quantitative models found are significant from a statistical point of view and permit a clear interpretation of the studied properties in terms of the structural features of molecules. External prediction series and cross-validation procedures (leave-one-out and leave-group-out) assessed model predictability. The reported method has shown similar results, compared with other topological approaches. The results obtained were the following: a) Seven physical properties of 74 normal and branched alkanes (boiling points, molar volumes, molar refractions, heats of vaporization, critical temperatures, critical pressures and surface tensions) were well modeled (R>0.98, q2>0.95) by the total quadratic indices. The overall MAE of 5-fold cross-validation were of 2.11 oC, 0.53 cm3, 0.032 cm3, 0.32 KJ/mol, 5.34 oC, 0.64 atm, 0.23 dyn/cm for each property, respectively; b) boiling points of 58 alkyl alcohols also were well described by the present approach; in this sense, two QSPR models were obtained; the first one was developed using the complete set of 58 alcohols [R=0.9938, q2=0.986, s=4.006oC, overall MAE of 5-fold cross-validation=3.824 oC] and the second one was developed using 29 compounds as a training set [R=0.9979, q2=0.992, s=2.97 oC, overall MAE of 5-fold cross-validation=2.580 oC] and 29 compounds as a test set [R=0.9938, s=3.17 oC]; c) good relationships were obtained for the boiling points property (using 80 and 26 cycloalkanes in the training and test sets, respectively) using 2 and 5 total quadratic indices: [Training set: R=0.9823 (q2=0.961 and overall MAE of 5-fold cross-validation=6.429 oC) and R=0.9927 (q2=0.977 and overall MAE of 5-fold cross-validation=4.801 oC); Test set: R=0.9726 and R=0.9927] and d) the linear model developed to describe the boiling points of 70 organic compounds containing aromatic rings has shown good statistical features, with a squared correlation coefficient (R2) of 0.981 (s=7.61 oC). Internal validation procedures (q2=0.9763 and overall MAE of 5-fold cross-validation=7.34 oC) allowed the predictability and robustness of the model found to be assessed. The predictive performance of the obtained QSPR model also was tested on an extra set of 20 aromatic organic compounds (R=0.9930 and s=7.8280 oC). The results obtained are valid to establish that these new indices fulfill some of the ideal requirements proposed by Randić for a new molecular descriptor.


The last decade has witnessed much progress in how chemical structures are characterized and described, how large sets of compounds are synthesized via a combinatorial chemistry approach and how simple and fast in-vitro assays are carried out. In this sense, the method most used for drug discovery is high-throughput screening (HTS), where massive screening of chemicals on a robot-assisted battery of biological assays is carried out [1,2]. Lately, virtual screening has emerged as an interesting alternative to the handling and screening of large databases in order to find a reduced set of potential new drug candidates [3,4,5]. This methodology and in general, molecular biology and drug design, are centered on the relationships between the chemical structures and measured properties of polymers and organic compounds.
In order to obtain structure-property (activity) relationships, henceforth-abbreviated SPR and SAR and quantitative SPR and SAR relationships (abbreviated QSPR and QSAR, respectively), it is necessary to have a structure parameterization. The structure parameterization includes the use of molecular descriptors. Molecular descriptors are “numbers that characterize a specific aspect of the molecule structure” [6]. At present, there are a great number of molecular descriptors that can be used in QSAR and QSPR studies [7]. Among them, the so-called topological indices (TIs) have found major application in medicinal chemistry and molecular modeling [8,9,10,11]. TIs are molecular descriptors derived from graph-theoretical invariants; i.e. they do not depend on the labeling of the vertices or edges on the “molecular graph” [12,13,14,15,16,17,18,19,20,21,22,23,24]. These indices codify structural information contained in ‘molecular connectivities’ and can be considered as structure cryptic descriptors [15,16,17].
The first TI capable of characterizing the ramification of a “graph” was proposed by Wiener [18]. This index was based on the topological concept of distance, understood as the number of bonds between two atoms by the shortest path. Other authors have defined various indices; prominent among them are the Balaban’s J index [19], Randić’s molecular connectivity [20], Kier and Hall’s electrotopological state (E-state) index [21], the Harary number [22], and Estrada’s spectral moments [23,24,25], among others. The latter are related with the bond adjacency matrix, while the majority of the remainder are derived from the vertex adjacency or distance matrices.
The proliferation of topological indices can be compared with the effect produced on quantum chemical parameters by changes in the molecular orbital. In this connection, TIs have been classified according to their nature as first, second and third generation [17]. In a recent paper, Randić [26] has proposed a list of desirable attributes for a topological descriptor. Therefore, this list can be considered as a methodological guide for the development of new TIs. One of the most important criteria is the possibility of defining the descriptors locally. This attribute refers to the fact that the index could be calculated for the molecule as a whole but also over certain fragments of the structure itself.
At times, the properties of a group of molecules are more related to a certain zone or fragment, rather than to the molecule as a whole. Thereinafter, the global definition never satisfies the structural requirements needed to obtain a good correlation in QSAR and QSPR studies. The local indices can be used in certain problems such as:
  • Research on drugs, toxics or generally any organic molecules with a common skeleton, which is responsible for the activity or property under study.
  • Study of the reactivity of specific sites of a series of molecules, which can undergo a chemical reaction or enzymatic metabolism.
  • In the study of molecular properties such as spectroscopic measurements, which are calculated experimentally in a local fashion
  • In any general case where it is necessary to study not the molecule as a whole, but rather some local properties of certain fragments, then the definition of local descriptors could be necessary.
Another of Randić’s attributes refers to the generalization of the indices. The description of the molecular structure by a simple number can bring about loss of information. For this reason, in most cases the use of a family of different simple descriptors for obtaining the algebraic models that relate the structure with its physical, chemical and biological properties is needed [27]. The two possibilities to solve the loss of information in the graph theoretical descriptors are: (1) the generalization of a simple descriptor to “higher” analogues or (2) the generation of graph theoretical invariants as a sequence of numbers [26].
Chemical graph theory is continuously evolving, and novel approaches have appeared as solutions to those difficulties. Recently, several molecular descriptors based on the two–dimensional topological structure of molecules have been defined and tested in QSAR models [28,29,30,31,32,33,34,35], showing that definition of novel molecular descriptors is a promising field in medicinal chemistry (see Todeschini, Karelson, Devillers and Estrada [15,16,17] for an exhaustive compilation). In this sense, the author has developed a novel method called TOMO-COMD (acronym of TOpological MOlecular COMputer Design) [36]. It calculates several families of topological molecular descriptors. One of these families has been defined as quadratic indices by analogy with the quadratic mathematical forms.
The main aim of this paper is to propose a total and local definition of quadratic indices of the “molecular pseudograph’s atom adjacency matrix”. In order to test the QSPR applicability of the present approach, we will develop quantitative models towards the prediction of several physical properties from the molecular structure of diverse organic compounds, combining quadratic indices and a multiple linear regression method. Finally, predicting series and a (leave-one-out and leave-group-out) cross-validation procedure will be used to corroborate the predictive power of the models.

Results and Discussion

Computational methods. Mathematical definition of the molecular descriptor

Molecular vector space

Each element of the periodic table has inherent atomic properties, such as electronegativity, density, atomic radius and so on. Each one of these properties numerically characterizes each kind of atom taking values in the real set (ℜ). For example, the Mulliken electronegativity (XA) [37] of the atom A take the values XH = 2.2 for Hydrogen, XC = 2.63 for Carbon, XN = 2.33 for Nitrogen, XO = 3.17 for Oxygen, XCl = 3.0 for Chlorine and so on.
Let there be a molecular vector whose elements are the atomic properties of the atoms in the molecule, for instance XA. Thus, a molecule having 2, 3, 4,…, n atoms can be “represented” by means of vectors, with 2, 3, 4,...., n components, belonging to the spaces ℜ2, ℜ3, ℜ4,..., ℜn, respectively. Where n is the dimension of these real subsets (ℜn).
This approach allows us to express compounds such as benzene, cyclohexane, hexane and all the constitutional and geometric isomers of hexane through a general kind of vector X= (XC, XC, XC, XC, XC, XC). On the other hand, n-propanol, iso-propanol, propanal, and acetone may be represented by (XC, XC, XC, XO) or any permutation of the components of this vector. All these vectors belong to the product space ℜ6 and ℜ4, respectively. It must be noted that the order of the vector components is meaningless here. This fact, not common in classical vector spaces, will be explained elsewhere. In this example the hydrogen atoms were not considered.
By taking into consideration all the universe of organic molecules, a molecular vector space (E) could be defined:
E =     2   3 ...   n = i = 1 n   i
where, i=1, 2, 3,…n; ℜk ⌒ ℜl = {0}: k ≠ l [38,39] and the dimension of E is the sum of the dimensions of each one of the ℜi spaces. Therefore, this dimension is n(n+1)/2.
This space includes all possible molecules having n atoms as vectors of the ℜn spaces. This mathematical formalism makes it possible to represent any drug or organic molecule as a vector space and then, to use the well-known applications of this algebraic construction to codify molecular structure in a timely but mathematically rigorous way.

Total quadratic indices; [qk(x)].

Mathematically, a quadratic form is defined as follows [39,40,41]: Let H be a K-space of a finite dimension n. Then the application q: H→ K is a quadratic form (q(x)) if for X=x1a1+...+xnan, where (ai)1≤in is a base of H, it satisfies that:
q ( x ) = i = 1 n   j = 1 n   a i j X   i Y   j
Therefore, the quadratic indices are calculated based on an equation analogue to Eq. 2 as an application in the ℜi, vector space of finite dimension i: q: ℜi→ K. If a molecule is considered with n atoms (vector ofn), the k-th quadratic indices qk(x) are defined as q application (q: ℜn→ℜ) if the molecular vector (X) can be expressed by a linear combination with a base belonging to the vector space ℜn (X=x1a1+...+xnan, where (ai)1≤in is a base of ℜn). Taking into consideration the above mentioned conditions q is a quadratic form if Eq. 3 is considered. In this way, the whole form qk(x), is written as a sum of all the possible terms aijxixj, of "i" and "j", independently one of the other, taking values from 1 to n.
q k ( x ) = i = 1 n   j = 1 n   k   a i j X   i Y   j
where kaij = kaji and n is the number of atoms of the molecule. The coefficients kaij are the elements of the k-th power of the “molecular pseudograph’s atom adjacency matrix” (G). Here, M (G) = M = [aij], where n is the number of vertices and the elements aij are defined as follows:
a i j   = P i j  if i≠j and    e k     E   /   e k   ~   v i , v j = L i i  if  i = j = 0  otherwise
where, Pij is the number of edges that comply with ek ~ vi,vj among the vertices (atoms) vi and vj and Lii is the number of loops in vi. Thus, mathematically a pseudograph can be defined in the following way [38,39]: Let V be a finite not empty set and E an unordered finite set of pairs of elements in V (with equal pairs in E inclusive): the pairs G=<V,E >, are called graphs with loops and multiple edges or pseudograph.
The elements aij (if aij = Pij) of this matrix represent the bonds between an atom vi and an other vj. The matrix Mk provides the number of walks of length k that links the vertices vi and vj. For this reason each edge represents 2 electrons of a covalent bond between atoms vi and vj, and it is appreciated in the M (k=1) matrix input that vij and vji is equal to 1. In this way, the benzene molecule can be represented by two different multigraphs, where each multigraph is related with one of the Kekulé structures. Taking this into consideration, it is necessary the use of a pseudograph to avoid this situation in compounds with more than one canonical structure. This happens for substituted aromatic compounds such as pyridine, naphthalene, quinoline, etc., where the electrons of PI(π)-orbitals are represented as loops of all-ring atoms.
Aromatic rings with only one canonical structure, such as furan, thiophene, pyrrole etc. are represented as a multigraph. This explanation is represented, in an easy way, in Scheme 1 and in Table 1. As can be observed, for the benzene molecule, the total quadratic indices (without considering hydrogen atoms) calculated using the multigraph matrices (connectivity matrices) have the same values. However, some molecules such as acetylsalicylic acid show differences in the total and local (heteroatoms and H-bonding heteroatoms) quadratic indices obtained from each multigraph (Scheme 1, MKA and MKB). The representation number, like a multigraph, is higher when the number of rings with more than one canonical structure is increased.
On the other hand, from the expression of qk(x) the following considerations arise in a natural way: 1) With the coefficients aij, evidently, the square matrix M=[aij] of order n can be formed, and 2) let X = [x1, x2, x3,...., xn], the vector of coordinates of X in the base {a1,...,ai}, a matrix of n-row and a single columns; transposing this matrix, Xt= [X1 X2,........,Xn] is obtained; which is the row vector of the coordinates of X in the base {a1,...,ai}. Then q(x) can be written in the form of a matrix product q(x) =XtMX. Recently, other descriptors have been expressed through the vector-matrix-vector multiplication procedure [42]. The result of the matrix multiplication is a matrix formed by a row and a column that is a number. Therefore, if we use the canonical bases, the coordinates of any molecular vector (X) coincide with the components of that vector. For that reason, those coordinates can be considered as weights (atom labels) of the vertices of the molecular pseudograph, due to the fact that components of the vector are values of some atomic property, which characterizes each kind of atom.
Scheme 1. Graphical representation of some molecules using “multigraphs” and “pseudographs”.
Scheme 1. Graphical representation of some molecules using “multigraphs” and “pseudographs”.
Molecules 08 00687 g001
Table 1. Total and Local Quadratic Indices Calculated for Multigraphs (MKA, MKB) and Pseudographs (P).
Table 1. Total and Local Quadratic Indices Calculated for Multigraphs (MKA, MKB) and Pseudographs (P).
Acetylsalicylic acid
If we make M the matrix of paths of length k (Mk) among n vertices of the molecular pseudograph and we multiply it by the coordinates of molecular vector (X) in the canonical basis of ℜn, we obtain k values that constitute numeric descriptors of the molecular structure. Therefore we can “define” a molecule as quadratic indices (q(x)’s) in the matrix form XtMkX = qk(x), k ≥ 10.
From the given definitions of M and qk(x) it can be observed that the total quadratic indices are positive integers. The data presented in Table 2 exemplifies the calculation of five quadratic indices for isonicotinic acid.
In any case, if a complete series of indices is considered, a specific characterization of the chemical structure is obtained, which is not repeated in any other molecule. The generalization of the matrices and descriptors to “superior analogues” is necessary for the evaluation of situations where one descriptor is unable to bring a good structural characterization [26].
Table 2. Definition and Calculation of Five (k=0-4) Quadratic Indices of the Molecular Pseudograph’s Atom Adjacency Matrix of the Isonicotinic Acid Molecule.
Table 2. Definition and Calculation of Five (k=0-4) Quadratic Indices of the Molecular Pseudograph’s Atom Adjacency Matrix of the Isonicotinic Acid Molecule.
Molecules 08 00687 i001
Isonicotinic acid
Molecules 08 00687 i002
Molecular Pseudograph (G)
(Hydrogen Suppressed-pseudograph)
X=[N1 C2 C3 C4 C5 C6 C7 O8 O9]
Molecular Vector: X∊ℜ9 and ℜ9E;
E: Molecular Vector Space

In the definition of the X, as molecular vector, the chemical symbol of the element is used to indicate the corresponding electronegativity value. That is: if we write O it means χ(O), oxygen Mulliken electronegativity or some atomic property, which characterizes each atom in the molecule. Therefore, if we use the canonical bases of R9, the coordinates of any vector X coincide with the components of that molecular vector
Xt =[233 263 263 263 263 263 263 3.17 3.17]
Xt = transposed of X and it means the vector of the coordinates of X in the Canonical basis of R9 (a row vector)
X: vector of coordinates of X in the Canonical basis of R9 (a column vector)
q 0 ( x ) = i = 1 n   j = 1 n   0   a i j X   i X   j
= XtM0X=67.0281
q 1 ( x ) = i = 1 n   j = 1 n   1   a i j X   i X   j
= XtM1X=183.7166
q 2 ( x ) = i = 1 n   j = 1 n   2   a i j X   i X   j
= XtM2X=589.963
M ( G ) =   N 1 C 2 C 3 C 5 1 C 6 C 7 C 8 C 9 N 1 1 1 0 0 0 1 0 0 0 C 2 1 1 1 0 0 0 0 0 0 C 3 0 1 1 1 0 0 0 0 0 C 4 0 0 1 1 1 0 1 0 0 C 5 0 0 0 1 1 1 0 0 0 C 6 1 0 0 0 1 1 0 0 0 C 7 0 0 0 1 0 0 0 2 1 C 8 0 0 0 0 0 0 2 0 0 C 9 0 0 0 0 0 0 1 0 0

M(G): Adjacency Matrix Among Vertices of the Molecular Pseudograph (G)
q 3 ( x ) = i = 1 n   j = 1 n   3   a i j X   i X   j
= XtM3X=1784.6905
q 4 ( x ) = i = 1 n   j = 1 n   4   a i j X   i X   j
= XtM4X=5707.7232

Local quadratic indices; [qkL(x)]

In the case of quadratic indices it is possible to define analogues to total quadratic indices that possess similar properties and which are defined as local quadratic indices of the “molecular pseudograph`s atoms adjacency matrix”. The definition of this descriptor, graph theoretical invariant for a given fragment FR (connected subgraph), within a specific pseudograph (G) is the following:
q k L ( x ) = i = 1 m   j = 1 m   k   a i j L   X   i X   j
where m is the number of atoms of the fragment of interest and kaijL is the element of the file “i” and column “j” of the matrix MkL=Mk(G, FR) [qkL(x) = qk(x, FR)]. This matrix is extracted from the Mk matrix and it contains the information referred to the vertices of the specific fragments (FR) and also of the molecular environment.
The matrix MkL=[kaijL] with elements kaijL is defined as follows:
kaijL = kaij if both vi and vj are vertices contained in the specific fragment.
=1/2 kaij either vi or vj is contained in the specific fragment but not both
at the same time
=0 otherwise
with kaij being the elements of the k-th power of M. These local analogues can also be expressed in matrix form by the expression:
qkL(x) =Xt MkL X: MkL:it is extract from Mk
As can be seen. if a molecule is partitioned in Z molecular fragments, the matrix Mk can be partitioned in Z local matrices MkL, L=1,... Z. The k-th power of matrix M is exactly the sum of the k-th power of local Z matrices:
M k = L = 1 Z M     L     k
or in the same way as Mk=[kaij], where:
  k a i j = L = 1 Z   k a i j   L
and consequently, the total quadratic indices of order k can be expressed as the sum of the local quadratic indices of the Z fragments of the same order: FR
q k ( x ) = L = 1 Z q k L ( x )
Any local quadratic index has a particular meaning, especially for the first values of k, where the information about the structure of the fragment FR is contained. High values of k are in relation to the environment information of the fragment FR considered inside the molecular pseudograph (G). A general equation for k order is described as follows:
q k L ( x ) = i k a i i L X i 2 + 2 ( i , j ) k a i j L X i X j
In a similar way to total analogues, the complete series of indices brings gives a unique characterization of the chemical structure fragment, which not only has information about the fragment under study, but also on the molecular environment. These local indices can also be used together with total indices as variables of QSAR and QSPR models for properties or activities that depend more on a region or fragment than on the whole molecule.

Calculation of total and local quadratic indices

Let us now consider the molecule of 1-methylallyl alcohol (but-3-en-2-ol) and its labelled molecular “pseudograph” and atom adjacency matrix as a simple example. The zero, first and second powers of this matrix and local matrices of these orders of each one of the three fragments shown in the molecule are given in Table 3.
The quadratic indices of the “molecular pseudograph’s atoms adjacency matrix” are calculated in the following way:
Total and Local indices of zero order [q0(x) and q0L(x)]. These indices are obtained when the matrix M is raised to the power 0 (k=0). A matrix raised to the power 0 is the identity matrix (I); which is constituted by the elements aii=1 [M0(i, i)=1]. Since the zero order matrix is diagonal, its quadratic form contains only the terms with the squares of the coordinates (an atomic property) of the X vector in canonical bases. Generally, we can establish that.
q 0 ( x ) = i = 1 n X i 2
q 0 L ( x ) = i = 1 m X i 2
where n and m are the number of atoms in the molecule or in the fragment FR under study, respectively.
The total quadratic indices of zero order are obtained by the matrix product, q0(x)=XtM0X and local quadratic indices of zero order for each one of the three represented fragments are calculated using the three local matrices as the matrix of the quadratic form. Making the matrix product by the row matrix (Xt) and by the column matrix (X), the three local molecular quadratic indices (one for each fragment) are obtained (see Table 3): q0(x, F1)=1.(XO4 )2=1.(3.17)2=10.0489; q0(x, F2)= 1.(XC3)2 + 1.(XC5)2=1.(2.63)2 +1.(2.63)2=13.8338 and q0(x, F3)= 1.(XC1)2 + 1.(XC2)2=1.(2.63)2 +1.(2.63)2=13.8338. It should be noted that q0(x, G)= q0(x, F1)+q0(x, F2)+q0(x, F3)= 1.(XC1)2 +1.(XC2)2 +1.(XC3)2 +1.(XO4)2 +1.(XC5)2 =1.(2.63)2 +1.(2.63)2 +1.(2.63)2+ 1.(3.17)2 + 1.(2.63)2=37.7165 and that M0(G)=M0(G, F1)+M0(G, F2)+M0(G, F3).
The local quadratic index, q0L(x) contains information about the fragment under study, without regard to which atom(s) it is bonded to, since the ones in the main diagonal express that paths of length 0 is the succession of a single vertex. That is to say, those sub-graphs of zero order consist of isolated vertices. This index has information about the molecular size of the fragment and it depends on the number and type of atoms that are contained in the fragment under study.
Total and local quadratic indices of first order [q1(x) and q1L(x)]. These indices are obtained when the matrix M is raised to the unit power (M1= M) and multiplied by the matrices Xt and X. We can write the expression for q1(x) and q1L(x) in the forms:
q 1 ( x ) = i a i i X i 2 + 2 ( i , j ) a i j X i X j
q 1 L ( x ) = i a i i L X i 2 + 2 ( i , j ) a i j L X i X j
The total quadratic index of first order is: q1(x)= 4.(XC1.XC2) + 2.(XC2.XC3) + 2.(XC3.XO4) + 2.(XC3.XC5) = 4.( +2.( +2.( +2.( = 72.0094. To obtain the local analogues for each fragment we proceed to the extract of the matrices “partitioned” for each one of the fragments (see Table 3). Making the matrix product we get: q1(x,F1) = 1.(XC3.XO4) = 1.( = 8.3371; q1(x,F2) = 1.(XC2.XC3) +1.(XC3.XO4)+2.(XC3.XC5) = 1.( +1.( +2.( = 29.0878 and q1(x, F3) = 4.(XC1.XC2) +1.(XC2.XC3) = 4.( +1.( = 34.5845. It should be observed that q1(x, G)= q1(x, F1)+ q1(x, F2) +q1(x, F3) and that M1(G)= R = 1 3 M1(G, FR).
As can be seen, this index not only has information about the fragment FR of interest, but also has information about the atoms to which this fragment is connected to by a step (by means of a walk of length 1). As it is appreciated from its formulation that this index is capable of differentiating between saturated and unsaturated sub-structures (fragments) inside a molecular pseudograph (molecule). Two sub-graphs will only have the same value, if and only if, both fragments present the same composition, equal topological arrangements among the atoms that constitute them and, the fragments are connected to the same atoms that are not part of the fragment by a path of length 1 (in a step).
Total and local quadratic indices of second order [q2(x) and q2L(x)]. In general, these indices are calculated as:
q 2 ( x ) = i = 1 n j = 1 n   2 a i j X i X j
q 2 L ( x ) = i = 1 m j = 1 m   2 a i j L X i X j
As it can be observed, to obtain this index it is necessary to obtain the matrices M2, which are given in Table 3. If in the four cases (total and three local ones) we carry out the matrix product we obtain:
q2(x,G)=4.(XC1)2+5.(XC2)2+3.(XC3)2+1.(XO4)2+1.(XC5)2+4.(XC1.XC3)+2.(XC2.XO4)+2.(XC2.XC5)+2.(XO4.XC5)=4.(2.63)2 +5.(2.63)2 +3.(2.63)2 +1.(3.17)2 +1.(2.63)2 +4.( +2.( +2.( +2.(;
q2(x, F1)=1.(XC2.XO4)+1.(XO4.XC5)+1.(XO4)2=1.( +1.(
q2(x, F2)=2.(XC1.XC3) +1.(XC2.XC5) +1.(XC4.XC5) +3.(XC3)2 +1.(XC5)2=2.(
+1.( +1.( +3.(2.63)2+ 1.(2.63)2=56.7554, and
q2(x, F3)=2.(XC1.XC3) +1.(XC2.XC4) +1.(XC2.XC5) +4.(XC1)2 +5.(XC2)2=2.(
+1.( +1.( +4.(2.63)2 +5.(2.63)2=91.3399.
It is easy to prove that q2(x, G) = q2(x, F1)+q2(x, F2)+q2(x, F3) and that M2(G)= R = 1 3 M2(G, FR).
Table 3. The Zero, First and Second Powers of the Molecular “pseudograph’s” Atom Adjacency Matrix and Local Matrices for These Order of Each One of 3 Fragments Shown in the Molecule of 1-methylallyl alcohol (but-3-en-2-ol).
Table 3. The Zero, First and Second Powers of the Molecular “pseudograph’s” Atom Adjacency Matrix and Local Matrices for These Order of Each One of 3 Fragments Shown in the Molecule of 1-methylallyl alcohol (but-3-en-2-ol).
Molecules 08 00687 i003
Molecular Structure of 1-methylallyl alchohol (But-3-en-2-ol)
X=[C1 C2 C3 O4 C5] Molecular Vector: X∊ℜ5 and 5∊ℜE;
E: Molecular Vector Space
In the definition of the X, as molecular vector, the chemical symbol of the element is used to indicate the corresponding electronegativity value. That is: if we write O it means χ(O), oxygen Mulliken electronegativity or some atomic property, which characterizes each atom in the molecule. Therefore, if we use the canonical bases of ℜ5, the coordinates of any molecular vector X coincide with the components of that molecular vector.
Xt = [2.63 2.63 2.63 3.17 2.63]
Xt = transposed of X and it means the vector of the coordinates of X in the Canonical basis of ℜ5 (a row vector)
X: vector of coordinates of X in the Canonical basis of ℜ5 (a column vector)
The zero, first and second powers of the molecular “pseudograph’s” total atom adjacency matrix.
M 0 ( G ) = I ( G ) = 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 M 1 ( G ) = 0 2 0 0 0 2 0 1 0 0 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 M 2 ( G ) = 4 0 2 0 0 0 5 0 1 1 2 0 3 0 0 0 1 0 1 1 0 1 0 1 1
The zero, first and second powers of the molecular “pseudograph’s” local atom adjacency matrix of each one of 3 fragments shown in the molecule of 1-methylallyl alcohol
M 0 ( G , F 1 ) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 M 1 ( G , F 1 ) = 0 0 0 0 0 0 0 0 0 0 0 0 0 1 / 2 0 0 0 1 / 2 0 0 0 0 0 0 0 M 2 ( G , F 1 ) = 0 0 0 0 0 0 0 0 1 / 2 0 0 0 0 0 0 0 1 / 2 0 1 1 / 2 0 0 0 1 / 2 0
M 0 ( G , F 2 ) = 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 M 1 ( G , F 2 ) = 0 0 0 0 0 0 0 1 / 2 0 0 0 1 / 2 0 1 / 2 1 0 0 1 / 2 0 0 0 0 1 0 0 M 2 ( G , F 2 ) = 0 0 1 0 0 0 0 0 0 1 / 2 1 0 3 0 0 0 0 0 0 1 / 2 0 1 / 2 0 1 / 2 1
M 0 ( G , F 3 ) = 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 M 1 ( G , F 3 ) = 0 2 0 0 0 2 0 1 / 2 0 0 0 1 / 2 0 0 0 0 0 0 0 0 0 0 0 0 0 M 2 ( G , F 3 ) = 4 0 1 0 0 0 5 0 1 / 2 1 / 2 1 0 0 0 0 0 1 / 2 0 0 0 0 1 / 2 0 0 0

The TOMO-COMD software

The calculation of total and local quadratic indices for any organic molecule was implemented with the TOMO-COMD software [36]. This software has a graphical interface that makes it user friendly for medicinal chemists. The input of the chemical structure is by directly drawing the molecular pseudograph using the software’s drawing mode. This procedure is carried out by a selection of the active atom symbols belonging to different groups of the periodic table. The multiple edges and loops are edited with a right mouse click. Afterwards, in the calculation mode, one should select the atomic property and the family descriptor before calculating the molecular indices. In this work, we used the Mulliken electronegativity as an example of an atomic property [37]. The descriptors calculated were the following:
qk(x) and qkH(x) are the k-th total quadratic indices calculated using the k-th power of the matrices [Mk(G) or Mk(GH)] of the molecular pseudograph (G) considering and not considering hydrogen atoms, respectively.
EqkL(x) [or EqkLH(x)] and H qkL(x) are the k-th local quadratic indices calculated using a k-th power of the local matrices [MkL(G, FR)] of the molecular pseudograph (G) not considering (or considering) hydrogen atoms for heteroatoms (S,N,O) and hydrogen bonding heteroatoms, respectively.

Physical properties data sets for QSPR studies

To test the ability of the set of the total and local quadratic indices to predict molecular physical properties, the following four series have been investigated (three of which have been previously investigated by other “topological” procedures):
74 alkanes (Table 4) with seven representative physical properties: Boiling point (Bp), molar volume at 20 oC (MV), molar refraction at 20 oC (MR), heat of vaporization at 25 oC (HV), critical temperature (TC), critical pressure (PC), and surface tension at 20 oC (ST) [43];
58 alkyl alcohols with Bp data (Table 8, Table 9 and Table 10) [44];
106 cycloalkanes, including polycycles and spiroalkanes with Bp data (Table 12 and Table 13) [25];
Bp data of 95 structurally diverse compounds belonging to several chemical groups, but all containing in their structure some aromatic rings (Table 14 and Table 15) [45,46].
Table 4. Quadratic Indices of the “Molecular Pseudograph’s Atom Adjacency Matrix” for C3-C9 Alkanes.
Table 4. Quadratic Indices of the “Molecular Pseudograph’s Atom Adjacency Matrix” for C3-C9 Alkanes.

Data analysis

The statistical analyses were carried out with the STATISTICA software package [47]. Linear multiple regression analysis (LMR) was used to obtain quantitative models that relate the structures and physical properties of organic compounds. The quality of the models was determined examining the statistic parameters of multivariable comparison of regression and cross-validation procedures [leave-one-out and leave-group (5-fold)-out]. In recent years, the leave-one-out (LOO) press statistics (e.g., q2) have been used as a means of indicating predictive ability. Many authors consider high q2 values (for instance, q2> 0.5) as indicator or even as the ultimate proof of the high predictive power of a QSAR model. In a recent paper, Golbraikh and Tropsha demonstrated that high values of LOO q2 appears to be a necessary but not the sufficient condition for the model to have a high predictive power [48]. A more exhaustive cross-validation method can be used in which a fraction of the data (10-20%) is left out and predicted from a model based on the remaining data. This process (leave-group-out, LGO) is repeated until each observation has been left out at least once [49,50]. For this present paper, each investigated data set was splited randomly into five groups of approximately the same size (20%). Each group was left out (LGO) and that group was then predicted by a model developed from the remaining observations (80% of the data). This process was carried out five times on five unique subsets. In this way, every observation was left out once, in groups of 20%, and its value predicted. The mean absolute errors (MAE) for the five groups will be used as the significant criterion for assessing model quality. The level of overall (average) MAE (for a 20% full leave-out) of 5-fold cross-validation procedure can be taken as good confirmation of the predictive quality of the model. In addition, to assess the robustness and predictive power of the found models, external prediction (test) sets also were used. This type of model validation is very important, if we take into consideration that the predictive ability of a QSAR model can only be estimated using an external test set of compounds that was not used for building the model [48].

QSPR applications

The objective will be to show, in as direct a manner as possible, that the total and local quadratic indices delineated in the previous section yield predictive molecular physical properties in a QSPR analysis. In this sense, we can find a quantitative relation between a property P and the quadratic indices of M having, for instance, the following appearance:
P=a0q0(x) + a1q1(x) + a2q2(x) +….+ akqk(x) + c
where P is the measurement of the property, qk(x) [or qkL(x)] is the kth total [or local] quadratic indices, and the ak’s are the coefficients obtained by the linear regression analysis.
Taking into consideration another of Randić’s attributes, it is convenient that candidates for molecular descriptors have good correlations with at least one physical property [26]. In the present work we have selected physical properties of several data sets of organic compounds. The first data set is formed by 74 alkanes. The values of the total quadratic indices for such molecules are presented in Table 4. The alkanes represent an especially attractive class of compounds as a starting point for the application of molecular modeling techniques, because many alkane properties vary in a regular manner according to molecular mass and extent of branching. Besides, the alkanes are nonpolar and a number of complexities that arise with more polar compounds are thus avoided [43].
The best linear regression models for seven representative physical properties of alkanes were obtained by a forward stepwise procedure; the equation and the statistical parameters are presented in Table 5. In this Table, R is the multiple correlation coefficient, s is the standard deviation of the regression, q2 is the square multiple correlation coefficient of the LOO cross-validation procedure; MAE is the (average) mean absolute error of the LGO cross-validation procedure; F is the Fisher ratio at the 95% confidence level, and the p-value is the significance level.
Table 5. Multiple Regression Equation for Physical Properties Using the Quadratic Indices of the Molecular Pseudograph’s Atom Adjacency Matrix.
Table 5. Multiple Regression Equation for Physical Properties Using the Quadratic Indices of the Molecular Pseudograph’s Atom Adjacency Matrix.
B.p. (oC)=-204.184(±3.262) +1.44048(±0.026).q1H(x) -9.29x10-3(±0.427x10-3).q0(x).q2(x) +2.91x10-7 (±1.75x10-8).q0(x).q13(x) -0.11678(±0.028).q2(x)

N=74 R=0.9988 q2=0.9970 F(4.69)=7068.1 s=2.35 MAE=2.11 p<0.0000
MV (cm3)=39.72(±2.441) +0.7651(±0.031).q0H(x) -4.4x10-7(±1.08x10-7).q15(x) +4.634x10-3(±0.214 x10-3).q0(x).q2(x) -1.74x10-3(±0.132x10-3).q0(x).q3(x)

N=69 R=0.9991 q2=0.9973 F(4.69)=8916.5 s= 0.75 MAE=0.53 p<0.0000
MR (cm3)=3.2327(±0.048) +1.734x10-2(±4.71x10-5). q3H(x) -0.01012(±0.302x10-3).q3(x) +7.486x10-3 (±0.836x10-3).q2(x)

N= 69 R=0.9999 q2=0.9999 F(3.65)= 2.52x105 s= 0.049 MAE=0.0322 p<0.00
HV (KJ/mol)=-1.35607(±0.327) +0.07648(±0.001).q2H(x) -0.1309(±0.004).q2(x) +1.19x10-5(±9.3x10-7) .q11(x)

N=69 R=0.998 q2= 0.9955 F(3.65)=5469.5 s= 0.34 MAE=0.32 p<0.0000
TC (oC)=-71.6809(±6.373) +0.2399(±0.007).q3H(x) -0.02165(±0.001).q0(x).q2(x) +0.83x10-3(±6.01x10-5) .q0(x). q5(x)

N=74 R=0.9953 q2= 0.9892 F(3.70)=2460.1 s=5.66 MAE=5.34 p<0.0000
PC (atm)=54.7074(±0.786) -6.998x10-3(±0.265x10-3).q4H(x)+5.95x10-4(±3.72x10-5).q0(x).q3(x)

N=74 R=0.9803 q2=0.9575 F(2.71)= 878.64 s= 0.86 MAE=0.64 p<0.0000
ST (dyn/cm)=-3.49402(±1.097) +0.04848(±0.001).q2H(x)
-0.00163(±0.122x10-3).q0(x).q2(x) +1.21x10-5(±5.15x10-7).q0(x).q7(x)

N=68 R=0.9892 q2= 0.9734 F(4.63)=722.14 s= 0.29 MAE=0.23 p<0.0000
Table 6. Statistical Parameters for the Models Describing Physical Properties of Alkanes by Using Conectivity Indices, ad hoc Descriptors, Spectral Moments of Edge-Adjacency Matrix and Quadratic Indices of the Molecular Pseudograph’s Atom.Adjacency Matrix.
Table 6. Statistical Parameters for the Models Describing Physical Properties of Alkanes by Using Conectivity Indices, ad hoc Descriptors, Spectral Moments of Edge-Adjacency Matrix and Quadratic Indices of the Molecular Pseudograph’s Atom.Adjacency Matrix.
Connectivity Indicesad hoc DescriptorsMoments of E MatrixQuadratic Indices of M Matrix
a Number of Variables in QSPR Models.
As can be observed from the statistical parameters of the regression equations in Table 5, most of the physical properties are well accounted for by quadratic indices of the “molecular pseudograph’s atom adjacency matrix”. In Table 6 we show the statistical parameters of the best regression equations obtained by Needham et al. [43] using connectivity indices and ad hoc descriptors and by Estrada [23] using spectral moments of edge-adjacency matrix in a molecular graph.
In this sense, the QSPR models obtained by using quadratic indices present less variables (parsimony principle) that the equation obtained by Needham et al. and Estrada with molecular modeling techniques. Nevertheless, in this Table it can be well appreciated that the statistical parameters of the equation obtained with quadratic indices are similar to those obtained in previous studies [23,43]. For most properties, the accuracies of the models are sufficient for many practical purposes.
In second place, we have chosen a group of molecules used by Randić and Basak [51] and later on by Krenkel et. al. [44] from which the Bp of the 58 alkyl alcohols have been computed, which have been used in several QSAR/QSAR studies [52,53,54,55,56].
Using the RLM analysis two QSPR equations have been obtained. Eq. 26 was obtained using the complete set just like Randić and Basak and the Eq. 27 was obtained using as a training set, the same 29 compounds that Krenkel et. al. used. Therefore, in the second case the data of compounds were split into two equivalent sub-sets: 1) a training set, which is constituted by the molecules 1, 2, 3, 4, 6, 8, 9, 11, 14, 16, 18, 20, 22, 26, 27, 29, 34, 35, 37, 39, 41, 44, 45, 48, 49, 52, 53, 56 and 58 of the Table 9 and 2) a test set which includes the remaining molecules (5, 7, 10, 12, 13, 15, 17, 19, 21, 23, 24, 25, 28, 30, 31, 32, 33, 36, 38, 40, 42, 43, 46, 47, 50, 51, 54, 55 and 57). The obtained models are given as follows and the corresponding statistical parameters to the regression equations (Eq. 26-27) are depicted in Table 7. These values have also been included for the equations reported by Randić-Basak and Krenkel et al. (see Table 7 in reference 48 and Table 2 in reference 44). The observed Bp, those calculated for Eq. 26 and 27 and their residuals values as well as those obtained in previous studies is depicted in the Table 8, Table 9 and Table 10.
Bp (oC) =34.16625(±2.696) +0.26497(±0.0111).q2H(x) -0.29237(±0.045).q2(x)
-78.0818x10 –5(±9.932x10-5).Eq9LH(x)
Bp (oC) =461.7348(±30.20806) +0.092098(±0.002).q3H(x) -0.0175226(±0.001).q6(x)
-10.266162(±0.707).Eq2LH(x) +10.956280x10-5(±1.32x10-5).E q14L(x)
The correlation coefficient (R2) for equations 26 and 27 were 0.9877 and 0.9977, respectively. Therefore, these models explained more than 98% and 99% of the variance for the experimental values of Bp [57,58].
Table 7. Statistical Parameters Corresponding to the Regression Equations.
Table 7. Statistical Parameters Corresponding to the Regression Equations.
EquationSetCorrelation Coefficient (R)Standard Error (S) Fischer ratio (F)Average Deviation
Eq. 26Complete0.99384.0061446.92.82
Randić and Basak /48/Complete0.99384.03921932.90
Eq. 27Training
Eq. 11 /44/Training
Eq. 12 /44/Training
Eq. 13 /44/Training
In order to assess the predictability of the model found, a LOO cross-validation was carried out. Using this approach, the models 26 and 27 had a cross-validation square correlation coefficient (q2) of 0.986 and 0.992, respectively.
In the LGO cross-validation procedure carried out for a more exhustive validation of Eq. 26 (Eq. 27), the mean absolute errors for the five groups (used in each case) were as follows: MAE=3.202, 3.053, 3.461, 4.849 and 4.555 oC (MAE=1.579, 1.728, 2.674, 3.546 and 3.375 oC). The overall MAE were 3.824 oC and 2.580 oC for the models 26 and 27, respectively. For a 20% full leave-out cross-validation procedure, this level of MAE is good confirmation of the predictive quality of the models developed.
On the other hand, the statistical parameters represented in Table 7, demonstrates the statistical quality of the obtained models (Eq. 26 and 27), which are similar to those obtained previously. This way, for example, for the complete series the coefficients of multivariable correlation (R) are similar in Eq. 26 to the one obtained in the paper of Randić and Basak [48]. However, the standard error (s) and the average of the deviation obtained by us are smaller.
Similarly, there were no significant differences between model (Eq. 27) obtained using the other alternative (starting from the training set) and the results obtained from previous theoretical results. In this sense, not statistical difference was evidenced using a t-Student test procedure for both models and for those reported previously.
In addition, to assess the utility of quadratic indices to describe in an adequate form the chemical structure of molecules that contain cycles, we have selected from the literature the Bp of 106 cycloalkanes [25]. The same training and prediction sets were taken into consideration as were used in the original study, to make the study comparative.
Table 8. Experimental and Calculated Bp of Alkyl Alcohols in full Set.
Table 8. Experimental and Calculated Bp of Alkyl Alcohols in full Set.
Alkyl alcoholBp exp (oC)Bp calc. (Eq.26)*% ∆Bp cal. Ref./48/
1. methanol64.7065.50-0.80-1.2465.24 (-0.54)
2. ethanol78.3078.43-0.13-0.1777.69 (0.61)
3. 1-propanol97.2095.631.571.6296.42 (0.77)
4. 2. propanol82.3085.83-3.53-4.2884.11 (-1.81)
5. 1-butanol117.70113.404.303.65115.67 (2.03)
6. 2-butanol99.60102.87-3.27-3.28102.43 (-2.83)
7. 2-methyl-1-propanol107.90108.66-0.76-0.71109.15 (-1.25)
8. 2-methyl-2-propanol82.4087.68-5.28-6.4184.52 (-2.12)
9. 1-pentanol137.80133.164.643.36134.92 (2.88)
10. 2-pentanol119.00120.59-1.59-1.34121.68 (-2.68)
11. 3-pentanol115.30119.90-4.60-3.99120.75 (-5.45)
12. 2-methyl-1-butanol128.70126.392.311.80127.97 (0.73)
13. 3-methyl-1-butanol131.20127.134.073.10128.90 (2.30)
14. 2.methyl-2-butanol102.00104.57-2.57-2.52102.41 (-0.41)
15. 3-methyl-2-butanol111.50115.75-4.25-3.81114.72 (-3.22)
16. 2,2-dimethyl-1-propanol113.10117.54-4.44-3.93115.84 (-2.74)
17. 1-hexanol157.13153.124.012.55154.17 (2.83)
18. 2-hexanol139.90140.35-0.45-0.32140.92 (-1.02)
19. 3-hexanol135.40137.63-2.23-1.64139.99 (-4.59)
20. 2-methyl-1-pentanol148.00146.141.861.25147.22 (0.78)
21.3-methyl-1-pentanol152.40146.895.513.61147.72 (4.8)
22. 4-methyl-1-pentanol151.80148.972.831.86148.15 (3.65)
23. 2-methyl-2-pentanol121.40122.25-0.85-0.70121.66 (-0.25)
24. 3-methyl-2-pentanol134.20133.420.780.58133.55 (0.65)
25. 4-methyl-2-pentanol131.70134.27-2.57-1.95134.90 (-3.20)
26. 2-methyl-3-pentanol126.50132.77-6.27-4.96134.31 (-7.81)
27. 3-methyl-3-pentanol122.40121.450.950.78120.30 (2.10)
28. 2-ethyl-1-butanol146.50144.112.391.63146.79 (-0.29)
29. 2,2-dimethyl-1-butanol136.80135.211.591.16134.37 (2.43)
30. 2,3-dimethyl-1-butanol149.00140.078.936.00140.77 (8.23)
31. 3.3-dimethyl-1-butanol143.00136.826.184.32136.11 (6.89)
32. 2,3-dimethyl-2-butanol118.60117.301.301.10114.28 (4.32)
33. 3,3-dimethyl-2-butanol120.00124.47-4.47-3.72121.00 (-1.00)
34. 1-heptanol176.30173.382.921.66173.41 (2.87)
35. 3-heptanol156.80157.38-0.58-0.37159.24 (-2.44)
36. 4-heptanol155.00155.35-0.35-0.23159.24 (-4.24)
37. 2-methyl-2-hexanol142.50142.000.500.35140.90 (1.60)
38. 3-methyl-3-hexanol142.40139.133.272.30139.55 (2.85)
39. 3-ethyl-3-pentanol142.50138.324.182.93138.37 (4.13)
40. 2,3-dimethyl-2-pentanol139.70134.924.783.42133.11 (6.59)
41.3,3-dimethyl-2-pentanol133.00142.09-9.09-6.83139.67 (-6.57)
42. 2.2-dimethyl-3-pentanol136.00141.49-5.49-4.04139.32 (-3.32)
43. 2,3-dimethyl-3-pentanol139.00134.174.833.48132.18 (6.82)
44. 2,4-dimethyl-3-pentanol138.80145.64-6.84-4.93145.34 (-6.54)
45. 1-octanol195.20193.671.530.78192.58 (2.62)
46. 2-octanol179.80180.57-0.77-0.43179.33 (0.47)
47. 2-ethyl-1-hexanol184.60183.820.780.42185.29 (-0.69)
48. 2,2,3trimethyl-3-pentanol152.20142.739.476.22152.78 (-0.57)
49. 1-nonanol213.10213.97-0.87-0.41211.91 (1.19)
50. 2-nonanol198.50200.85-2.35-1.19198.66 (-0.16)
51. 3-nonanol194.70197.60-2.90-1.49197.73 (-3.03)
52. 4-nonanol193.00195.07-2.07-1.07197.73 (-4.73)
53. 5-nonanol195.10194.870.230.12197.73 (-2.63)
54. 7-methyl-1-octanol206.00210.01-4.01-1.95205.46 (0.54)
55. 2,6-dimethyl-4-heptanol178.00182.72-4.72-2.65185.69 (-7.69)
56. 3,5-dimethyl-4-hexanol187.00180.996.013.21183.83 (3.17)
57. 3,3,5-trimethyl-1-hexanol193.00192.540.460.24186.98 (6.02)
58. 1-decanol230.20234.27-4.07-1.77231.15 (-0.95)
*Residual, defined as [Bp exp.– Bp calc], given in brackets for Ref. /48/.
Table 9. Experimental and Calculated Bp of Alkyl Alcohols in Training Set.
Table 9. Experimental and Calculated Bp of Alkyl Alcohols in Training Set.
Alkyl alcoholBp exp (oC)Bp calc. (Eq. 27)∆*% ∆Bp calc. (Eq. 11)
1. methanol64.7066.03-1.33-2.0664.68 (0.02)
2. ethanol78.3075.962.342.9977.36 (0.94)
3. 1-propanol97.2097.44-0.24-0.2496.80 (0.40)
4. 2. propanol82.3080.691.611.9678.24 (4.06)
6.2-butanol99.60100.08-0.48-0.4897.68 (1.92)
8. 2-methyl-2-propanol82.4081.630.770.9384.97 (-2.57)
9. 1-pentanol137.80137.060.740.54135.69 (2.11)
11. 3-pentanol115.30118.40-3.10-2.69117.13 (-1.83)
14. 2.methyl-2-butanol102.00101.740.260.26104.41 (-2.41)
16. 2,2-dimethyl-1-propanol113.10116.94-3.84-3.40117.11 (4.01)
18. 2-hexanol139.90138.731.170.83136.57 (3.33)
20. 2-methyl-1-pentanol148.00147.820.180.12148.68 (-0.68)
22. 4-methyl-1-pentanol151.80149.112.691.77148.68 (3.12)
26. 2-methyl-3-pentanol126.50131.41-4.91-3.88130.11 (-3.61)
27. 3-methyl-3-pentanol122.40121.410.990.81123.86 (-1.46)
29. 2,2-dimethyl-1-butanol136.80132.034.773.49136.55 (0.25)
34. 1-heptanol176.30175.420.880.50174.57 (1.73)
35. 3-heptanol156.80156.88-0.08-0.05156.01 (0.79)
37. 2-methyl-2-hexanol142.50140.891.611.13143.30 (-0.80)
39. 3-ethyl-3-pentanol142.50140.751.751.23143.30 (-0.80)
41.3,3-dimethyl-2-pentanol133.00136.16-3.16-2.37137.43 (-4.43)
44. 2,4-dimethyl-3-pentanol138.80143.48-4.68-3.37143.10 (-4.30)
45. 1-octanol195.20194.460.740.38194.01 (1.19)
48. 2,2,3trimethyl-3-pentanol152.20154.18-1.98-1.30144.16 (8.04)
49. 1-nonanol213.10213.32-0.22-0.10213.45 (-0.35)
52. 4-nonanol193.00195.49-2.49-1.29194.89 (-1.89)
53. 5-nonanol195.10195.34-0.24-0.12194.89 (0.21)
56. 3,5-dimethyl-4-hexanol187.00178.808.204.39181.99 (5.01)
58. 1-decanol230.20232.18-1.98-0.86232.86 (-2.66)
*Residual, defined as [Bp exp. – Bp calc] given in brackets for Eq. 11. Ref. [44].
Table 10. Experimental and Calculated Bp of Alkyl alcohols in Test Set.
Table 10. Experimental and Calculated Bp of Alkyl alcohols in Test Set.
Alkyl alcoholBp exp. (oC)Bp calc. (Eq. 27)*% ∆Bp calc.(Eq. 11)
5. 1-butanol117.70117.500.200.17116.25 (1.45)
7. 2-methyl-1-propanol107.90112.68-4.78-4.43109.79 (-1.89)
10. 2-pentanol119.00119.23-0.23-0.20117.13 (1.87)
12. 2-methyl-1-butanol128.70130.00-1.30-1.01129.34 (-0.64)
13. 3-methyl-1-butanol131.20131.110.090.07129.23 (1.97)
15. 3-methyl-2-butanol111.50114.17-2.67-2.39110.67 (0.83)
17. 1-hexanol157.13156.380.750.48155.13 (1.87)
19. 3-hexanol135.40137.52-2.12-1.57136.57 (-1.17)
21.3-methyl-1-pentanol152.40147.355.053.31148.68 (3.72)
23. 2-methyl-2-pentanol121.40121.160.240.20123.86 (-2.46)
24. 3-methyl-2-pentanol134.20131.272.932.18130.11 (4.09)
25. 4-methyl-2-pentanol131.70132.55-0.85-0.65130.11 (1.59)
28. 2-ethyl-1-butanol146.50146.120.380.26148.68 (-2.18)
30. 2,3-dimethyl-1-butanol149.00141.008.005.37142.22 (6.78)
31. 3.3-dimethyl-1-butanol143.00133.599.416.58136.55 (6.45)
32. 2,3-dimethyl-2-butanol118.60119.44-0.84-0.71117.40 (1.20)
33. 3,3-dimethyl-2-butanol120.00120.08-0.08-0.06117.99 (2.01)
36. 4-heptanol155.00156.58-1.58-1.02156.01 (-1.01)
38. 3-methyl-3-hexanol142.40141.121.280.90143.30 (-0.90)
40. 2,3-dimethyl-2-pentanol139.70138.021.681.20136.84 (2.86)
42. 2.2-dimethyl-3-pentanol136.00136.45-0.45-0.33137.43 (-1.43)
43. 2,3-dimethyl-3-pentanol139.00138.900.100.07136.84 (2.16)
46. 2-octanol179.80177.282.521.40175.45 (4.35)
47. 2-ethyl-1-hexanol184.60182.691.911.03187.56 (-2.96)
50. 2-nonanol198.50196.412.091.05194.89 (3.61)
51. 3-nonanol194.70195.53-0.83-0.43194.89 (-0.19)
54. 7-methyl-1-octanol206.00205.500.500.24207.00 (1.00)
55. 2,6-dimethyl-4-heptanol178.00183.63-5.63-3.16181.99 (-3.99)
57. 3,3,5-trimethyl-1-hexanol193.00190.452.551.32188.43 (4.57)
*Residual, defined as [Bp exp.– Bp calc], given in brackets for Eq. 11. Ref. [44].
This data contains cyclic, mono, poly-substituted alkanes, as well as spiroalkanes. Using a stepwise procedure, two MLR models that describe the Bp of compounds in the training and prediction sets, using the quadratic indices as independent variables, were obtained:
Bp (oC)=-105.146(±4.718) +3.1629(±0.118).q1(x) -0.4933(±0.045).q2(x)
Bp (oC)=-108.197(±3.635) +1.6358(±0.361).q0(x) +2.038(±0.103).q1(x)
-0.3016(±4.718).q2(x) -1.75x10-5(±3.75x10-6).q14(x)
+6.42x10-6(±1.34x10-6) .q15(x)
The statistical parameters of these two QSPR equations and the values reported by Estrada [25] are presented in Table 11.
Table 11. Statistical Parameters Corresponding to the Regression Equations for 80 Compounds Present in the Training Data Set.
Table 11. Statistical Parameters Corresponding to the Regression Equations for 80 Compounds Present in the Training Data Set.
EquationSetCorrelation Coefficient (R)Standard Error (S)Fischer ratio (F)
Eq. (28) two descriptorsTraining
Eq. (29) Five descriptorsTraining
Eq. (1)/(25). Six descriptors Training
The statistical parameters show a high statistical quality of the developed models. For example, the correlation coefficient of model 28 with two single variables is bigger than 0.98 and the standard deviation represents less than 8% of the variance of the experimental property. Nevertheless, the statistical parameters of this equation are inferior to those obtained by Estrada [25], although its model includes 6 molecular descriptors. Furthermore, models with more statistical quality were obtained (Eq. 29), with a lineal correlation coefficient of 0.9927 and the standard deviation represented less than 5% of the variance in the experimental property.
These statistical parameters are accepted for the Bp description of molecules that contain cycles, if we take into consideration that the generation of good equations for the description of the Bp of these compounds is not the principal objective of this work. Nevertheless, our model with less variables (parsimony principle) and including single linear terms presents statistical parameters comparable to that of the original paper [25], which use 6 variables (spectral moments of different order) and non-linear dependence between the physical property and the spectral moments. The use of non-linear terms influence significantly in the multivariable equations. In this case, the statistical parameters of the equations obtained for the description of physical properties of alkanes using the spectral moments improved with the introduction of the square root of variables [23]. In this role, the improvements were significant, especially for the Bp, when including in the model the square root of the spectral moment of order zero, reducing the value of the standard deviation in half and R and F increased from 0.9949 to 0.9984 and from 1650 to 5194, respectively. In the case of the description of the critical pressure (PC, atm) using spectral moments, R had a significant increase from 0.9756 to 0.9854, because of the inclusion of non-linear terms [23].
In Table 12, the experimental and calculated values of the Bp are given for compounds in the training set, for the two equations obtained in this study and for the models obtained by Estrada [25].
Table 12. Experimental and Calculated Bp of Cycloalkanes of the Training Set.
Table 12. Experimental and Calculated Bp of Cycloalkanes of the Training Set.
noCycloalkaneObsd (oC)Cald [Eq. 28]Res. Cald [Eq. 29]Res. Cald [Eq. 1 /25 ]Res.
Using the LOO cross-validation procedure, the models 28 and 29 had a q2 of 0.961 and 0.977, respectively. Using the LGO cross-validation method, the Eqs 28 and 29 had a overall MAE of 6.429 oC (7.452, 5.766, 7.070, 7.321 and 4.536 oC) and 4.801 oC (5.472, 5.159, 3.539, 5.426 and 4.41 oC), respectively.
Table 13. Experimental and Calculated Bp of Cycloalkanes of the Test Set.
Table 13. Experimental and Calculated Bp of Cycloalkanes of the Test Set.
noCycloalkaneObsd (oC)Cald [Eq. 28]Res. Cald [Eq. 29]Res. Cald [Eq. 1 /25 ]Res.
In addition, as a second corroboration of the predictive power of the model, an external prediction set of twenty-six cyclic alkanes was used (external validation). The Bp of the compounds included in the external test set was predicted with the same accuracy as the compounds in the data set. The linear relationship in this series can be supported by the statistical parameters for this set depicted in Table 11.
In Table 13, the experimental and calculated Bp for both equations and for the model obtained by Estrada [25] are depicted. These statistical parameters are adequate for the description of physical properties and are comparable with those obtained by Estrada for the same series. Considering the whole set (Training and test set), the correlation coefficient and standard deviation were 0.9931 and 4.94 oC, respectively. As it can be observe, in both series, the predictability and robustness of the theoretical model was demonstrated.
Finally, in order to test the applicability of quadratic indices on structure-property correlations, and with the aim of extending the approach to molecules that contain aromatic cycles in their structure, 95 structurally diverse organic compounds, were selected. They were randomly splitted into two subsets; one contained 75 compounds that were used as a training set, and the other 20 compounds were used as a test set. Using a series of 75 compounds as training set, a quantitative model as a function of total and local quadratic indices, was developed. The Bp values were described by multivariate linear regression analysis using a stepwise procedure. The best QSPR model obtained, together with its statistical parameter, are given below:
Bp (oC) = -21.10996(±5.894) +0.352115(±0.084).q0H(x) +0.2756648(±0.012).q2(x) +5.420964(±0.218).Hq1L(x) +1.644634(±0.347).Eq1L(x) +0.041902(±0.012).Eq4LH(x) -0.025834(±0.004).Eq5L(x)
N=70 R=0.9905 q2=0.9763 F(6.63)=539.43 s=7.6115 MAE=7.34 p<0.0001
In the development of the quantitative model for the Bp description of the calibration data set, five compounds were detected as statistical outliers. Outlier detection was carried out using the following standard statistical test: residual, standardized residuals, Studentized residual and Cooks’ distance [55]. The five compounds were m-bromophenol, o-anisidine, p-nitroaniline, hexamethylbenzene and furan cycle. As can be observed there are no distinctive structural relationships among these compounds.
In Table 14 are listed the experimental and calculated Bp values of the training set. Statistical parameters in Eq. 30 suggest a high quality of the found model. The correlation coefficient R is over 0.99 and standard deviation is only 7.61oC. The squared correlation coefficient (R2) for Eq. 30 was 0.981, so this model explained more than 98% of the variance for the experimental Bp values.
In order to assess the predictability and robustness of the found model, internal and external validation procedures were carried out. Using LOO cross-validation procedure, the Eq. 30 had a cross-validation squre correlation coefficient of 0.976. In LGO cross-validation approach, the model 30 had the following mean absolute errors for the five groups (20%, 14 compounds): MAE=9.679, 6.788, 4.262, 7.727 and 8.250 oC. The overall MAE was 7.342. Like a more exhaustive corroboration of the predictive power of the model, an external prediction set of 20 aromatic organic compounds was used. The Bp of the compounds included in the external test set was predicted with the same accuracy as compounds in the data set. The statistical parameters for this series were: R= 0.9930, F(1.18)=1274.4 and s=7.8280 oC. These results evidence the good predictive power of the model found. Experimental and calculated Bp of the 20 aromatic compounds is given in Table 15. Considering the full set (training and test set) the correlation coefficients were 0.9884, F(1.88)=3717.5 and s=8.43 oC.
Table 14. Experimental and Calculated Values of the Bp of Molecules Included in the Training Set, that Contain Aromatic Cycles in Their Molecular Structure, as Well as Residual of Regression and Cross-Validation.
Table 14. Experimental and Calculated Values of the Bp of Molecules Included in the Training Set, that Contain Aromatic Cycles in Their Molecular Structure, as Well as Residual of Regression and Cross-Validation.
CompoundObs. (oC)Calc.Res.R-CVCompoundObs. (oC)Calc.Res.R-CV
o-Toluic Acid259.00265.28-6.28-6.68p-Cymene177.00179.64-2.64-2.96
m- Toluic Acid263.00266.40-3.40-3.63Biphenyl255.00257.78-2.78-3.20
p- Toluic Acid275.00267.057.958.52Diphenylmethane263.00271.25-8.25-9.32
o-Bromophenol194.00191.362.642.82Benzyl Alcohol205.00194.7210.2810.72
p-Fluorophenol185.00189.05-4.05-6.64α-Phenylethyl Alcohol205.00212.19-7.19-7.54
o-Phenylenediamine252.00265.08-13.08-15.96β-Phenylethyl Alcohol221.00211.439.5710.37
p-Toluidine200.00208.13-8.13-8.51Phthalyc Anhydride284.00280.663.344.85
Benzoic Acid250.00245.954.054.28Naphthalene218.00215.182.823.23
Benzoyl Chloride197.00200.84-3.84-4.089,10-Anthraquinone380.00374.995.019.29
p-Xylene138.00148.89-10.89-11.40Furfuryl Alcohol171.00175.81-4.81-5.42
1, 2, 3-Trimethyl benzene176.00169.996.016.32Phenylacetic Acid266.00275.84-9.84-12.57

Colinearity between variables and redundancy of information

One on the main problems concerning the application of TIs to QSPR/QSAR studies is that many descriptors are colinear and that there will be much redundancy of information. Problems with redundancy of information, and collinearity, have been illustrated with the use of TIs, such as the molecular connectivities [59,60].
Table 15. Experimental and Calculated Values of the Bp of Molecules, Included in the Test Set, that Contain Aromatic Cycles in their Molecular Structure as Well as Residual of Regression.
Table 15. Experimental and Calculated Values of the Bp of Molecules, Included in the Test Set, that Contain Aromatic Cycles in their Molecular Structure as Well as Residual of Regression.
CompoundObs. (oC)Cal.Res.CompoundObs. (oC)Cal.Res.
p-Chlorotoluene162.00151.4810.52Cinnamylic Alcohol257.50239.3418.16
*Compound detected as an outlier in the training set.
For a better statistical interpretation of the QSPR/QSAR models (in order to understand which effects cannot be separated), where inter-related indices are considered (such as topologic or topographic indices based on the same graph-theoretical invariant), the inclusion in the model of strongly interrelated variables should be avoided. It is necessary to consider the above-mentioned criterion because an interrelation among different descriptors produces a highly unstable correlation coefficient and makes it difficult to know the real contribution of each variable included in the model [58]. An unfortunate illustration of this phenomenon was described recently by Romanelli et al. [61] who reported a QSAR for the toxicity of twelve aliphatic alcohols, using nine collinear variables, achieving an R2 of 0.9932. To solve this problem Randić proposed a procedure of orthogonalization of molecular descriptors that have been applied with much success to QSPR and QSAR studies [62,63,64,65,66]. The orthogonalization of molecular descriptors is an approach in which molecular descriptors are transformed in such a way that they do not mutually correlate. The nonorthogonal descriptors and the derived orthogonal descriptors both contain the same information, which results in the same statistical parameters of the QSAR models [62,63,64,65,66]. However, the coefficient of the QSAR model based on orthogonal descriptors are stable to the inclusion of novel descriptors, which permits to interpret the regression terms and evaluate the role of individual descriptors to the QSAR model.
For the present paper, to alleviate the colinearity between variables in each investigated data set, an interrelation study among the quadratic indices used in the obtained equations were carried out, using correlation matrices of the molecular descriptors used in QSPRs. The acceptable level of colinearity to avoid is a more subjective issue. In this sense, reports of acceptable correlation coefficients between variables have range from less than 0.4 to 0.9 in the literature. In the view of the Cronin and Schultz, the collinearity of the variables should be as low as possible, but must be significantly lower that the statistical fit of the QSPR/QSAR itself [67]. In order to shown the procedure above mentioned, the inter-correlation study between total and local quadratic indices used in the development of the Eq. 30 was considered. In Table 16, the correlation matrix for this equation shows that there is low colinearity among these variables. In Table 17, other useful parameters to detect the existence of multicolinear variables (partial correlation and tolerance) are given. In this sense, the tolerance represents the unexplained variability for the other variables, and the partial correlation coefficient explains the correlation between the property and a specific variable, when the linear effects of other independent variables have been eliminated.
Table 16. The squared correlation matrix showing covariance (r2) among the topological descriptors (Total and local quadratic indices) used in the regression analysis for 70 compounds.
Table 16. The squared correlation matrix showing covariance (r2) among the topological descriptors (Total and local quadratic indices) used in the regression analysis for 70 compounds.
Table 17. “Redundancy” of total and local quadratic indices used as independent variables.
Table 17. “Redundancy” of total and local quadratic indices used as independent variables.
DescriptorsMultiple RMultiple R-squareR-square changePartial Correlation.TolernceR-square

Interpretation of QSPR models

At present, it is known that properties are influenced by different kinds of interactions. In Eq. 31, the Bp is represented as a function of several interaction properties.
Bp = f (Molecular Weight, H-Bonding Capacity, Dipole Moment,
Molecular Branching)
Several approaches can be used to extract a structural interpretation of an obtained model using quadratic indices. We used two different ways that permit an easy interpretation of the Bp in terms of molecular structure. The first one is the “classical” way in which we do a direct analysis of the structural information presented by each molecular descriptor and how this contributes to the property under study. The second one the way that is how the total contribution of different atoms in a specific molecule is expressed. In the second approach, a more compact additive scheme is obtained [68]. The first approach permits estimating the relative contribution of different molecular factors (mass, branching, electronic and steric factor) to the physical properties. As can be observed in the obtained regression models, the included variables are related with the factors that influence on the Bp values and these ones with the structural features of molecules. Taken into consideration the structurally diverse organic compounds included in the fourth QSPR example, this dataset was selected to develop a simple analysis. For example, in Eq. 31, the variables Hq1L(x) and Eq1L(x), Eq5L(x), Eq4LH(x) are in relation with the H-bonding capacity (hydrogen atoms as donors and acceptors, respectively). The coefficients of these variables in the Eq. 31 are positive; only local “heteroatoms” quadratic indices of fifth order [Eq5L(x)] have a negative contribution to the property. This is a logical result because when the number of hydrogen atoms bonded to heteroatoms in molecules is increased then the Bp increases also, because the possibility of intermolecular H-bonding increases with the increase of H-X groups (O, N and S) in molecules. In this sense, the “protonic” quadratic indices of first order [Hq1L(x)] are the sum of all possible products of electronegativity of the hydrogen atoms and heteroatoms bonded to them. If X is O, N or S atom, then values of this index increase in the same order, because the electronegativity of these atoms decreases from oxygen atom until the sulfur atom. For this reason, this index is an indicative of the number and type of hydrogen atom linked to heteroatoms.
On the other hand, the Eq1L(x), Eq5L(x) and Eq4LH(x) also are in relation with molecular charge, that is to say, these indices are variables that parameterize to the molecular dipole moment. Finally, molecular weight is described for total quadratic indices [q2(x) and q0H(x)], suppressing and including hydrogen atoms in molecular pseudograph, respectively. For example, the q0H(x) possesses positive contribution to the Bp due to this molecular descriptor is the sum of the squared of all posible products of the electronegativity of all atoms in the molecule, which is an indicative of the molecular size that increase with the number (n) of atoms in the molecule. The other molecular descriptor [q2(x)] is related with the possible effect of this variable on molecular weight, size and molecular branching. That is, this variable is a good choice to describe the Bp defined by the combination of molecular weight and branching. This influence is demonstrated by the positive contribution of this index to the studied property.
The second approach permits to obtain the contribution of atoms in a specific molecule allowing the comparison among them in a more effective way. In these sense, we can substitute expression (Eq.10) into QSPR model (Eq. 18) to obtain the total contribution of the different atoms in a specific molecule. The atoms’ contribution is calculated from this procedure as shown in Eq. 32,
P = b 0 + k a k q k ( x ) = b 0 + k L a k q k L ( x )
where L stands for the corresponding atom.
Considering the QSPR models obtained for describing the Bp of cycloalkenes (Eq. 28 and Eq. 29) and the molecule of 1-methyl-1,2-diethylcyclopropane, a simple example is given here for calculation of these atoms contributions to Bp. This molecule with its atom numbering and the total and local (atom) quadratic indices are depicted in Table 18.
Table 18. Molecule of 1-methyl-1,2-diethylcyclopropane with the Following Atom Numbering and Their Total and Local (Atom) Quadratic Indices.
Table 18. Molecule of 1-methyl-1,2-diethylcyclopropane with the Following Atom Numbering and Their Total and Local (Atom) Quadratic Indices.
Molecules 08 00687 i004
Atom (f)q0L(x, f)q1L(x, f)q2L(x, f)q14L(x, f)q15L(x, f)BpA [0C; (Eq. 28)]BpB [oC; (Eq. 29)
Now, if we divide the intercept values of the QSPR models by the number of atoms in the molecule (n=8) and we using the atom quadratic indices as molecular descriptors into models A (Eq. 28) and B (Eq. 29), then the atom contribution for each specific atom is obtained:
BpA (a)= (-105.146/8) +3.1629.q1L(x, a)–0.4933.q2L(x, a)=47.07 oC
BpB (a)=(-108.197/8) +1.6358.q0L(x, a) +2.038.q1L(x, a)–0.3016.q2L(x, a)
-1.75x10-5.q14L(x, a) +6.42x10-6. q15L(x, a)=34.90 oC
BpA (b)= (-105.146/8) +3.1629.q1L(x, b)–0.4933.q2L(x, b)=25.19 oC
BpB (b)= (-108.197/8) +1.6358.q0L(x, b)+2.038.q1L(x, b)–0.3016.q2L(x, b)
-1.75x10-5.q14L(x, b) +6.42x10-6. q15L(x, b)=20.34 oC
BpA (c)= (-105.146/8) +3.1629.q1L(x, c)–0.4933.q2L(x, c)=6.73 oC
BpB (c)= (-108.197/8) +1.6358.q0L(x, c)+2.038.q1L(x, c)–0.3016.q2L(x, c)
-1.75x10-5.q14L(x, c) +6.42x10-6. q15L(x, c)=8.92 oC
BpA (d)= (-105.146/8) +3.1629.q1L(x, d)–0.4933.q2L(x, d)=-4.91 oC
BpB (d)= (-108.197/8) +1.6358.q0L(x, d)+2.038.q1L(x, d)–0.3016.q2L(x, d)
-1.75x10-5.q14L(x, d) +6.42x10-6. q15L(x, d)=13.68 oC
BpA (e)= (-105.146/8) +3.1629.q1L(x, e)–0.4933.q2L(x, e)=13.55 oC
BpB (e)= (-108.197/8) +1.6358.q0L(x, e) +2.038.q1L(x, e)–0.3016.q2L(x, e)
-1.75x10-5.q14L(x, e) +6.42x10-6. q15L(x, e)=13.68 oC
BpA (f)= (-105.146/8) +3.1629.q1L(x, f)–0.4933.q2L(x, f)=1.91 oC
BpB (f)= (-108.197/8) +1.6358.q0L(x, f)+2.038.q1L(x, f)–0.3016.q2L(x, f)
-1.75x10-5.q14L(x, f) +6.42x10-6. q15L(x, f)=7.30 oC
BpA (g)= (-105.146/8) +3.1629.q1L(x, g)–0.4933.q2L(x, g)=16.96 oC
BpB (g)= (-108.197/8) +1.6358.q0L(x, g)+2.038.q1L(x, g)–0.3016.q2L(x, g)
-1.75x10-5.q14L(x, g) +6.42x10-6. q15L(x, g)=16.56 oC
BpA (h)= (-105.146/8) +3.1629.q1L(x, h)–0.4933.q2L(x, h)=1.91 oC
BpB (h)= (-108.197/8) +1.6358.q0L(x, h)+2.038.q1L(x, h)–0.3016.q2L(x, h)
-1.75x10-5.q14L(x, h) +6.42x10-6. q15L(x, h)=7.09 oC
Now, we can calculate the Bp of the 1-methyl-1,2-diethylcyclopropane molecule using two approaches. The first one is using the atom’s quadratic indices, because it is clear that the sum of these atom contributions gives the value of the Bp of the molecule (see right hand column in Table 18) and the second one is using the total quadratic indices (considering the whole molecule). The Bp of the molecule as a function of total quadratic indices can be obtained as follows:
BpA (Molecule)=-105.146+3.1629.q1(x)–0.4933.q2(x)=108.42 oC
BpB (Molecule)=-108.197+1.6358.q0(x)+2.038.q1(x)–0.3016.q2(x)-1.75x10-5.q14(x)
+6.42x10-6. q15(x)=110.79 oC
This approach allows building of topological chemical representations of molecules (using a pseudograph) by combining molecular fragments. In this sense, k-th total quadratic indices can be expressed as a “linear combination” of k-th fragment (local) quadratic indices (subgraph). This way, the calculation of several molecules properties by combining distributions (atom contributions) of smaller fragments present in the molecule is carried out. This method is based on the assumption that contribution of a given molecular fragment to the complete molecular property should be quite similar in different molecules or in different locations of the same molecule, provided that the molecular environments are similar. That is to say, the atom or fragment contribution of several properties of molecular fragments is approximately “transferable”. Now consider two the ethyl fragments present (e-f and g-h) in the 1-methyl-1,2-diethylcyclopropane molecule as in the example given above. These fragments had similar contributions but not the same. This is a logical result because the molecular enviroment is similar but not the identical. For example q0L(x, f) [q0L(x, e-f)=6.9169+0.9169 and q0L(x, g-h)=6.9169+0.9169] and q1L(x, f) [q1L(x, e-f)=13.8338+6.9169 and q1L(x, g-h)=13.8338+6.9169] had the same value for both ethyl fragments; but the values of the other molecular descriptors included in the obtained models (Eq. 28 and Eq. 29) are not the same; for example: q2L(x, f) [q2L(x, e-f)=34.5845+13.8338 and q2L(x, g-h)=27.6676+13.8338]. In this case, the difference is in relation with the different values of the local qudratic indices of e and g atom, which is logic because the topologic enviroment (in two steps) is not the same for both atoms. Notice that the f and h atoms have the same value for local qudratic indices and their atom contribution in the ethyl fragment is the same [q2L(x, f)= q2L(x, h)=13.8338].The magnitude of the local quadratic indices increases as the order of the index increases as a consequence of the greater amount of structural information contained in higher order local quadratic indices. For intance, q14; 15L(x, e-f) and q14; 15L(x, g-h) contain more information about both ethyl fragment (on the atom that constitute the fragment and on theirs molecular enviroment), than the previous one.


A promising topological approach to obtain a family of new molecular descriptors has been proposed. In this connection, a vector space E (molecular vector space), whose elements are organic molecules, was defined as a “direct sum” of different ℜi spaces.
The descriptors were denominated, in general, as quadratic indices, in analogy to the mathematical quadratic forms. The k-th power of the atom adjacency matrix (M) of the molecular pseudograph and canonical bases are selected as the quadratic forms’ matrices and bases, respectively. This molecular TIs has been implemented in computer in the TOMO-COMD software, with the aim of creating a new calculation method. Specifically, the electronegativities of the atoms were used as atomic property. These indices were generalized to “higher analogues (higher order)” as number sequences, with the aim of creating a family of descriptors that constitute a tool of great utility for drug design and bioinformatic studies. In addition, this paper introduces a local approach for molecular quadratic indices. The local definition of these indices allows obtaining these descriptors for an atom or a fragment in study, which can be used in the description of molecular properties that are greatly related with the contribution of this portion. This way, for example, these local indices are of great importance in the modeling of properties of molecules that contain heteroatoms in their structure.
Finally, total and local quadratic indices and MLR have been used in QSPR studies of organic compounds. The resulting quantitative models are significant from a statistical point of view and permit a clear interpretation of the studied properties in terms of the structural features of molecules. A LOO and LGO cross-validation procedure (internal validation) and external predicting series (external validation) revealed that the regression models had a fairly good predictability. The physical properties of the test set compounds were predicted with the same accuracy as the compounds of the training set. The comparison with other approaches reveals a good behavior of the proposed method. The obtained results are valid to establish that these new indices fulfill several desirable attributes for a new molecular descriptor.
The approach described in this paper appears to be a very promising structural invariant, useful for QSPR/QSAR studies and showed to providing an excellent alternative or guides for discovery and optimization of new lead compounds, reducing the time and cost of traditional procedure.


The author thanks Dr. Ernesto Estrada for sending me several reprints of his papers about Chemical Graph Theory. Also, the author thanks the anonymous referees for their useful comments, which contributed to an improved presentation of these results.


  1. Devlin, J. P. (Ed.) High Throughput Screening; Marcel Dekker: New York, 2000.
  2. Broach, J. R.; Thorner, J. High-Throughput Screening for Drug Discovery. Nature 1996, 384 Suppl., 14–16. [Google Scholar]
  3. Walters, W. P.; Stahl, M. T.; Murcko, M. A. Virtual Screening-an Overview. Drug Disc Today. 1998, 3, 160–178. [Google Scholar]
  4. Drie, J. H. V.; Lajinees, M. S. Approaches to Virtual Library Design. Drug Disc Today. 1998, 3, 274–283. [Google Scholar]
  5. de Julián-Ortiz, J. V.; Gálvez, J.; Muñoz-Collado, C.; García- Domenech, R.; Gimeno-Cardona, C. Virtual Combinatorial Synthesis and Computational Screening of New Potential Anti-Herpes Compounds. J Med Chem. 1999, 42, 3308–3314. [Google Scholar]
  6. Van de Waterbeemd, H.; Carter, R. E.; Grassy, G.; Kubinyi, H.; Martin, Y. C.; Tute, M. S.; Willett, P. Annu. Rep. Med. Chem. 1998, 33, 397.
  7. Karelson, M. Molecular Descriptors in QSAR/ QSPR; John Wiley & Sons: New York, 2000. [Google Scholar]
  8. Katritzky, A. R.; Gordeeva, E. V. Traditional Topological Indexes vs Electronic, Geometrical, and Combined Molecular Descriptors in QSAR/QSPR Research. J. Chem. Inf. Comput. Sci. 1993, 33, 835. [Google Scholar]
  9. Kier, L. B.; Hall, L. H. Molecular Structure Description. The Electrotopological State; Academic Press: New York, 1999. [Google Scholar]
  10. Balaban, A. Topological and Stereochemical Molecular Descriptors for Databases Useful in QSAR, Similarity/Dissimilarity and Drug Design. SAR QSAR Environ. Res. 1998, 8, 1–21. [Google Scholar]
  11. Estrada, E. On the Topological Sub-Structural Molecular Design (TOSS-MODE) in QSPR/QSAR and Drug Design Research. SAR QSAR Environ. Res. 2000, 11, 55–73. [Google Scholar]
  12. Randić, M. Encyclopedia of Computational Chemistry; Schleyer, P. V. R., Ed.; John Wiley & Sons: New York, 1998; Vol. 5, pp. 3018–3032. [Google Scholar]
  13. Rouvray, D. H. Mathematical and Computational Concepts in Chemistry; Trinajstic, N., Ed.; Ellis Horwood: Chichester, 1986; pp. 295–306. [Google Scholar]
  14. Balaban, A. T. (Ed.) From Chemical Graphs to Three-Dimensional Geometry; Plenum Press: New York, 1997.
  15. Todeschini, R.; Consoni, V. Handbook of molecular descriptors; Wiley VCH, Weinheim: Germany, 2000. [Google Scholar]
  16. Topological Indices and Related Descriptors in QSAR and QSPR; Devillers, J.; Balaban, A. T. (Eds.) Gordon and Breach: Amsterdam, the Netherlands, 1999.
  17. Estrada, E.; Uriarte, E. Recent Advances on the Role of Topological Indices in Drug Discovery Research. Curr. Med. Chem. 2001, 8, 1699–1714. [Google Scholar]
  18. Wiener, H. Structural Determination of Paraffin Boiling Point. J. Am. Chem. Soc. 1947, 69, 17–20. [Google Scholar]
  19. Balaban, A. T. Highly Discriminant Distance-Based Topological Index. Chem. Phys. Lett. 1982, 89, 399–404. [Google Scholar]
  20. Randić, M. Characterization of Molecular Branching. J. Am. Chem. Soc. 1975, 69, 6609–6615. [Google Scholar]
  21. Kier, L. B.; Hall, L. H. Molecular Structure Description. The Electrotopological State; Academic Press: New York, 1999. [Google Scholar]
  22. Plavšić, D.; Nikolić, S.; Trinajstić, N.; Mihalić, Z. On the Harary Index for the Characterization of Chemical Graphs. J. Math. Chem. 1993, 12, 235–250. [Google Scholar]
  23. Estrada, E. Spectral Moment of Edge Adjacency Matrix in Molecular Graphs.1. Definition and Application to the Prediction of Physical Properties of Alkanes. J. Chem. Inf. Comp. Sci. 1996, 36, 846–849. [Google Scholar]
  24. Estrada, E. Spectral Moment of Edge Adjacency Matrix in Molecular Graphs. 2. Molecules Containing Heteroatom and QSAR Applications. J. Chem. Inf. Comp. Sci. 1997, 37, 320–328. [Google Scholar]
  25. Estrada, E. Spectral Moment of Edge Adjacency Matrix in Molecular Graphs 3. Molecules Containing Cycles. J. Chem. Inf. Comp. Sci. 1998, 38, 123–27. [Google Scholar]
  26. Randić, M. Generalized Molecular Descriptors. J. Math. Chem. 1991, 7, 155–168. [Google Scholar]
  27. Mihalic, Z.; Trinajstić, N. A Graph-Theoretical Approach to Structure-Property Relationships. J. Chem. Educ. 1992, 69, 701–712. [Google Scholar]
  28. Diudea, M. V. (Ed.) QSPR/QSAR Studies by Molecular Descriptors; Nova Science, Huntington: New York, 2001.
  29. Ivanciuc, O.; Ivanciuc, T.; Cabrol–Bass, D.; Balaban, A. T. Evaluation in Quantitative Structure–Property Relationship Models of Structural Descriptors Derived from Information–Theory Operators. J. Chem. Inf. Comput. Sci. 2000, 40, 631–643. [Google Scholar]
  30. Balaban, T.; Mills, D.; Ivanciuc, O.; Basak, S. C. Reverse Wiener Indices. Croat. Chem. Acta. 2000, 73, 923. [Google Scholar]
  31. Ivanciuc, O.; Ivanciuc, T.; Klein, D. J.; Seitz, W. A.; Balaban, A. T. Wiener Index Extension by Counting Even/Odd Graph Distances. J. Chem. Inf. Comput. Sci. 2001, 41, 536–549. [Google Scholar]
  32. Torrens, F. Valence Topological Charge-Transfer Indices for Dipole Moments. Molecules 2003, 8, 169–185. [Google Scholar]
  33. Rios–Santamarina, I.; García–Domenech, R.; Cortijo, J.; Santamaria, P.; Morcillo, E. J.; Gálvez, J. Natural Compounds with Bronchodilator Activity Selected by Molecular Topology. Internet Electron. J. Mol. Des. 2002, 1, 70–79. [Google Scholar]
  34. Marino, D. J. G.; Peruzzo, P. J.; Castro, E. A.; Toropov, A. A. QSAR Carcinogenic Study of Methylated Polycyclic Aromatic Hydrocarbons Based on Topological Descriptors Derived from Distance Matrices and Correlation Weights of Local Graph Invariants. Internet Electron. J. Mol. Des. 2002, 1, 115–133. [Google Scholar]
  35. Ivanciuc, O. QSAR Comparative Study of Wiener Descriptors for Weighted Molecular Graphs. J. Chem. Inf. Comput. Sci. 2000, 40, 1412–1422. [Google Scholar]
  36. Marrero, Y.; Romero, V. TOMO-COMD software. Central University of Las Villas, 2002; TOMO-COMD (TOpological MOlecular COMputer Design) for Windows, version 1.0 is a preliminary experimental version; in the future a professional version may be obtained upon request to Y. Marrero: [email protected]; [email protected]. [Google Scholar]
  37. Cotton, F. A. Advanced Inorganic Chemistry; Revolucionaria: Havana; p. 103.
  38. Ross, K. A.; Wright, C.R.B. Matemáticas Discretas; Prentice Hall Hispanoamericana: México, 1990. [Google Scholar]
  39. Noriega, T. Álgebra; Revolucionaria: Havana, Cuba, 1990; pp. 2-10, 43-49. [Google Scholar]
  40. Maltsev, A. I. Fundamentos del álgebra lineal; Mir: Moscow, 1976; pp. 68, 262. [Google Scholar]
  41. Garrido, L. G. Introduccion a la Matemáticas Discretas; Revolucionaria: Havana, Cuba, 1990; pp. 237–298. [Google Scholar]
  42. Estrada, E.; Rodriguez, L. Matrix Algebraic Manipulation of Molecular Graphs. 2. Harary- and MTI-like Molecular Descriptors. Match 1997, 35, 157–167. [Google Scholar]
  43. Needham, D. E.; Wei, I-C.; Seybold, P. G. Molecular Modeling of the Physical Properties of the Alkanes. J. Am. Chem. Soc. 1998, 110, 4186–4194. [Google Scholar]
  44. Krenkel, G.; Castro, E. A.; Toropov, A. A. Improved Molecular Descriptors Based on the Optimization of Correlation Weights of local Graph Invariants. Int. J. Mol. Sci. 2001, 2, 57–65. [Google Scholar]
  45. Morrison, R. T.; Boyd, R. N. Organic Chemistry; Revolucionaria: Havana, Cuba, 1970. [Google Scholar]
  46. Solomon, J. W. G. Química Orgánica; Limusa: Mexico, 1987. [Google Scholar]
  47. STATISTICA ver. 5.5. Statsoft, Inc., 1999.
  48. Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Modell. 2002, 20, 269–276. [Google Scholar]
  49. Rose, K.; Hall, L. H.; Kier, L. B. Modeling Blood-Brain Barrier Partitioning Using the Electrotopological State. J. Chem. Inf. Comput. Sci. 2002, 42, 651–666. [Google Scholar]
  50. Wold, S.; Erikson, L. Statistical Validation of QSAR Results. Validation Tools. In Chemometric Methods in Molecular Design; van de Waterbeemd, H., Ed.; VCH Publishers: New York, 1995; pp. 309–318. [Google Scholar]
  51. Randić, M.; Basak, S. Optimal Molecular Descriptors Based on Weighted Path Numbers. J. Chem. Inf. Comput. Sci. 1999, 39, 261–266. [Google Scholar]
  52. Chenzhong, C.; Zhiliang, L. Molecular Polarizability. 1. Relationship to Water Solubility of Alkanes and Alcohols. J. Chem. Inf. Comput. Sci. 1998, 38, 1–7. [Google Scholar]
  53. Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Normal Boiling Points for Organic Compounds: Correlation and Prediction by a Quantitative Structure-Property Relationship. J. Chem. Inf. Comput. Sci. 1998, 38, 28–41. [Google Scholar]
  54. Estrada, E.; Ivanciuc, O.; Gutman, I.; Gutiérrez, A.; Rodríguez, L. Extended Wiener Indices. A New Set of Descriptors for Quantitative Structure-Property Studies. New J. Chem. 1998, 22, 819–822. [Google Scholar]
  55. Katrizky, A.; Maran, U.; Lobanov, V. S.; Karelson, M. Structurally Diverse Quantitative Structure-Property Relationship Correlations of Technologically Relevant Physical Properties. J. Chem. Inf. Comput. Sci. 2000, 40, 1–18. [Google Scholar]
  56. Stanton, D. T. Development of a Quantitative Structure-Property Relationship Model for Estimating Normal Boiling Points of Small Multifunctional Organic Molecules. J. Chem. Inf. Comput. Sci. 2000, 40, 81–90. [Google Scholar]
  57. Belsey, D. A.; Kuh, E.; Welsch, R. E. Regression Diagnostics; Wiley: New York, 1980. [Google Scholar]
  58. Alzina, R. B. Introduccion conceptual al análisis multivariable. Un enfoque informatico con los paquetes SPSS-X, BMDP, LISREL Y SPAD; PPU, SA: Barcelona, 1989; Chapter 8; Vol. 1, p. 202. [Google Scholar]
  59. Basak, S. C.; Balaban, A. T.; Grunwald, G. D.; Gute, B. D. Topological Indices: Their Nature and Mutual Relatedness. J. Chem. Inf. Comput. Sci. 2000, 40, 891–898. [Google Scholar]
  60. Patel, H.; Cronin, M. T. D. A Novel Index for the Description of Molecular Linearity. J. Chem. Inf. Comput. Sci. 2001, 41, 1228–1236. [Google Scholar]
  61. Romanelli, G. P.; Cafferata, L. F. R.; Castro, E. A. An improved QSAR study of toxicity of saturated alcohols. J. Mol. Struct. (Theochem). 2000, 504, 261–265. [Google Scholar]
  62. Randić, M. Orthogonal Molecular Descriptors. New J. Chem. 1991, 15, 517–525. [Google Scholar]
  63. Randić, M. Fitting of Nonlinear Regression by Orthogonalized Power Series. J. Comput. Chem. 1993, 14, 363–370. [Google Scholar]
  64. Randić, M. Resolution of Ambiguities in Structure-Property Studies by us of Orthogonal Descriptors. J. Chem. Inf. Comput. Sci. 1991, 31, 311–320. [Google Scholar]
  65. Randić, M. Correlation of Enthalpy of Octanes with Orthogonal Connectivities indices. J. Mol. Struct. (Theochem). 1991, 233, 45–59. [Google Scholar]
  66. Lučić, B.; Nikolić, S.; Trinajstić, N.; Jurić, D. The Structure-Property Models can be Improbad Using the Orthogonalized Descriptors. J. Chem. Inf. Comput. Sci. 1995, 35, 532–538. [Google Scholar]
  67. Cronin, M. T. D.; Schultz, T. W. Pitfalls in QSAR. J. Mol. Struct. (Theochem). 2003, 622, 39–51. [Google Scholar]
  68. Estrada, E.; Gonzáles, H. What Are the Limits of Applicability for Graph Theoretic Descriptors in QSPR/QSAR? Modeling Dipole Moments of Aromatic Compounds with TOPS-MODE Descriptors. J. Chem. Inf. Comput. Sci. 2003, 43, 75–84. [Google Scholar]
  • Sample Availability: Not applicable.

Share and Cite

MDPI and ACS Style

Ponce, Y.M. Total and Local Quadratic Indices of the Molecular Pseudograph's Atom Adjacency Matrix: Applications to the Prediction of Physical Properties of Organic Compounds. Molecules 2003, 8, 687-726.

AMA Style

Ponce YM. Total and Local Quadratic Indices of the Molecular Pseudograph's Atom Adjacency Matrix: Applications to the Prediction of Physical Properties of Organic Compounds. Molecules. 2003; 8(9):687-726.

Chicago/Turabian Style

Ponce, Yovani Marrero. 2003. "Total and Local Quadratic Indices of the Molecular Pseudograph's Atom Adjacency Matrix: Applications to the Prediction of Physical Properties of Organic Compounds" Molecules 8, no. 9: 687-726.

Article Metrics

Back to TopTop