Total and Local Quadratic Indices of the Molecular Pseudograph’s Atom Adjacency Matrix: Applications to the Prediction of Physical Properties of Organic Compounds

A novel topological approach for obtaining a family of new molecular descriptors is proposed. In this connection, a vector space E (molecular vector space), whose elements are organic molecules, is defined as a “direct sum” of different ℜi spaces. In this way we can represent molecules having a total of i atoms as elements (vectors) of the vector spaces ℜi (i=1, 2, 3,..., n; where n is number of atoms in the molecule). In these spaces the components of the vectors are atomic properties that characterize each kind of atom in particular. The total quadratic indices are based on the calculation of mathematical quadratic forms. These forms are functions of the k-th power of the molecular pseudograph’s atom adjacency matrix (M). For simplicity, canonical bases are selected as the quadratic forms’ bases. These indices were generalized to “higher analogues” as number sequences. In addition, this paper also introduces a local approach (local invariant) for molecular quadratic indices. This approach is based mainly on the use of a local matrix [Mk(G, FR)]. This local matrix is obtained from the k-th power (Mk(G)) of the atom adjacency matrix M. Mk(G, FR) includes the elements of the fragment of interest and those that are connected with it, through paths of length k. Finally, total (and local) quadratic indices have been used in QSPR studies of four series of organic compounds. The quantitative models found are significant from a statistical point of view and permit a clear interpretation of the studied properties in terms of the structural features of molecules. External prediction series and cross-validation procedures (leave-one-out and leave-group-out) assessed model predictability. The reported method has shown similar results, compared with other topological approaches. The results obtained were the following: a) Seven physical properties of 74 normal and branched alkanes (boiling points, molar volumes, molar refractions, heats of vaporization, critical temperatures, critical pressures and surface tensions) were well modeled (R>0.98, q2>0.95) by the total quadratic indices. The overall MAE of 5-fold cross-validation were of 2.11 oC, 0.53 cm3, 0.032 cm3, 0.32 KJ/mol, 5.34 oC, 0.64 atm, 0.23 dyn/cm for each property, respectively; b) boiling points of 58 alkyl alcohols also were well described by the present approach; in this sense, two QSPR models were obtained; the first one was developed using the complete set of 58 alcohols [R=0.9938, q2=0.986, s=4.006oC, overall MAE of 5-fold cross-validation=3.824 oC] and the second one was developed using 29 compounds as a training set [R=0.9979, q2=0.992, s=2.97 oC, overall MAE of 5-fold cross-validation=2.580 oC] and 29 compounds as a test set [R=0.9938, s=3.17 oC]; c) good relationships were obtained for the boiling points property (using 80 and 26 cycloalkanes in the training and test sets, respectively) using 2 and 5 total quadratic indices: [Training set: R=0.9823 (q2=0.961 and overall MAE of 5-fold cross-validation=6.429 oC) and R=0.9927 (q2=0.977 and overall MAE of 5-fold cross-validation=4.801 oC); Test set: R=0.9726 and R=0.9927] and d) the linear model developed to describe the boiling points of 70 organic compounds containing aromatic rings has shown good statistical features, with a squared correlation coefficient (R2) of 0.981 (s=7.61 oC). Internal validation procedures (q2=0.9763 and overall MAE of 5-fold cross-validation=7.34 oC) allowed the predictability and robustness of the model found to be assessed. The predictive performance of the obtained QSPR model also was tested on an extra set of 20 aromatic organic compounds (R=0.9930 and s=7.8280 oC). The results obtained are valid to establish that these new indices fulfill some of the ideal requirements proposed by Randić for a new molecular descriptor.


Introduction
The last decade has witnessed much progress in how chemical structures are characterized and described, how large sets of compounds are synthesized via a combinatorial chemistry approach and how simple and fast in-vitro assays are carried out. In this sense, the method most used for drug discovery is high-throughput screening (HTS), where massive screening of chemicals on a robotassisted battery of biological assays is carried out [1,2]. Lately, virtual screening has emerged as an interesting alternative to the handling and screening of large databases in order to find a reduced set of potential new drug candidates [3][4][5]. This methodology and in general, molecular biology and drug design, are centered on the relationships between the chemical structures and measured properties of polymers and organic compounds.
In order to obtain structure-property (activity) relationships, henceforth-abbreviated SPR and SAR and quantitative SPR and SAR relationships (abbreviated QSPR and QSAR, respectively), it is necessary to have a structure parameterization. The structure parameterization includes the use of molecular descriptors. Molecular descriptors are "numbers that characterize a specific aspect of the molecule structure" [6]. At present, there are a great number of molecular descriptors that can be used in QSAR and QSPR studies [7]. Among them, the so-called topological indices (TIs) have found major application in medicinal chemistry and molecular modeling [8][9][10][11]. TIs are molecular descriptors derived from graph-theoretical invariants; i.e. they do not depend on the labeling of the vertices or edges on the "molecular graph" [12][13][14][15][16][17][18][19][20][21][22][23][24]. These indices codify structural information contained in 'molecular connectivities' and can be considered as structure cryptic descriptors [15][16][17].
The first TI capable of characterizing the ramification of a "graph" was proposed by Wiener [18]. This index was based on the topological concept of distance, understood as the number of bonds between two atoms by the shortest path. Other authors have defined various indices; prominent among them are the Balaban's J index [19], Randić's molecular connectivity [20], Kier and Hall's electrotopological state (E-state) index [21], the Harary number [22], and Estrada's spectral moments [23][24][25], among others. The latter are related with the bond adjacency matrix, while the majority of the remainder are derived from the vertex adjacency or distance matrices.
The proliferation of topological indices can be compared with the effect produced on quantum chemical parameters by changes in the molecular orbital. In this connection, TIs have been classified according to their nature as first, second and third generation [17]. In a recent paper, Randić [26] has proposed a list of desirable attributes for a topological descriptor. Therefore, this list can be considered as a methodological guide for the development of new TIs. One of the most important criteria is the possibility of defining the descriptors locally. This attribute refers to the fact that the index could be calculated for the molecule as a whole but also over certain fragments of the structure itself.
At times, the properties of a group of molecules are more related to a certain zone or fragment, rather than to the molecule as a whole. Thereinafter, the global definition never satisfies the structural requirements needed to obtain a good correlation in QSAR and QSPR studies. The local indices can be used in certain problems such as: • Research on drugs, toxics or generally any organic molecules with a common skeleton, which is responsible for the activity or property under study.
• Study of the reactivity of specific sites of a series of molecules, which can undergo a chemical reaction or enzymatic metabolism.

•
In the study of molecular properties such as spectroscopic measurements, which are calculated experimentally in a local fashion • In any general case where it is necessary to study not the molecule as a whole, but rather some local properties of certain fragments, then the definition of local descriptors could be necessary.
Another of Randić's attributes refers to the generalization of the indices. The description of the molecular structure by a simple number can bring about loss of information. For this reason, in most cases the use of a family of different simple descriptors for obtaining the algebraic models that relate the structure with its physical, chemical and biological properties is needed [27]. The two possibilities to solve the loss of information in the graph theoretical descriptors are: (1) the generalization of a simple descriptor to "higher" analogues or (2) the generation of graph theoretical invariants as a sequence of numbers [26].
Chemical graph theory is continuously evolving, and novel approaches have appeared as solutions to those difficulties. Recently, several molecular descriptors based on the two-dimensional topological structure of molecules have been defined and tested in QSAR models [28][29][30][31][32][33][34][35], showing that definition of novel molecular descriptors is a promising field in medicinal chemistry (see Todeschini, Karelson, Devillers and Estrada [15][16][17] for an exhaustive compilation). In this sense, the author has developed a novel method called TOMO-COMD (acronym of TOpological MOlecular COMputer Design) [36]. It calculates several families of topological molecular descriptors. One of these families has been defined as quadratic indices by analogy with the quadratic mathematical forms.
The main aim of this paper is to propose a total and local definition of quadratic indices of the "molecular pseudograph's atom adjacency matrix". In order to test the QSPR applicability of the present approach, we will develop quantitative models towards the prediction of several physical properties from the molecular structure of diverse organic compounds, combining quadratic indices and a multiple linear regression method. Finally, predicting series and a (leave-one-out and leavegroup-out) cross-validation procedure will be used to corroborate the predictive power of the models.

Molecular vector space
Each element of the periodic table has inherent atomic properties, such as electronegativity, density, atomic radius and so on. Each one of these properties numerically characterizes each kind of atom taking values in the real set ( ℜ ). For example, the Mulliken electronegativity (X A ) [37] of the atom A take the values X H = 2.2 for Hydrogen, X C = 2.63 for Carbon, X N = 2.33 for Nitrogen, X O = 3.17 for Oxygen, X Cl = 3.0 for Chlorine and so on.
Let there be a molecular vector whose elements are the atomic properties of the atoms in the molecule, for instance X A . Thus, a molecule having 2, 3, 4,…, n atoms can be "represented" by means of vectors, with 2, 3, 4,...., n components, belonging to the spaces ℜ 2 , ℜ 3 , ℜ 4 ,..., ℜ n , respectively. Where n is the dimension of these real subsets ( ℜ n ).
This approach allows us to express compounds such as benzene, cyclohexane, hexane and all the constitutional and geometric isomers of hexane through a general kind of vector X= (X C , X C , X C , X C , X C , X C ). On the other hand, n-propanol, iso-propanol, propanal, and acetone may be represented by (X C , X C , X C , X O ) or any permutation of the components of this vector. All these vectors belong to the product space ℜ 6 and ℜ 4 , respectively. It must be noted that the order of the vector components is meaningless here. This fact, not common in classical vector spaces, will be explained elsewhere. In this example the hydrogen atoms were not considered.
By taking into consideration all the universe of organic molecules, a molecular vector space (E) could be defined: where, i=1, 2, 3,…n; ℜ k ∩ ℜ l = {0}: k ≠ l [38,39] and the dimension of E is the sum of the dimensions of each one of the ℜ i spaces. Therefore, this dimension is n(n+1)/2. This space includes all possible molecules having n atoms as vectors of the ℜ n spaces. This mathematical formalism makes it possible to represent any drug or organic molecule as a vector space and then, to use the well-known applications of this algebraic construction to codify molecular structure in a timely but mathematically rigorous way.

Total quadratic indices; [q k (x)].
Mathematically, a quadratic form is defined as follows [39][40][41]: Let H be a K-space of a finite dimension n. Then the application q: H→ K is a quadratic form (q(x)) if for X=x 1 a 1 +...+x n a n , where (a i ) 1≤i≤ n is a base of H, it satisfies that: Therefore, the quadratic indices are calculated based on an equation analogue to Eq. 2 as an application in the ℜ i , vector space of finite dimension i: q: ℜ i → K. If a molecule is considered with n atoms (vector of ℜ n ), the k-th quadratic indices q k (x) are defined as q application (q: ℜ n → ℜ ) if the molecular vector (X) can be expressed by a linear combination with a base belonging to the vector space ℜ n (X=x 1 a 1 +...+x n a n , where (a i ) 1≤i≤ n is a base of ℜ n ). Taking into consideration the above mentioned conditions q is a quadratic form if Eq. 3 is considered. In this way, the whole form q k (x), is written as a sum of all the possible terms a ij x i x j , of "i" and "j", independently one of the other, taking values from 1 to n.
where k a ij = k a ji and n is the number of atoms of the molecule. The coefficients k a ij are the elements of the k-th power of the "molecular pseudograph's atom adjacency matrix" (G). Here, where n is the number of vertices and the elements a ij are defined as follows: where, P ij is the number of edges that comply with e k ∼ v i ,v j among the vertices (atoms) v i and v j and L ii is the number of loops in v i . Thus, mathematically a pseudograph can be defined in the following way [38,39]: Let V be a finite not empty set and E an unordered finite set of pairs of elements in V (with equal pairs in E inclusive): the pairs G=<V,E >, are called graphs with loops and multiple edges or pseudograph.
The elements a ij (if a ij = P ij ) of this matrix represent the bonds between an atom v i and an other v j . The matrix M k provides the number of walks of length k that links the vertices v i and v j . For this reason each edge represents 2 electrons of a covalent bond between atoms v i and v j , and it is appreciated in the M (k=1) matrix input that v ij and v ji is equal to 1. In this way, the benzene molecule can be represented by two different multigraphs, where each multigraph is related with one of the Kekulé structures. Taking this into consideration, it is necessary the use of a pseudograph to avoid this situation in compounds with more than one canonical structure. This happens for substituted aromatic compounds such as pyridine, naphthalene, quinoline, etc., where the electrons of PI(π)-orbitals are represented as loops of all-ring atoms.
Aromatic rings with only one canonical structure, such as furan, thiophene, pyrrole etc. are represented as a multigraph. This explanation is represented, in an easy way, in Scheme 1 and in Table  1. As can be observed, for the benzene molecule, the total quadratic indices (without considering hydrogen atoms) calculated using the multigraph matrices (connectivity matrices) have the same values. However, some molecules such as acetylsalicylic acid show differences in the total and local (heteroatoms and H-bonding heteroatoms) quadratic indices obtained from each multigraph (Scheme 1, MKA and MKB). The representation number, like a multigraph, is higher when the number of rings with more than one canonical structure is increased.
On the other hand, from the expression of q k (x) the following considerations arise in a natural way: 1) With the coefficients a ij, evidently, the square matrix M=[a ij ] of order n can be formed, and 2) let X = [x 1 , x 2 , x 3 ,...., x n ], the vector of coordinates of X in the base {a 1 ,...,a i }, a matrix of n-row and a single columns; transposing this matrix, X t = [X 1 X 2 ,........,X n ] is obtained; which is the row vector of the coordinates of X in the base {a 1 ,...,a i }. Then q(x) can be written in the form of a matrix product q(x) =X t MX. Recently, other descriptors have been expressed through the vector-matrix-vector multiplication procedure [42]. The result of the matrix multiplication is a matrix formed by a row and a column that is a number. Therefore, if we use the canonical bases, the coordinates of any molecular vector (X) coincide with the components of that vector. For that reason, those coordinates can be considered as weights (atom labels) of the vertices of the molecular pseudograph, due to the fact that components of the vector are values of some atomic property, which characterizes each kind of atom. Scheme 1. Graphical representation of some molecules using "multigraphs" and "pseudographs". If we make M the matrix of paths of length k (M k ) among n vertices of the molecular pseudograph and we multiply it by the coordinates of molecular vector (X) in the canonical basis of ℜ n , we obtain k values that constitute numeric descriptors of the molecular structure. Therefore we can "define" a molecule as quadratic indices (q(x)'s) in the matrix form X t M k X = q k (x), k ≥ 10.
From the given definitions of M and q k (x) it can be observed that the total quadratic indices are positive integers. The data presented in Table 2 exemplifies the calculation of five quadratic indices for isonicotinic acid.
In any case, if a complete series of indices is considered, a specific characterization of the chemical structure is obtained, which is not repeated in any other molecule. The generalization of the matrices and descriptors to "superior analogues" is necessary for the evaluation of situations where one descriptor is unable to bring a good structural characterization [26].

Local quadratic indices; [q kL (x)]
In the case of quadratic indices it is possible to define analogues to total quadratic indices that possess similar properties and which are defined as local quadratic indices of the "molecular pseudograph`s atoms adjacency matrix". The definition of this descriptor, graph theoretical invariant for a given fragment F R (connected subgraph), within a specific pseudograph (G) is the following: where m is the number of atoms of the fragment of interest and k a ijL is the element of the file "i" and column "j" of the matrix M k . This matrix is extracted from the M k matrix and it contains the information referred to the vertices of the specific fragments (F R ) and also of the molecular environment.
The matrix M k L =[ k a ijL ] with elements k a ijL is defined as follows: k a ijL = k a ij if both v i and v j are vertices contained in the specific fragment. (6) =1/2 k a ij either v i or v j is contained in the specific fragment but not both at the same time =0 otherwise with k a ij being the elements of the k-th power of M. These local analogues can also be expressed in matrix form by the expression: q kL (x) =X t M k L X: M k L :it is extract from M k (7) As can be seen. if a molecule is partitioned in Z molecular fragments, the matrix M k can be partitioned in Z local matrices M k L , L=1,... Z. The k-th power of matrix M is exactly the sum of the k-th power of local Z matrices: and consequently, the total quadratic indices of order k can be expressed as the sum of the local quadratic indices of the Z fragments of the same order: F R Any local quadratic index has a particular meaning, especially for the first values of k, where the information about the structure of the fragment F R is contained. High values of k are in relation to the environment information of the fragment F R considered inside the molecular pseudograph (G). A general equation for k order is described as follows: In a similar way to total analogues, the complete series of indices brings gives a unique characterization of the chemical structure fragment, which not only has information about the fragment under study, but also on the molecular environment. These local indices can also be used together with total indices as variables of QSAR and QSPR models for properties or activities that depend more on a region or fragment than on the whole molecule.

Calculation of total and local quadratic indices
Let us now consider the molecule of 1-methylallyl alcohol (but-3-en-2-ol) and its labelled molecular "pseudograph" and atom adjacency matrix as a simple example. The zero, first and second powers of this matrix and local matrices of these orders of each one of the three fragments shown in the molecule are given in Table 3.
The quadratic indices of the "molecular pseudograph's atoms adjacency matrix" are calculated in the following way: 1) Total and Local indices of zero order [q 0 (x) and q 0L (x)]. These indices are obtained when the matrix M is raised to the power 0 (k=0). A matrix raised to the power 0 is the identity matrix (I); which is constituted by the elements a ii =1 [M 0 (i, i)=1]. Since the zero order matrix is diagonal, its quadratic form contains only the terms with the squares of the coordinates (an atomic property) of the X vector in canonical bases. Generally, we can establish that.
where n and m are the number of atoms in the molecule or in the fragment F R under study, respectively.
The total quadratic indices of zero order are obtained by the matrix product, q 0 (x)=X t M 0 X and local quadratic indices of zero order for each one of the three represented fragments are calculated using the three local matrices as the matrix of the quadratic form. Making the matrix product by the row matrix (X t ) and by the column matrix (X), the three local molecular quadratic indices (one for each fragment) are obtained (see Table 3) The local quadratic index, q 0L (x) contains information about the fragment under study, without regard to which atom(s) it is bonded to, since the ones in the main diagonal express that paths of length 0 is the succession of a single vertex. That is to say, those sub-graphs of zero order consist of isolated vertices. This index has information about the molecular size of the fragment and it depends on the number and type of atoms that are contained in the fragment under study.
2) Total and local quadratic indices of first order [q 1 (x) and q 1L (x)]. These indices are obtained when the matrix M is raised to the unit power (M 1 = M) and multiplied by the matrices X t and X. We can write the expression for q 1 (x) and q 1L (x) in the forms: and The total quadratic index of first order is: q 1  To obtain the local analogues for each fragment we proceed to the extract of the matrices "partitioned" for each one of the fragments (see Table 3). Making the matrix product we get: As can be seen, this index not only has information about the fragment F R of interest, but also has information about the atoms to which this fragment is connected to by a step (by means of a walk of length 1). As it is appreciated from its formulation that this index is capable of differentiating between saturated and unsaturated sub-structures (fragments) inside a molecular pseudograph (molecule). Two sub-graphs will only have the same value, if and only if, both fragments present the same composition, equal topological arrangements among the atoms that constitute them and, the fragments are connected to the same atoms that are not part of the fragment by a path of length 1 (in a step).

(17)
As it can be observed, to obtain this index it is necessary to obtain the matrices M 2 , which are given in Table 3. If in the four cases (total and three local ones) we carry out the matrix product we obtain: It is easy to prove that q 2   ( )

The zero, first and second powers of the molecular "pseudograph's" local atom adjacency matrix of each one of 3 fragments shown in the molecule of 1-methylallyl alcohol
The calculation of total and local quadratic indices for any organic molecule was implemented with the TOMO-COMD software [36]. This software has a graphical interface that makes it user friendly for medicinal chemists. The input of the chemical structure is by directly drawing the molecular pseudograph using the software's drawing mode. This procedure is carried out by a selection of the active atom symbols belonging to different groups of the periodic table. The multiple edges and loops are edited with a right mouse click. Afterwards, in the calculation mode, one should select the atomic property and the family descriptor before calculating the molecular indices. In this work, we used the Mulliken electronegativity as an example of an atomic property [37]. The descriptors calculated were the following:

Physical properties data sets for QSPR studies
To test the ability of the set of the total and local quadratic indices to predict molecular physical properties, the following four series have been investigated (three of which have been previously investigated by other "topological" procedures): a) 74 alkanes (Table 4)

Data analysis
The statistical analyses were carried out with the STATISTICA software package [47]. Linear multiple regression analysis (LMR) was used to obtain quantitative models that relate the structures and physical properties of organic compounds. The quality of the models was determined examining the statistic parameters of multivariable comparison of regression and cross-validation procedures [leaveone-out and leave-group (5-fold)-out]. In recent years, the leave-one-out (LOO) press statistics (e.g., q 2 ) have been used as a means of indicating predictive ability. Many authors consider high q 2 values (for instance, q 2 > 0.5) as indicator or even as the ultimate proof of the high predictive power of a QSAR model. In a recent paper, Golbraikh and Tropsha demonstrated that high values of LOO q 2 appears to be a necessary but not the sufficient condition for the model to have a high predictive power [48]. A more exhaustive cross-validation method can be used in which a fraction of the data (10-20%) is left out and predicted from a model based on the remaining data. This process (leave-group-out, LGO) is repeated until each observation has been left out at least once [49,50]. For this present paper, each investigated data set was splited randomly into five groups of approximately the same size (20%). Each group was left out (LGO) and that group was then predicted by a model developed from the remaining observations (80% of the data). This process was carried out five times on five unique subsets. In this way, every observation was left out once, in groups of 20%, and its value predicted. The mean absolute errors (MAE) for the five groups will be used as the significant criterion for assessing model quality. The level of overall (average) MAE (for a 20% full leave-out) of 5-fold crossvalidation procedure can be taken as good confirmation of the predictive quality of the model. In addition, to assess the robustness and predictive power of the found models, external prediction (test) sets also were used. This type of model validation is very important, if we take into consideration that the predictive ability of a QSAR model can only be estimated using an external test set of compounds that was not used for building the model [48].

QSPR applications
The objective will be to show, in as direct a manner as possible, that the total and local quadratic indices delineated in the previous section yield predictive molecular physical properties in a QSPR analysis. In this sense, we can find a quantitative relation between a property P and the quadratic indices of M having, for instance, the following appearance: P=a 0 q 0 (x) + a 1 q 1 (x) + a 2 q 2 (x) +….+ a k q k (x) + c (18) where P is the measurement of the property, q k (x) [or q kL (x)] is the kth total [or local] quadratic indices, and the a k 's are the coefficients obtained by the linear regression analysis.
Taking into consideration another of Randić's attributes, it is convenient that candidates for molecular descriptors have good correlations with at least one physical property [26]. In the present work we have selected physical properties of several data sets of organic compounds. The first data set is formed by 74 alkanes. The values of the total quadratic indices for such molecules are presented in Table 4. The alkanes represent an especially attractive class of compounds as a starting point for the application of molecular modeling techniques, because many alkane properties vary in a regular manner according to molecular mass and extent of branching. Besides, the alkanes are nonpolar and a number of complexities that arise with more polar compounds are thus avoided [43].
The best linear regression models for seven representative physical properties of alkanes were obtained by a forward stepwise procedure; the equation and the statistical parameters are presented in Table 5. In this Table, R is the multiple correlation coefficient, s is the standard deviation of the regression, q 2 is the square multiple correlation coefficient of the LOO cross-validation procedure; MAE is the (average) mean absolute error of the LGO cross-validation procedure; F is the Fisher ratio at the 95% confidence level, and the p-value is the significance level.     As can be observed from the statistical parameters of the regression equations in Table 5, most of the physical properties are well accounted for by quadratic indices of the "molecular pseudograph's atom adjacency matrix". In Table 6 we show the statistical parameters of the best regression equations obtained by Needham et al. [43] using connectivity indices and ad hoc descriptors and by Estrada [23] using spectral moments of edge-adjacency matrix in a molecular graph.
In this sense, the QSPR models obtained by using quadratic indices present less variables (parsimony principle) that the equation obtained by Needham et al. and Estrada with molecular modeling techniques. Nevertheless, in this Table it can be well appreciated that the statistical parameters of the equation obtained with quadratic indices are similar to those obtained in previous studies [23,43]. For most properties, the accuracies of the models are sufficient for many practical purposes.
In second place, we have chosen a group of molecules used by Randić and Basak [51] and later on by Krenkel et. al. [44] from which the Bp of the 58 alkyl alcohols have been computed, which have been used in several QSAR/QSAR studies [52][53][54][55][56].
In the LGO cross-validation procedure carried out for a more exhustive validation of Eq. 26 (Eq. 27), the mean absolute errors for the five groups (used in each case) were as follows: MAE=3.202 On the other hand, the statistical parameters represented in Table 7, demonstrates the statistical quality of the obtained models (Eq. 26 and 27), which are similar to those obtained previously. This way, for example, for the complete series the coefficients of multivariable correlation (R) are similar in Eq. 26 to the one obtained in the paper of Randić and Basak [48]. However, the standard error (s) and the average of the deviation obtained by us are smaller.
Similarly, there were no significant differences between model (Eq. 27) obtained using the other alternative (starting from the training set) and the results obtained from previous theoretical results. In this sense, not statistical difference was evidenced using a t-Student test procedure for both models and for those reported previously.
In addition, to assess the utility of quadratic indices to describe in an adequate form the chemical structure of molecules that contain cycles, we have selected from the literature the Bp of 106 cycloalkanes [25]. The same training and prediction sets were taken into consideration as were used in the original study, to make the study comparative.    This data contains cyclic, mono, poly-substituted alkanes, as well as spiroalkanes. Using a stepwise procedure, two MLR models that describe the Bp of compounds in the training and prediction sets, using the quadratic indices as independent variables, were obtained: -0.3016(±4.718) . q 2 (x) -1.75x10 -5 (±3.75x10 -6 ) . q 14 (x) +6.42x10 -6 (±1.34x10 -6 ) . q 15 (x) (29) The statistical parameters of these two QSPR equations and the values reported by Estrada [25] are presented in Table 11. The statistical parameters show a high statistical quality of the developed models. For example, the correlation coefficient of model 28 with two single variables is bigger than 0.98 and the standard deviation represents less than 8% of the variance of the experimental property. Nevertheless, the statistical parameters of this equation are inferior to those obtained by Estrada [25], although its model includes 6 molecular descriptors. Furthermore, models with more statistical quality were obtained (Eq. 29), with a lineal correlation coefficient of 0.9927 and the standard deviation represented less than 5% of the variance in the experimental property.
These statistical parameters are accepted for the Bp description of molecules that contain cycles, if we take into consideration that the generation of good equations for the description of the Bp of these compounds is not the principal objective of this work. Nevertheless, our model with less variables (parsimony principle) and including single linear terms presents statistical parameters comparable to that of the original paper [25], which use 6 variables (spectral moments of different order) and nonlinear dependence between the physical property and the spectral moments. The use of non-linear terms influence significantly in the multivariable equations. In this case, the statistical parameters of the equations obtained for the description of physical properties of alkanes using the spectral moments improved with the introduction of the square root of variables [23]. In this role, the improvements were significant, especially for the Bp, when including in the model the square root of the spectral moment of order zero, reducing the value of the standard deviation in half and R and F increased from 0.9949 to 0.9984 and from 1650 to 5194, respectively. In the case of the description of the critical pressure (PC, atm) using spectral moments, R had a significant increase from 0.9756 to 0.9854, because of the inclusion of non-linear terms [23].
In Table 12, the experimental and calculated values of the Bp are given for compounds in the training set, for the two equations obtained in this study and for the models obtained by Estrada [25].  Using the LOO cross-validation procedure, the models 28 and 29 had a q 2 of 0.961 and 0.977, respectively. Using the LGO cross-validation method, the Eqs 28 and 29 had a overall MAE of 6.  In addition, as a second corroboration of the predictive power of the model, an external prediction set of twenty-six cyclic alkanes was used (external validation). The Bp of the compounds included in the external test set was predicted with the same accuracy as the compounds in the data set. The linear relationship in this series can be supported by the statistical parameters for this set depicted in Table  11.
In Table 13, the experimental and calculated Bp for both equations and for the model obtained by Estrada [25] are depicted. These statistical parameters are adequate for the description of physical properties and are comparable with those obtained by Estrada for the same series. Considering the whole set (Training and test set), the correlation coefficient and standard deviation were 0.9931 and 4.94 o C, respectively. As it can be observe, in both series, the predictability and robustness of the theoretical model was demonstrated.
Finally, in order to test the applicability of quadratic indices on structure-property correlations, and with the aim of extending the approach to molecules that contain aromatic cycles in their structure, 95 structurally diverse organic compounds, were selected. They were randomly splitted into two subsets; one contained 75 compounds that were used as a training set, and the other 20 compounds were used as a test set. Using a series of 75 compounds as training set, a quantitative model as a function of total and local quadratic indices, was developed. The Bp values were described by multivariate linear regression analysis using a stepwise procedure. The best QSPR model obtained, together with its statistical parameter, are given below: In the development of the quantitative model for the Bp description of the calibration data set, five compounds were detected as statistical outliers. Outlier detection was carried out using the following standard statistical test: residual, standardized residuals, Studentized residual and Cooks' distance [55]. The five compounds were m-bromophenol, o-anisidine, p-nitroaniline, hexamethylbenzene and furan cycle. As can be observed there are no distinctive structural relationships among these compounds.
In Table 14 are listed the experimental and calculated Bp values of the training set. Statistical parameters in Eq. 30 suggest a high quality of the found model. The correlation coefficient R is over 0.99 and standard deviation is only 7.61 o C. The squared correlation coefficient (R 2 ) for Eq. 30 was 0.981, so this model explained more than 98% of the variance for the experimental Bp values.
In order to assess the predictability and robustness of the found model, internal and external validation procedures were carried out. Using LOO cross-validation procedure, the Eq. 30 had a crossvalidation squre correlation coefficient of 0.976. In LGO cross-validation approach, the model 30 had

Colinearity between variables and redundancy of information
One on the main problems concerning the application of TIs to QSPR/QSAR studies is that many descriptors are colinear and that there will be much redundancy of information. Problems with redundancy of information, and collinearity, have been illustrated with the use of TIs, such as the molecular connectivities [59,60]. For a better statistical interpretation of the QSPR/QSAR models (in order to understand which effects cannot be separated), where inter-related indices are considered (such as topologic or topographic indices based on the same graph-theoretical invariant), the inclusion in the model of strongly interrelated variables should be avoided. It is necessary to consider the above-mentioned criterion because an interrelation among different descriptors produces a highly unstable correlation coefficient and makes it difficult to know the real contribution of each variable included in the model [58]. An unfortunate illustration of this phenomenon was described recently by Romanelli et al. [61] who reported a QSAR for the toxicity of twelve aliphatic alcohols, using nine collinear variables, achieving an R 2 of 0.9932. To solve this problem Randić proposed a procedure of orthogonalization of molecular descriptors that have been applied with much success to QSPR and QSAR studies [62][63][64][65][66]. The orthogonalization of molecular descriptors is an approach in which molecular descriptors are transformed in such a way that they do not mutually correlate. The nonorthogonal descriptors and the derived orthogonal descriptors both contain the same information, which results in the same statistical parameters of the QSAR models [62][63][64][65][66]. However, the coefficient of the QSAR model based on orthogonal descriptors are stable to the inclusion of novel descriptors, which permits to interpret the regression terms and evaluate the role of individual descriptors to the QSAR model.
For the present paper, to alleviate the colinearity between variables in each investigated data set, an interrelation study among the quadratic indices used in the obtained equations were carried out, using correlation matrices of the molecular descriptors used in QSPRs. The acceptable level of colinearity to avoid is a more subjective issue. In this sense, reports of acceptable correlation coefficients between variables have range from less than 0.4 to 0.9 in the literature. In the view of the Cronin and Schultz, the collinearity of the variables should be as low as possible, but must be significantly lower that the statistical fit of the QSPR/QSAR itself [67]. In order to shown the procedure above mentioned, the inter-correlation study between total and local quadratic indices used in the development of the Eq. 30 was considered. In Table 16, the correlation matrix for this equation shows that there is low colinearity among these variables. In Table 17, other useful parameters to detect the existence of multicolinear variables (partial correlation and tolerance) are given. In this sense, the tolerance represents the unexplained variability for the other variables, and the partial correlation coefficient explains the correlation between the property and a specific variable, when the linear effects of other independent variables have been eliminated.  Table 17. "Redundancy" of total and local quadratic indices used as independent variables.

Interpretation of QSPR models
At present, it is known that properties are influenced by different kinds of interactions. In Eq. 31, the Bp is represented as a function of several interaction properties.
Several approaches can be used to extract a structural interpretation of an obtained model using quadratic indices. We used two different ways that permit an easy interpretation of the Bp in terms of molecular structure. The first one is the "classical" way in which we do a direct analysis of the structural information presented by each molecular descriptor and how this contributes to the property under study. The second one the way that is how the total contribution of different atoms in a specific molecule is expressed. In the second approach, a more compact additive scheme is obtained [68]. The first approach permits estimating the relative contribution of different molecular factors (mass, branching, electronic and steric factor) to the physical properties. As can be observed in the obtained regression models, the included variables are related with the factors that influence on the Bp values and these ones with the structural features of molecules. Taken into consideration the structurally diverse organic compounds included in the fourth QSPR example, this dataset was selected to develop a simple analysis. For example, in Eq. 31 coefficients of these variables in the Eq. 31 are positive; only local "heteroatoms" quadratic indices of fifth order [ E q 5L (x)] have a negative contribution to the property. This is a logical result because when the number of hydrogen atoms bonded to heteroatoms in molecules is increased then the Bp increases also, because the possibility of intermolecular H-bonding increases with the increase of H-X groups (O, N and S) in molecules. In this sense, the "protonic" quadratic indices of first order [ H q 1L (x)] are the sum of all possible products of electronegativity of the hydrogen atoms and heteroatoms bonded to them. If X is O, N or S atom, then values of this index increase in the same order, because the electronegativity of these atoms decreases from oxygen atom until the sulfur atom. For this reason, this index is an indicative of the number and type of hydrogen atom linked to heteroatoms. On the other hand, the E q 1L (x), E q 5L (x) and E q 4L H (x) also are in relation with molecular charge, that is to say, these indices are variables that parameterize to the molecular dipole moment. Finally, molecular weight is described for total quadratic indices [q 2 (x) and q 0 H (x)], suppressing and including hydrogen atoms in molecular pseudograph, respectively. For example, the q 0 H (x) possesses positive contribution to the Bp due to this molecular descriptor is the sum of the squared of all posible products of the electronegativity of all atoms in the molecule, which is an indicative of the molecular size that increase with the number (n) of atoms in the molecule. The other molecular descriptor [q 2 (x)] is related with the possible effect of this variable on molecular weight, size and molecular branching. That is, this variable is a good choice to describe the Bp defined by the combination of molecular weight and branching. This influence is demonstrated by the positive contribution of this index to the studied property.
The second approach permits to obtain the contribution of atoms in a specific molecule allowing the comparison among them in a more effective way. In these sense, we can substitute expression (Eq.10) into QSPR model (Eq. 18) to obtain the total contribution of the different atoms in a specific molecule. The atoms' contribution is calculated from this procedure as shown in Eq. 32, ∑ ∑∑ where L stands for the corresponding atom.
Considering the QSPR models obtained for describing the Bp of cycloalkenes (Eq. 28 and Eq. 29) and the molecule of 1-methyl-1,2-diethylcyclopropane, a simple example is given here for calculation of these atoms contributions to Bp. This molecule with its atom numbering and the total and local (atom) quadratic indices are depicted in Table 18. atom contributions gives the value of the Bp of the molecule (see right hand column in Table 18) and the second one is using the total quadratic indices (considering the whole molecule). The Bp of the molecule as a function of total quadratic indices can be obtained as follows: Bp A (Molecule)=-105.146+3.1629 . q 1 (x)-0.4933 . q 2 (x)=108.42 o C Bp B (Molecule)=-108.197+1.6358 . q 0 (x)+2.038 . q 1 (x)-0.3016 . q 2 (x)-1.75x10 -5. q 14 (x) +6.42x10 -6. q 15 (x)=110.79 o C This approach allows building of topological chemical representations of molecules (using a pseudograph) by combining molecular fragments. In this sense, k-th total quadratic indices can be expressed as a "linear combination" of k-th fragment (local) quadratic indices (subgraph). This way, the calculation of several molecules properties by combining distributions (atom contributions) of smaller fragments present in the molecule is carried out. This method is based on the assumption that contribution of a given molecular fragment to the complete molecular property should be quite similar in different molecules or in different locations of the same molecule, provided that the molecular environments are similar. That is to say, the atom or fragment contribution of several properties of molecular fragments is approximately "transferable". Now consider two the ethyl fragments present (e-f and g-h) in the 1-methyl-1,2-diethylcyclopropane molecule as in the example given above. These fragments had similar contributions but not the same. This is a logical result because the molecular enviroment is similar but not the identical. For example q 0L (x, f) [q 0L (x, e-f)=6.9169+0.9169 and q 0L (x, g-h)=6.9169+0.9169] and q 1L (x, f) [q 1L (x, e-f)=13.8338+6.9169 and q 1L (x, g-h)=13.8338+6.9169] had the same value for both ethyl fragments; but the values of the other molecular descriptors included in the obtained models (Eq. 28 and Eq. 29) are not the same; for example: q 2L (x, f) [q 2L (x, ef)=34.5845+13.8338 and q 2L (x, g-h)=27.6676+13.8338]. In this case, the difference is in relation with the different values of the local qudratic indices of e and g atom, which is logic because the topologic enviroment (in two steps) is not the same for both atoms. Notice that the f and h atoms have the same value for local qudratic indices and their atom contribution in the ethyl fragment is the same [q 2L (x, f)= q 2L (x, h)=13.8338].The magnitude of the local quadratic indices increases as the order of the index increases as a consequence of the greater amount of structural information contained in higher order local quadratic indices. For intance, q 14; 15L (x, e-f) and q 14; 15L (x, g-h) contain more information about both ethyl fragment (on the atom that constitute the fragment and on theirs molecular enviroment), than the previous one.

Conclusions
A promising topological approach to obtain a family of new molecular descriptors has been proposed. In this connection, a vector space E (molecular vector space), whose elements are organic molecules, was defined as a "direct sum" of different ℜ i spaces.
The descriptors were denominated, in general, as quadratic indices, in analogy to the mathematical quadratic forms. The k-th power of the atom adjacency matrix (M) of the molecular pseudograph and canonical bases are selected as the quadratic forms' matrices and bases, respectively. This molecular TIs has been implemented in computer in the TOMO-COMD software, with the aim of creating a new calculation method. Specifically, the electronegativities of the atoms were used as atomic property. These indices were generalized to "higher analogues (higher order)" as number sequences, with the aim of creating a family of descriptors that constitute a tool of great utility for drug design and bioinformatic studies. In addition, this paper introduces a local approach for molecular quadratic indices. The local definition of these indices allows obtaining these descriptors for an atom or a fragment in study, which can be used in the description of molecular properties that are greatly related with the contribution of this portion. This way, for example, these local indices are of great importance in the modeling of properties of molecules that contain heteroatoms in their structure.
Finally, total and local quadratic indices and MLR have been used in QSPR studies of organic compounds. The resulting quantitative models are significant from a statistical point of view and permit a clear interpretation of the studied properties in terms of the structural features of molecules. A LOO and LGO cross-validation procedure (internal validation) and external predicting series (external validation) revealed that the regression models had a fairly good predictability. The physical properties of the test set compounds were predicted with the same accuracy as the compounds of the training set. The comparison with other approaches reveals a good behavior of the proposed method. The obtained results are valid to establish that these new indices fulfill several desirable attributes for a new molecular descriptor.
The approach described in this paper appears to be a very promising structural invariant, useful for QSPR/QSAR studies and showed to providing an excellent alternative or guides for discovery and optimization of new lead compounds, reducing the time and cost of traditional procedure.