Atom, Atom-Type, and Total Linear Indices of the “Molecular Pseudograph’s Atom Adjacency Matrix”: Application to QSPR/QSAR Studies of Organic Compounds

In this paper we describe the application in QSPR/QSAR studies of a new group of molecular descriptors: atom, atom-type and total linear indices of the molecular pseudograph's atom adjacency matrix. These novel molecular descriptors were used for the prediction of boiling point and partition coefficient (log P), specific rate constant (log k), and antibacterial activity of 28 alkyl-alcohols and 34 derivatives of 2-furylethylenes,respectively. For this purpose two quantitative models were obtained to describe the alkyl-alcohols' boiling points. The first one includes only two total linear indices and showed a good behavior from a statistical point of view (R(2) = 0.984, s = 3.78, F = 748.57,q(2) = 0.981, and s(cv) = 3.91). The second one includes four variables [3 global and 1 local(heteroatom) linear indices] and it showed an improvement in the description of physical property (R(2) = 0.9934, s = 2.48, F = 871.96, q(2) = 0.990, and s(cv) = 2.79). Later, linear multiple regression analysis was also used to describe log P and log k of the 2-furyl-ethylenes derivatives. These models were statistically significant [(R(2) = 0.984, s = 0.143, and F = 113.38) and (R(2) = 0.973, s = 0.26 and F = 161.22), respectively] and showed very good stability to data variation in leave-one-out (LOO) cross-validation experiment [(q(2) = 0.93.8 and scv = 0.178) and (q(2) = 0.948 and s(cv) = 0.33), respectively]. Finally, a linear discriminant model for classifying antibacterial activity of these compounds was also achieved with the use of the atom and atom-type linear indices. The global percent of good classification in training and external test set obtained was of 94.12% and 100.0%, respectively. The comparison with other approaches (connectivity indices, total and local spectral moments, quantum chemical descriptors, topographic indices and E- state/biomolecular encounter parameters) reveals a good behavior of our method. The approach described in this paper appears to be a very promising structural invariant, useful for QSPR/QSAR studies and computer-aided "rational" drug design.


Introduction
The graph-theory approach appears to be an important alternative to computer-aided molecular design methods.They provide for the discovery of new lead drugs at minimum cost [1].The high cost of development of new bioactive molecular entities using traditional methods has led to the interest of the pharmaceutical industry in "rational" drug design assisted by computers.This is manifested by the gradually growing interest shown by these companies in quantitative studies of Structure-Activity/Property Relationships (QSAR/QSPR) directed to the rationalization of the search for new biologically active molecules.In this sense, rational combinatorial library design [2] and virtual screening [3] have emerged as important foci of attention in drug discovery research.
An important part of QSAR/QSPR research is the discovery of molecular descriptors applicable to physical, chemical and biological properties of interest.At present, there are a great number of molecular descriptors that can be used in QSAR/QSPR studies [4].The so-called topological indices (TIs) are among the most useful molecular descriptors known nowadays [5][6][7][8][9][10].TIs can be classified as "global" and "local", according to the way in which they characterize the molecular structure [11].However, most TIs known today can be considered as global molecular descriptors.One exception in this sense is the electrotopological state (E-state) index [12][13].Other "global" molecular descriptors, such as the spectral moments of the edge adjacency matrix, can be obtained in local form [11].The great success of the E-state and total and local spectral moments in QSPR/QSAR stimulated us to propose and validate here some novel local descriptors based on a topological characterization of the molecular structure.
In this sense, our research group has recently introduced the novel computer-aided molecular design scheme TOMOCOMD-CARDD (acronym of TOpological MOlecular COMputer Design-Computer Aided "Rational" Drug Design) [14][15][16].This method has been developed to generate molecular descriptors based on the linear algebra theory.This approach has been successfully employed in QSPR/QSAR studies [15][16][17][18], including studies related to nucleic acid-drug interactions [19].The approach describes changes in the electron distribution with time throughout the molecular backbone.The TOMOCOMD-CARDD strategy is very useful for the selection of novel subsystems of compounds having a desired property/activity, which can be further optimized by using some of the many molecular modeling methods at the disposition of the medicinal chemists.The method has also demonstrated flexibility in relation to many different problems.One of the applications involved the prediction of the anthelmintic activity of novel drugs [20].More recently, the TOMOCOMD-CARDD approach has been applied to the fast-track experimental discovery of novel antimalarial compounds [21].Codification of chirality and other 3D structural features constitutes another advantage of this method [22].The latter opportunity has allowed the description of the significance-interpretation and the comparison to other molecular descriptors [16,23].The features of the k th total and local linear and quadratic indices was illustrated by examples of various types of molecular structures, including chainlengthening, branching, heteroatoms-content, and multiple bonds.Besides, the linear independence of the quadratic and linear indices to others 0D, 1D, 2D, and 3D molecular descriptors is demonstrated by using principal component analysis for several heterogeneous molecules [16,23].
The main objective of the present paper was to test the QSPR/QSAR applicability of the TOMOCOMD-CARDD approach; and in a second place, to compare the results obtained with other cheminformatic methods in order to assess it.For this purpose, we will develop quantitative models to describe the boiling points of alkyl alcohols and the partition coefficient (log P), specific rate constant (log k) and antibacterial activity of 34 derivatives of 2-furylethylenes.

Theoretical Approach
The current approach is based on the calculation of the linear indices of the molecular pseudograph's atom (vertex) adjacency matrix.The general principle of this approach for small-tomedium size organic compounds has been explained in some detail elsewhere [16].However, in this paper we offer a global consideration of this approach.
First, the molecular vector (X) is built to calculate the linear indices of a molecule, where the components of this vector are numeric values that represent a certain atomic property.These properties characterize each atom type in the molecule.Some of these properties can be the electronegativity, density, atomic radii, among others.For example the Mulliken electronegativity (X A ) [24] of an atom A takes the values X H = 2.2 for Hydrogen, X C = 2.63 for Carbon, X N = 2.33 for Nitrogen, X O = 3.17 for Oxygen, X Cl = 3.0 for Chlorine, and so on.Therefore, a molecule having 5, 10, 15,..., n atoms can be represented by means of vectors, with 5, 10, 15,..., n components, belonging to the spaces ℜ 5 , ℜ 10 , ℜ 15 ,..., ℜ n , respectively, where n is the dimension of the real sets ( ℜ n ).This focus allows us to code molecules like acetic acid (suppressed H-atoms) through the molecular vector X = [X C , X C , X O , X O ] = [2.63,2.63, 3.17, 3.17], in the X A -electronegativity scale [24].This vector belongs to the product space of ℜ 4 .The use of other atomic properties defines other vectors.In this context, total (and local) linear indices include "bulk" properties and physicochemical properties (such hydrophobicity [25], molecular polar surface area [26], molar refractivity [27], molecular polarizability [28] and atomic charge summatory [29]), if some atomic physicochemical parameters (such as atomic Log P [25], surface contributions of polar atoms [26], atomic molar refractivity [27], atomic hybrid polarizabilities [28], and Gasteiger-Marsilli atomic charge [29], respectively) are consider as atom-property (atom-label) for build the n-dimensional molecular vector, X.

Local (Atom) Linear Indices of the "Molecular Pseudograph's Atom (Vertex) Adjacency Matrix"
If we have a molecule composed by n atoms (vector of ℜ n ), then the k th atom linear indices, f k (x i ), will be calculated as linear maps in in the canonical bases of this space as is shown in Eq. 1, where, k a ij = k a ji (symmetric square matrix), n is the number of atoms of the molecule, and X j are the coordinates of the molecular vector (X) in a set of basis vectors of ℜ n .One can choose the basis vectors; the coordinates of the same vector will be different [30][31][32][33].The values of the coordinates depend thus in an essential way on the choice of the basis.With the so-called canonical ('natural') basis, e j denotes the n-tuple having 1 in the j th position and 0's elsewhere.In the canonical basis, the coordinates of any vector X coincide with the components of this vector [30][31][32][33].For this reason, those coordinates can be considered as weights (atom labels) of the vertices of the molecular pseudograph [15][16][17][18][19][20][21][22][23].
The coefficients k a ij are the elements of the k th power of the matrix M(G) of the molecular pseudograph (G).The term pseudograph in chemical graph-theory was introduced by Frank Harary [34].According to him, a pseudograph is a graph with multiple edges or loops between the same vertices or the same vertex.Loop-multigraph [35] or general graphs [36] are other terms also used in this research area [37].
Here, M(G) = M = [a ij ], denotes the matrix of f k (x i ) with respect to the natural basis.In this matrix n is the number of vertices (atoms) of G and the elements a ij are defined as follows [15][16][17][18][19][20][21][22][23]: where E(G) represents the set of edges of G.In this adjacency matrix M(G) the row i and column i correspond to vertex v i from G. P ij is the number of edges between vertices v i and v j , and L ii is the number of loops in v i .
Given that a ij = P ij , the elements a ij of this matrix represent the number of bonds between an atom i and other j.The matrix M k provides the number of walks of length k that link the vertices v i and v j .For this reason, each edge in M 1 represents 2 electrons belonging to the covalent bond between atoms v i and v j ; e.g. the inputs of M 1 are equal to 1, 2, or 3 when single, double or triple bonds appear between vertices v i and v j , respectively.On the other hand, molecules containing aromatic rings with more than one canonical structure are represented by a pseudograph.This happens for substituted aromatic compounds such as pyridine, naphthalene, quinoline, and so on, where the presence of PI(π) electrons are accounted for by means of loops in each atom of the aromatic ring.Conversely, aromatic rings having only one canonical structure, such as furan, thiophene, and pyrrole are represented by a multigraph.
It should be noted that atom's linear indices are defined as a linear transformation f k (x i ) on an molecular vector space ℜ n .This map is a correspondence that assigns to every vector X in ℜ n a vector f(x) in such a way that: (3) for any scalar λ 1 ,λ 2 and any vector X 1 ,X 2 in ℜ n .The defining equation (1) for f k (x i ) may be written as the single matrix equation: ) ( (4) or in the more compact form, (5) where [X] is a column vector (a nx1 matrix) of the coordinates of X in the canonical basis of ℜ n and M k the k th power of the matrix M of the molecular pseudograph (map's matrix).
It should also be noted that this approach is rather similar to the LCAO-MO (Linear Combinations of Atomic Orbitals-Molecular Orbitals) method.Reality, the approach (for k = 1) is a quite similar approximation to the extended Hückel MO method, due to the formalism each MO ψ i is composed of n valence AOs of atoms in a molecule.
The main idea of the LCAO-MO method is that the electrons in a molecule are accommodated in definite MOs just as in an atom where they are accommodated in definite AOs.Normally MOs made up as LCAO of atoms composing the system, i.e. are written in the form, where i is the number of the MO ψ [in our case, f 1 (x i )]; j are the numbers of atomic ϕ-orbitals (in our case, X j ); and c ij (in our case, 1 a ij ) are the numerical coefficients defining the contributions of individuals AOs into the given MO.Such a way of constructing a MO is based on the assumption that an atom represented by a definite set of orbitals remains distinctive in the molecule.
Local and total linear indices of order 0-5 (k = 0-5) It is useful to perform a calculation on a molecule to illustrate the steps in the procedure.For this, we use the 2-aminobenzaldehyde molecule.Table 1 depicts the calculation of the linear indices of the molecular pseudograph's atom adjacency matrix for 2-aminobenzaldehyde.From Table 1, we extract the X-values (Mulliken electronegativity) [24] for each atom and the molecular vector X, for encoding whole-organic molecule, is obtained.Additionally, all valence-bond electrons (σ -and π -networks) in one step are revealed in M 1 matrix.Then, the local (and total) linear indices of first-order values, f 1 (x i ), for each atom are calculated.Nevertheless, the k th (k = 0-5) local and total values are shown at the bottom of Table 1.

Total (Whole-Molecule) Linear Indices of the "Molecular Pseudograph's Atom (Vertex) Adjacency Matrix"
Total linear indices are a linear functions [30][31][32][33] (some mathematicians use the term linear form, which means the same as linear functional) on ℜ n .That is, the k th total linear indices is a linear map from ℜ n to the scalar The mathematical definition of these molecular descriptors is the following: where n is the number of atoms and f k (x i ) are the atom's linear indices (linear maps) obtained by Eq. 1.Then, a linear form f k (x) can be written in the matrix form, (9) for each molecular vector X∈ ℜ n .[u] t is a n-dimensional unitary row vector.As can be seen, the k th total linear index is calculated by summing the local (atom) linear index of all atoms in the molecule.

Local (Atom-type) Linear Indices of the "Molecular Pseudograph's Atom (Vertex) Adjacency Matrix"
In addition to this, atom linear indices computed for each atom in the molecule, a local-fragment (atom-type) formalism can be developed.The k th atom-type linear index of the molecular pseudograph's atom adjacency matrix is calculated by summing the k th atom linear indices of all atoms of the same type in the molecule.
Consequently, if a molecule is partitioned in Z molecular fragments, the total linear indices can be partitioned in Z local linear indices f kL (x), L = 1, …, Z.That is to say, the total linear indices of order k can be expressed as the sum of the local linear indices of the Z fragments of the same order: In the atom-type linear indices formalism, each atom in the molecule is classified into an atom-type (fragment), such as heteroatoms, H-bonding acceptor heteroatoms (O, N and S), halogens, aliphatic carbon chain, aromatic atoms (aromatic rings), an so on.For all data sets, including those with a common molecular scaffold as well as those with very diverse structure, the k th fragment (atom-type) linear indices provide much useful information.

Data Set for QSPR/QSAR studies
In order to illustrate the possibilities of the total and local (atom and atom-type) linear indices in the QSPR/QSAR studies, we have selected the following two series to be investigated: 1) boiling point of 28 alkyl-alcohols (see Table 2) firstly studied by Kier and Hall using E-state/biomolecular encounter parameters [13] and recently by Estrada and Molina [11] using the local spectral moments of the edge adjacency matrix, and 2) a set of 34 2-furylethylene derivatives previously studied using total and local spectral moments, 2D/3D connectivity indices (vertex and edge ones) and to quantum chemical descriptors to model their partition coefficient (log P), specific rate constant (log k) and antibacterial activity.These chemicals have different substituents at position 5 of the furan ring as well as at the β position of the exocyclic double bond [38,39].The structures of these 34 furylethylene derivatives are given in Table 3.
The 2-furylethylene compounds have been well-known as antimicrobials, antitumoral, and cytotoxic during many years [40][41][42].The values of the log k (for nucleophilic addition of the mercaptoacetic acid) and n-octanol/water log P of these compounds have been experimentally determined and reported in the literature [38].Tables 4 and 5 depict theses values, respectively.The antibacterial activity of these compounds was determined as the inverse of the concentration C that produces 50% of growth inhibition in E. coli at six different times and reported as log (1/C) [38].This antibacterial activity was used to classify furylethylenes in two groups by Estrada and Molina [39].The group of active compounds is composed of those compounds having values of log (1/C) < 3, while the rest formed group of inactive compounds.Table 6 illustrates the classification of 2-furylethylene derivatives as antibacterial according to this experimental cutoff value.This Table also depicts the antibacterial activity of a series of nine new 2-furylethylenes using by Estrada and Molina [39] like external prediction (test) set.These compounds have a NO 2 group at position R 3 and a Br or I at positions R 1 and/or R 2 .All these compounds were shown to have antibacterial activity in different assays [42,43].The structures of these compounds are given at bottom of the Table 3.

Computational Methods: TOMOCOMD-CARDD Approach
TOMOCOMD is an interactive program for molecular design and bioinformatics research [14].It is a composite by four subprograms, each one of them allows one to draw the structures (drawing mode) and to calculate molecular 2D and 3D structures (calculation mode).The modules are named CARDD (Computed-Aided 'Rational' Drug Design), CAMPS (Computed-Aided Modeling in Protein Science), CANAR (Computed-Aided Nucleic Acid Research) and CABPD (Computed-Aided Bio-Polymers Docking).In this paper we outline the salient features of only one of these subprograms: CARDD.This subprogram was developed based on a user-friendly philosophy.
The calculation of total and local linear indices for any organic molecule was implemented in the TOMOCOMD-CARDD software [14].The main steps for the application of this method in QSAR/QSPR can be briefly resumed as follows: 1. Draw the molecular pseudograph for each molecule of the data set, using the software drawing mode.This procedure is performed by selection of the active atom symbol belonging to different groups of the periodic table, 2. Use appropriate weights in order to differentiate the molecular atoms.In this work, we used as atomic property the Mulliken electronegativity [24] for each kind of atom, 3. Compute the total and local linear indices of the molecular pseudograph's atom adjacency matrix.They can be carried out in the software calculation mode, where you can select the atomic properties and the family descriptor previously to calculate the molecular indices.This software generates a table in which the rows correspond to the compounds and columns correspond to the total and local linear indices or other family molecular descriptors implemented in this program, 4. Find a QSPR/QSAR equation by using mathematical techniques, such as multilinear regression analysis (MRA), Neural Networks (NN), Linear Discrimination Analysis (LDA), and so on.That is to say, we can find a quantitative relation between a property P and the linear indices having, for instance, the following appearance, where P is the measurement of the property, f k (x) is the k th total linear indices, and the a k 's are the coefficients obtained by the linear regression analysis.5. Test the robustness and predictive power of the QSPR/QSAR equation by using internal and external cross-validation techniques, 6. Develop a structural interpretation of obtained QSAR/QSPR model using total and local (atom and atom-type) linear indices as molecular descriptors.

Statistical Analysis
In describing Bp, log k, and log P the multiple linear regression analysis was used as statistical method.This experiment was performed with STATISTICA software package [44].The tolerance parameter (proportion of variance that is unique to the respective variable) used was the default value for minimum acceptable tolerance, which is 0.01.Forward stepwise was fixed as the strategy for variable selection.The principle of parsimony (Occam's razor) was taken into account as strategy for model selection.In this connection, we select the model with higher statistical signification but having as few parameters (a k ) as possible.The quality of the models was determined examining the regression's statistic parameters and of the cross-validation procedures [45,46].In this sense, the quality of models was determined by examining the regression coefficients (R), determination coefficients or squared regression coefficient (R 2 ), Fisher-ratio's p-level [p(F)], standard deviations of the regression (s) and the leave-one-out (LOO) press statistics (q 2 , s cv ).
On the other hand, linear discriminant analysis (LDA) was used to the classification of 34 2furylethylene derivatives as antibacterial.This statistical analysis was performed using also STATISTICA software [44].In order to test the quality of the discriminant function derived, we used the Wilks´ λ (U-statistic) and the Mahalanobis distance (D 2 ).The Wilks´ λ statistical helpful to value the total discrimination and can take values between 0 (perfect discrimination) and 1 (no discrimination).The D 2 indicates the separation of the respective groups.The statistical robustness and predictive power of the obtained model was assessed using an external prediction (test) set.In developing classification models the values of 1 and -1 were assigned to active and inactive compounds, respectively.To make the classification of compounds in both groups we preferred the use of the a posteriori probabilities instead of cutoff values.This is the probability that the respective case belongs to a particular group (active or inactive) and it is proportional to the Mahalanobis distance from that group centroid.In closing, the posterior probability is the probability, based on our knowledge of the values of others variables, that the respective case belongs to a particular group.An external test set of nine new compounds was used in order to assess the predictive ability of the obtained LDA model.

Describing boiling points of 28 alkyl alcohols
The first data set that will be studied here is composed by 28 alkyl alcohols (14 are primary, 6 secondary and 8 tertiary) for which the boiling point (Bp) has been reported previously [11].The best linear regression model obtained to describe the Bp of these compounds using total linear indices is given below: The values of experimental and calculated values of the Bp for the data set (both models) are given in Table 2 and the linear relationships between them are illustrated in Figures 1 and 2. These models (Eqs.12 and 13) explain more than 98% and 99% of the variance of the experimental Bp values, respectively.Similar equations were reported by Estrada and Molina [11] and Kier and Hall [13] using spectral moment and E-state/ biomolecular encounter parameters as molecular descriptors, respectively.These equations are given bellow with their statistical parameters: where, n is a number of carbon atoms in the molecule, µ k (C-O) are the k th spectral moment for C-O bond [11] and H_H 2 , H_O, S(-OH) are values of biomolecular encounter parameters and E-state, respectively [13].These models (Eqs.14 and 15) explain more than 98% and 92% of the variance of the experimental Bp values, respectively.
Predictability and stability of the obtained models using linear indices (Eqs.12 and 13) to data variation is carried out here by means of LOO cross-validation.These models showed a crossvalidation regression coefficient of 0.981 and 0.990 respectively.
Unfortunately, the authors (Estrada and Molina [11], and Kier and Hall [13]) do not report the result of the cross-validation.It is remarkable that one of our models (Eq 12) uses three variables less than the model obtained by Estrada and Molina [11] (Eq 14) and one variable less than the model obtained by Kier and Hall [13] (Eq 15).However, Eq. 12 explains a greater percent of the variance of the experimental Bp values than that the previously developed models do [11,13].

Modeling specific rate constants (log k) of 34 2-furylethylenes derivatives
Many topological descriptors are not useful to describe chemical reactions [11].In order to prove the applicability of this new approach in the QSR(Reactivity)R studies, we select a data set of 34 derivatives of 2-furylethylene.The molecular structures of such compounds are depicted in Table 3.These compounds were studied by Estrada y Molina [11] to describe the specific rate constant k of nucleophilic addition of the mercaptoacetic acid using their total and local spectral moments, connectivity indices and quantum chemical local descriptors.All developed models had seven variables.The model obtained by these authors using the connectivity indices describes an 82% of the experimental values of log k, with a standard deviation of 0.681.In addition, these researchers obtained similar results using the global spectral moments as molecular descriptors in QSRR equation (R 2 = 84% and s = 0.655) [11].The use of local molecular descriptors such as quantum chemical or graph-theoretical (local spectral moments) produces a significant improvement in the statistical quality of the obtained models.In this sense, both models (quantum chemical and local spectral moments) explain more than 96% (96.8% and 96.4%) of the variance of the log k, with a standard deviation of 0.288 and 0.320, respectively.
The molecular descriptors included in these equations clearly pointed to the identification of the reaction centers involved in the studied chemical interaction.That is to say, the molecular indices calculated for the atoms 2, 6 and 7 or for the bonds defined by these atoms (C 2 -C 6 and C 6 -C 7 ) were included in the obtained models.These atoms are those involved in the exocyclic double bond of the 2furylethylene and these are the "target" of the nucleophilic attack by thiol (mercapto) group.
Taking into account this logical result, we calculated the k th local linear indices for the atoms C 2 , C 6 and C 7 (bonds C 2 -C 6 and C 6 -C 7 ).The best obtained model, using these local linear indices as molecular descriptors, together with its statistical parameters is given below: N = 34 R = 0.986 R² = 0.973 s = 0.26 q 2 = 0.948 s cv = 0.33 F(6,27) = 161.22 Note, that our model (Eq.16) included only six variables (one less than the models object of comparison) and explained more than 97% of the variance (s = 0.26).These statistics are slightly better than those obtained previously.Table 4 depicts the experimental and calculated values of reactivity index (log k) from connectivity indices, total and local spectral moments, quantum chemical indices and local linear indices.Plots of observed versus calculated log k for data set of compounds are illustrated in Figure 3.

Modeling partition coefficients (log P) of 34 2-furylethylenes derivatives
It has been clear from structure-activity relationship studies that the lipophilicity of 2furylethylenes derivatives is critical for the development of their antibacterial activity [38].The partition coefficient n-octanol/water (log P) has an important role in the understanding of the biological behavior of these 2-furylethylene derivatives [38].Consequently, we will study this parameter to compare the possibilities of molecular linear indices in QSPR and to compare this result to those obtained by Estrada and Molina [39] using 2D and 3D (topographic) connectivity indices (vertex and edge ones), and quantum chemical descriptors.
The best obtained model, using total and local linear indices as molecular descriptors, together with its statistical parameters is given below: N = 34 R = 0.984 R² = 0.968 s = 0.143 q 2 = 0.938 s cv = 0.176 F (7,26) = 113.38This equation explained 96.8% of the variance of both log P.This statistic is lightly better than those obtained previously [39].The experimental and calculated values of log P obtained with 2D and 3D connectivity indices, quantum chemical descriptors, total and local spectral moments as well as molecular linear indices are show in Table 5. Plots of observed versus calculated log P according to the Eq. 17 are illustrated in Figure 4.
Finally, LOO cross-validation procedure was used in order to assess the predictive ability of developed model (17).Using this approach, the model 17 had a LOO q 2 of 0.938.This value of q 2 (q 2 > 0.5) can be considered as a proof of the high predictive ability of the models [44][45][46][47].In this sense, the equations obtained with the vertex and edge connectivity indices, with the topographic descriptors, and with the quantum chemical descriptors (Eqs.10, 11, and 13 in Ref. 39) showed a smaller predictive abilities (s cv of 0.247, 0.176, and 0.370, respectively) that the equation 17 (s cv = 0.176), achieved with the total and local linear indices.

Classification of 34 2-furylethylene derivatives as antibacterial
Linear discriminant analysis (LDA) will be used here to obtain a classification model of 2-furylethylene compounds according to their antibacterial activity.The classification model obtained is given below together with the statistical parameters of LDA: where, λ is Wilk's statistic, D 2 is the squares of Mahalanobis distances, and F is the Fisher ratio.The statistical analysis showed that exist appropriate discriminatory power for differentiating between the two respective groups.The calculation of percentages of good classification in the data set and external prediction set permitted us to carry out the assessment of the models.The statistical analysis of three models obtained previously using 2D and 3D connectivity and quantum chemical descriptors showed quite similar results.In this case, the overall accuracy of the three models was 91.2%, 94.1%, and 88.2%, respectively [39].
The classification of all compounds in the complete training data set provides some assessment of the goodness of fit of the model, but it does not provide a thorough criterion of how the model can predict the biological properties of new compounds.To assess such predictive power, the use of an external test set is essential [45][46][47].In this sense, the activity of the compounds in such set was predicted with the obtained discrimination function.
The overall accuracy for this group was 100.0%.Using this same external test set of nine new 2furylethylenes, the QSAR models obtained with 2D and 3D connectivity and quantum chemical descriptors have also 100.0% of global good classification, including one NC (not-classified) compound [39].The results of global classification of compounds in both, training and external prediction sets archived with all these approaches are shown in Table 6 (see also Table 7).Finally, the improvement in the statistical parameters of our model (Eq.18) compared to that using 2D and 3D connectivity indices as well as quantum chemical descriptors is easily detected by the decrease in the Wilk's λ parameter and an increase in the Mahalanobis square distance (see Table 7).

Concluding Remarks
Although there have been many discoveries in the last years in the field of theoretical drug-design it is necessary to continue developing new molecular descriptors that can explain, by means of QSAR studies, different pharmacological properties of these substances.In this sense, the definition of molecular descriptors based on graph-theoretical invariants that contain important information on atoms (or bonds) in an explicit way is not only possible but also necessary [11].In this context, total and local (atom and atom-type) linear indices of the molecular pseudograph's atom adjacency matrix are promising total and local-level molecular descriptors.
We have shown here that total and local linear indices are useful molecular descriptors for modeling physicochemical and biological properties of organic compounds.The obtained models were statistically significant and better than other obtained previously using recognized methods (see Table 7).Taking into consideration those total and local spectral moments, connectivity indices, quantum chemical descriptors and E-state, which have been successfully applied in the QSAR/QSPR studies and drug design, the satisfactory comparative result showed that linear indices used here will be a novel chem-bioinformatic tool for the computer aided "rational" drug design (TOMOCOMD-CARDD).

Figure 1 .
Figure 1.Correlation between experimental and calculated (by Eq. 12) Boiling point of 28 alcohols of the data set.

Figure 2 .
Figure 2. Correlation between experimental and calculated (by Eq. 13) Boiling point of 28 alcohols of the data set.

Figure 3 .
Figure 3. Observed versus predicted log k of the specific rate constant for the reaction of nucleophilic addition of thiols to the exocyclic double bond of the 2furylethylene derivatives.

Table 1 .
Definition and Calculation of Total (whole-molecule) and Local (Atom) Linear Indices of the Molecular Pseudograph's Atom Adjacency Matrix of the 2-Aminobenzaldehyde Molecule.

Table 2 .
Experimental and Predicted Values of the Boiling Point of Alcohols R-OH Used in This Study.
a Experimental values of Bp. b Predicted values using total linear indices (Eq.12).c Predicted values using total and local linear indices (Eq.13).d Predicted values using spectral moments (Eq.14).e Predicted values using E-state (Eq.15).

Table 3 .
Chemical structures and numbering of atoms in the 2-furylethylene compounds used in this study.

Table 4 .
Experimental and calculated values of the specific rate constant for the reaction of nucleophilic addition of thiols (log k) to the exocyclic double bond of the studied 2-furylethylenes

Table 5 .
Experimental and calculated values of the partition coefficient n-octanol/water (log P) for the furylethylenes studied.Linear correlations of observed versus calculated log P according to the model obtained from molecular linear indices.Model 18 classified correctly 94.12% of the compounds in the training data set (92.85% and 95.0% of good classification in active and inactive training data set, respectively), misclassifying only 2 compounds of a total of 34.The percentage of false actives as well as of the false inactive in this data set was only 2.94%.

Table 6 .
Classification of 2-furylethylene derivatives as antibacterial according to the four models obtained with molecular linear indices, 2D and 3D connectivity as well as quantum chemical descriptors.

Table 7 .
Statistical parameters of the QSPR/QSAR models obtained using different molecular descriptors.
*Values are not reported in the literature