Molecular Sciences Total and Local Quadratic Indices of the " Molecular Pseudograph's Atom Adjacency Matrix " . Application to Prediction of Caco-2 Permeability of Drugs

The high interest in the prediction of the intestinal absorption for New Chemical Entities (NCEs) is generated by the increasing rate in the synthesis of compounds by combinatorial chemistry and the extensive cost of the traditional evaluation methods. Quantitative Structure–Permeability Relationships (QSPerR) of the intestinal permeability across the Caco-2 cells monolayer (P Caco-2) could be obtained by the application of new molecular descriptors. In this sense, quadratic indices of the " molecular pseudograph's atom adjacency matrix " and multiple linear regression analysis were used to obtain good quantitative models to determine the P Caco-2. QSPerR models found are significant from a statistical point of view. The total and local quadratic indices were calculated with the TOMO-COMD software. A leave-one-out cross-validation procedure (internal validation) and the evaluation of external test set of 20 drugs (external validation) revealed that regression models had a good predictive power. A comparison with results derived from other theoretical studies shown a quite satisfactory behavior of the present method. The descriptors included in the prediction models permitted the interpretation in structural terms of the permeability process, evidencing the main role of H-bonding and size properties. The Int. 513 models found were used in virtual screening of drug intestinal permeability and a relationship between P Caco-2 calculated and percentage of human intestinal absorption for the 72 compounds was established. These results suggest that the proposed method is able to predict P Caco-2 , being a good tool for screening of P Caco-2 for large sets of NCEs synthesized via combinatorial chemistry approach.

Quantitative Structure-Permeability Relationships (QSPerR) of the intestinal permeability across the Caco-2 cells monolayer (P Caco-2 ) could be obtained by the application of new molecular descriptors.In this sense, quadratic indices of the "molecular pseudograph's atom adjacency matrix" and multiple linear regression analysis were used to obtain good quantitative models to determine the P Caco-2 .QSPerR models found are significant from a statistical point of view.The total and local quadratic indices were calculated with the TOMO-COMD software.A leave-one-out cross-validation procedure (internal validation) and the evaluation of external test set of 20 drugs (external validation) revealed that regression models had a good predictive power.A comparison with results derived from other theoretical studies shown a quite satisfactory behavior of the present method.The descriptors included in the prediction models permitted the interpretation in structural terms of the permeability process, evidencing the main role of H-bonding and size properties.The

Introduction
The oral administration is one of the most important routes due to its convenience, low cost and high patient compliance rates.The prediction of human oral drug absorption for new drug candidates is of considerable utility in the early stage of drug discovery process [1,2].As a rapid way to predict the human intestinal absorption during the high throughput screening (HTS) [3], many in vitro cell culture models, has been investigated as potential tool for drug absorption and metabolism studies [4][5].The most widely used in vitro model is a Caco-2 cell line.The permeability coefficient across Caco-2 cell monolayer (P caco-2 ) has been used to estimate oral absorption of New Chemical Entities (NCEs) [6][7][8][9].
Artursson and Karlsson have obtained a good correlation between human oral drug absorption and permeability coefficient, determined through the Caco-2 cell monolayer, which suggest that the human absorption can be predicted by this in vitro model [7].However, inter-laboratory differences of Caco-2 cell permeability have been demonstrated by several researchers [10,11].
The wide use of Caco-2 cell screening for oral absorption is based on the biological membrane properties expressed by these cells, such as: the brush borders at their apical surface and the expression of carrier-mediated transport systems and typical small intestinal enzymes [5][6][7]12].These properties permit the use of this cell culture for understanding the mechanism of cellular permeability and identify which of the drug's properties are responsible for cellular permeation.Nevertheless, this cell line presents several disadvantages including: a) the permeability of compounds that are transported via carrier-mediated absorption is lower than obtained in the human small intestine.Besides, the hydrophilic compounds with paracellular transport have a poor permeability [10], b) Globet, endocrine and M cells are not expressed in this cell lines, c) the cancer origin of this cell line produce and overexpression of P-glycoprotein with the consequently lower permeabilities in the absorptive direction [13], d) the lack of standardization in cell culture and experimental procedures and e) the long culture periods (21-24 day culture times), being the last one the major practical shortcoming of this approximation, with consequently extensive cost.
Several molecular interactions have been proposed to explain the oral absorption for a great diversity of substrates.The lipophilicity, among these interactions, is considered as the most significant driving forces for permeability through Caco-2 monolayer cell cultures [6,8,14,15] besides the role of hydrogen bonding capacity or the molecule net charge [2,4,8,14].Waterbeemd et al. have proposed a function, where these interactions are considered [14]: Permeability= f (lipophilicity, molecular size, H-bonding capacity, charge) (1) where, for each property there are limited ranges as have been established in the Rule-of-5 [1], but none of them are independent [16].
The significant failure rate of drug candidates, in late stage of drug development, suggest the use of good predictive tools able to eliminate inappropriate compounds before substantial time and money are invested in testing [26].
Several methods have been developed in order to explain the drug-membrane interactions and among them appear computational chemistry and QSAR/QSPR techniques such as: linear regression, [10,14,15,27,28] partial least square, [14] artificial neural networks [27] and no linear relationship [6,8,14].In some of these papers traditional QSPR analysis were applied to derive quantitative relationships between the P Caco-2 and molecular structures.Some kinds of molecular descriptors have been introduced, including size and hydrogen-bonding descriptors [14], polar surface area (PSA) [10,29,30], Molsurf-derived descriptors [31], MO-calculation [27] and membrane-interaction analysis [28].These QSPR models have predicted the Caco-2 cell permeability with a reasonable accuracy, although the number of compounds used in the data sets is limited.
Recently, several molecular descriptors based on the two-dimensional topological structure of molecules have been defined and tested in QSAR models [32][33][34][35][36][37][38][39][40][41][42][43][44][45][46].In this sense, two of the present authors have developed a novel method called TOMO-COMD (acronym of TOpological MOlecular COMputer Design) [47].It calculates several families of topologic molecular descriptors.One of these families has been defined as quadratic indices in analogy to the quadratic mathematical forms.Several works have been conducted with the use of these topological indices and they will be published elsewhere.
The purpose of this study was to develop a quantitative model that permits the prediction of Caco-2 cell permeability from the molecular structure using a combinatorial approach of quadratic indices and multiple linear regression method; in a second place, to compare the results obtained with other methodologies in order to assess it.Furtherly, to evaluate the relationships between the structures, expressed by the quadratic indices and the permeability coefficients of the data set split in anionic, neutral and cationic compounds.In addition, to corroborate the predictive power of the models found, using an external prediction set of 20 drugs and by a cross-validation procedure (leave-one-out) of the original data set.Finally, a virtual screening of drug intestinal permeability was carried out.

Molecular vector space
Each element of the periodic table has inherent atomic properties, such as electronegativity, density, atomic radii and so on.Each one of these properties numerically characterizes each kind of atom taking values in the real set ( ℜ ).For example, the Mulliken electronegativity (X A ) [48] of the atom A take the values X H = 2.2 for Hydrogen, X C = 2.63 for Carbon, X N = 2.33 for Nitrogen, X O = 3.17 for Oxygen, X Cl = 3.0 for Chlorine and so on.
Let be a molecular vector whose elements are the atomic properties of the atoms in the molecule, for instance X A .Thus, a molecule having 2, 3, 4,…, n atoms can be "represented" by means of vectors, with 2, 3, 4,...., n components, belonging to the spaces ℜ 2 , ℜ 3 , ℜ 4 ,..., ℜ n , respectively.Where n is the dimension of these real subsets ( ℜ n ).
This approach allows us to express compounds such as: benzene, cyclohexane, hexane and all the constitutional and geometric isomers of hexane through a general kind of vector X= (X C , X C , X C , X C , X C , X C ).On the other hand, n-propanol, iso-propanol, propanal, and acetone may be represented by (X C , X C , X C , X O ) or any permutation of the components of this vector.All these vectors belong to the products space ℜ 6 and ℜ 4 , respectively.It must be noted that the order of the vector components is meaningless here.This fact, not common in classical vector spaces, will be explained elsewhere.Besides, in this example were not considered the hydrogen atoms.
By taking into consideration all the universe of organic molecules, a molecular vector space (E) could be defined: where, i=1, 2, 3,…n; ℜ k ∩ ℜ l = {0}: k ≠ l and the dimension of E is the sum of the dimensions of each one of the ℜ i spaces.Therefore, this dimension is n(n+1)/2.
This space includes all the possible molecules having n atoms as vectors of the ℜ n spaces.The present mathematical formalism makes possible to represent any drug or organic molecule into a vector space and then, to use the well-known applications of this algebraic construction to codify molecular structure in a timely but mathematically rigorous way.

Total quadratic indices; [q k (x)]
If a molecule is consists of n atoms (vector of ℜ n ) then the k-th quadratic indices q k (x) are defined like q application (q: ℜ n → ℜ ) where X can be expressed by a linear combination with a base belonging to the vector sub-space ℜ n (X=x 1 a 1 +...+x n a n , where (a i ) 1≤i≤ n is a base of ℜ n ).Taking into consideration the conditions mentioned above q is a quadratic form if Eq. 3 is considered.
where, a ij = a ji and n is the number of atoms of the molecule.The coefficients k a ij are the elements a ij of the k-th power of the matrix M of the molecular pseudograph (G).Then, M (G) = M = [a ij ], where n is the number of vertices and the elements a ij are defined as follows: where, P ij is the number of edges that comply with e k ∼ v i ,v j among the vertices v i y v j .L ii is the number of loops in v i .
The elements a ij (if a ij = P ij ) of this matrix represents the bonds between an atom "i" and other "j." The matrix M k provides the number of paths of length k that links the vertices v i and v j .For this reason each edge represents 2 electrons of the covalent bond between 2 atoms v i and v j , and it is appreciated in the M (k=1) matrix input that v ij and v ji is equal to 1.In this way, the benzene molecules can be represented for two different multigraphs, where each multigraph is related with one of the Kekulé structures.Taken into consideration that mentioned above, it is necessary the use of a pseudograph to avoid this situation in compounds with more than one canonical structure.It happened for substituted aromatic compounds such as pirydine, naphthalene, quinoline, etc., where the electrons of PI(π)orbitals are represented as loops of all ring atoms.Aromatic rings with only one canonical structure, such as furan, thiophene, pyrrol etc. are represented like a multigraph.This explanation is represented, in an easy way, in Scheme 1 and in Table 1.As can be observed, for benzene molecule the total quadratic indices (without considering hydrogen atoms) calculated using the multigraph matrices (connectivity matrices) had the same values.However, single molecules like acetylsalicylic acid show differences in the total and local (heteroatoms and H-bonding heteroatoms) quadratic indices obtained from each multigraph (MKA and MKB).The representation numbers, like a multigraph, are higher when the number of rings with more than one canonical structure is increased.
Scheme 1. Graphical representation of benzene and acetylsalicylic acid using "multigraph (MA and MB) and pseudograph (P)".
Benzene q 0 (x) q 1 (x) q 2 (x) q 3 (x) q 4 (x) q 5 (x) q 6 (x) q 7 (x Acetylsalicylic acid Total q 0 (x) q 1 (x) q 2 (x) q 3 (x) q 4 (x) q 5 (x) q 6 (x) q 7 (x   On the other hand, we can obtain q k (x) by means of the matrix expression q k (x) =X t M k X (k≥10), being X the column vector of the coordinates in the base a i .In this case as we work with the canonical base, the coordinates of any vector X, coincide with the components of this vector.For that reason, such coordinates can be considered as weights of the vertices of the pseudograph, because the components of the vectors are values of some atomic property that characterizes each kind atom.In Table 2 the calculation of five quadratic indices for acetylsalicylic acid is exemplified.
As can be seen in the Eq. 3 the products appear between each other, for even pairs, of the different coordinates of X, which gives it a quadratic aspect.As k a ij = k a ji (the matrix is symmetric) and x i x j =x j x i , we can rewrite the q k (x) expression in the form:

Local approach (local invariant) of the quadratic indices; [q kL (x)]
In the case of the quadratic indices it is possible to define analogs to the total quadratic indices that possess similar properties and which are defined as local quadratic indices of the "molecular pseudograph's atom adjacency matrix".The definition of this descriptor (invariant theoretical-graph for a given fragment F i within a specific pseudograph G) is the following: Molecular Vector: X∈ ℜ 13 and ℜ 13 ∈E;

E: Molecular Vectorial Space
In the definition of the X, as molecular vector, the chemical symbol of the element is used to indicate the corresponding electronegativity value.That is: if we write O it means χ(O), oxygen Mulliken electronegativity or some atomic property, which characterizes each atom in the molecule.So, if we use the canonic bases of R 13 , the coordinates of any vector X coincide with the components of that molecular vector X t =[3.17 3.17 2.63 2.63 2.63 2.63 2.63 2.63 2.63 3.17 2.63 3.17 2.63] X t = transposed of X and it means the vector of the coordinates of X in the Canonical base of R 13 (a row Matrix) X: vector of coordinates of X in the Canonical base of R 13  (a columns matrix) where m is the number of atoms of the fragment of interest and k a ijL is the element of the file "i" and column "j" of the matrix M k L =M k (G, Fi) [q kL (x) = q k (x, Fi)].This matrix is extracted from the M k matrix and it contains the information referred to the vertices of the specific fragments (Fi) and also of the molecular environment.
The matrix M k L =[ k a ijL ] and the elements k a ijL is defined as follows: k a ijL = k a ij if both v i and v j are vertices contained in the specific fragment.
= 1 / 2 k a ij if either v i or v j is contained in the specific fragment but not both at the same time =0 otherwise being k a ij the elements of the k-th power of M.These local analogs can also be expressed in matrix form by the expression: As can be seen if a molecule is partitioned in Z molecular fragments, the matrix M k can be partitioned in Z local matrices M k L , L=1,... Z.The k-th power of matrix M is exactly the sum of the kth power of the local Z matrixes: or in the same way as M k =[ k a ij ] where: (10) and the total quadratic indices is the sum in the quadratic indices of the Z fragments: In any case, whether a complete series of indices is considered, a specific characterization of the chemical structure is obtained (whole structure or fragment), which is not repeated in any other molecule.The generalization of the matrices and descriptors to "superior analogs" is necessary for the evaluation of situations where only one descriptor is unable to bring a good structural characterization [49].These local indices can also be used together with total indices as variables of QSAR and QSPR models for properties or activities that depend more on a region or fragment than on the whole molecule.

The TOMO-COMD Software
The calculation of total and local quadratic indices for any organic molecule was implemented in the software TOMO-COMD [47].This software has a graphical interface that becomes it user friendly for medicinal chemists.The main steps to conducted for the application of this method to QSAR/QSPR can be briefly resumed as follows: 1. Draw the molecular pseudographs for each molecule of the data set, using the software drawing mode.This procedure is carried out by a selection of the active atom symbol belonging to different groups of the periodic table.The multiples edges and loops are edited with a right mouse click, 2. Use appropriated atom weights in order to differentiate the molecular atoms.In this work, we used as atomic property the electronegativity of Mulliken [48] for each kind atom, 3. Compute the total and local quadratic indices of the molecular pseudograph's atom adjacency matrix.They can be carried out in the software calculation mode, which you can select the atomic properties and the family descriptor previously to calculate the molecular indices.This software generate a table in which the rows correspond to the compounds and columns correspond to the total and local quadratic indices or any others family molecular descriptors implemented in this program, 4. Find a QSPR/QSAR equation by using statistical techniques, such as multilinear regression analysis (MRA), Neural networks, linear discrimination analysis, and so on.That is to say, we can find a quantitative relation between a property P and the quadratic indices having, for instance, the following appearance: where P is the measurement of the property, q k (x) [or q kL (x)] is the kth total [or local] quadratic indices, an the a k 's are the coefficients obtained by the linear regression analysis.The descriptors found in the whole models obtained were the following: (1) q k (x) and q k H (x) are the k-th total quadratic indices calculated using the k-th power of the matrices [M k (G)] of the molecular pseudograph (G) considering and not considering hydrogen atoms, respectively.
(2) E q kL (x) [or E q kL H (x)] and H q kk (x) are the k-th local quadratic indices calculated using a k-th power of the local matrices [M k L (G, Fi)] of the molecular pseudograph (G) not considering (or considering) hydrogen atoms for heteroatoms (S,N,O) and hydrogen bonding heteroatoms (S,N,O), respectively.

Caco-2 Cell Permeation Coefficients
The 17 structurally diverse compounds used in the present study were taken from the literature [14].
The experimental values of Log P Caco-2 (AP→BL) are illustrated in Table 3.The data set used for 'in silico' permeability studies included compounds with a diverse molecular weight and their net charge is variable at pH 7.4 [14].

Statistical Analysis
The statistical analyses were carried out with the software STATISTICA [50].The linear multiple regression analysis (LMR) was used to obtain quantitative models between structure and Caco-2 cell permeability coefficients.The quality of the models was determined examining the statistics parameter of multivariable comparison of the regression and the cross-validation (leave-one-out) procedures.In addition, to assess the predictive power of the model an external prediction set of 20 drugs was used [8].

Quantitative Structure Permeability Relationships
In order to test the applicability of quadratic indices on structure-permeability correlations and with the aim of predicting the Caco-2 cell permeability, 17 diverse structurally drugs were selected.Two quantitative models were developed.The values of Log P Caco-2 (AP→BL) were described by multivariate linear regression analysis using a stepwise procedure.The best models obtained together with its statistical parameters are given below: Log P caco-2 =-4.61426 (± 0.486)-0.00245(±0.301x10 -3 ) ..H q 5L (x) +0.004175 (± 1.618x10 -3 ) .q 0 H (x) where, R is the multiple regression coefficient, R CV is the regression coefficient for the leave-one-out cross-validation procedure, s the standard error of estimated, RMSE CV is the root of the mean square error of the cross-validation, F is the Fisher ratio at the 95% confidence level and p-value is the significance level.This regression models are significant at p-value < 0.001 using the F statistics.The p-value is the observed significance probability of obtaining a greater F value by chance alone if a model fits no better than the over-all response mean.
In the Table 3 are depicted the values of experimental and calculated permeability coefficients for data set (both models), and in Figure 1 and 2 are illustrated the linear relationships between them.In the development of the first quantitative model for description of Log P Caco-2 (AP→BL) (Eq.13), the acetylsalicylic acid was detected as statistical outliers.Outliers detection was carried out using the following standard statistical test: residual, standardized residuals, studentized residual and Cooks' distance [51].Once rejected the statistical outlier, the Eq. 14 was obtained with better statistical   parameters.The outlier in linear model (Eq.13; acetylsalicylic acid) has been detected as outliers in other work.In this sense, Waterbeemd et al. [14] developed relationship between permeability (in Caco-2) and several physicochemical properties such as lipophilicity, H-bonding capability, etc.
Among the compounds used in this study, the acetylsalicylic acid was detected as outlier with diverges from the curve obtained for two properties above mentioned.The quadratic indices included in the Eq.
13 have structural information of molecular features in relationship with lipophilicity and hydrogenbonding property.This can be explained the outlier behavior of the acetylsalicylic acid.Besides, outlier from linear relationship have been explained in terms of active transport, molecular size, diffusion limitation though aqueous stagnant layers at the membrane, or solubility of the drug that produced a sigmoid relationship [14].
The correlation coefficient (R 2 ), for equations 13 and 14 were 0.86 and 0.92 respectively, so these models explained the 86% and 92% of the variance for the experimental values of Log Caco-2 permeability [52,53].
Validation is a crucial aspect of any QSAR/QSPR modeling [54].One of the most popular validation criteria is leave-one-out cross-validated R 2 (LOO q 2 ; internal validation).For this reason, in order to assess the predictability of the model found, a LOO q 2 was carried out.This methodology systematically removed one data point at a time from the data set.A QSPerR model was then constructed on the basis of this reduced data set and subsequently used to predict the removed data point.This procedure was repeated until a complete set of predicted was obtained.Using this approach, the model 13 and 14 had a LOO q 2 of 0.73 and a 0.88, respectively.These values of q 2 (q 2 >0.5) can be considered as a proof of the high predictive ability of the models.However, this assumption is generally incorrect and can be that exist the lack of the correlation between the high LOO q 2 and the high predictive ability of QSAR/QSPR models has been established and corroborated recently [54].Thus, the high value of LOO q 2 appears to be the necessary but not the sufficient condition for the models to have a high predictive power.In this sense, Golbraikh and Tropsha [54] emphasize that the predictive ability of a QSAR/QSPR model can only be estimated using an external test set (external validation) of compounds that was not used for building the model and formulated a set of criteria for evaluation of predictive ability of QSAR/QSPR model.For this reason and as a second corroboration of the predictive power of the model, an external prediction set of 20 drugs was used.These compounds were also experimentally studied by Camenisch et al. [8].The Caco-2 experiment was designed based on the work of Artursson's group [7], and with the objective of combine your data with previously measured compounds in order to obtain a large data set.The comparison of permeability in Caco-2 monolayers in these two different laboratories using the same experimental conditions, only Mannitol, shown inter-laboratory variations.As in the original paper, the compounds used by Waterbeemd et al. were taken from the same literature [7], we has selected the Caco-2 cell permeability coefficient data set development by Camenisch et al. [8] as one way to validate the predictive power of our models.The permeability coefficients of the drugs included in the external test set were predicted with the same accuracy that compound in the data set, if taken into consideration that these compounds were study in a different laboratory.Considering the full set (data and test sets) the correlation coefficients were 0.80 (s=0.64) and 0.82 (s=0.56) for Eq. 13 and 14, respectively.As can be seen, in both series, the predictability and robustness of the theoretical model was demonstrated.From the full data set only 3 compounds were outliers for both equations.
Waterbeemd et al. obtained, for this data set, a correlation coefficient (R 2 ) of 0.77 (s = 0.52) where two components principal, as variables in linear regression models, were used [14].In this study, a principal component analysis, to visualize the relationships between the 26 descriptors, was developed.
The three principal components used in the regression analysis explained 86.9% of the variance.The first component (43.4%) contains information about the H-bonding potential and the second one (34.2%)encodes for molecular size.This correlation coincides with the current paradigm of structurepermeability expressed in Eq. 1. Besides, these authors using a representative set of molecular weight and various H-bonding descriptors obtained 12 models applying MLR and one equation using a PLS analysis.The QSPR models developed for 17 compounds had a correlation coefficient less than 0.89.
In our approach, if the statistical parameters are considered, the obtained model appears to be better than previously reported [14].
Other researchers have explored QSPerR involving Caco-2 cell permeability.For example, correlation coefficients of 0.74 and 0.76 have been obtained, considering quadratic and interactive terms [27].In the previous paper, these authors obtained a regression coefficient of 0.79 using neural network.In other published work, Ren and Lien [15] developed a QSAR analysis for a data set of 51 compounds, where an adequate regression coefficient value, was obtained (0.79).Finally, a recently study about prediction of Caco-2 cell permeation coefficients was carried out by Kulkarni et al. [28] where 6 predictive models were obtained using Multidimensional Linear Regression (MLR) and the R values were between 0.86 and 0.92, but in this case only 74% from the original data set [6] was selected.
In order to understand the individual contribution of several properties and thus their effect on permeation, the compounds were considered separately according their net charge.As can be seen in equation 1 the charge of molecules has a special effect on the drug permeability, which is in relation with the negative charge of the biological membrane [55].In the obtained models (Eq.13 and 14) although there is not a specific variable for heteroatoms, the charge effect on lipophilicity of the compounds is already taken into account by the use of included descriptors ( H q 5L (x) and q 0 H (x)).
However, when the whole data set was correlationed with the quadratic indices calculated ( E q 0L (x)), over heteroatoms (O, N, S), the correlation coefficient was 0.546 (data not shown), which indicate the influence of this indices to describe the charge effect over the permeability.
These models explained more than 82, 90 and 98% (R 2 =0.824, 0.904 and 0.986) of the variance in the experimental values of permeability coefficient for anionic, neutral and cationic compounds, respectively.

Interpretation of QSPerR Models
For a better statistical interpretation of the models built, where inter-related indices are considered (such as topological indices or topologic and topographic indices based on the same graph-theoretical invariant), the inclusion in the model of strongly interrelated variables should be avoided.It is necessary to consider the above-mentioned criterion because of the interrelation among different descriptors produce a highly unstable regression coefficients and makes difficult to know the real contribution of each variable included in the model [52].For this reason, an interrelation study between the quadratic indices used in the equation 13 was carried out.In the Table 5, the correlation matrix for this equation shows that there is not collinearity among these variables.In the same Table other useful parameters to detect the existence of multicollinear variables (partial correlation and tolerance) are depicted.In this sense, the tolerance represents the unexplained variability for the other variables and the partial correlation coefficient explain the correlation between the property and a specific variable when the linear effects of other independent variables have been eliminated.From the Eq. 14 to 17 the tolerance value is 1 and the partial correlation coincides with the correlation coefficient.
At present, it is known the absorption is influenced by a different kind of interactions.In the equation 1 the permeability is represented like function of several interaction properties.However, Waterbeemd et al. expressed that charge is included in lipophilicity when distribution coefficient (Log D) instead of partition coefficient (Log P) is used; also the molecular size and H-bonding are components of lipophilicity.Thus, these authors also wrote this relationship as: [14] Permeability = f (molecular size, H-bonding capacity) As can be observed in the regression models, the included variables are related with the factors that influence on the permeability values and these one with the structural features of molecules.For example, in the equations 13, the variables H q 5L (x) and q 0 H (x) are in relation with the hydrogen atoms as donors and with the molecular weight (size of molecules), respectively.Both variables are identified with the paradigm of structure-permeability relationship (Eq.1 and 18).Besides, this result coincides with the information contained in the two principal components using by Waterbeemd et al. (Eq.4,ref.14).The coefficient of the "protonic" variable in the equation 13 is negative, which is a logical result due to when the number of the hydrogen atom bonding to heteroatoms in the molecules is increased then the permeability across the biological membrane decrease.This effect is also related with the decreasing of molecules lipophilicity and the possibility of ionization and to obtain a charge.On the other hand, the q 0 H (x), with a positive contribution, is related with the possible effect of this variable on lipophilicity of compounds.That is to say, transcellular lipid permeation depends both on molecular size via lipophilicity and the diffusion coefficient through the membrane, while paracellular pore permeation depends on molecular size via the sieving effect and on diffusion in water.In the equation 14, only was included as variable the H-bonding capacity and in this case the model shown a better description that obtained in the Eq. 13.This result coincides with described by Waterbeemd et al. [16], due to the variable H q 5L (x) take into consideration not only the hydrogen atoms bond to heteroatoms but also the molecular environment, being this variable the better choice to describe the physicochemical space defined by the combination of molecular weight and H-bonding capacity.
The results obtained in the equations 15, 16 and 17 evidenced the role of total H-bonding capacity (in Eq. 15 as acceptor and in Eq. 16, 17 as a donor of hydrogen atoms).The negative contribution of the included variables may result in less membrane permeability.However, oral absorption will nevertheless be limited because of the high H-bonding capacity.

Virtual Screening and relationship of human intestinal absorption and Caco-2 cell permeability
The virtual screening has emerged as an interesting alternative to the screening of large database in order to find a set of potential new drug candidates [56][57][58].In the present study we simulated a virtual search of P Caco-2 values by using the regression equations (Eq.13 and 14) obtained.In Table 6, the Caco-2 cell permeability data for 72 structurally diverse compounds, obtained from different source (2, 7, 24-29) and the evaluation results of these compounds, are summarized.
As can be seen in the Table 6, existing significant variability in P Caco-2 experimental values obtained from two or more source.The differences in the permeability coefficients reported from various laboratories might be due to variations in cell culture conditions such as passage number, type of medium, day in culture, as well as the experimental conditions used for their measurement.Taken into consideration the inter-laboratories variability, most of 72 compounds evaluated are predicted adequately using the models obtained.It is obvious that from these results the quality of the predictions corroborates the predictive power of the models found and justified their use in the prediction of this important property.Besides, the 'in silico' estimated intestinal permeability could be used as a predictor of the true fraction of the drug absorbed (Fa) using the theoretical relationship described by Amidon et al. [59]: In these sense, the literature analysis demonstrates that the range selection for permeability coefficient in Caco-2 cells is a bottleneck whether a correlation with the human absorption is searched.
Several classifications methods have been described in the literature [6,[60][61][62], where the interlaboratory and experimental variability is considered.Nevertheless, if all the approaches reported in the literature are analyzed, we can state that a value of permeability coefficient greater than 10x10 -6   cm/s will classify well-absorption compounds (70-100%).A second group (P Caco-2 <10x10 -6 cm/s), with moderate-poor absorption can be considered, although in this range a high variability is appreciated when the human oral absorption is predicted from the P Caco-2 values.From a practical perspective the established boundary assure that classified compounds have a good absorption profile.
However, the general form, when the absorbed dose fraction, from human studies, is compared with the predictive P(AP→BL) Caco-2 cell, a good relation between the theoretical and observed values is obtained.The following Table 6 demonstrates this relation for the compounds used in the study.

Conclusions
The total and local quadratic indices appear to be a very promising structural invariant.Using these molecular indices and multiple regressions, two QSPerR models were obtained for the description and determination of AP→BL transportation across monolayer of intestinal epithelial (Caco-2) cell.The results derived from the comparison with other theoretical studies shown a quite satisfactory behavior of the proposed method.The statistical quality of the models was demonstrated by evaluation of the statistical parameter of regression and those obtained by the cross-validation procedure.Besides, a test set of 20 drugs also assessed the predictive power of these models.Furthermore, this approximation permits us to obtain significant interpretation of the experimental results in terms of the structural features of molecules.This molecular descriptor is suitable for screening and a priori determination, during the early drug discovery, of cellular permeability coefficient for large sets of new chemical entities synthesized via combinatorial chemistry approach.

Table 2 .
Definition and calculation of six (k=0-5) total quadratic indices of the "Molecular Pseudograph's Atom Adjacency Matrix of the molecule of Acetylsalicylic Acid.
11) Any local quadratic index has a particular meaning, especially for the first values of k, where the information about the structure of the fragment Fi is contained.High values of k are in relation with the environment information of the fragment Fi considered inside the molecular pseudograph (G).

5 .
Test the robustness and predictive power of the QSPR/QSAR equation by using internal and external cross-validation techniques, 6. Develop a structural interpretation of obtained QSAR/QSPR model using quadratic indices as molecular descriptors.

Figure 1 .
Figure 1.Correlation between experimental and calculated (by Eq. 13) permeability coefficients Log P Caco-2 of 17 compounds of the data set.

Figure 2 .
Figure 2. Correlation between experimental and calculated (by Eq. 14) permeability coefficients Log P Caco-2 of 16 compounds of the data set.

Table 3 .
Experimental and calculated values for Caco-2 cell permeability coefficients by equations 13 and 14.Residual, defined as Log P Caco-2 .(obsd)-LogP Caco-2 (calc).d Residual of the Cross-Validation.e Calculated with the equation 14. f Total quadratic indice of zero order, calculated considering atom hydrogen in the pseudograph.g Local quadratic indice of five order, calculated considering atom hydrogen in the pseudograph.

Table 4 .
Experimental and calculated values for the Log Caco-2 cell permeability coefficients of anionic, neutral and cationic compounds by equation 15, 16 and 17 respectively.

Table 5 .
The squared correlation matrix and several parameters of the quadratic indices used in the regression analysis (Eq.13) for 17 compounds.

Table 6 .
Caco-2 cell permeability coefficients calculated using Eq. 13 or 14; percent absorption in human and observed Caco-2 cells permeability coefficients from the different reports.