Molecular van der Waals Space and Topological Indices from the Distance Matrix

A comparative study of 36 molecular descriptors derived from the topological distance matrix and van der Waals space is carried out within this paper. They are partitioned into 16 generalized topological distance matrix indices, 11 topological distance indices known in the literature (seven obtained from eigenvalues/eigenvectors of distance matrix), and 9 van der Waals molecular descriptors. The generalized topological distance indices, (k)delta(lambda) (lambda = 1 - 3, k = 1 - 4), are introduced in this work on the basis of reciprocical distance matrix. Intercorrelation analysis reveals that topological distance indices mostly contain the same type of information, while van der Waals indices can be bound to the shape or the size of molecules. Furthermore, we found that topological distance indices are good for describing molecular size, and they may be viewed as bulk parameters. The most accurate QSPR models for predicting boiling point of alkanes are based on some of the generalized, eigenvalues/eigenvectors topological distance indices and the van der Waals descriptors of molecular size.


Introduction
The most important problem in QSPR and QSAR analysis is to convert chemical structure into mathematical molecular descriptors that are relevant to the physical, chemical or biological properties [1].Molecular structure is one of the basic concepts of chemistry, since properties and chemical and biological behaviors of molecules are determined by it.One can distinguish three levels for quantifying molecular structure: topological (based on atomic connectivity) [2], metric (bond length, valence and torsion angles) [3] and electronic (quantum-mechanical evaluation of detailed dynamics of electrons and nuclei) [4].Within many congener series of chemical compounds the variations of molecular geometry (as measured by van der Waals descriptors), and electronic structure are small [5,6].Consequently, one can consider that many of molecular properties are conditioned only by topology of molecules and quantify the structural information contained in their molecular graphs by means of so-called topological indices (TIs).These are numerical quantities based on various invariants or characteristics of molecular graph.Among them, more detailed topological information is provided by the topological distance matrix D, whose entries d ij represent topological distances between vertices i and j, that is the number of edges (bonds) along the shortest path between these vertices (atoms).Therefore, many TIs used in QSPR and QSAR studies have been developed on the basis of D.
From their definitions, one may admit many TIs derived from D may code two structural steric factors, namely the size and shape of the molecule [7].Although TIs do not have a precise physical meaning, they are measures for topological shape, i.e. the degree of branching or cyclicity and they correlate well with molecular volume or surface [1].However, extensive studies on this topic do not yet exist.
On the other hand, the idea that the molecular van der Waals (vdW) space is responsible for molecular properties affords an adequate reason for introducing vdW molecular descriptors (vdWMDs) with a clear physical meaning [3,5].They were frequently used as molecular descriptors by themselves [3,6,8] or as a starting point for deriving other parameters, e.g.lipophilicity/ hydrophilicity [9], surface tension parameters [10], Weighted Holistic Invariant Molecular (WHIM) descriptors [11] and so on.
In this paper we present our efforts to develop some topological distance indices (TDIs) [5,[12][13][14][15][16][17] and vdWMDs [3,5,6,18,19] and investigate if there exists a linear relationship between these two groups of structural parameters, situated at the first and second level of molecular structural information, respectively.One type of TDIs, the generalized (global) topological distance indices (GTDIs), denoted by k δ λ , λ=0,1,2,3 and k=1,2,3,4,…, is generalized here on the basis of reciprocal distances from a molecular graph Γ (reciprocal distance matrix [20]).The other type was developed with the aid of real number local vertex invariants (LOVIs) based on the graph eigenvalues [12][13][14].Eigenvectors corresponding to the largest negative eigenvalue of the distance matrix, D, can serve as LOVIs.Various TDIs have been obtained from these LOVIs by various operations (simple summation, or application of Randić-type formulas) [12].All TDIs presented here were tested in correlations against boiling points of alkanes, with satisfactory results for some of them, also reported in this work.It must be mentioned that Trinajstić et al. [21] compared five TDIs and five topographical (3D) distance indices in order to answer the questions as to what extent the distance indices are intercorrelated and how they perform in a given QSAR for the boiling points of the first 150 alkanes with 2-10 carbon atoms.
Among calculated vdWMDs, [5] we selected here as molecular structural descriptors only the vdW volume V W and surface S W , V W /S W , vdW volume of molecule considered as ellipsoids V WE , semi-axes of the ellipsoid (a,b,c) which embeds a given molecule (viewed as a collection of atomic spheres distributed in 3D-space, each atomic sphere having a radius equal with its vdW radius) and two globularity measures [3,5,6,12].
The results obtained by correlation analysis of all the above described molecular structural descriptors and a QSPR study of boiling temperatures of the first alkanes with 2-9 carbon atoms are also reported.They permit some insights about the physical meaning of the investigated TDIs.

Description of Selected Topological Distance Indices
The distance matrix D(Γ) = {d ij } of a graph Γ is an important graph-invariant.Its entries d ij , called distances, are equal to the number of edges connecting the vertices i and j on the shortest path between them.Thus all d ij are integers, and d ij =1 for nearest neighbors; by definition, d ii = 0. Therefore, the distance matrix D = D(Γ) of a labeled connected graph Γ is a real symmetric matrix NxN whose elements d ij are defined as [21,22]: where l ij is the topological length of the shortest path, i. e. the minimum number of edges between the vertices i and j in Γ.The length of the shortest path l ij is also called [22] the distance between the vertices i and j in Γ, hence the name "distance matrix" for D.
Many TDIs have been developed on the basis of D. We selected some of these for the present study, in which we analyze the relationship between TDIs and molecular vdW space.Among the TDIs that can be derived from D, the most popular investigated and applied is the Wiener number [23].Besides the Wiener number [24,25] we will briefly present the following TDIs used in our analysis: the polarity number [24][25][26], the Platt index [26], the Balaban J index [27,28], and TDIs based on graph eigenvalues and eigenvectors [12][13][14].We also generalize here the TDIs derived [5,[15][16][17] from reciprocal distance matrix [20,21], denoted by k δ λ .

(a) Wiener index
The Wiener index, W, [24,25] was defined as the sum of the number of bonds separating all pairs of atoms in an acyclic molecule.It is easily to shown that this index equal to the half-sum of the offdiagonal elements of D [29]: where N is the total number of vertices (atoms) in Γ.

(b) Polarity number
Wiener has also introduced the so-called polarity number, P. P is the number of pairs of vertices separated by three edges, that is half of the number of distances of length three: ( ) In relation ( 3) N represents the total number of vertices in Γ.
The ½ factor before the sums in (3) compensates for the fact that the three edges between the vertices i and j in Γ are accounted for two times (both ways).W and P have been applied to correlations with boiling points, heat of formation and vaporization and other physical properties of alkanes [24][25][26].

(c) Platt index
Platt (nearest-neighbor edges) index F is calculated by summing for each edge the number of its adjacent edges [26]:

(d) Balaban index
Balaban [27,28] has proposed a topological index, which can be described as the average distance sum connectivity.The Balaban topological index J of a molecular graph Γ is defined as [27]: where m is the number of edges in Γ, µ is the cyclomatic number, and the vertices i and j are adjacent.
The average distance sum k d for a vertex k in Γ represents the sum of all entries of the k th row or column in the distance matrix, D [27]: The cyclomatic number µ = µ(Γ), i.e. the number of cycles in Γ, is given by [28] µ = m -n +1 (7) where N is the number of vertices in Γ. Relation (7) is the known Euler equation connecting the number of vertices (N), edges (m) and cycles (µ) in a planar graph.Average distance sums were used in relation ( 5) instead of distance sums because distance sums increase approximately parallel with m for the same type of branching.The ciclomatic number µ, defined in (7), was introduced in the definition of J because the presence of cycles markedly reduces the distance sums [7].

(e) Graph eigenvalues or eigenvector -based indices
Lowest and highest eigenvalues and corresponding eigenvectors of matrices A and D have also been used as topological indices and local vertex invariants (LOVIs) [12][13][14]30].We present here only TDIs derived by us [12][13][14] from D of all alkanes with 2-9 carbon atoms.From the largest negative eigenvalues of D, denoted by E(D), and corresponding eigenvectors we introduced the following TDIs [13]: (10) where e i are the elements (LOVIs) of the first eigenvector derived from E(D) and N is the number of carbon atoms.

( ) ∑∑
Two kinds of normalizations against the number N of carbon atoms of the alkane were carried out.Each of these led to a type of TDIs, denoted below by VxDk, distinguished by the final number k = 2 or k = 3: where x = A, E. Up to eight carbon atoms no degeneracy was found in the TDIs values as estimated by relations (8)- (12).However, for nine carbon atoms, just one pair of isomers for VED-type indices was found to have degenerate values [13].
The VxDk (with x=A and E, and k=1,3) and VRD indices were calculated here for the first alkanes with 2-9 carbon atoms by means of our IRS [31] computer package.The values were compared with those obtained with the aid of DRAGON [32].The W, P, F, J, VADk and VEDk (k=1,3), VRD indices for 72 alkanes with N=1-9 carbon atoms are given in Table 1a.

Reciprocal Distance -Based Indices
Another graph-invariant is the reciprocal distance matrix RD = { } where N is the total number of graph vertices.This is a symmetrical matrix whose elements are reciprocal of the topological distance [5,16,17,20,33].The first TDIs proposed on the basis of RD have been developed by a two-steps process as follows [5,16,17].
(i) The LOVI of each vertex in a molecular graph Γ, denoted later by µ i , was defined from the RD using the following relation [5,16]: In relation (13) d ij is the topological distance between the vertices i and j, N represents the total number of vertices (i.e.non-hydrogen atoms) in Γ, and summation is made over all possible paths, from d ij = 1 to d ij = max(d ij ).Thus, each vertex is well characterized; it contains global information of the topological structure of Γ, the topological interaction between vertices i and j decreasing as distance d ij is increasing.That is, for each vertex i, the quantity µ i may be viewed as a measure if the influence of all others vertices in a given graph Γ on the vertex i.
(ii) The LOVIs µ i were condensed into a TDI, h δ, with the aid of the Randić-type formula [34], the generalized molecular connectivity [35], as follows [5,16]: These topological distances connectivity indices (TDCIs) [5,16], also called topological distance measure connectivity indices (TDMCIs) [17], of order higher than three, have not been used in correlation due to the expected small contributions to the molecular properties.
TDCIs of order one ( 1 δ), two ( 2 δ) and three ( 3 δ) have been calculated by the following relations [5,16]: Monoparametric correlations with molecular properties such as boiling temperatures (at normal pressure), gas chromatographic retention indices, atomization enthalpies, and molar refractions for alkanes were performed.The reported results for 2 δ are very good, the correlation coefficients r being in the range 0.983 -0.991 [5,16].
In this paper we extend TDCIs by generalization of relation (13) as follows: Thus, we obtain a set of generalized topological distances indices (GTDIs), k δ λ , where k is the same as in relation (18), which can be calculated with the following formulas: ,...

van der Waals Molecular Descriptors
No general theory of the quantitative relationship between molecular structure and molecular properties in organic chemistry (QSPR) or biological activities in medicinal chemistry (QSAR) can reasonably be regarded as satisfactory unless it provides a sound basis for predicting and interpreting linear relationships among molecular quantities.
A satisfactory theoretical model for linear correlations in organic and/or medicinal chemistry should allow reliable predictions to be made as easily as possible concerning both the circumstances in which correlations should occur (e.g., between which properties and for which compounds) and the magnitudes of the regression coefficients.
The concepts used in the model -for example, analysis into electronic (polar and resonance), hydrophobic, and steric effects -should be defined in such a way that knowledge gained through the interpretation of the linear correlations can be readily used in other areas or organic or medicinal chemistry (e.g., in elucidating the reaction mechanisms or receptor-drug molecule interaction).
Therefore, the design of molecular descriptors with very clear physical meaning is a very important task in this area of research.Analysis of the informational content of TDISs and their possible steric nature [7] as described by vdW molecular descriptors are also presented in this paper.To do this we used a set of van der Waals descriptors [3,5,6], such as the vdW molecular volume (V W ) [19,[36][37][38] and surface (S W ) [18], and other descriptors of shape and size of alkane molecules, e.g. the volume of the ellipsoid which embeds the whole molecule extended along the Ox axes of Cartesian system of coordinates, V EL , semi-axes of this ellipsoid [3,5,6], E X , E Y , E Z , along Ox, Oy, and Oz axes, respectively, two measures of globularity [12], denoted by G LOB , G LEL and a measure of molecular packing, R WV .

(a) Molecular van der Waals Volume
The concept of molecular volume and surface area have found many applications, not only in QSAR, but also in the studies of molecular interactions, especially in relating the bulk properties of substances like crystal packing with molecular structures [39].The molecular volume is a measure of the space around atomic nuclei filled by electrons [40,41] and is defined geometrically as the combined volume of overlapping spheres centred on the nuclei, similar in shape to a space-filling molecular model.The van der Waals radii are used for the radii of the atomic spheres.The molecular surface area is the area of the surface that wraps the molecular volume.Exact calculation of the molecular volume and surface area is, however, a formidable task due to multiple overlap of spheres of different radii.
A molecular van der Waals envelope, ζ, can be defined in the "hard-spheres" approximation as the external surface resulted from the intersection of all vdW spheres corresponding to the atoms of molecule Μ.The points (x,y,z) inside the envelope satisfy at least one of the following inequalities: where N represents the number of atoms of Μ.Consequently, the total volume embedded by the envelope is the molecular vdW volume ( W M V ) of the molecule M.
Molecules 2004, 9 1062 Table 1b Generalized Topological Distance Indices for the First 72 Alkanes The following integral: can be intuitively justified as a volume [3,5,19].This assumption is natural because the properties of molecular vdW space can be considered independent from the nature of the atoms, even in the case when domains of the vdW atomic spheres intersect.
To estimate the integral (24), the molecule ( 23) is inserted into a bounding parallelepiped with the volume V p .The random points are generated into the parallelepiped, which includes the domain M. If n t is the total number of generated points and n s the number of points that which satisfy the inequalities in (23) than the van der Waals volume is: In order to avoid multiple computation of the same volume, resulted from multiple atom spheres overlapping, we applied a Monte Carlo technique [42].The accuracy ε of the estimate (25) for a given maximum probability, δ, is inversely proportional to the square root of the number of trials, or Taking into consideration the precision and the accuracy of chemical and biological experiments, for ε = 0.05 and δ = 0.01, the number of necessary points is N = 10,000.This makes the Monte Carlo method not difficult to apply, due to the performances of nowadays computers.In order to increase the accuracy of the method the calculus must be repeated at least 10 to 20 times for each calculated volume.The final result, i.e. the mean value of these computed volumes, is validated by statistical methods.

(b) Molecular van der Waals Surface
The van der Waals volume of the envelope, ζ, defined in the previous paragraph, can be a measure of the molecules' size.Obviously, this envelope is a surface, and there were methods developed to compute the area of this surface [5,18,42,43].Some of them are based on a Monte Carlo method [5,18], others on an analytical algorithm [44].The computed surfaces were especially used to characterize the shape and the similarity of the molecules, their graphical representation [44,45], and so on.A Monte Carlo algorithm [5,18] implies the generation of an uniform grid on each sphere of the molecule, followed by the detection of the number of points generated on the surface (n t ) and of those (n e ) that do not satisfy the inequalities in (23).For every "hard sphere" i, one computes the outer part of each sphere's surface, The final surface is computed as a sum of exterior surface of each sphere, W i S : See refs.[5,18] for details about how to generate a uniform grid.

(c) Synthetic van der Waals descriptors of molecular shape
The shape of molecules is doubtlessly the main element of most chemical interactions.Quantitative treatment of molecular shape, that is the development of appropriate molecular descriptors able to synthesize the characteristics of 3D extension of molecules, is a very difficult problem.Most procedures are based either on comparing molecules with a reference structure, or on dividing them and defining the sectors by means of Euclidean distance between certain atoms or with the aid of Cartesian coordinates of those sectors.
Using a hard-spheres model, we developed a series of van der Waals indicators of the molecular shape.This model allowed the introduction of several synthetic descriptors of molecular shape, which are presented as follows.
A first set of indicators was developed starting from the fact that a molecule can be characterized by the surface of molecular envelope described by equations (23) (with the sign "=").
The equation ( 29 By transformations of coordinates (translation), equation ( 29) is simplified and reduced to one of the fifteen equations composed of four terms [46].For obvious physical reasons related to spatial extension of substituents, we neglected both singular quadrics and the equations that do not have real solutions -and, therefore, do not represent geometrical figures.From the five non-singular surfaces of 2 nd degree which remain and represent geometrical figures (ellipsoid, ellipsoidal and hyperbolic paraboloid, and one-sheet and two-sheet hyperboloid), only the ellipsoid fulfils the physical conditions so that by assimilating the molecule with this geometrical figure the physical meaning of the calculated parameters is maintained.
It is known that the relationship: represents an ellipsoid, namely a spheroid (or conoid).If E X < E Y = E Z equation ( 30) represents a prolate ellipsoid.If E X = E Y > E Z the relationship (30) represent an oblate ellipsoid of revolution, and if E X = E Y = E Z we have a sphere.The molecules are oriented along the Ox axes of the Cartesian coordinate system and the volume of the ellipsoid (30) and its vdW centre are estimated by a Monte Carlo algorithm implemented in the IRS computer program [31].Then, the semi-axes of the ellipsoid are calculated.
Starting from the concept of packing density and from the fact that the experimental determination of the cross-section area of a molecule [47] is performed by assimilating it to a sphere, and assuming a maximal packing of molecule spheres, one may consider as a quantitative measure of the steric characteristics of molecules the descriptor R WV , defined as follows: where V W and S W are the calculated vdW volume and surface, respectively (see above the corresponding sections)

(d) Globularity measures
With the help of the molecular vdW descriptors computed, two other parameters can be defined.The first is defined only for acyclic molecules, named globularity measure (G LOB ), and is given by relation [5,12]: where R WV is defined by relation (31) and R s represents the ratio between the volume and the surface of an equivalent sphere, which surrounds the molecule, with the radius equal to the half of the longest dimension of the parallelepiped that embeds the molecule.The above relation cannot be used for cyclic molecules, because the volume of the equivalent sphere includes the internal empty space, which is not included in the van der Waals volume.The second one is defined by the following equation: where V EL is the volume of the ellipsoid surrounding the whole molecule, and V S is the volume of a sphere with a radius equal to half of the longest ellipsoid axe.This parameter should be more useful for characterizing globularity because it includes the volume of all holes which may appear.These two parameters can be used to describe the shape of acyclic molecules.The globularity measure decreases with the growth of the linear chains and increases toward unity when the molecule is highly branched or compacted.

Intercorrelation of Topological Distance Indices and van Der Waals Molecular Descriptors
In this section we analyze the extent to which the molecular descriptors presented in this paper are linearly intercorrelated.The correlation analysis was performed on all TDIs and vdWMDs considered in this report for a set of 72 alkanes of up to 9 carbon atoms.For this purpose alkanes are convenient systems because they represent structurally rather simple chemical structures, and skeletal branching is their only complicated structural feature [21].In this way we can establish to what extent the molecular descriptors from Tables 1a and 1 are orthogonal.This orthogonality is absolutely necessary for molecular descriptors in QSPR relations because it avoids the artificial strengthening of correlations.It also assures that a quantity of information is independent of the parameters of the obtained linear model, thus very useful for physical interpretations of the model.If, on the other hand, the MDs are not orthogonal, it is possible that they predominantly express the same type of structural information, with differences residing in the scaling factors.
We have investigated the linear relationship between the pairs of molecular descriptors presented here, MD a and MD b , where MD a and MD b are TDIs, GTDIs and vdwMDs.
The correlation coefficient, r, is a measure of linear relationship described in relation (34).If r = 0 no linear relationship exists between MD a and MD b .If r = 1, there is a direct linear relationship, and if r = -1 , there is an inverse linear relationship between MD a and MD b .The correlation coefficient r ≥ 0.900 was proposed as the criterion for the intercorrelated pairs of molecular descriptors [48].Strongly intercorrelated pairs of the molecular descriptors are those with r ≥ 0.980.
The results of the correlation analysis are displayed as the intercorrelation matrices with the correlation coefficient r.In Table 3a-d we give the intercorrelation matrices reflecting pairwise linear correlation for all molecular descriptors from Table 1 and Table 2: 11 selected TDIs, 16 GTDIs extended here from the reciprocical distance matrix, and 9 vdWMDs.The Tables 3a, b and c contain the intercorrelation matrices corresponding to TDIs, GTDIs and vdWMDs, respectively.Since the matrices are symmetric, we give only the upper triangle.In Table 3d we report the intercorrelation matrix of TDIs and GTDIs.From Table 3 we learn several interesting points: 1) The intercorrelation matrix of the selected topological indices presented in Table 3a reveals that these indices are not strongly intercorrelated, that is their information content about topological structure of the 72 alkanes from table 1 is somewhat independent.Only the indices derived from eigenvalues and eigenvectors are better intercorrelated.The TDIs belonging to this group are very poorly correlated with Balaban's J-index.Besides, J is also independent when compared to W, and very weakly linked to P. On the other hand it seems to correlate very well with F (r = 0.933).From this point of view it is necessary to avoid the simultaneous use of these indices for studying physical properties in QSPR relations.
Taking as criterion for strong correlations r ≥ 0.980 one notices that there exists a strong correlation inside each class λ, which slightly decreases along with the increase in k.This fact is entirely explainable, if we take into consideration the way in which LOVIs are constructed; the more the dimensionality of the space is increased, the interaction between atoms that are separated by the same topological distance decreases, and the influence gets smaller as the distance and the dimensionality of the space get larger.The degree of correlation between indices k δ λ of different classes are generally smaller, except those corresponding to λ = 1 and λ = 2, which are greater than r = 0.960.

Table 3b
Intercorrelation Matrix of Generalized Topological Distance Indices for Alkanes with up to 9 Carbon Atoms   3) Van der Waals molecular descriptors, vdWMDs, are much more independent relative to each other than the GTDIs and TDIs.A strong correlation was observed only between the volume (V W ) and the corresponding vdW surface (S W ) of the 72 alkanes having 2 -9 carbon atoms (r = 0.994).This significant correlation was obtained between the vdW volume and surface, but also between them and the molecular vdW volume of alkanes treated as molecules with a more or less ellipsoidal shape.The shift of alkanes to an extended, intercalated conformation greatly influences the volume of the ellipsoid and progressively smaller the vdW surface area and the vdW volume.On the other hand, conformational variations on orthogonal directions are affecting these descriptors on a much smaller measure.Our intercorrelation results suggest the possibility of simultaneously using these indices in QSAR and QSPR relations for global testing of vdW space occupied by molecules (space-filling), along with bulk steric parameters (V W , S W , V EL , G LOB , G LEL , R WV ), or certain directions within them (E X , E Y , E Z ).The simple and fast calculus for any molecular structure and the possibility of immediately testing the degree of orthogonality ensures their large applicability for any series of compounds.4) Generalized topological distance indices derived from the reciprocical distance matrix, GTDIs, present significant correlations with topological indices derived from eigenvalues and eigenvectors of the distance matrix, D. Repeatedly, the strongest are those between k δ λ (k = 1,2,3,4, λ = 0,1) and VRD.In this case a more rigorous statistical analysis is imposed on the relation between distance indices, k δ λ , and the VADi and VEDi parameters, i = 1,2,3.The intercorrelation between GTDIs and the first indices defined on the distance matrix is decreasing in the following order: W, F, P. Although, generally speaking, the Wiener index, W, correlates well with GTDIs, there are two surprising exceptions for indices 1 δ 3 and 4 δ 3 .
Investigating the physical meaning of GTDIs could emerge interesting information on other topological indices.The work is in progress.
5) Are the topological indices steric measures of molecular van der Waals space?Although some reported that they correlate well with molecular volume [7] or surface area, extensive studies on this subject have not yet been performed.In Table 4 we present the intercorrelation matrix of molecular vdW descriptors and of topological indices described in this work.The best results were obtained for the correlations with the van der Waals molecular volume (V W ) and surface (S W ) against Wiener indices (W), derived from eigenvectors and of the distance matrix, and GTDIs, k δ λ , except for indices with λ = 2 and k = 1 (r = 0.886), and λ = 3 and k = 1 (r = 0.869).Except for P, F and J indices, the others should be viewed as bulk steric parameters, as measured by vdW volume and surface of tested alkanes.The steric component of most topological indices is poorly explained by vdW volumes of ellipsoidassimilated alkanes (revolving around r = 0.900).Weak correlations were also obtained for P, F and J.The results suggest the impossibility of testing the vector nature of steric effects by means of topological which is rather important for modeling biological interactions.This is a possible explanation for the lesser utility of topological indices for QSAR studies.It can be seen from Table 5 that the correlation coefficients are satisfactory for the majority of generalized topological distance indices (except 1 δ 3 ) and eigenvalues and eigenvectors based indices VxDn (x = A, E; n = 1 -3), and unsatisfactory for P, F and especially J topological distance indices and van der Waals molecular descriptors that measure globularity (G LOB , G LEL and R WV ) and various directions in molecular van der Waals space of alkanes (E X , E Y and E Z ).The best results are obtained for VED1, VED3, 1 δ 1 , 2 δ 1 , 3 δ 1 , V W and S W (r > 0.980).These topological indices contain in a great measure a bulky component and there is a strong relation between them and the whole space of alkane molecules.Van der Waals volume and surface seem to be essential for explaining the structural variation of the boiling points of alkanes.This is easy to explain if we consider the nature of the physical interactions which appear between molecules in the liquid phase and in the gas phase.The r values for E X , E Y , G LOB and R WV are lower than those obtained for V W , S W and V EL .The correlation coefficient for P and F topological indices are also fairly low.Poor values for r are especially observed for G LEL and E Z , although there is no strong linear relation between them (the coefficient of intercorrelation is r = 0.761 -see Table 3c).This fact demonstrates that these indices contain little (E X , E Y , G LOB and R WV ) or no information (J, E Z , and G LEL ) about the size of alkane molecules.
The most accurate models are (42), ( 43), ( 45), ( 51) -( 53), ( 63) and (64), where r > 0.980, F are in the intervals 2000 -3366 (for VED1) and standard deviations vary from 6.74 to 8.66, that is they are less than 3.6% from the whole domain of boiling points.The correlation equations above explain more than 96% from the variance of the experimentally measured boiling points.
The topological distance indices W, P, F and J, and also the van der Waals molecular descriptors (E X , E Y ), G LOB and R WV are less successful in modeling boiling points than the generalized and eigenvalues/eigenvectors distance indices.The worst correlation was obtained with G LEL , probably because this globularity vdW descriptor, which contains information about the shape of molecules, is normalized; its value tends towards 1 when the shape of the molecule gets closer to a sphere [49].
The results here obtained suggest that the shape of the molecules seems to be less important than the size for structure-based modeling of boiling points of alkanes.Obviously, the shape is also a more abstract concept than size, thus it is also more difficult to estimate it quantitatively (through a single number) than size.

Concluding remarks
In this work we carried out a comparative study of 36 molecular descriptors derived from the distance matrix and molecular van der Waals space.This study belongs to our interest to develop molecular descriptors with clear physical meaning.Among the studied indices were 9 van der Waals descriptors of molecular size and shape and 27 topological distance indices.We also introduced here a generalized formula for deriving topological indices from the reciprocical distance matrix.
We analyzed the correlation among some classical distance matrices (W, P, F and J), distance indices derived by us from eigenvalues and eigenvectors of distance matrix and generalized distance indices, k δ λ (k = 1 -4, λ = 1 -3), and their relation with a series of van der Waals molecular descriptors of molecular size (volume and surface area) and shape (globularity measures and ellipsoidal characteristics).The analysis of the between the topological molecular space and the molecular van der Waals space allowed some insights on the physical meaning of topological indices.The obtained results suggest the possibility of considering topological distance indices as descriptors of the molecule's size.They can be regarded as bulk parameters.In the series under study the informational content is progressively decreasing in the following approximate order: VRD, k δ λ , VxDn, W, F, P, J.
The correlation analysis on the first 72 alkanes revealed that many counterparts are characterized by their topological distance indices.The meaning of this result is that the topological distance indices contain similar structural information about the molecular graph.On the contrary, the intercorrelation analysis of van der Waals molecular descriptors shows that the size descriptors are weakly linked to shape descriptors.In other words, although there is some connection between the molecular shape and its size, this connection is not strong.These results lead to the conclusion that while topological distance indices contain similar information about the molecular size, they contain less information about its shape.A comparison of performance between the 36 distance indices in structure -boiling point correlations for 74 alkanes of up to 9 carbon atoms showed that the most accurate QSPR models were obtained with VxDn (x = A, n = 3; x = A, n = 1,3) and k δ λ (λ = 1, k = 1,2,3) topological indices and van der Waals descriptor of molecular size, i.e. volume (V W ) and surface (S W ).
Studies concerning the physical meaning of structural descriptors are therefore extremely useful.They allow the distillation of the informational content of such descriptors, preventing their misuse in QSPR studies.

Table 1a .
Topological Distance Indices and Boiling Points of the First 72 Alkanes

Table 2
Vander Waals Molecular Descriptors of the First 72 Alkanes

Table 3a
Intercorrelation Matrix of Topological Distance Indices for Alkanes with up to 9 Carbon Atoms

Table 3c
Intercorrelation Matrix of van der Waals Molecular Descriptors for Alkanes with up to 9 Carbon Atoms

Table 3d
Intercorrelation Matrix of Generalized Topological Distance Indices (GTDIs) against Topological Distance Indices (TDIs) for Alkanes with up to 9 Carbon Atoms