www.mdpi.org/ijms/ Maximum Topological Distances Based Indices as Molecular Descriptors for QSPR. 4. Modeling the Enthalpy of Formation of Hydrocarbons from Elements

The enthalpy of formation of a set of 60 hydroarbons is calculated on the basis of topological descriptors defined from the distance and detour matrices within the realm of the QSAR/QSPR theory. Linear and non-linear polynomials fittings are made and results show the need to resort to higher-order regression equations in order to get better concordances between theoretical results and experimental available data. Besides, topological indices computed from maximum order distances seems to yield rather satisfactory predictions of heats of formation for hydrocarbons.


Introduction
Graphs have found considerable employment in several chemistry fields, particularly in modeling molecular structure [1][2][3][4][5][6][7][8][9][10].The applications of graphs to the study of structure-property relationships implies the representation of molecules by selected molecular descriptors, often referred to as topological indices [11,12].These topological indices, which often have a direct structural interpretation, are defined in terms of selected structural parts and hopefully should help one in building molecular models for structure-property relationships.Among hundreds of possible descriptors a few have arisen again and again as the most useful for characterization of molecules [13][14][15][16] The graph theoretical characterization of molecular structure is realized by means of various matrices, polynomials, spectra, spectral moments, sequences counting distances, paths and walks.The molecular matrices represent an important source of structural descriptors.Usually, a small number of matrices is used to characterize the molecular topology, namely the adjacency, the distance and sometimes, the Laplacian matrix.Novel matrices were developed in recent years, encoding in various ways the topological information [17].However, distance matrices remain being the most relevant ones within the realm of the Quantitative Structure Activity (Property) Relationships (QSAR/QSPR) theory.
Any matrix, the elements of which satisfy the axioms of distance, can be referred to as a distance matrix D ≡ {D ij } [18].The axioms of distance require that a) Distance is a positive quantity, D ij ≥ 0, assigned to a pair of elements (points in an ndimensional vector space).b) D ii = 0 ∀i = 1, 2, ......N; N = number of elements c) Distance does not depend on the direction of measurement, i.e.D ij = D ji d) Distance satisfies the triangular inequality, i.e.D ij ≤ D ik + D kj Two distance matrices are particularly important, both of them based on the topological distance for vertices within a graph: the distance matrix and the detour matrix.The detour matrix, together with the distance matrix, was introduced into the mathematical literature in 1969 by Frank Harary [19].Both matrices were also briefly discussed in 1990 by Buckley and Harary [20].The detour matrix was introduced into the chemical literature in 1994 under the name "the maximum path matrix of a molecular graph" [21,22].The graph-theoretical detour matrix have found some interest in chemistry [23] and a valuable variation pertaining to the definition of the diagonal elements was proposed by Rücker and Rücker [24].They found this matrix in combination with the Wiener index W is more useful than Hosoya's Z index for regression of the boiling points of a large sample of compounds containing all acyclic and cyclic alkanes with known boiling points from methane up to polycyclic octanes.
In three previous papers [25][26][27] we have analyzed the relative merits of these distances when they are used to define molecular descriptors in order to calculate physical chemistry properties within the realm of QSAR/QSPR theory.The aim of this paper is to continue with this sort of discussion resorting to the calculation of heats of formation for a representative set of 60 hydrocarbons.
The paper is organized as follows: next section deals with some basic mathematical definitions and the computational procedure; then we present the results and discuss them; and finally we state the conclusion of the present study as well as some possible extensions.

Basic Definitions
The adjacency matrix: The adjacency matrix A = A(G) of a graph G with N vertices is the square N x N symmetric matrix whose entry in the i th row and j th column is defined as [19] 1 if i ≠ j and (i,j) ∈ E(G) where E(G) represents the set of edges of G.The sum of entries over row i or column j in A(G) is the degree of vertex i, deg i .
An example of molecular graph and the adjacency matrix is given below for the 1-ethyl-2methylcyclopropane.The vertices and edges are labeled from 1 to 6 and from a to f, respectively.

The distance matrices
The distance matrix D(G) can be defined for G with elements Dij, the distance, or the number of least steps from vertex i to vertex j.Similarly, the detour matrix ∆(G) of a labeled connected graph G is a real symmetric N x N matrix whose (i,j) entry is the length of the longest path from vertex i to vertex j (i.e., the maximum number of edges in G between vertices i and j [23]).
For example, for the previous graph corresponding to 1-ethyl-2-methyl cyclopropane molecule, matrices A and ∆ are:

Topological molecular descriptors
We present the basic definitions related to the topological descriptors chosen for the present study.
Balaban index [32,33] where d i = ∑ D ij , q is the number of edges and µ the number of cycles in the graph.The j summations in formulas (2), (3) and ( 5) are over all edges i-j in the hydrogen-depleted graph.

The Zagreb group indices
The employment of these topological descriptors has proven to be extremely useful in QSAR/QSPR studies giving simple correlations between the selected properties and the molecular structure [39][40].Multiple regression analysis is often employed in such studies in the hope that it might point to structural factors that influence a particular property.Naturally, regression analysis results do no establish any sort of causal relationships between structural components and molecular properties.However, it may be helpful in model building and be useful in the design of molecules with prescribed desirable properties [41].
An interesting alternative to the previous definitions based upon distance matrix D is replace it by the detour matrix ∆, defining the associated topological indices W*, H*, etc. on the basis of this last matrix in a similar fashion as done in Eqs.(2-5).

Results and Discussion
We have employed two sets of topological descriptors; a) {N c , 0 χ, 1 χ, 2 χ, M1, M2, W, H, H, J, MTI} and b) {N c , 0 χ, 1 χ, 2 χ, M1, M2, W, H, J, MTI, W*, H*, J*, MTI*}.N c stands for the number of C atoms.This particular choice was made since we want to know the relative merits of topological indices defined on the basis of the two distance matrices (i.e.D and ∆).A way to assess it is via this choice, although there are other options.
The molecular set comprises 60 hydrocarbons containing from 1 to 18 carbon atoms and they are presented in Table 1 together with their corresponding topological parameters.Although at first sight this rather specialized set includes molecules with only C and H atoms, this option does not necessarily implies a lack of molecular variation within such particular choice.As a matter of fact, the hydrocarbon set includes examples of planar, non-planar, alternant and non-alternant aromatic hydrocarbons, alkyl-and alkenyl-substituted benzene derivatives, acyclic and polycyclic alkanes, strained and unstrained olefins and disparate structures combined with aromatics like benzene and naphthalene, which do not require separate parametrizations for different types of C and H atoms.This molecular set has been used in previous studies on QSPR theory [42][43][44].
Enthalpy (or heat) of formation is a fundamental thermodynamical key for predicting the compound's thermochemical behavior.Thus, enthalpies of formation are important in the investigation of bond energies, resonance energies, the nature of the chemical bond, the calculation of equilibrium constants of reaction, etc. [45].Therefore, it is not surprising that considerable endeavor has been directed towards the determination of heats of formation in the past [46][47][48][49][50][51].Although a wide variety of procedures to calculate heats of formation theoretically have indeed been introduced, based on such different concepts as isodesmic and homodesmic reactions, atom and group equivalents, transferability and additivity of Fock matrix elements, etc. [52][53][54][55][56][57][58][59][60] the calculation through QSPR theory has not attracted so much attention.
We have performed a complete analysis to find the best one-, two-, ..., five-variables fittings at first, second and third polynomial orders.Statistical results are given in Table 2 for both molecular sets, while in Table 3 we display some theoretical results together with the experimental data.We have inserted the best five-variables third-order correlations for both topological indices sets a) and b).Complete results are available and can be obtained upon request to one of us (E.A.C.) at the above address.  1Best five-variables fitting for molecular set a).
(2) Best five-variables fitting for molecular set b).
Analysis of results from Tables 2 and 3 shows the better predictive power arising from the fitting equations derived on the basis of set b) of topological descriptors than those corresponding to the set a).The statistical parameters (Table 2) and the correlations between experimental data and theoretical predictions (Table 3) makes clear the convenience of resorting to the use of the detour matrix instead of the standard distance matrix in order to define the pertinent topological parameters.Besides, it seems recommendable to employ higher-order polynomials to get a better degree of prediction.These conclusions are in line with other previous ones in some studies on the usefulness of the application of the detour matrix [25][26][27].
In order to properly judge the value of the predictions is interesting to note the low average absolute deviations (0.76 and 0.62 kcal/mol, respectively).It must be taken into account that usually the theoretical predictions give an average absolute deviation around 2 kcal/mol, which is the degree of uncertainty in the experimental determinations.Furthermore, there is not any "pathological" prediction and the maximum deviation for the results presented in Table 3 is 2.31 kcal/mol.

Conclusions
We have employed some usual topological indices to study the heat of formation of a set of hydrocarbons comprising 60 structurally diverse molecules.In those cases were the definition demands the employment of the distance matrix we have also defined similar indices on the base of the detour matrix (i.e.maximum distance) in order to assess the relative merits of both distance definitions.Results show that resorting to the detour matrix for defining the topological indices yields better correlations to predict enthalpies of formation.These results agree with other similar ones to study other physical chemistry properties, which seems to support the use of detour indices in structure-property modeling [25][26][27][61][62][63].We conclude that the obtained results are good enough for the chosen set to validly infer that the detour matrix ∆ represents a convenient topological device to be employed in the QSAR/QSPR analysis and it constitutes a valuable molecular descriptor which verily adds to the set of topological matrices.In order to arrive to more significant and definite conclusions on this issue, we deem it is necessary to extend the calculations to quite different molecular sets and other physical chemistry properties and biological activities.Research along this line is being carried in our laboratories and results will be presented elsewhere in the near future.

Table 2 .
Statistical results for the best fitting equations.

Table 3 .
Experimental and theoretical heats of formation (kcal/mol) for a set of 60 hydrocarbons.The numbering of molecules corresponds to the molecular listing of Table1.