Weighted Wiener Indices of Molecular Graphs with Application to Alkenes and Alkadienes

There exist many topological indices that are calculated on saturated hydrocarbons since they can be easily modelled by simple graphs. On the other hand, it is more challenging to investigate topological indices for hydrocarbons with multiple bonds. The purpose of this paper is to introduce a simple model that gives good results for predicting physico-chemical properties of alkenes and alkadienes. In particular, we are interested in predicting boiling points of these molecules by using the well known Wiener index and its weighted versions. By performing the non-linear regression analysis we predict boiling points of alkenes and alkadienes.


Introduction
The Wiener index is a graph invariant based on distances in a graph. It is denoted by W(G) and defined as the sum of distances between all pairs of vertices in G: The name Wiener index is usual in chemical literature, since Harold Wiener [1] in 1947 seems to be the first who considered it. His original definition was slightly different-yet equivalent-to (1); the definition of the Wiener index in terms of distances between vertices of a graph, such as in Equation (1), was first given by Hosoya [2].
Wiener stated that the boiling points of organic compounds as well as their physical properties depend on the number and structural arrangement of atoms in a molecule. In chemistry, an organic compound is generally any chemical compound that contains carbon-C and hydrogen-H atoms. Molecules composed only of this two types of atoms are called hydrocarbons. The graph representation of a hydrocarbon with only single bonds (saturated hydrocarbon) is a simple graph where vertices represent carbon atoms, two vertices being adjacent if there is a CC bond between the corresponding C atoms. Since the valency of a carbon atom equals four, the number of hydrogen atoms adjacent to each C atom is determined and can therefore be omitted in the graph representation. A comparative study between the Wiener index and several other topological indices as predictors for the boling points of saturated hydrocarbons called alkanes was done in Reference [3], showing that best results are obtained in 3-or 4-parameters relationship, both of them including the Wiener index.
In the case of hydrocarbons with double or triple bonds -unsaturated hydrocarbons, the graph representation should be a multigraph. Such graphs can be described with the vertex-adjacency matrix A for which (A) ij = m ij where m ij is the multiplicity (i.e., bond order) of the edge between vertices v i and v j (see Reference [4]). But there is some ambiguity in the mathematical modelling of unsaturated hydrocarbons. In one of the seminal papers on the prediction of physico-chemical properties from topological indices by Basak et al. [5] three different clusters of hydrocarbons were investigated: alkanes, alkylbenzenes and polycyclic aromatic hydrocarbons, all of them modelled with a simple graph although the last two groups of molecules contain double CC bonds. In the past few years much attention is given to the unsaturated hydrocarbons called carbon nanotubes, especially to fullerenes (cubic 3-connected planar graphs with only pentagonal and hexagonal faces), which are modelled with simple graphs. For some latest result on topological indices (parameters) of this graphs see References [6][7][8]. The Wiener index of some alkenes was calculated in Reference [9] with the use of the eccentricity of a graph and the double bonds were treated the same way as single bonds. Moreover, the Wiener index of unsaturated hydrocarbons was considered in Reference [10]. It was shown that the Wiener index W of a molecular graph can be written as W = W s + W d + W t + W a , where s, d, t, and a refer to the contributions of single, double, triple, and aromatic bonds, respectively. Then, multiple linear regression analysis was performed and a good correlation between the logarithms of n-octanol-water partition coefficients of the molecules and the mentioned components was established (R = 0.95). In this investigation, the considered collection of molecules included 71 saturated and unsaturated hydrocarbons. Furthermore, in Reference [11] the authors distinguish between double and single bonds of molecules with the use of quite complex charge density of an atom (CMI-Charge-related Molecular Index).
On the other hand, some authors introduced the concept of a weighted graph to consider multiple bonds in unsaturated hydrocarbons. The approach of searching the optimal weight x of a double bond in alkenes was addresed by Randić and Pompe in Reference [12]. They selected a set of 39 alkenes and chose as a property a molar refraction number and then use the statistical analysis in optimizing x so that the standard error of the regression was the smallest. These two authors et al. in Reference [13] performed a QSPR study for predicting gas-phase reaction rate constants of unsaturated organic compounds with OH radical. Two approaches were taken; the first one was an optimized 6-parameters MLR regression model originated with a large pool of topological descriptors and the second one focused only on one topological index where the weights of the atoms are variables that are to be determined by the optimization procedure in the best fit to the considered physico-chemical property. Grossman et al. [14] used the concept of weighted paths in the characterizations of heteroatomic molecules and their activity.
Morevoer, in Reference [15] authors defined a novel distance between two adjacent atoms in a molecular graph which corresponds to the length of the bond. Next, they introduced so-called valence overall Wiener index and found out that it has a good correlation with molecular volume (R = 0.9955), boiling point (R = 0.9906), partition coefficients (R = 0.9280), molecular refractions (R = 0.9946), critical temperature (R = 0.9906), and critical pressures (R = 0.9307) for various hydrocarbons with different types of bonds. Furthermore, a very similar approach was used in References [16,17] for the hyper-Wiener index. More precisely, in Reference [16] a novel type of hyper-Wiener index for unsaturated hydrocarbons was defined by taking into account the relative distances. Then, three physical properties were considered. After applying multiple regression analysis for 41 unsaturated hydrocarbons, good correlations with the following three properties were obtained: boiling point (R = 0.9820), molecular volumes (R = 0.9810), heats of atomic formation (R = 0.9775 in one case and R = 0.9999 in another case). On the other hand, the authors of Reference [17] used the relative distances to define a novel overall hyper-Wiener index. They performed multiple regression analysis in a QSAR modelling of six physical properties (boiling point, molar volumes, partition coefficient, molecular refractions, critical temperature, and critical pressures) of 42 alkanes and unsaturated hydrocarbons with one to six carbons. It was concluded that the novel overall hyper-Wiener index has good relationships with molecular properties of these hydrocarbons with different types of multiple bonds.
We are interested in the correlation between the weighted Wiener index and the boiling points of alkenes and alkadienes. The aim of this paper is to mathematically model an unsaturated hydrocarbon with an edge-weighted graph, then calculate the corresponding distances in the graph and obtain a new Wiener index. With the use of the QSPR (Quantitative Structure Property Relationships) we show that the new Wiener index gives very good predictions of the boiling points of alkenes and alkadienes. This approach can be applied in the calculation of different distance-based topological indices for unsaturated hydrocarbons as well as for organic compounds in general. At the end, we compare our method for a group of considered molecules with the method used in References [15][16][17].

Graph Theory Preliminaries
A graph G is an ordered pair G = (V, E) of a set V of vertices and a set E of edges, which are 2-element subsets of V. The edge e = {u, v} between vertices u and v will be also denoted as e = uv. All the basic concepts from graph theory can be found in Reference [18]. Having a molecule, if we represent atoms by vertices and bonds between them by edges, we obtain a molecular graph.
If G is a connected graph, then a function w : E(G) → R + is called an edge-weight of G. The pair (G, w) is known as the edge-weighted graph.
A path of length n − 1 between vertices v 1 and v n in a connected graph G is a sequence of vertices A shortest path between vertices u and v of (G, w) is a path with the minimum weight w(P) among all possible paths between u and v. The distance between u and v, d (G,w) (u, v), is the weight of any shortest path P between u and v, that is, d (G,w) (u, v) := w(P). Obviously, if w(e) = 1 for any e ∈ E(G), then d (G,w) (u, v) is the standard graph distance denoted simply as d G (u, v).
Let (G, w) be an edge-weighted graph. The Wiener index of (G, w) [19] is defined as Obviously, if w ≡ 1, then W(G, w) is the standard Wiener index W(G). The Wiener index of a weighted graph can be called the weighted Wiener index.
Let G be a connected graph and D ⊆ E(G) a subset of its edges. If e ∈ D, then e will be called a double edge of G. In this paper, we denote a connected graph G with the set of double edges D as G D . Note that in figures the double edges are depicted with two lines between corresponding vertices and that G D can be represented as a multigraph.
If a ∈ R + is a positive constant and G D a connected graph with double edges, then the weight w : E(G) → R + is defined in the following way: Consequently, the Wiener index of G D with respect to a, denoted by W a (G D ), is defined as the Wiener index of (G, w ), that is, Finally, we define the valence Wiener index [15] by replacing the classical distances with the relative distances. Obviously, the distances in a molecule are determined by chemical bonds, so the distances between two adjacent vertices in a molecular graph should be correlated with the bond lengths between two atoms. As a consequence, the relative distance defines the distance of the CC single bond (sp 3 − sp 3 type) as 1. Then the relative distances of other types of chemical bonds are defined as the ratio of their bond lengths and the bond length of CC single bond (sp 3 − sp 3 type), which equals 1.544. The bond lengths and relative distances of some types of chemical bonds [20] are shown in Table 1. Table 1. Bond types with their lengths and relative distances.

Bond Type
Bond Length Relative Distance If G D is a connected graph with double edges, then we define edge-weight w so that w (e) is the relative distance of the corresponding chemical bond. Hence, the valence

Algorithms for Computing the Wiener Indices
In this section, we present two algorithms that were used for computing the Wiener index of an edge-weighted graph (G, w). Throughout the section, the (i, j) entry of a matrix A, (A) ij , will be denoted as Let V(G) = {1, . . . , n} be the vertex set of (G, w). We represent the graph by a n × n array (matrix), such that if e = {i, j} is an edge, then the (i, j) entry and the (j, i) entry of the array are both equal to w(e), all diagonal elements are set to 0, and all the other entries of the array are initialized to infinity. Then, we perform the well-known Floyd-Warshall algorithm [21] (Algorithm 1) to compute the distance matrix D of (G, w). More precisely, . In our version of the algorithm, we consider only undirected graphs, but Algorithm 1 can be easily adopted so that it can be used also on directed graphs. It is easy to see that it can be implemented in O(n 3 ) time.
When the distance matrix of (G, w) is computed, we can easily calculate the Wiener index W(G, w) by the procedure described in Algorithm 2. It is obvious that this algorithm correctly computes the Wiener index of an edge-weighted graph and has the time complexity O(n 2 ).
For the clarity, we present the described procedure for computing the index W a of the graph G D with label 6D11 from Figure 1. The obtained distance matrix is Hence, we obtain W a (G D ) = 5a + 23.

Experimental Data
In this section we present the data that was used. Firstly, we consider the alkenes (with one double bond). We wanted to obtain the boiling points at normal pressure for unbranched alkenes with at most ten carbon atoms and also all branched alkenes with at most seven carbon atoms (the number of branched alkenes increases very quickly with the number of carbon atoms). Table 2 presents all such alkenes for which the data was available in Reference [22] (only some alkenes, for which we did not find the data, were omitted).
In our model, different stereoisomers are represented by the same graph with double edges. Therefore, it can not distinguish between E and Z isomer molecules (molecular descriptors that can distinguish between such isomers were considered, for example, in Reference [23]). However, the difference in the boiling points between E and Z isomers is usually small. Moreover, E isomers are commonly more stable and lower in energy, therefore at normal conditions compound contains mostly those types of geometric isomers. As a consequence, we considered only the data for E isomers.
To every alkene we assign a label (see the second column in Table 2) in which the first number represents the number of carbon atoms and the number of double bonds corresponds to the number of times letter D appears in the label. The last number in the label is simply the consecutive number. The alkenes from Table 2, together with their labels, are depicted in Figure 1.  However, considering the alkenes with more than one double bond is very demanding due to their number. Therefore, in Table 3 we included all the alkenes with two double bonds (alkadienes) up to six carbon atoms and two additional alkadienes with seven carbon atoms for which the data was available (again, a label was assigned to every alkene from Table 3). These alkadienes are depicted in the lower part of Figure 1.

Results
In this section, we present the results obtained by the QSPR analysis. We divided the data into the training and the test set (see "x" in the last column in Tables 2 and 3). The correlation between the boiling points and weighted Wiener indices W = W 1 , W 2 , and W1 2 was investigated. Weighted Wiener indices are also calculated in Tables 2 and 3.

Alkenes
After performing regression analysis on all three weighted Wiener indices, it was established that the best results are obtained with the index W 2 . Therefore, we only present the results of the regression between the boiling points and W 2 . The obtained logarithmic function based on the training set can be seen in Figure 2. The predicted boiling point ( BP) can now be computed as BP = 63.071 ln(W 2 ) + 106.69.
The correlation is very good since R 2 = 0.9847 (for the comparison, the correlation with the Wiener index W is slightly lower, that is, R 2 = 0.9725). By applying Equation (2) we calculate the predicted boiling points and the corresponding residuals on the test set, see Table 4. Moreover, in Table 5 the mentioned data is provided for all considered alkenes. Finally, the graphical representation of boiling points and predicted boiling points is shown in Figure 3.
In order to obtain the best possible correlation, we consider separately also the unbranched alkenes. The correlation is slightly better, since R 2 = 0.9937 (see Figure 4).

Alkadienes
Again, we only present the results of the regression between the boiling points and W 2 since the best results are obtained with this index. Similarly, the logarithmic function gives the best fit on the training set, see Figure 5: The correlation is very good also in this case since R 2 = 0.9464 (the correlation with the Wiener index W is weaker, that is, R 2 = 0.9059). After applying Equation (3) on the test set and on all considered alkadienes, the calculated predicted boiling points and the corresponding residuals are shown in Tables 6 and 7, respectively. The boiling points and the predicted boiling points are shown in Figure 6.
As in the case of alkenes, we separately consider all 14 unbranched alkadienes. Again, the correlation is improved with R 2 = 0.9600 (see Figure 7).
For the unbranched alkadienes we compare our method to the method used in References [15][16][17], where the valence Wiener index was applied. The correlation between the boiling points and W v is weaker with R 2 = 0.9242 (see Figure 8).

Conclusions
In this paper, we have modelled unsaturated hydrocarbons (alkenes and alkadienes) by edge-weighted graphs. In general, a weight of a double edge can be any real number; we have decided to use the weight equal to the number of bonds, that is, 2. On the other hand, it seems reasonable to use a weight less than 1 (for example 1/2) since multiple bonds are shorter than single bonds. Therefore, we have performed statistical analysis for both weights on double edges, that is, 2 and 1/2. We were also interested whether the weighted Wiener index will outdo the usual Wiener index (which does not distinguish between alkanes and alkenes) in the prediction of boiling points, so we have used weight 1 as well. At the end, we investigate correlation between the valence Wiener index and the boiling points for the unbranched alkadienes. However, the correlation for these molecules is better with the weighted Wiener index W 2 .
It also turns out that the correlation coefficient in regression analysis between weighted Wiener indices and boiling points is the best for weight 2 and hence only these results are presented. Consequently, logarithmic models for predicting boiling points of alkenes and alkadienes are deduced. It would be interesting to investigate the physical meaning of the obtained correlations. Moreover, similar approach can be applied also for unsaturated hydrocarbons with multiple bonds and for other physico-chemical properties. Furthermore, an open problem is to upgrade the described model in such a way that it would be possible to distinguish between different stereoisomers.