1. Introduction
Structure–property modeling employs molecular descriptors [
1] to generate regression models correlating the physicochemical, biological, or thermodynamic properties of chemical compounds. Degree-based graphical indices are a class of graph-theoretic molecular descriptors that gained popularity in efficiently correlating the physicochemical properties of benzenoid hydrocarbons (BHs). In 1975, Randić introduced the connectivity index, commonly referred to as the Randić index (cf. [
2]). Over the years, this index has emerged as the predominant molecular descriptor in Quantitative Structure–Property Relationship (QSPR) and Quantitative Structure–Activity Relationship (QSAR) studies (cf. [
2]). Its mathematical properties have been extensively examined, as succinctly outlined in two recent monographs [
2,
3]. Moreover, various modifications and alternative formulations of this index have been proposed in the scientific literature (cf. [
4,
5]). In the present discourse, we also explore a closely affiliated variant of the connectivity index, denoted as the sum-connectivity index [
6]. For some recent progress on the structure–property modeling of the physicochemical properties of nanostructures and bio-molecular networks, we refer to [
7,
8,
9,
10].
In order to test the quality of a certain class of molecular graphical descriptors, it is customary to conduct comparative testing by selecting suitable test molecules and their particular chemical properties. Gutman and Tošović [
11] tested the quality of degree-dependent graphical descriptors for correlating the physicochemical properties of ismeric octanes (representatives of alkanes). Malik et al. [
12] extended this study of degree-based molecular indices from octane-isomers to benzenoid hydrocarbons (BHs). Hayat et al. [
13] (resp. Hayat et al. [
14]) further extended the work from physicochemical properties to the quantum-theoretical (resp. thermodynamic) properties of BHs.
In their study, Gutman and Tošović [
11] selected isomeric octanes as test molecules, whereas, other studies [
11,
12,
14] opted for the lower 20–30 BHs as test molecules for their investigation. Moreover, Gutman and Tošović [
11] and Malik et al. [
12] selected the normal boiling point (
) and the standard enthalpy of formation
to represent physicochemical characteristics. Van der Waals and intermolecular forms of interactions are represented by
, whereas,
advocates for the thermal characteristics of a compound. On the other hand, the total
-electronic energy (
) was selected to represent quantum-theoretical characteristics by Hayat et al. [
13] and the entropy and heat capacity were selected to advocate for thermodynamic properties by Hayat et al. [
14].
All of the aforementioned quality testing revealed the strong potential of both general product-connectivity
and sum-connectivity
indices to efficiently correlate the physicochemical, thermodynamical, and quantum-theoretical characteristics of benzenoid hydrocarbons. For instance, Malik et al. [
12] showed that, among all degree-based descriptors,
and
are the top two indices in correlating physicochemical characteristics of BHs. Similarly, Hayat et al. [
13] showcased that
and
are the best descriptors in predicting the
of BHs, whereas Hayat et al. [
14] showed that
and
are best two indices for correlating the thermodynamic properties of BHs. However, the disadvantage to these studies is that they consider both
and
in their comparative testing for only finite values of
, i.e.,
. Since both
and
deliver strong potential in correlating various properties of BHs, it is natural to consider these indices by considering the general
. Note that there might be a possibility that some other nonlinear function
, for instance considering other powers of
, could work even better. However, the current study is restricted to investigating the estimation potential of
and
only.
In summary, current comparative studies considered
and
for
and showed that both
and
with some of these values of
correlate well with the physicochemical properties as well as the total
-electron energy (
) of benzenoid hydrocarbons (BHs). For instance, Malik et al. [
12] showed that
and
are the top two best degree-based predictors for correlating the physicochemical properties of BHs. Moreover, Hayat et al. [
13] showed that
correlates well with the
of BHs. The only limitation of these studies was that they considered
and
for
only. So, if
and
deliver good predictors for these fixed integral values, both
and
might deliver even better predictors if we consider the general values of
.
In this paper, we determine the value(s) of
for which both
and
deliver strong predictive potential for the physicochemical properties of BHs. Multiple correlation and regression analyses were also conducted to find the best
for which the strongest multiple correlation is delivered both by
and
simultaneously. Following Gutman and Tošović [
11], the physicochemical properties
and
were selected as the test properties of BHs. Moreover, 22 lower BHs were selected as the test molecules as the public availability of the experimental values of
and
is ensured for these test molecules. A computational method was used to calculate the
and
of these 22 BHs and then a detailed statistical analysis was conducted to find the suitable values of
for which both
and
deliver strong predictive potential.
2. Mathematical Preliminaries
For a chemical graph
, a degree-based graphical index
takes the general form
, where
is a symmetric map and
is the degree of
. The
product-connectivity index of
G, proposed by Randić in [
15] back in 1975, is one of the earliest degree-based graphical indices. Later on, the index was renamed as the Randić index. Mathematically, it takes
in
. Thus, the product-connectivity descriptor
is defined as:
The diversity of its applicability in cheminformatics makes the Randić index one of most-studied structure graphical descriptors. For instance, its mathematical and chemical properties were extensively examined in [
2,
16,
17,
18,
19].
Introduced by Zhou and Trinajstić [
6], the
sum-connectivity index is another degree-related molecular graphical descriptor. For a graph
G, it considers
in
. Therefore, the sum-connectivity
of
G has the defining structure:
The reader is suggested [
12,
13,
20,
21] for further studies on both applicative and mathematical perspectives of the sum-connectivity index.
The successful applicability of the product-connectivity and sum-connectivity indices motivated researchers to consider variants of these descriptors. Perhaps, the most well-studied variants are the generalized variants of the product- and sum-connectivity indices. For
, if
(resp.
), the index is called the
general product-connectivity (resp.
general sum-connectivity ) index. The general product-connectivity index was put forward by Bollobás and Erdös [
4] in 1998 while generalizing the classical
index:
where
. There have been numerous contributions in the chemical and mathematical literature published on the general product-connectivity index, see, for example, [
2,
22,
23,
24,
25].
Similarly, Zhou and Trinajstić [
26] in 2010 proposed the general sum-connectivity index with the following defining structure:
where
. A detailed mathematical treatment is reported in [
27,
28,
29,
30]. The application perspective of
is reported in Gutman and Tošović [
11] and Hayat et al. [
14]. Obviously,
In the field of statistics, the
correlation coefficient between two finite-mean random variables
X and
Y is defined to be
, where cov is the covariance function, and
and
represent the standard deviations of the random variables
X and
Y, respectively. The correlation coefficient measures both the direction and strength of the linear relationship between a
predictor Y and a response variable
X. For a series of
k measurements of these variables, denoted by
and
, the value
is estimated by
where
and
. Values of
closer to 1 indicate a strong linear relationship between
X and
Y.
The correlation coefficient is strongly linked to the concept of the linear regression of Y against X by assuming a regression line where represents random errors, and are coefficients to be estimated. The ordinary least squares method is typically employed, with closed-form solutions of the estimators and for a and b, respectively, being readily available and widely known. In particular, for this simple linear regression model, , where and are the unbiased estimators of and , respectively, while . Evidently, the correlation is related to the slope of the regression line.
The
standard error of fit and correlation coefficient are both key goodness-of-fit measures in regression analysis. The standard error of fit is defined as
where
(the regression line’s resulting predicted value). This quantifies how much the observed values deviate from the values predicted by the model. Using various types of mathematical or statistical software, they can be calculated.
The linear regression model can be extended to include multiple predictors, e.g.,
. Suppose we have two predictors
and
, we may define the
multiple correlation measure between these predictors and a single response variable
Y as follows:
In the context of multiple linear regression, the quantity
is usually referred to as the
coefficient of determination. It is interpreted as the proportion of variability in the response variable
Y that is accounted for by the predictor variables
and
. The value
R, thus, provides a measure of the correlation between the observed values of
Y and the values predicted by the multiple linear regression model involving
and
.
3. Materials and Methods
Every benzenoid hydrocarbon can be inherently depicted through a benzenoid system, defined as a finite, connected plane graph devoid of cut vertices, wherein each internal face is enclosed by a regular hexagon possessing sides of unit length.
The following definitions, as presented in [
31], are applicable. Let
B be a benzenoid system with
v vertices and
p hexagons. For any path
of length
) within
B, the associated vertex degree sequence is defined as
. Subsequently, a
fjord,
cove,
bay, and
fissure refer to paths with degree sequences (2, 3, 3, 3, 3, 2), (2, 3, 3, 3, 2), (2, 3, 3, 2), and (2, 3, 2), respectively. These paths are traversed along the perimeter of
B, as depicted in
Figure 1. Fjords, coves, bays, and fissures are all considered different types of
inlets. The number of inlets,
k, is then defined as the total number of fjords, coves, bays, and fissures summed.
Suppose a benzenoid system
B has
p hexagons,
k inlets, and
v vertices. Let
denote the number of
B’s edges that satisfies the conditions
and
, where
and
, respectively, are the degrees of the ends
a and
b of an edge. By Lemma 1 in [
31], we have
By (
3) and (
7), the benzenoid system
B has the general product-connectivity index as follows:
By (
4) and (
7), the benzenoid system
B has the general sum-connectivity index as follows:
We employ (
8) and (
9) to compute the
and
for the 22 lower BHs given in
Table 1.
Table 1 provides information on the molecular structure, normal boiling point (
), and standard enthalpy of formation (
) for various polycyclic aromatic hydrocarbons (PAHs). Additionally,
Table 2 presents data on the general product-connectivity index
and the general sum-connectivity index
for the 22 lower BHs.
4. Results and Discussion
Recall that the general product-connectivity index and the general sum-connectivity index considering a range of values exhibit a high degree of accuracy in predicting the boiling point and enthalpy of formation for the lower benzenoid hydrocarbons (BHs).
First, we employed the method described in
Section 3 to evaluate the exact analytical expressions for
and
for the 22 lower BHs provided in
Table 1. In particular, we utilized expressions for
and
in (
8) and (
9), respectively, to compute their exact values. Note that, we only needed the number of vertices
v, the number of inlets
k, and the number of hexagons
p for a given hexagonal system to compute its
and
values. The next example explains the methodology in
Section 3 to compute the general sum- and product-connectivity indices for a given BH graph.
Example 1. Let us consider the graph of phenanthrene, e.g., P from Table 1. Then, P comprises two fissures, one bay (three inlets in total), three hexagons, and 14 vertices. Thus, , , and . Using these values in (8) and (9), we obtain: By using this method for all the graphs in
Table 1, we generated the data in
Table 2.
From the data shown in
Table 2, we generated four curves, as illustrated in
Figure 2,
Figure 3,
Figure 4 and
Figure 5. For these 22 lower BHs, the correlation coefficient curves for their physicochemical properties (
in
Figure 2 and
Figure 3;
in
Figure 4 and
Figure 5) and the indices (
or
) are drawn in the respective figures in solid lines, distinguished by colors.
Comparing the two general indices, the general product-connectivity index
is the best measure of the boiling point
for BHs for
, as shown in
Figure 3, while for any other
, the sum-connectivity index
is the best. On the other hand, as measures of the enthalpy of formation
of benzenoid hydrocarbons, the general product-connectivity index
is better for
, as can be seen in
Figure 5, while for any other
, the sum-connectivity index
is better.
There exists a good correlation between
and
when
is in some interval. For example, for
,
and
have a correlation coefficient greater than 0.996558. Similarly, there also exists—for
in different intervals—a good correlation between
and
, between
and
, and between
and
, as shown in
Figure 6.
By
Figure 3 and
Figure 5, we have that, for the 22 lower BHs,
and
are the most linearly correlated with
and
, respectively, among all product-connectivity indices, and
and
are the most linearly correlated with
and
, respectively, among all sum-connectivity indices. The linear correlations (with 95% confidence intervals) between the physicochemical properties (
and
) and both of the aforementioned indices, respectively, are given below:
Note that
s and
are the standard error of fit and correlation coefficient, respectively.
Figure 7 shows scatter plots between the boiling point
and the indices
and
, and scatter plots between the enthalpy of formation
and the indices
and
for the 30 lower benzenoids.
It is obvious from (
10)–(
13) that the product-connectivity indices
and
, respectively, are the best for measuring the boiling point and enthalpy of formation among all the examined indices. All the Octave codes have been made publicly accessible. See the
Supplimentary Information at the end of the paper.
Recall that Gutman and Tošović [
11] considered
and
to be representatives of physicochemical properties. Moreover, they considered isomeric octanes as test molecules. We applied our study on the 18 isomeric octanes and the preliminary results showed that the value(s) of
for the 22 lower BHs yielding a good estimate of
and
were not the same as they were for isomeric octanes. Thus, the current study and the corresponding intervals/values of
are limited to BHs only. However, we expect a similar behavior for other BHs (different from the 22 lower BHs considered in this study) as well.
5. Simultaneous Predictive Potential of and
In this section, we are interested in finding value(s) of for which the correlation of either or with both properties and simultaneously is the strongest. In order to achieve that, we need to consider the multiple correlation coefficient of either or with both and by treating them as two independent variables. Let be the dependent variable and be the two independent variables. Note that the multiple correlation determines the relationship with one dependent and more than one independent variable. Since there are two representatives of physicochemical properties, i.e., and , we employ multiple correlation between one graphical descriptor and the two chosen properties . This was able to deliver the predictive potential of a descriptor with the two properties simultaneously rather than determining the correlation strength of the considered descriptor with both properties individually.
In the case where the response variable
y depends on an unknown parameter
, the value of multiple correlation
above also depends on
, i.e.,
. A preliminary plot of
in the region
reveals a unimodal shape with a maxima in this region. A built-in optimizer in the
R programming language was employed that yielded the value
that maximizes the multiple correlation value
.
Figure 8 presents the corresponding plot elaborating this calculation.
Figure 9 exhibits the matrix plot showing the distribution of the variables as well as the bivariate relationships between them (using the optimal value
).
Next, we study the multiple correlation
between
and the two chosen physicochemical properties
. In the case where the response variables
y depends on an unknown parameter
, the value of
R above also depends on
, i.e.,
. A preliminary plot of
in the region
again reveals a unimodal shape with a maxima in this region. This time, the built in
R optimizer yielded the value
, so
.
Figure 10 presents the corresponding plot elaborating these values.
Figure 11 exhibits the matrix plot, showing the distribution of the variables as well as the bivariate relationships between them (using the optimal value
).