1. Introduction
Structure–property modeling employs molecular descriptors [
1] to generate regression models correlating the physicochemical, biological, or thermodynamic properties of chemical compounds. Degree-based graphical indices are a class of graph-theoretic molecular descriptors that gained popularity in efficiently correlating the physicochemical properties of benzenoid hydrocarbons (BHs). In 1975, Randić introduced the connectivity index, commonly referred to as the Randić index (cf. [
2]). Over the years, this index has emerged as the predominant molecular descriptor in Quantitative Structure–Property Relationship (QSPR) and Quantitative Structure–Activity Relationship (QSAR) studies (cf. [
2]). Its mathematical properties have been extensively examined, as succinctly outlined in two recent monographs [
2,
3]. Moreover, various modifications and alternative formulations of this index have been proposed in the scientific literature (cf. [
4,
5]). In the present discourse, we also explore a closely affiliated variant of the connectivity index, denoted as the sum-connectivity index [
6]. For some recent progress on the structure–property modeling of the physicochemical properties of nanostructures and bio-molecular networks, we refer to [
7,
8,
9,
10].
In order to test the quality of a certain class of molecular graphical descriptors, it is customary to conduct comparative testing by selecting suitable test molecules and their particular chemical properties. Gutman and Tošović [
11] tested the quality of degree-dependent graphical descriptors for correlating the physicochemical properties of ismeric octanes (representatives of alkanes). Malik et al. [
12] extended this study of degree-based molecular indices from octane-isomers to benzenoid hydrocarbons (BHs). Hayat et al. [
13] (resp. Hayat et al. [
14]) further extended the work from physicochemical properties to the quantum-theoretical (resp. thermodynamic) properties of BHs.
In their study, Gutman and Tošović [
11] selected isomeric octanes as test molecules, whereas, other studies [
11,
12,
14] opted for the lower 20–30 BHs as test molecules for their investigation. Moreover, Gutman and Tošović [
11] and Malik et al. [
12] selected the normal boiling point (
) and the standard enthalpy of formation 
 to represent physicochemical characteristics. Van der Waals and intermolecular forms of interactions are represented by 
, whereas, 
 advocates for the thermal characteristics of a compound. On the other hand, the total 
-electronic energy (
) was selected to represent quantum-theoretical characteristics by Hayat et al. [
13] and the entropy and heat capacity were selected to advocate for thermodynamic properties by Hayat et al. [
14].
All of the aforementioned quality testing revealed the strong potential of both general product-connectivity 
 and sum-connectivity 
 indices to efficiently correlate the physicochemical, thermodynamical, and quantum-theoretical characteristics of benzenoid hydrocarbons. For instance, Malik et al. [
12] showed that, among all degree-based descriptors, 
 and 
 are the top two indices in correlating physicochemical characteristics of BHs. Similarly, Hayat et al. [
13] showcased that 
 and 
 are the best descriptors in predicting the 
 of BHs, whereas Hayat et al. [
14] showed that 
 and 
 are best two indices for correlating the thermodynamic properties of BHs. However, the disadvantage to these studies is that they consider both 
 and 
 in their comparative testing for only finite values of 
, i.e., 
. Since both 
 and 
 deliver strong potential in correlating various properties of BHs, it is natural to consider these indices by considering the general 
. Note that there might be a possibility that some other nonlinear function 
, for instance considering other powers of 
, could work even better. However, the current study is restricted to investigating the estimation potential of 
 and 
 only.
In summary, current comparative studies considered 
 and 
 for 
 and showed that both 
 and 
 with some of these values of 
 correlate well with the physicochemical properties as well as the total 
-electron energy (
) of benzenoid hydrocarbons (BHs). For instance, Malik et al. [
12] showed that 
 and 
 are the top two best degree-based predictors for correlating the physicochemical properties of BHs. Moreover, Hayat et al. [
13] showed that 
 correlates well with the 
 of BHs. The only limitation of these studies was that they considered 
 and 
 for 
 only. So, if 
 and 
 deliver good predictors for these fixed integral values, both 
 and 
 might deliver even better predictors if we consider the general values of 
.
In this paper, we determine the value(s) of 
 for which both 
 and 
 deliver strong predictive potential for the physicochemical properties of BHs. Multiple correlation and regression analyses were also conducted to find the best 
 for which the strongest multiple correlation is delivered both by 
 and 
 simultaneously. Following Gutman and Tošović [
11], the physicochemical properties 
 and 
 were selected as the test properties of BHs. Moreover, 22 lower BHs were selected as the test molecules as the public availability of the experimental values of 
 and 
 is ensured for these test molecules. A computational method was used to calculate the 
 and 
 of these 22 BHs and then a detailed statistical analysis was conducted to find the suitable values of 
 for which both 
 and 
 deliver strong predictive potential.
  2. Mathematical Preliminaries
For a chemical graph 
, a degree-based graphical index 
 takes the general form 
, where 
 is a symmetric map and 
 is the degree of 
. The 
product-connectivity index of 
G, proposed by Randić in [
15] back in 1975, is one of the earliest degree-based graphical indices. Later on, the index was renamed as the Randić index. Mathematically, it takes 
 in 
. Thus, the product-connectivity descriptor 
 is defined as:
The diversity of its applicability in cheminformatics makes the Randić index one of most-studied structure graphical descriptors. For instance, its mathematical and chemical properties were extensively examined in [
2,
16,
17,
18,
19].
Introduced by Zhou and Trinajstić [
6], the 
sum-connectivity index is another degree-related molecular graphical descriptor. For a graph 
G, it considers 
 in 
. Therefore, the sum-connectivity 
 of 
G has the defining structure:
The reader is suggested [
12,
13,
20,
21] for further studies on both applicative and mathematical perspectives of the sum-connectivity index.
The successful applicability of the product-connectivity and sum-connectivity indices motivated researchers to consider variants of these descriptors. Perhaps, the most well-studied variants are the generalized variants of the product- and sum-connectivity indices. For 
, if 
 (resp. 
), the index is called the 
general product-connectivity  (resp. 
general sum-connectivity ) index. The general product-connectivity index was put forward by Bollobás and Erdös [
4] in 1998 while generalizing the classical 
 index:
      where 
. There have been numerous contributions in the chemical and mathematical literature published on the general product-connectivity index, see, for example, [
2,
22,
23,
24,
25].
Similarly, Zhou and Trinajstić [
26] in 2010 proposed the general sum-connectivity index with the following defining structure:
      where 
. A detailed mathematical treatment is reported in [
27,
28,
29,
30]. The application perspective of 
 is reported in Gutman and Tošović [
11] and Hayat et al. [
14]. Obviously,
      
In the field of statistics, the 
correlation coefficient between two finite-mean random variables 
X and 
Y is defined to be 
, where cov is the covariance function, and 
 and 
 represent the standard deviations of the random variables 
X and 
Y, respectively. The correlation coefficient measures both the direction and strength of the linear relationship between a 
predictor Y and a response variable 
X. For a series of 
k measurements of these variables, denoted by 
 and 
, the value 
 is estimated by
      
      where 
 and 
. Values of 
 closer to 1 indicate a strong linear relationship between 
X and 
Y.
The correlation coefficient is strongly linked to the concept of the linear regression of Y against X by assuming a regression line  where  represents random errors, and  are coefficients to be estimated. The ordinary least squares method is typically employed, with closed-form solutions of the estimators  and  for a and b, respectively, being readily available and widely known. In particular, for this simple linear regression model, , where  and  are the unbiased estimators of  and , respectively, while . Evidently, the correlation is related to the slope of the regression line.
The 
standard error of fit and correlation coefficient are both key goodness-of-fit measures in regression analysis. The standard error of fit is defined as
      
      where 
 (the regression line’s resulting predicted value). This quantifies how much the observed values deviate from the values predicted by the model. Using various types of mathematical or statistical software, they can be calculated.
The linear regression model can be extended to include multiple predictors, e.g., 
. Suppose we have two predictors 
 and 
, we may define the 
multiple correlation measure between these predictors and a single response variable 
Y as follows:
In the context of multiple linear regression, the quantity 
 is usually referred to as the 
coefficient of determination. It is interpreted as the proportion of variability in the response variable 
Y that is accounted for by the predictor variables 
 and 
. The value 
R, thus, provides a measure of the correlation between the observed values of 
Y and the values predicted by the multiple linear regression model involving 
 and 
.
  3. Materials and Methods
Every benzenoid hydrocarbon can be inherently depicted through a benzenoid system, defined as a finite, connected plane graph devoid of cut vertices, wherein each internal face is enclosed by a regular hexagon possessing sides of unit length.
The following definitions, as presented in [
31], are applicable. Let 
B be a benzenoid system with 
v vertices and 
p hexagons. For any path 
 of length 
) within 
B, the associated vertex degree sequence is defined as 
. Subsequently, a 
fjord, 
cove, 
bay, and 
fissure refer to paths with degree sequences (2, 3, 3, 3, 3, 2), (2, 3, 3, 3, 2), (2, 3, 3, 2), and (2, 3, 2), respectively. These paths are traversed along the perimeter of 
B, as depicted in 
Figure 1. Fjords, coves, bays, and fissures are all considered different types of 
inlets. The number of inlets, 
k, is then defined as the total number of fjords, coves, bays, and fissures summed.
Suppose a benzenoid system 
B has 
p hexagons, 
k inlets, and 
v vertices. Let 
 denote the number of 
B’s edges that satisfies the conditions 
 and 
, where 
 and 
, respectively, are the degrees of the ends 
a and 
b of an edge. By Lemma 1 in [
31], we have
      
By (
3) and (
7), the benzenoid system 
B has the general product-connectivity index as follows:
By (
4) and (
7), the benzenoid system 
B has the general sum-connectivity index as follows:
We employ (
8) and (
9) to compute the 
 and 
 for the 22 lower BHs given in 
Table 1.
Table 1 provides information on the molecular structure, normal boiling point (
), and standard enthalpy of formation (
) for various polycyclic aromatic hydrocarbons (PAHs). Additionally, 
Table 2 presents data on the general product-connectivity index 
 and the general sum-connectivity index 
 for the 22 lower BHs.
   4. Results and Discussion
Recall that the general product-connectivity index  and the general sum-connectivity index  considering a range of values  exhibit a high degree of accuracy in predicting the boiling point and enthalpy of formation for the lower benzenoid hydrocarbons (BHs).
First, we employed the method described in 
Section 3 to evaluate the exact analytical expressions for 
 and 
 for the 22 lower BHs provided in 
Table 1. In particular, we utilized expressions for 
 and 
 in (
8) and (
9), respectively, to compute their exact values. Note that, we only needed the number of vertices 
v, the number of inlets 
k, and the number of hexagons 
p for a given hexagonal system to compute its 
 and 
 values. The next example explains the methodology in 
Section 3 to compute the general sum- and product-connectivity indices for a given BH graph.
Example  1.  Let us consider the graph of phenanthrene, e.g., P from Table 1. Then, P comprises two fissures, one bay (three inlets in total), three hexagons, and 14 vertices. Thus, , , and . Using these values in (8) and (9), we obtain:  By using this method for all the graphs in 
Table 1, we generated the data in 
Table 2.
From the data shown in 
Table 2, we generated four curves, as illustrated in 
Figure 2, 
Figure 3, 
Figure 4 and 
Figure 5. For these 22 lower BHs, the correlation coefficient curves for their physicochemical properties (
 in 
Figure 2 and 
Figure 3; 
 in 
Figure 4 and 
Figure 5) and the indices (
 or 
) are drawn in the respective figures in solid lines, distinguished by colors.
Comparing the two general indices, the general product-connectivity index 
 is the best measure of the boiling point 
 for BHs for 
, as shown in 
Figure 3, while for any other 
, the sum-connectivity index 
 is the best. On the other hand, as measures of the enthalpy of formation 
 of benzenoid hydrocarbons, the general product-connectivity index 
 is better for 
, as can be seen in 
Figure 5, while for any other 
, the sum-connectivity index 
 is better.
There exists a good correlation between 
 and 
 when 
 is in some interval. For example, for 
, 
 and 
 have a correlation coefficient greater than 0.996558. Similarly, there also exists—for 
 in different intervals—a good correlation between 
 and 
, between 
 and 
, and between 
 and 
, as shown in 
Figure 6.
By 
Figure 3 and 
Figure 5, we have that, for the 22 lower BHs, 
 and 
 are the most linearly correlated with 
 and 
, respectively, among all product-connectivity indices, and 
 and 
 are the most linearly correlated with 
 and 
, respectively, among all sum-connectivity indices. The linear correlations (with 95% confidence intervals) between the physicochemical properties (
 and 
) and both of the aforementioned indices, respectively, are given below:
Note that 
s and 
 are the standard error of fit and correlation coefficient, respectively. 
Figure 7 shows scatter plots between the boiling point 
 and the indices 
 and 
, and scatter plots between the enthalpy of formation 
 and the indices 
 and 
 for the 30 lower benzenoids.
It is obvious from (
10)–(
13) that the product-connectivity indices 
 and 
, respectively, are the best for measuring the boiling point and enthalpy of formation among all the examined indices. All the Octave codes have been made publicly accessible. See the 
Supplimentary Information at the end of the paper.
Recall that Gutman and Tošović [
11] considered 
 and 
 to be representatives of physicochemical properties. Moreover, they considered isomeric octanes as test molecules. We applied our study on the 18 isomeric octanes and the preliminary results showed that the value(s) of 
 for the 22 lower BHs yielding a good estimate of 
 and 
 were not the same as they were for isomeric octanes. Thus, the current study and the corresponding intervals/values of 
 are limited to BHs only. However, we expect a similar behavior for other BHs (different from the 22 lower BHs considered in this study) as well.
  5. Simultaneous Predictive Potential of  and 
In this section, we are interested in finding value(s) of  for which the correlation of either  or  with both properties  and  simultaneously is the strongest. In order to achieve that, we need to consider the multiple correlation coefficient of either  or  with both  and  by treating them as two independent variables. Let  be the dependent variable and  be the two independent variables. Note that the multiple correlation determines the relationship with one dependent and more than one independent variable. Since there are two representatives of physicochemical properties, i.e.,  and , we employ multiple correlation between one graphical descriptor  and the two chosen properties . This was able to deliver the predictive potential of a descriptor with the two properties simultaneously rather than determining the correlation strength of the considered descriptor with both properties individually.
In the case where the response variable 
y depends on an unknown parameter 
, the value of multiple correlation 
 above also depends on 
, i.e., 
. A preliminary plot of 
 in the region 
 reveals a unimodal shape with a maxima in this region. A built-in optimizer in the 
R programming language was employed that yielded the value 
 that maximizes the multiple correlation value 
. 
Figure 8 presents the corresponding plot elaborating this calculation.
Figure 9 exhibits the matrix plot showing the distribution of the variables as well as the bivariate relationships between them (using the optimal value 
).
 Next, we study the multiple correlation 
 between 
 and the two chosen physicochemical properties 
. In the case where the response variables 
y depends on an unknown parameter 
, the value of 
R above also depends on 
, i.e., 
. A preliminary plot of 
 in the region 
 again reveals a unimodal shape with a maxima in this region. This time, the built in 
R optimizer yielded the value 
, so 
. 
Figure 10 presents the corresponding plot elaborating these values.
Figure 11 exhibits the matrix plot, showing the distribution of the variables as well as the bivariate relationships between them (using the optimal value 
).