Next Article in Journal
Identification of Submergence Tolerance Loci in Dongxiang Wild Rice (DXWR) by Genetic Linkage and Transcriptome Analyses
Next Article in Special Issue
In Vitro and In Silico Cytotoxic Activity of Isocordoin from Adesmia balsamica Against Cancer Cells
Previous Article in Journal
Purinergic System Transcript Changes in the Dorsolateral Prefrontal Cortex in Suicide and Major Depressive Disorder
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Computational Approach to Predictive Modeling Using Connection-Based Topological Descriptors: Applications in Coumarin Anti-Cancer Drug Properties

1
Faculty of Science, Universiti Brunei Darussalam, Jln Tungku Link, Gadong BE1410, Brunei
2
Department of Mathematics, Science Faculty, King Abdulaziz University, Jeddah 21589, Saudi Arabia
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(5), 1827; https://doi.org/10.3390/ijms26051827
Submission received: 23 January 2025 / Revised: 18 February 2025 / Accepted: 19 February 2025 / Published: 20 February 2025
(This article belongs to the Special Issue From Nature to Medicine: Exploring Natural Products for New Therapies)

Abstract

Cheminformatics bridges chemistry, computer science, and information technology to predict chemical behaviors using quantitative structure–property relationships (QSPRs). This study advances QSPR modeling by introducing novel connection-based graphical invariants, specifically designed to enhance the predictive accuracy for physicochemical properties (PCPs) of benzenoid hydrocarbons (BHs). Employing cutting-edge computational methods, we evaluate these invariants against established descriptors in modeling the normal boiling point and standard heat of formation. The findings reveal superior predictive performance by newly proposed invariants, such as the sum-connectivity connection index, outperforming traditional indices like the Zagreb connection indices. Furthermore, we extend these methods to model the physicochemical properties of coumarin-related anti-cancer drugs, demonstrating their potential in drug development. The statistical analysis suggests that the most appropriate structure–property models are nonlinear. This work not only proposes robust tools for PCP estimation but also advocates for rigorous testing of descriptors to ensure relevance in cheminformatics.

1. Introduction

1.1. Structure–Property Modeling and Topological Descriptors

Quantitative structure–property relationship (QSPR) studies [1] play a vital role in theoretical chemistry, enabling the estimation of thermodynamic and physicochemical properties of molecular structures, and particularly, organic compounds of these predictive models are built upon advanced mathematical and computational methodologies [2]. The origins can be traced back to Harold Wiener [3], who introduced the concept of the “path number”, representing the sum of pairwise distances, to estimate the boiling points of alkanes. This measure was later formalized as the Wiener index in graph theory. Today, QSPR modeling relies on structure-based molecular descriptors [4], which provide the necessary mathematical frameworks. Among these descriptors, topological indices—also referred to as graph-related molecular invariants—have been widely studied [5]. These invariants transform the molecular structure [6], excluding hydrogen atoms, into numerical values that capture essential chemical characteristics. To accurately predict physicochemical properties such as boiling point and heat of formation [7], graphical invariants utilize regression equations [8] that encode the structural characteristics of a compound and its underlying chemical information. For diverse applicability of graph theory and network analysis in other areas such as communication and mobile networks, we refer to [9,10,11,12].
Graphical invariants can be categorized into various types based on the structural properties they incorporate, including degree-related descriptors [13,14], distance-based indices [15], spectral descriptors derived from graph matrices [16], and counting-related polynomials and invariants [17]. With the continuous introduction of new invariants [18], many of them lack substantial chemical relevance [4]. Addressing this issue requires strict criteria for evaluating newly proposed descriptors. Gutman and Tošović [19] highlighted that the proliferation of graphical invariants without rigorous assessment has resulted in an excessive number of descriptors, many of which are unnecessary. This underscores the need to systematically examine emerging families of graphical descriptors for their effectiveness in structure–property modeling, ensuring that only the most efficient ones are advanced while eliminating those that do not contribute meaningfully.
Recent research in mathematical chemistry has increasingly focused on evaluating graphical invariants to determine their effectiveness in predicting physicochemical and thermodynamic properties. This process involves filtering out less reliable descriptors while highlighting those with superior predictive power. The groundwork for this comparative approach was laid by Gutman and Tošović [19], whose study emphasized the importance of systematically assessing molecular descriptors. Their methodology was later extended by Malik et al. [20], who applied it to benzenoid hydrocarbons (BHs), expanding beyond the initial work on isomeric octanes. Additionally, Hayat et al. [21] explored degree-based invariants for their potential to estimate the total π -electronic energy E π in BHs. This line of research was further developed in the context of distance-based descriptors [22,23], and subsequent investigations have examined the predictive power of eigenvalue-related graphical invariants [24].
This paper studies a novel family of graphical invariants known as connection-based graphical invariants and conduct a comparative testing to computer their potential to predict PCPs of BHs. Following Gutman and Tošović [19], the standard heat of formation Δ h f and normal boiling point ρ b p were chosen to represent PCPs. Some new connection-based indices have also been proposed. A computer-dependent computational method is presented to calculate all existing connection-based invariants and contemporary statistical tools such as multiple regression analysis are employed afterwards to rule out insignificant indices while putting forward the ones which deserve further attention in QSPR models. The experimental analysis showcased that the general sum-connectivity connection index outperforms all the connection-based invariants, whereas the well-studied two Zagreb connection indices deliver poor performance. The results in this paper contribute to the avoidance of proliferation of graphical invariants.

1.2. Coumarin-Related Compounds

Coumarins are crystalline, colorless polyphenolic compounds classified under oxygenated heterocyclic substances. These compounds were first discovered by Vogel in 1820, who isolated them from the Fabaceae plant Dipteryx odorata Willd., commonly known as “coumaroun” [25]. Oxygenated heterocyclic compounds are a group that includes furan derivatives, containing four carbon atoms, and pyran derivatives, which have five carbon atoms. Furan derivatives are uncommon in plants, but pyran derivatives are more widespread, forming the backbone of numerous compounds like ketones, including α -pyrones and γ -pyrones. When pyran derivatives fuse with benzene in plants, they yield secondary metabolites known as benzo- α -pyrones (coumarins) and benzo- α -pyrones (chromones) [26].
Coumarin (1,2-benzopyrone or 2H-1-benzopyran-2-one) and its derivatives are plant-derived compounds that can be found as glycosides (heterosides) or in their unbound form. To date, nearly 800 naturally occurring coumarin derivatives have been discovered across approximately 600 genera within over 100 plant families [27]. These compounds are commonly found in seeds, roots, and leaves [28], with significant concentrations in plant families like Rutaceae and Apiaceae, which belong to the Dicotyledonae class of the Spermatophyta division. While the majority of natural coumarins are produced by vascular plants, certain types, such as novobiocin, coumermycin, and aflatoxin, are generated by microbial sources [29].
Due to their diverse biological activities, coumarin derivatives have attracted significant attention in recent years. Research has highlighted their potential in various therapeutic areas, including antitumor applications [30], photochemotherapy, anti-HIV activity [31], antibacterial and antifungal properties [32], anti-inflammatory effects [33], and anticoagulant activity through the inhibition of VKOR (vitamin K epoxide reductase) [34]. They also exhibit triglyceride-lowering effects [35] and act as central nervous system stimulants [36]. Hydroxycoumarins, in particular, are noted for their strong antioxidant capabilities, protecting against oxidative stress by neutralizing reactive oxygen species [37]. Furthermore, coumarins with reduced estrogenic activity have been identified, enabling their use in alleviating menopausal symptoms [38]. On the other hand, some coumarin derivatives are employed as flavor enhancers in tobacco products, as noted in [39].
Recently, Timmanaikar et al. [40] employed certain graphical indices in structure–property modeling of coumarin and related compounds. Using several degree-based molecular descriptors, including the Balban index and connective eccentric index (CEI), the study models various physicochemical properties of these compounds, such as boiling point and vapor pressure. Findings reveal that the Balban index and CEI are particularly effective in predicting these properties with high accuracy, highlighting their potential as robust tools in the design of anti-cancer drugs. This work underscores the relevance of topological indices in drug discovery, facilitating efficient and cost-effective molecular analysis. One of the limitations of their work is that their study’s reliance on specific molecular descriptors, like the Balban index and CEI, limits its applicability to broader descriptor classes or additional pharmacological properties.
In this paper, we essentially extended their work to address the aforementioned limitation.

2. Results and Discussion

This section delivers a detailed analysis of MLCs in Section 3.3.
The very first unexpected outcome that we observe is that the two multiplicative Zagreb connection descriptors, i.e., Π c i ( i = 1 ,   2 ) , showcase a considerably poor performance as the MLC for both Π c i ( i = 1 ,   2 ) is < 0.7 , which in structure–property studies is considered very poor. Thus, the study by Javaid et al. [41], who proposed the two multiplicative Zagreb connection invariants, has no meaningful applicative potential from a structure–property studies perspective. We discourage authors from further investigating these two descriptors. Furthermore, the MLC value for the 2nd Zagreb connection invariant M c 2 is < 0.9 , which from the perspective of mathematical investigation is feasible; however, it does not find popularity among researchers.
Notably, the performance of the 1st Zagreb connection invariant M c 1 with MLC ρ = 0.9138 seems reasonable.
Secondly, among the newly proposed connection-based invariants, we observe that the Randić R c and the general Randić R c β connection descriptors with β = 1 ,   2 ,   2 are less-efficient. However, the general Randić connection descriptor R c 1 delivers a reasonable performance with ρ = 0.9164 . The only other newly proposed connection invariant is the augmented Zagreb connection invariant, i.e., A Z I c with MLC ρ < 0.9 . Apart from these connection invariants that deliver poor performance, all other newly proposed connection invariants showcase significantly improved efficiency. For instance, the general sum-connectivity S C c β connection index with β = 1 records the strongest MLC of ρ = 0.9336 and this connection index is among the newly proposed graphical connection-dependent invariants.
The strong potential of the newly proposed connection invariants, such as S C c 1 , motivates us to look for other potential connection invariants. Our experiment shows that the general sum-connectivity S C c β connection index with β = 1 , the sum-connectivity connection descriptor S C c , the atom-bond-connectivity A B C c connection index, the geometric-arithmetic G A c connection invariant, and the arithmetic–geometric A G c connection index are among the top five connection-based graphical invariants for estimating the physicochemical characteristics of BHs. It is noteworthy to observe that all of these five best connection descriptors are the newly proposed connection-based invariants, which ultimately justify considering introducing new connection graphical invariants. Table 1 depicts the list of the five best connection-related graphical invariants.
Next, we conduct a detailed statistical analysis of the five best connection-related graphical invariants. First, we put forward appropriate data-fitting multiple linear regression (MLR) models between the PCPs ρ b p and Δ h f and the five best connection-related graphical invariants. Table 2 delivers the most appropriate data-fitting MLRs with 95% confidence intervals for the intercept and the two X-variables. Moreover, Table 2 computes the standard error of estimation and the determination coefficient for the top five connection-related graphical invariants.
We performed the leave-one-out cross validation (LOOCV) method on the data and generated LOOCV root mean squared errors L O O C V R M S E corresponding to the predictions in Table 2. Next, we deliver these cross-validation results.
Table 3 reports L O O C V R M S E for the predictions by the top five connection indices in Table 2. Note that L O O C V R M S E in Table 3 are fairly close to s values in Table 2 which shows the efficiency of our predictive models.
Figure 1 present the plots for scattering data between the PCPs ρ b p and Δ h f and the five best connection-related graphical invariants.
Note that the general sum-connectivity connection index ( S C c β ) has been identified as the best predictor among the tested topological descriptors. This superior predictive power can be attributed to several key theoretical and structural factors:
  • Balanced Sensitivity to Molecular Connectivity 
    Unlike degree-based or distance-based indices, S C c β incorporates both local and global molecular connectivity by summing the inverse square roots of vertex connections. This ensures a smooth variation across different molecular structures, preventing excessive dependence on extreme values (as seen in multiplicative indices like Π c 1 and Π c 2 ).
  • Mathematical Robustness and Stability 
    The formula for S C c β
    S C c β = i j E Ω con v i + con v j β
    allows for flexible tuning via the parameter β , enabling it to capture nonlinear structure–property relationships more effectively than rigid indices like the Zagreb connection indices. Empirical results indicate that the β = 1 case provides optimal correlation, likely due to its ability to moderate the influence of high-degree vertices while maintaining significant contributions from lower-degree vertices.
  • Strong Correlation with van der Waals Interactions 
    The sum-connectivity framework aligns well with intramolecular interactions, particularly van der Waals forces and dispersion interactions. These weak but cumulative effects influence key physicochemical properties like boiling points and heat of formation, which were the primary test variables in our study.
  • Empirical Evidence from Regression Models 
  • S C c β achieved the highest multiple correlation coefficient ( ρ = 0.9336 ) across all tested indices.
  • Regression analysis demonstrated that quadratic and cubic models using S C c β provided the best predictive accuracy, suggesting that its mathematical form aligns well with nonlinear structure–property relationships.

2.1. Structure–Property Modeling of Coumarin-Related Anti-Cancer Drugs

2.1.1. Coumarin-Related Compounds as Potential Anti-Cancer Drugs

Coumarin-related compounds, derived from the benzopyrone family, have garnered significant attention in medicinal chemistry for their diverse biological activities [29], particularly their anti-cancer properties. These compounds exhibit a range of pharmacological actions, including anti-proliferative, pro-apoptotic, and anti-angiogenic effects, which make them promising candidates for cancer therapy [30]. The anti-cancer potential of coumarins is attributed to their ability to modulate various molecular targets, such as inhibiting tyrosine kinases, disrupting cell cycle progression, and inducing oxidative stress in cancer cells. Additionally, coumarins can act as chemosensitizers, enhancing the efficacy of conventional chemotherapy drugs and overcoming drug resistance in certain cancer types.
Coumarins also show potential for selective toxicity, targeting cancer cells while sparing healthy tissues. Their versatility allows for structural modifications, enabling the development of derivatives with enhanced potency and specificity against different cancer types [30]. Examples include esculetin and umbelliferone, which exhibit notable anti-cancer activity through the inhibition of cell signaling pathways and the suppression of metastasis. Furthermore, coumarin derivatives have demonstrated synergistic effects when combined with other anti-cancer agents, highlighting their utility in combination therapy strategies. These compounds represent a promising area of research for developing novel and effective treatments for a wide range of malignancies.
Coumarins exhibit a variety of structural types that contribute to their diverse biological activities [26]. Simple coumarins, the most basic form, consist of a benzopyrone core and are widely found in nature. Furanocoumarins, characterized by a fused furan ring, are known for their photoreactive properties and are often studied for their anti-cancer and anti-inflammatory effects. Pyranocoumarins, with an additional pyran ring, exhibit enhanced lipophilicity and improved bioavailability [28], making them suitable for pharmaceutical applications. Pyrone-substituted coumarins, in which the lactone moiety is modified, demonstrate unique biochemical interactions that broaden their therapeutic potential. These structural variations allow for extensive functional diversity, enabling tailored applications in the design of anti-cancer drugs.
In this paper, we consider 25 contemporary anti-cancer drugs and conduct structure–property modeling of their physicochemical properties. These drugs and their transformed molecular graphs are delivered in Table 4, Table 5 and Table 6. Moreover, Table 4 shows simple coumarin-related compounds, whereas Table 5 (resp. Table 6) records furanocoumarins (resp. pyranocoumarins and pyrone-substituted coumarins) considered in this work. The data of these coumarin compounds were taken from Küpeli et al. [42] who conducted a structure–property study on these compounds. For more on the chemistry of these compounds, we refer to [43,44].
For our structure–property predictive modeling, we consider a diverse range of physicochemical properties, including boiling point (BP) in °C at 760 mmHg, molar volume (MV) in m3/mol, enthalpy of vaporization (E) in kJ/mol, density (D) in g/cm3, surface tension (ST) cm3, vapor pressure (VP) in mmHg at 25 °C, molar refractivity (MR) in A2, index of refraction (IR) in cm3, flash point (FP) in °C, polarizability (P) in dyne/cm, and polar surface area (PSA) cm3. We retrieved the experimental data of these physicochemical properties from the open source http://www.chemspider.com/ (accessed on 15 February 2025). Table 7 delivers the experimental values of these properties for the selected 25 coumarin-related anti-cancer drugs.

2.1.2. Structure–Property Modeling of Physicochemical Properties

Now, we conduct a detailed correlation and regression analysis of the coumarin-related drugs from Table 4, Table 5 and Table 6 with their physicochemical properties in Table 7. We employ the top five connection-based topological descriptors from Table 1 as our predictors.
With three types of regression models, linear, quadratic, and cubic, we evaluated the relationship between molecular descriptors and their hyper-counterparts with 11 essential physicochemical properties of anti-cancer drugs derived from coumarins. The objective of this study was to determine how well these descriptors could predict properties that are crucial to the effectiveness and stability of anti-cancer drugs. In order to evaluate the quality of each model, two key statistical parameters were used: the correlation coefficient (r), a measure of the strength and direction of linear relationships between variables, and the standard error of estimate (s), a measure of how accurate regression models are. As linear models evolved into cubic models, their complexity and accuracy increased.
The linear regression model provided insight into the relationship between molecular descriptors and drug properties, with molar refractivity (MR) and polarizability (P) showing the highest correlation coefficients. According to Table 8, these properties have a direct and significant linear relationship, suggesting their predictability through linear modeling.
Quadratic regression improves fitting compared to linear models for several properties, such as molar volume (MV) and polarizability (P). According to Table 9, these improvements suggest that the physicochemical properties of these drugs change nonlinearly with changes in molecular descriptors. By including squared terms, the model can capture a wider range of dynamics and variances in data that are not captured by linear models.
As shown in Table 10, the cubic regression model, which incorporates third-degree terms, provides the highest level of precision. It was particularly effective in capturing intricate dependencies on flash point (FP) and molar volume (MV), where higher-order interactions between molecular descriptors are evident. Based on its advanced fit for such properties, the cubic model suggests that some physicochemical traits are influenced by complex interactions that are only captured by higher-order models.
Gradually moving from linear to cubic regressions provides a deeper understanding of drug characteristics. They provide crucial insight into drug efficacy and stability, which is essential for designing optimized anti-cancer drugs. Advances in statistical techniques make it possible to comprehend and predict pharmaceutical behaviors more accurately. This methodology enhances drug development by exploring molecular descriptors and drug properties. It demonstrates the importance of applying advanced statistical tools to pharmaceutical research to improve drug efficacy and patient outcomes by predicting and refining drug properties based on molecular descriptors.

2.1.3. Regression Models for FP, MR, MV, and P

The best cubic regression model is obtained with the geometric-arithmetic connection index:
F P = 25.112 ( G A c ) 1.000 ( G A c ) 2 + 0.15 ( G A c ) 3 22.874 , r = 0.926 , s = 25.971 , f = 42.250 .
This model is based on the arithmetic–geometric connection index:
F P = 23.407 ( G A c ) 0.882 ( G A c ) 2 + 0.013 ( G A c ) 3 18.121 , r = 0.926 , s = 26.029 , f = 42.028 .
Coumarins and their related compounds represent the best overall model for anti-cancer drugs. They are shown in Figure 2.
The best cubic regression model is obtained with the sum-connectivity connection index:
M R = 28.662 ( S C c ) 2.105 ( S C c ) 2 + 0.160 ( S C c ) 3 12.465 , r = 0.984 , s = 4.660 , f = 208.971 .
The best linear and quadratic regression model is obtained with the sum-connectivity connection index:
M R = 20.485 ( S C c ) 2.863 , r = 0.983 , s = 4.481 , f = 677.655 .
M R = 19.069 ( S C c ) + 0.156 ( S C c ) 2 0.038 , r = 0.984 , s = 4.566 , f = 326.389 .
Coumarins and their related compounds represent the best overall model for anti-cancer drugs, as shown in Figure 3.
The best cubic regression model is obtained with the sum-connectivity connection index:
M V = 82.583 ( S C c ) 6.572 ( S C c ) 2 + 0.495 ( S C c ) 3 36.530 , r = 0.954 , s = 22.237 , f = 70.461 .
The best linear and quadratic regression model is obtained with the sum-connectivity connection index:
M V = 56.763 ( S C c ) 5.7903 , r = 0.954 , s = 21.299 , f = 230.309 .
M V = 52.884 ( S C c ) + 0.426 ( S C c ) 2 + 1.945 , r = 0.954 , s = 21.753 , f = 110.422 .
Coumarins and their related compounds represent the best overall model for anti-cancer drugs.
They are shown in Figure 4.
The best cubic, linear, and quadratic regression model is obtained with the sum-connectivity connection index, as follows:
P = 10.955 ( S C c ) 0.741 ( S C c ) 2 + 0.057 ( S C c ) 3 4.390 , r = 0.984 , s = 1.852 , f = 208.023 . P = 8.124 ( S C c ) 1.133 , r = 0.983 , s = 1.781 , f = 674.950 . P = 7.536 ( S C c ) + 0.065 ( S C c ) 2 + 0.039 , r = 0.984 , s = 1.814 , f = 325.296 .
Coumarins and their related compounds represent the best overall model for anti-cancer drugs.
The regression models are shown in Figure 5.
Although applying these results in drug discovery pipelines demands a separate study, we recall a seminal work by Estrada et al. [45], which addresses this gap of structure–property modeling by graphical descriptors and drug discovery research. The reader is referred to this work for an illustration of the role of graphical descriptors in drug discovery research.

3. Materials and Methods

3.1. Mathematical Preliminaries

A graph Ω is a pair ( V Ω , E Ω ) in which V Ω is the vertex set and E Ω V Ω 2 . The valency/degree deg x of a vertex x V Ω is defined as deg x = { z V Ω : x z E Ω } . The distance/geodesic dis ( x , z ) between two vertices x , z V Ω has the definition dis ( x , z ) : = min { ( P x , z ) } , where ( P x , z ) is the length (number of edges traversing) by the path P x , z (chain of vertices connecting x to z). Based on the geodesic, we define the connection con x of vertex x V Ω as con x : = { z V G : dis ( x , z ) = 2 } , i.e., the number of vertices at distance two from x.
A graphical invariant is said to be connection-based if it is structured on the vertices’ connection. The next subsection surveys all the existing connection invariants. Some new connection-based indices have also been put forward.

Connection-Based Graphical Invariants

Based on the connection of vertices, the first two connection-based graphical descriptors were proposed by Ali and Trinajstić [46]. They defined the first Zagreb connection index as follows:
M c 1 = i j E Ω con v i + con v j
They also introduced a degree-connection-based descriptor called the modified first Zagreb connection index. It is defined as
M d c 1 = i V Ω deg v i con v i
Ali et al. [47] studied M d c 1 of certain T-sum graphs. Ali and Trinajstić [46] also studied the applicability of these descriptors in cheminformatics. Immediately after its conception, Ducoffe et al. [48] derived extremal graphs corresponding to M c 1 . Tang et al. [49] in 2019 put forward the second Zagreb connection index as follows:
M c 2 = i j E Ω con v i con v j
They proved some results corresponding to M c i ( i = 1 , 2 ) for certain derived graphs, such as semi-total point/line graph etc. Cao et al. [50] studied molecular graphs with respect to some operation for Zagreb connection indices. Javaid et al. [41] delivered the multiplicative version of two Zagreb connection invariants. They are defined as follows:
Π c 1 = i j E Ω con v i + con v j
Π c 2 = i j E Ω con v i con v j
Moreover, these multiplicative Zagreb connection invariants were further investigated for wheel-related graphs.
Note that Diudea et al. ([8], Chapter 4) presented a rationale to be followed while proposing a graphical descriptor. It includes a list of the following desirable attributes for constructing a graphical descriptor:
1. 
Direct structural interpretation;
2. 
Good correlation with at least one property;
3. 
Good discrimination of isomers;
4. 
Locally defined;
5. 
Generalizable to higher analogs;
6. 
Linearly independent;
7. 
Simplicity;
8. 
Not trivially related to other indices;
9. 
Efficiency of construction;
10.
Based on familiar structural concepts;
11.
Show a correct size dependence;
12.
Gradual change with gradual change in structures.
We observed that most of the existing connection-based graphical descriptors failed to comply with these aforementioned attributes. For instance, the modified first Zagreb connection index M d c 1 and the two multiplicative Zagreb connection indices Π c i ( i = 1 , 2 ) fail to comply with attributes such as numbers 4, 5, and 6. Building upon this limitation, we introduce novel connection graphical indices efficiently complying with the theoretical foundation delivered by Diudea et al. ([8], Chapter 4).
Note that connection-based descriptors are introduced based on their degree-based counterparts; see [46]. Following this, we further enhanced this study by proposing connection descriptors based on other degree-based graphical invariants. Note that the references cited here are regarding the corresponding degree-based graphical invariants.
The following expression delivers the Randić [51] connection index R c of Ω .
R c = i j E Ω 1 con v i × con v j .
For β R { 0 } , the general Randić [52] connection index R c β of Ω has the following mathematical formula:
R c β = i j E Ω con v i × con v j β .
Note that R c 1 2 = R c .
Next, we put forward the sum-connectivity [53] connection index S C c
S C c = i j E Ω 1 con v i + con v j ,
and the general sum-connectivity [54] connection index S C c β
S C c β = i j E Ω con v i + con v j β
where β R { 0 } . Note that S C c 1 2 = S C c .
The atom-bond connectivity [55] connection index A B C c possesses the defining structure:
A B C c = i j E Ω con v i + con v j 2 con v i × con v j .
Next, we introduce the augmented Zagreb [56] connection index A Z c as follows:
A Z I c = i j E Ω con v i × con v j con v i + con v j 2 3 .
The geometric–arithmetic [57] and the arithmetic–geometric [58] connection index has the mathematical expression
G A c = i j E Ω 2 con v i × con v j con v i + con v j .
and
A G c = i j E Ω con v i + con v j 2 con v i × con v j .
respectively.
The reduced Randić [59] connection R R c and the reduced reciprocal Randić [60] connection index R R R c are defined as
R R c = i j E Ω con v i × con v j .
and
R R R c = i j E Ω ( con v i 1 ) ( con v j 1 ) .
respectively.
Finally, the Sombor [61] connection index S O c is defined as
S O c = i j E Ω con v i 2 + con v j 2 .
In order to explain these connection-based indices, we consider the example of a chemical graph H in Figure 6.
The graph H in Figure 6 has order 8. The connection con v 4 of the vertex v 4 , for instance, is calculated as
con v 4 = | { v 2 , v 6 , v 8 } | = 3 .
That is, there are three vertices v 2 , v 6 , v 8 which are at distance 2 from v 4 . Thus, we have con v 4 = 3 . In a similar fashion, by calculating the connections of all the vertices of H, let us calculate the connection index A Z I c as follows:
A Z I c ( H ) = a b E con a × con b con a + con b 2 3 .
For graph H in Figure 6, its AZI index can be calculated as follows:
A Z I c ( H ) = con v 1 con v 2 con v 1 + con v 2 2 3 + con v 2 con v 3 con v 2 + con v 3 2 3 + + con v 7 con v 8 con v 7 + con v 8 2 3 = 2 × 2 2 + 2 2 3 + 2 × 3 2 + 3 2 3 + + 1 × 2 1 + 2 2 3 = 59.3906 .
Figure 7 explains the workflow of employing graphical descriptors in structure–property modeling.
Next, we deliver computational details applied in this study.

3.2. Computational Methods

This section is dedicated to presenting a computer-dependent computing technique to calculate connection-based graphical indices presented in Section 3.1.
The method makes use of three software packages simultaneously. This includes a computational chemistry software called HyperChem (version 8.0) [62], a mathematical platform to conduct matrix analysis, i.e., MatLab (version R2024b) [63], and TopoCluj (version 1.1) [64], a molecular topology platform. HyperChem is a comprehensive molecular modeling and computational chemistry software widely used in academic and industrial research. Its functionalities span various aspects of molecular modeling, including molecular mechanics, quantum chemistry, molecular dynamics, and visualization. For instance, regarding molecular modeling (resp. quantum chemistry), it performs model building, 3D visualization, etc. (resp. Density Functional Theory (DFT), ab initio methods, etc.). On the other hand, for molecular mechanics (resp. spectroscopy), its ability to conduct energy minimization (resp. UV-Vis and IR spectra, NMR spectroscopy, etc.) possesses significant efficiency. TopoCluj is a specialized software designed for calculating topological descriptors from topological matrices and polynomials. These descriptors are essential in the study of molecular characteristics as well as structures, particularly in the field of computational chemistry and molecular graph theory. MATLAB (short for “Matrix Laboratory”) is an interactive user-friendly environment and a high-level programming language delivered by MathWorks, primarily used for algorithmic development, data analysis, numerical visualization, and numerical computing.
Next, we provide a rationale for selecting the three platforms, i.e., HyperChem, MATLAB, and TopoCluj:
  • HyperChem: HyperChem is a widely used computational chemistry software known for its robust molecular modeling and visualization tools. Unlike other molecular modeling software (such as Gaussian (version 16) or Spartan (version 9)), HyperChem offers the following:
  • User-friendly interface for building and optimizing molecular structures.
  • Real-time visualization of molecular properties and transformations.
  • Efficient quantum mechanics and molecular mechanics calculations, which are particularly useful for cheminformatics applications.
  • MATLAB: MATLAB was chosen for its advanced numerical computing and matrix operations, which are essential for processing topological descriptors. Compared to alternatives like Python 3.13.2 (NumPy, SciPy) or R 4.4.2, MATLAB R2024b offers:
  • Highly optimized built-in matrix operations, crucial for computing large-scale graph-based descriptors.
  • Statistical and regression modeling capabilities, enabling precise correlation analysis.
  • Seamless integration with other scientific tools, ensuring flexibility in extending the analysis.
  • TopoCluj: TopoCluj is a specialized molecular topology software designed for calculating topological indices from molecular graphs. It was preferred over general-purpose tools like ChemOffice or Open Babel due to the following:
  • Dedicated algorithms for computing topological descriptors, reducing computational complexity.
  • Efficient processing of molecular graphs, making it highly suited for cheminformatics applications.
  • Compatibility with standard cheminformatics workflows, ensuring consistency in descriptor computations.
Here, we deliver our proposed 3-step computational method to compute connection-related descriptors for a given molecular graph Ω :
Step 1
Use the HyperChem drawing module to construct a 3D molecular graph of Ω . It delivers a file with the .hin extension.
Step 2
Feed the .hin file to TopoCluj to compute the distance matrix of Ω and generate the .m file corresponding to the .hin file.
Step 3
Compute all connection-related descriptors (from Section 3.1) by inputting .m to MatLab and employing our code written in MatLab.
Our step-by-step computational method is depicted in Figure 8.
We have made our MatLab code public by employing GitHub platform. Click the GitHub link in order to access all the data.

3.3. Data Analysis

This section delivers all the data, their usage, and implications in structure–property modeling.
The first step is to select test properties as representatives of physicochemical properties. Following the seminal work of Gutman and Tošović [19], the normal boiling point ρ b p and standard heat-of-formation Δ h f were selected as test physicochemical characteristics. The selection of ρ b p is justified, as it represents van der Waals/intramolecular-type reciprocations. Moreover, the justification for opting Δ h f is that the standard heat-of-formation delivers representation for thermochemical characteristics. Note that ρ b p of a substance is the temperature at which its vapor pressure equals atmospheric pressure (1 atmosphere or 101.3 kPa) at sea level. Each substance has a unique normal boiling point depending on its molecular properties. Moreover, Δ h f is the change in enthalpy when one mole of a compound is formed from its constituent elements in their standard states under standard conditions (usually 298 K and 1 atm pressure).
Next, as representatives of benzenoid hydrocarbons (BHs), we select 22 lower BHs as our test molecules. This has previously been performed by Hayat and Khan [65] and Hayat et al. [22]. Note that selecting lower BHs for this kind of testing is motivated by Lučić et al. [66], who considered 30 lower BHs for determining the predictive ability of two degree-based graphical descriptors for the total π -electronic energy of BHs. We have considered the lower 22 derivatives because of the limitation of public availability of the experimental data of ρ b p and Δ h f . Hayat and Khan selected lower 22 BHs in their comparative analysis to test the quality of eigenvalue-related degree descriptors. In addition, Hayat et al. [22] opted for 22 lower BHs to investigate the prediction ability of distance-related graph-theoretic invariants for physicochemical characteristics of BHs. The close correlation values of both ρ b p and Δ h f for the initial 22 members of BHs strengthen the justification for their selection in this study. Moreover, we find the number 22 sufficient in order to validate our statistical inferences. The experimental data of ρ b p and Δ h f for some BHs were retrieved from NIST’s standard data repository [67], while we consulted Allison and Burgess [7], Dias [68], and Nikolić et al. [69] for the remaining BHs.
Figure 9 showcased the graphical structures of the 22 initial members of BHs selected as test molecules. Since these are just the graphical representations of the 3D molecular structure of BHs, the aromaticity is omitted. The next step is to compute numerical values of the connection-related graphical invariants in Section 3.1. In order to do that, the computational method explained in Section 3.2 has been employed. Moreover, Table 11 delivers experimental values of both ρ b p and Δ h f for lower 22 BHs. The implementation of the proposed method in Section 3.2 for the first/second Zagreb and multiplicative Zagreb connection invariants delivers the data in Columns 4–7. Although we omit the data for the other connection invariants, it is noteworthy to say that those indices can be computed similarly.
The final step of this section is to employ the multiple linear correlation (MLC) between a given connection index C I and the two chosen PCPs, i.e., ρ b p and Δ h f . We compute ρ : = ρ ( ρ b p , Δ h f ; C I ) , i.e., the multiple linear correlation between X 1 = ρ b p , X 2 = Δ h f , and a connection invariant Y = C I by using the data in Table 11. The data analysis toolpack of MS Excel is utilized for this computation. Table 12 delivers the data of MLC values. Note that the general sum-connectivity S C c β and Randić R c β connection indices has a generic parameter β R { 0 } . Thus, for a meaningful analysis, we select β { ± 1 , ± 2 } as test values of β R { 0 } .
In the next section, we deliver a detailed analysis of the data in Table 12 and mention the top five best connection-based graphical invariants for predicting PCPs of BHs, meanwhile mentioning the ones that do not deserve further attention from the researchers.

4. Conclusions

4.1. Contributions

  • Introduced novel connection-based graphical invariants for predicting physicochemical properties (PCPs).
  • Demonstrated superior performance of the general sum-connectivity index over traditional indices.
  • Applied models to benzenoid hydrocarbons (BHs) and coumarin-related anti-cancer drugs, advancing QSPR modeling methods.

4.2. Study Implications

  • Enhanced predictive accuracy for boiling points, heat-of-formation, and key PCPs.
  • Offered scalable tools for drug development and molecular screening.
  • Provided a foundation for cheminformatics applications in material science and pharmaceuticals.

4.3. Limitations

  • Descriptor Selection Bias: The focus on connection-based descriptors may underrepresent quantum chemical and steric effects, requiring hybrid models for broader applicability.
  • Dataset Constraints: This study is limited to benzenoid hydrocarbons and coumarins, restricting generalizability to other chemical classes.
  • Modeling Assumptions: Standard environmental conditions and smooth regression models may not fully capture real-world molecular behavior.
  • Potential Overfitting: The small dataset may lead to overfitting in statistical models, necessitating regularization techniques.

4.4. Future Study

  • Expand datasets to include diverse molecular structures.
  • Explore additional properties like toxicity, solubility, and pharmacokinetics.
  • Integrate machine learning for enhanced predictive performance.
  • Investigate real-world applications in drug discovery and advanced material design.

Author Contributions

Conceptualization, S.H. and S.W.; methodology, S.H. and S.W.; software, S.W.; validation, S.H.; formal analysis, S.W.; investigation, S.H.; resources, S.W.; data curation, S.H. and S.W.; writing—original draft preparation, S.H.; writing—review and editing, S.W.; visualization, S.W.; supervision, S.W. and S.H.; project administration, S.W. and S.W.; funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. (GPIP: 580-247-2024). The authors, therefore, acknowledge with thanks DSR for their technical and financial support.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Datasets generated or analyzed during the current study are publicly available at https://github.com/Sakander/Connection-based-invariants (accessed on 15 February 2025).

Acknowledgments

The authors are grateful to all the reviewers for their remarks and constructive criticism which significantly improved the submitted version of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Katritzky, A.R.; Petrukhin, R.; Tatham, D.; Basak, S.; Benfenati, E.; Karelson, M.; Maran, M. Interpretation of quantitative structure- property and- activity relationships. J. Chem. Inf. Comput. Sci. 2001, 41, 679–685. [Google Scholar] [CrossRef]
  2. Basak, S.C.; Mills, D. Quantitative structure- property relationships (QSPRS) for the estimation of vapor pressure: A hierarchical approach using mathematical structural descriptors. J. Chem. Inf. Comput. Sci. 2001, 41, 692–701. [Google Scholar] [CrossRef]
  3. Wiener, H. Structural determination of the paraffin boiling points. J. Am. Chem. Soc. 1947, 69, 17–20. [Google Scholar] [CrossRef] [PubMed]
  4. Gutman, I.; Furtula, B. (Eds.) Novel Molecular Structure Descriptors—Theory and Applications; University of Kragujevac: Kragujevac, Serbia, 2010; Volume 1–2. [Google Scholar]
  5. Balaban, A.T.; Motoc, I.; Bonchev, D.; Mekenyan, O. Topological indices for structure-activity corrections. Top. Curr. Chem. 1983, 114, 21–55. [Google Scholar]
  6. Gutman, I.; Polansky, O.E. Mathematical Concepts in Organic Chemistry; Springer: New York, NY, USA, 1986. [Google Scholar]
  7. Allison, T.C.; Burgess Jr, D.R. First-principles prediction of enthalpies of formation for polycyclic aromatic hydrocarbons and derivatives. J. Phys. Chem. A 2015, 119, 11329–11365. [Google Scholar] [CrossRef]
  8. Diudea, M.V.; Gutman, I.; Lorentz, J. Molecular Topology; Nova, Huntington: Butler, PA, USA, 2001. [Google Scholar]
  9. Cai, D.; Fan, P.; Zou, Q.; Xu, Y.; Ding, Z.; Liu, Z. Active device detection and performance analysis of massive non-orthogonal transmissions in cellular Internet of Things. Sci. China Inf. Sci. 2022, 65, 182301. [Google Scholar] [CrossRef]
  10. Guo, Y.; Zhao, R.; Lai, S.; Fan, L.; Lei, X.; Karagiannidis, G.K. Distributed machine learning for multiuser mobile edge computing systems. IEEE J. Sel. Top. Signal Process. 2022, 16, 460–473. [Google Scholar] [CrossRef]
  11. Tan, C.; Cai, D.; Fang, F.; Ding, Z.; Fan, P. Federated unfolding learning for CSI feedback in distributed edge networks. IEEE Trans. Commun. 2025, 73, 410–424. [Google Scholar] [CrossRef]
  12. Zheng, S.; Shen, C.; Chen, X. Design and analysis of uplink and downlink communications for federated learning. IEEE J. Sel. Areas Commun. 2021, 39, 2150–2167. [Google Scholar] [CrossRef]
  13. Gutman, I. Degree-based topological indices. Croat. Chem. Acta 2013, 86, 351–361. [Google Scholar] [CrossRef]
  14. Hayat, S.; Imran, M. On topological properties of nanocones CNCk[n]. Stud. UBB Chem. 2014, 59, 113–128. [Google Scholar]
  15. Xu, K.; Liu, M.; Das, K.C.; Gutman, I.; Furtula, B. A survey on graphs extremal with respect to distance-based topological indices. MATCH Commun. Math. Comput. Chem. 2014, 71, 461–508. [Google Scholar]
  16. Consonni, V.; Todeschini, R. New spectral indices for molecular description. MATCH Commun. Math. Comput. Chem. 2008, 60, 3–14. [Google Scholar]
  17. Hosoya, H. On some counting polynomials in chemistry. Discret. Appl. Math. 1988, 19, 239–257. [Google Scholar] [CrossRef]
  18. Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics; Wiley-VCH: Weinheim, Germany, 2009; Volume 1–2. [Google Scholar]
  19. Gutman, I.; Tošović, J. Testing the quality of molecular structure descriptors. Vertex-degree-based topological indices. J. Serb. Chem. Soc. 2013, 78, 805–810. [Google Scholar] [CrossRef]
  20. Malik, M.Y.H.; Binyamin, M.A.; Hayat, S. Correlation ability of degree-based topological indices for physicochemical properties of polycyclic aromatic hydrocarbons with applications. Polycycl. Aromat. Compd. 2022, 42, 6267–6281. [Google Scholar] [CrossRef]
  21. Hayat, S.; Khan, S.; Khan, A.; Liu, J.-B. Valency-based molecular descriptors for measuring the π-electronic energy of lower polycyclic aromatic hydrocarbons. Polycycl. Aromat. Compd. 2022, 42, 1113–1129. [Google Scholar] [CrossRef]
  22. Hayat, S.; Khan, S.; Imran, M.; Liu, J.-B. Quality testing of distance-based molecular descriptors for benzenoid hydrocarbons. J. Mol. Struct. 2020, 1222, 128927–128935. [Google Scholar] [CrossRef]
  23. Hayat, S.; Khan, S.; Khan, A.; Imran, M. Distance-based topological descriptors for measuring the π-electronic energy of benzenoid hydrocarbons with applications to carbon nanotubes. Math. Methods Appl. Sci. 2020; early view. [Google Scholar] [CrossRef]
  24. Hayat, S.; Khan, S.; Khan, A.; Imran, M. A computer-based method to determine predictive potential of distance-spectral descriptors for measuring the π-electronic energy of benzenoid hydrocarbons with applications. IEEE Access 2021, 9, 19238–19253. [Google Scholar] [CrossRef]
  25. Bruneton, J. Immunotoxicity of Epicutaneously Applied Anti-Coagulant Rodenticide Warfarin; Intercept Ltd.: Hampshire, UK, 1999; pp. 245–263. [Google Scholar]
  26. Lacy, A.; O’kennedy, R. Studies on coumarins and coumarin-related compounds to determine their therapeutic role in the treatment of cancer. Curr. Pharm. Des. 2004, 10, 3797–3811. [Google Scholar] [CrossRef] [PubMed]
  27. Murray, R.D.H.; Mendez, J.; Brown, S.A. The Natural Coumarins Occurrence, Chemistry and Biochemistry; John Wiley and Sons Ltd.: New York, NY, USA; Chichester, UK, 1982. [Google Scholar]
  28. Lake, B. Synthesis & pharmacological investigation of 4-hydroxy coumarin derivatives & shown as anti-coagulant. Food Chem. Tox. 1999, 3, 412–423. [Google Scholar]
  29. Cooke, D.; Fitzpatrick, B.; O’Kennedy, R.; McCormack, T.; Egan, D. Coumarin: Biochemical Profile and Recent Developments; John Wiley & Sons: New York, NY, USA, 1997; Volume 3, pp. 311–322. [Google Scholar]
  30. Fylaktakidou, K.C.; Hadipavlou-Litina, D.J.; Litinas, K.E.; Nicolaides, D.N. Natural and synthetic coumarin derivatives with anti-inflammatory/antioxidant activities. Curr. Pharm. Des. 2004, 10, 3813–3833. [Google Scholar] [CrossRef] [PubMed]
  31. Harvey, R.G.; Cortex, C.; Ananthanarayan, T.P.; Schmolka, S. A new coumarin synthesis and its utilization for the synthesis of polycyclic coumarin compounds with anticarcinogenic properties. J. Org. Chem. 1988, 53, 3936–3943. [Google Scholar] [CrossRef]
  32. Al-Haiza, M.A.; Mostafa, M.S.; El-Kady, M.Y. Synthesis and biological evaluation of some new coumarin derivatives. Molecules 2003, 8, 275–286. [Google Scholar]
  33. Tosun, A.; Kupeli, E.; Yesilada, E. Anti-inflammatory and antinociceptive activity of coumarins from seseli gummiferum subsp. corymbosum (Apiaceae). Z. Nat. C 2009, 64, 56–62. [Google Scholar] [CrossRef] [PubMed]
  34. Hoult, J.R.; Paya, M. Pharmacological and biochemical actions of simple coumarins: Natural products with therapeutic potential. Gen. Pharmacol. 1996, 27, 713–722. [Google Scholar] [CrossRef]
  35. Madhavan, G.R.; Balraju, V.; Malleshasm, B.; Chakrabarti, R.; Lohray, V.B. Novel coumarin derivatives of heterocyclic compounds as lipid-lowering agents. Bioorg. Med. Chem. Lett. 2003, 13, 2547–2551. [Google Scholar] [CrossRef] [PubMed]
  36. Moffet, R.S. Central nervous system depressants. VII. Pyridyl coumarins. J. Med. Chem. 1964, 7, 446–449. [Google Scholar] [CrossRef] [PubMed]
  37. Paya, M.; Halliwell, B.; Hoult, J.R. Interactions of a series of coumarins with reactive oxygen species, Scavenging of superoxide, hypochlorous acid and hydroxyl radicals. Biochem. Pharmacol. 1992, 44, 205–214. [Google Scholar] [CrossRef] [PubMed]
  38. Murray, R.D. The naturally occurring coumarins. Fortschr. Chem. Org. Naturst. 2002, 83, 1. [Google Scholar]
  39. Fentem, J.H.; Fry, J.R. Species differences in the metabolism and hepatotoxicity of coumarin. Comp. Biochem. Physiol. C 1993, 104, 1–8. [Google Scholar] [CrossRef]
  40. Timmanaikar, S.T.; Hayat, S.; Hosamani, S.M.; Banu, S. Structure-property modeling of coumarins and coumarin-related compounds in pharmacotherapy of cancer by employing graphical topological indices. Eur. Phys. J. E 2024, 47, 31. [Google Scholar] [CrossRef] [PubMed]
  41. Javaid, M.; Ali, U.; Siddiqui, K. Novel connection based Zagreb indices of several wheel-related graphs. Comput. J. Combinat. Math. 2021, 1, 1–28. [Google Scholar]
  42. Küpeli, A.E.; Genç, Y.; Karpuz, B.; Sobarzo-Sánchez, E.; Capasso, R. Coumarins and coumarin-related compounds in pharmacotherapy of cancer. Cancers 2020, 12, 1959. [Google Scholar] [CrossRef] [PubMed]
  43. Basanagouda, M.; Jambagi, V.B.; Barigidad, N.N.; Laxmeshwar, S.S.; Devaru, V. Synthesis, structure–activity relationship of iodinated-4-aryloxymethyl-coumarins as potential anti-cancer and anti-mycobacterial agents. Eur. J. Med. Chem. 2014, 74, 225–233. [Google Scholar] [CrossRef]
  44. Makandar, S.B.N.; Basanagouda, M.; Kulkarni, M.V.; Pranesha; Rasal, V.P. Synthesis, antimicrobial and DNA cleavage studies of some 4-aryloxymethylcoumarins obtained by reaction of 4-bromomethylcoumarins with bidentate nucleophiles. Med. Chem. Res. 2012, 21, 2603–2614. [Google Scholar] [CrossRef]
  45. Estrada, E.; Uriarte, E. Recent advances on the role of topological indices in drug discovery research. Curr. Med. Chem. 2001, 8, 1573–1588. [Google Scholar] [CrossRef] [PubMed]
  46. Ali, A.; Trinajstić, N. A novel/old modification of the first Zagreb index. Mol. Inform. 2018, 37, 1800008. [Google Scholar] [CrossRef]
  47. Ali, U.; Javaid, M.; Kashif, A. Modified Zagreb connection indices of the T-sum graphs. Main Group Met. Chem. 2020, 43, 43–55. [Google Scholar] [CrossRef]
  48. Ducoffe, G.; Marinescu-Ghemeci, R.; Obreja, C.; Popa, A.; Tache, R.M. Extremal graphs with respect to the modified first Zagreb connection index. In Proceedings of the 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, Romania, 20–23 September 2018; pp. 141–148. [Google Scholar]
  49. Tang, J.H.; Ali, U.; Javaid, M.; Shabbir, K. Zagreb connection indices of subdivision and semi-total point operations on graphs. J. Chem. 2019, 2019, 9846913. [Google Scholar] [CrossRef]
  50. Cao, J.; Ali, U.; Javaid, M.; Huang, C. Zagreb connection indices of molecular graphs based on operations. Complexity 2020, 2020, 7385682. [Google Scholar] [CrossRef]
  51. Randić, M. On characterization of molecular branching. J. Am. Chem. Soc. 1975, 97, 6609–6615. [Google Scholar] [CrossRef]
  52. Amić, D.; Bešlo, D.; Lucić, B.; Nikolić, S.; Trinajstić, N. The vertex-connectivity index revisited. J. Chem. Inf. Comput. Sci. 1998, 38, 819–822. [Google Scholar] [CrossRef]
  53. Zhou, B.; Trinajstić, N. On a novel connectivity index. J. Math. Chem. 2009, 46, 1252–1270. [Google Scholar] [CrossRef]
  54. Zhou, B.; Trinajstić, N. On general sum-connectivity index. J. Math. Chem. 2010, 47, 210–218. [Google Scholar] [CrossRef]
  55. Estrada, E.; Torres, L.; Rodríguez, L.; Gutman, I. An atom-bond connectivity index: Modelling the enthalpy of formation of alkanes. Indian J. Chem. 1998, 37A, 849–855. [Google Scholar]
  56. Furtula, B.; Graovac, A.; Vukičević, D. Augmented Zagreb index. J. Math. Chem. 2010, 48, 370–380. [Google Scholar] [CrossRef]
  57. Vukičević, D.; Furtula, B. Topological index based on the ratios of geometrical and arithmetical means of end-vertex degrees of edges. J. Math. Chem. 2009, 46, 1369–1376. [Google Scholar] [CrossRef]
  58. Shegehalli, V.S.; Kanabur, R. Arithmetic-geometric indices of path graph. J. Math. Comput. Sci. 2015, 16, 19–24. [Google Scholar]
  59. Favaron, O.; Mahéo, M.; Saclé, J.F. Some eigenvalue properties of graphs (conjectures of Grafitti-II). Discret. Math. 1993, 111, 197–220. [Google Scholar] [CrossRef]
  60. Manso, F.C.; Júnior, H.S.; Bruns, R.E.; Rubira, A.F.; Muniz, E.C. Development of a new topological index for the prediction of normal boiling point temperatures of hydrocarbons: The Fi index. J. Mol. Liq. 2012, 165, 125–132. [Google Scholar] [CrossRef]
  61. Gutman, I. Geometric approach to degree-based topological indices: Sombor indices. MATCH Commun. Math. Comput. Chem. 2021, 86, 11–16. [Google Scholar]
  62. HyperChem Package Release 7.5 for Windows; Hypercube Inc.: Gainesville, FL, USA, 2002.
  63. MATLAB 8.0 and Statistics Toolbox 8.1; The MathWorks, Inc.: Natick, MA, USA, 2023.
  64. Diudea, M.V.; Ursu, O.; Nagy, C.L. Topocluj; Babes-Bolyai University: Cluj, Romania, 2002. [Google Scholar]
  65. Hayat, S.; Khan, S. Quality testing of spectrum-based valency descriptors for polycyclic aromatic hydrocarbons with applications. J. Mol. Struc. 2021, 1228, 129789. [Google Scholar] [CrossRef]
  66. Lučić, B.; Trinajstić, N.; Zhou, B. Comparison between the sum-connectivity index and product- connectivity index for benzenoid hydrocarbons. Chem. Phys. Lett. 2009, 475, 146–148. [Google Scholar] [CrossRef]
  67. NIST Standard Reference Database. Available online: http://webbook.nist.gov/chemistry/ (accessed on 15 February 2025).
  68. Dias, J.R. Handbook of Polycyclic Hydrocarbons. Part A: Benzenoid Hydrocarbons; Elsevier: Amsterdam, The Netherlands, 1987. [Google Scholar]
  69. Nikolić, S.; Trinajstić, N.; Baučić, I. Comparison between the vertex- and edge–connectivity indices for benzenoid hydrocarbons. J. Chem. Inf. Comput. Sci. 1998, 38, 42–46. [Google Scholar] [CrossRef]
Figure 1. Plot for scattering data between the PCPs ρ b p and Δ h f and the five best connection-related graphical invariants.
Figure 1. Plot for scattering data between the PCPs ρ b p and Δ h f and the five best connection-related graphical invariants.
Ijms 26 01827 g001
Figure 2. Regression models for FP with the best fit to the data.
Figure 2. Regression models for FP with the best fit to the data.
Ijms 26 01827 g002
Figure 3. Appropriate regression models for MR with the data.
Figure 3. Appropriate regression models for MR with the data.
Ijms 26 01827 g003
Figure 4. Data-fitting regression models for MV with the data.
Figure 4. Data-fitting regression models for MV with the data.
Ijms 26 01827 g004
Figure 5. Regression models for P that best fit the data.
Figure 5. Regression models for P that best fit the data.
Ijms 26 01827 g005
Figure 6. A graph H = ( V , E ) with the vertex set V = { v 1 , v 2 , , v 8 } and edge set E = { v 1 v 2 , v 2 v 3 , v 3 v 4 , v 4 v 5 , v 5 v 6 , v 4 v 7 , v 7 v 8 } .
Figure 6. A graph H = ( V , E ) with the vertex set V = { v 1 , v 2 , , v 8 } and edge set E = { v 1 v 2 , v 2 v 3 , v 3 v 4 , v 4 v 5 , v 5 v 6 , v 4 v 7 , v 7 v 8 } .
Ijms 26 01827 g006
Figure 7. A framework of workflow for this study.
Figure 7. A framework of workflow for this study.
Ijms 26 01827 g007
Figure 8. Step-by-step algorithmic steps of the computational method.
Figure 8. Step-by-step algorithmic steps of the computational method.
Ijms 26 01827 g008
Figure 9. The 2D graphs of the 22 initial members of BHs selected as test molecules.
Figure 9. The 2D graphs of the 22 initial members of BHs selected as test molecules.
Ijms 26 01827 g009
Table 1. The top five connection-related graphical invariants.
Table 1. The top five connection-related graphical invariants.
PositionConnection-Based IndexMLC Value
1 S C c 1 , Equation (8) with β = 1 0.9336
2 S C c , Equation (7) 0.9318
3 A B C c , Equation (9) 0.9310
4 G A c , Equation (11) 0.9279
5 A G c , Equation (12) 0.9272
Table 2. The most appropriate data-fitting MLR models for the top 5 connection-related graphical invariants and the chosen PCPs ρ b p and Δ h f .
Table 2. The most appropriate data-fitting MLR models for the top 5 connection-related graphical invariants and the chosen PCPs ρ b p and Δ h f .
Connection IndexMLR ModelStatistics
S C c 1 S C c 1 = 0 . 8551 ± 0.4702 + 0 . 0018 ± 0.0033 ρ b p + 0 . 0018 ± 0.0094 Δ h f r 2 = 0.8716 , s = 0.2644
S C c S C c = 1 . 3475 ± 1.4110 + 0 . 0105 ± 0.0098 ρ b p + 0 . 0079 ± 0.0166 Δ h f r 2 = 0.8683 , s = 0.7935
A B C c A B C c = 1 . 5566 ± 2.5411 + 0 . 0223 ± 0.0177 ρ b p + 0 . 0081 ± 0.0299 Δ h f r 2 = 0.8667 , s = 1.4290
G A c G A c = 1 . 4568 ± 4.2730 + 0 . 0416 ± 0.0297 ρ b p + 0 . 0044 ± 0.0503 Δ h f r 2 = 0.8610 , s = 2.4029
A G c A G c = 1 . 4305 ± 4.4032 + 0 . 0425 ± 0.0306 ρ b p + 0 . 0047 ± 0.0519 Δ h f r 2 = 0.8597 , s = 2.4761
Table 3. The leave-one-out cross validation (LOOCV) root mean squared errors for the top five connection indices.
Table 3. The leave-one-out cross validation (LOOCV) root mean squared errors for the top five connection indices.
Descriptor LOOCV RMSE
S C c 1 0.2800
S C c 0.84530
A B C c 1.5162
G A c 2.5931
A G c 2.6490
Table 4. Chemical structures and their corresponding molecular graphs for certain simple coumarins.
Table 4. Chemical structures and their corresponding molecular graphs for certain simple coumarins.
Cancer DrugMolecular GraphCancer DrugMolecular Graph
Ijms 26 01827 i001Ijms 26 01827 i002Ijms 26 01827 i003Ijms 26 01827 i004
Ijms 26 01827 i005Ijms 26 01827 i006Ijms 26 01827 i007Ijms 26 01827 i008
Ijms 26 01827 i009Ijms 26 01827 i010Ijms 26 01827 i011Ijms 26 01827 i012
Ijms 26 01827 i013Ijms 26 01827 i014Ijms 26 01827 i015Ijms 26 01827 i016
Table 5. Some furanocoumarins and their transformed molecular graphs.
Table 5. Some furanocoumarins and their transformed molecular graphs.
Cancer DrugMolecular GraphCancer DrugMolecular Graph
Ijms 26 01827 i017Ijms 26 01827 i018Ijms 26 01827 i019Ijms 26 01827 i020
Ijms 26 01827 i021Ijms 26 01827 i022Ijms 26 01827 i023Ijms 26 01827 i024
Ijms 26 01827 i025Ijms 26 01827 i026Ijms 26 01827 i027Ijms 26 01827 i028
Ijms 26 01827 i029Ijms 26 01827 i030Ijms 26 01827 i031Ijms 26 01827 i032
Table 6. Chemical structures of some pyranocoumarins and pyrone-substituted coumarins.
Table 6. Chemical structures of some pyranocoumarins and pyrone-substituted coumarins.
Cancer DrugMolecular GraphCancer DrugMolecular Graph
Ijms 26 01827 i033Ijms 26 01827 i034Ijms 26 01827 i035Ijms 26 01827 i036
Ijms 26 01827 i037Ijms 26 01827 i038Ijms 26 01827 i039Ijms 26 01827 i040
Ijms 26 01827 i041Ijms 26 01827 i042Ijms 26 01827 i043Ijms 26 01827 i044
Ijms 26 01827 i045Ijms 26 01827 i046Ijms 26 01827 i047Ijms 26 01827 i048
Ijms 26 01827 i049Ijms 26 01827 i050
Table 7. The physicochemical properties of the selected coumarin-related anti-cancer drugs.
Table 7. The physicochemical properties of the selected coumarin-related anti-cancer drugs.
DrugsBPMVEDSTVPMRIRFPPPSA
Alternariol384.6186.763.31.242.70.962.51.584161.924.836
Angelicin362.6134.060.91.455.50.849.91.667173.119.839
Bergapten412.4158.066.51.452.01.056.61.635203.222.449
Coumestrol406.0167.468.31.679.91.069.41.768199.327.580
Daphnetin430.4114.071.21.675.61.143.51.689184.517.367
Daphnin670.0202.6103.41.792.62.277.41.689252.430.7146
Dicumarol620.7213.896.71.680.71.985.41.731231.933.993
Esculetin469.7114.076.01.675.61.243.51.689201.517.367
Esculin697.7202.6107.31.792.62.377.41.689262.830.7146
Gravelliferone454.3267.174.11.143.31.287.51.569184.934.747
Herniarin335.3141.157.81.244.40.746.41.572138.618.436
Imperatorin448.3217.570.71.246.71.175.01.606224.929.749
Isobergapten412.4158.066.51.452.01.056.61.635203.222.449
Isopimpinellin448.7182.070.71.449.61.163.31.612225.125.158
Limettin388.1165.163.71.243.00.953.11.557176.321.145
Novobiocin876.2431.0133.41.470.60.0155.31.640483.761.6196
Pimpinellin441.0182.069.81.449.61.163.31.612220.525.158
Psoralen362.6134.060.91.455.50.849.91.667173.119.839
Seselin403.0186.765.41.242.70.962.51.584170.524.836
Skimmin632.0204.298.21.681.41.975.51.661239.329.9126
Umbelliferon382.1115.565.51.459.50.941.61.640181.216.547
Visnadin477.7307.274.21.349.51.299.31.560206.939.488
Warfarin515.2235.882.91.358.71.484.41.635188.833.564
Xanthotoxin414.8158.066.81.452.01.056.61.635204.722.449
Xanthyletin340.0133.958.41.346.40.745.41.593159.518.029
Table 8. The R 2 values for the linear regression model.
Table 8. The R 2 values for the linear regression model.
DescriptorDBPVPEFPIRMRPSAPSTMV
S C c 1 0.0040.6480.0000.6380.7070.0030.9670.5940.9670.0790.909
S C c 0.0160.6700.0000.6620.7120.0140.9500.6310.9500.1120.870
A B C c 0.0240.6780.0010.6710.7110.0220.9370.6460.9370.1290.847
G A c 0.0330.6740.0020.6660.7030.0320.9190.6520.9190.1420.821
A G c 0.0260.6680.0010.6600.7090.0240.9310.6400.9310.1270.840
Table 9. The R 2 values for the quadratic regression model.
Table 9. The R 2 values for the quadratic regression model.
DescriptorDBPVPEFPIRMRPSAPSTMV
S C c 1 0.0080.6490.5080.6410.8260.0050.9670.5990.9670.0860.909
S C c 0.0240.6720.5130.6670.8400.0200.9510.6370.9510.1230.872
A B C c 0.0350.6810.5140.6770.8440.0300.9390.6520.9390.1420.850
G A c 0.0430.6780.5080.6750.8430.0400.9240.6600.9240.1540.827
A G c 0.0360.6730.5030.6680.8430.0310.9350.6480.9350.1370.844
Table 10. The R 2 values for the cubic regression model.
Table 10. The R 2 values for the cubic regression model.
DescriptorDBPVPEFPIRMRPSAPSTMV
S C c 1 0.0500.6490.5370.6420.8490.0350.9680.6040.9670.1510.910
S C c 0.1340.6750.5720.6720.8540.1080.9520.6610.9520.2740.875
A B C c 0.1720.6860.5880.6860.8570.1430.9410.6860.9410.3290.856
G A c 0.2090.6820.5890.6820.8580.1840.9280.6960.9280.3680.840
A G c 0.1750.6770.5790.6750.8570.1510.9360.6810.9360.3280.851
Table 11. Experimental values of both ρ b p and Δ h f for lower 22 BHs and their first/second Zagreb and multiplicative Zagreb connection indices.
Table 11. Experimental values of both ρ b p and Δ h f for lower 22 BHs and their first/second Zagreb and multiplicative Zagreb connection indices.
Molecule ρ bp Δ hf M c 1 M c 2 Π c 1 Π c 2
Benzene80.175.2242440964096
Naphthalene2181416496192,080,0006,879,707,136
Phenanthrene338202.71061847.46807 × 10 12 8.70713 × 10 15
Anthracene340222.61041766.29408 × 10 12 7.2139 × 10 15
Chrysene431271.11482732.86774 × 10 17 1.102 × 10 22
Benzo[a]anthracene425277.11462652.4089 × 10 17 9.13009 × 10 21
Triphenylene429275.11502882.62144 × 10 17 8.30377 × 10 21
Tetracene440310.51442562.06244 × 10 17 7.56432 × 10 21
Benzo[a]pyrene4962961843765.35488 × 10 20 5.14147 × 10 26
Benzo[e]pyrene493289.91823635.15237 × 10 20 5.39122 × 10 26
Perylene497319.21843745.62449 × 10 20 5.14147 × 10 26
Anthanthrene5473231041766.29408 × 10 12 7.2139 × 10 15
Benzo[ghi]perylene542301.22184661.04142 × 10 24 3.18346 × 10 31
Dibenzo[a,c]anthracene5353481903708.3236 × 10 21 8.70713 × 10 27
Dibenzo[a,h]anthracene5353351883549.21947 × 10 21 1.15553 × 10 28
Dibenzo[a,j]anthracene531336.31883549.21947 × 10 21 1.15553 × 10 28
Picene519336.91903621.10121 × 10 22 1.39471 × 10 28
Coronene590296.72525581.92829 × 10 27 1.97112 × 10 36
Dibenzo(a,h)pyrene596375.62244561.70556 × 10 25 5.39122 × 10 32
Dibenzo(a,i)pyrene5943662244561.70556 × 10 25 5.39122 × 10 32
Dibenzo(a,l)pyrene595393.32264731.62857 × 10 25 4.54885 × 10 32
Pyrene393221.31402701.5565 × 10 16 5.39122 × 10 20
Table 12. MLCs of connection-related graphical invariants with ρ = ρ ( ρ b p , Δ h f ; C I ) with both ρ b p and Δ h f for lower 22 BHs.
Table 12. MLCs of connection-related graphical invariants with ρ = ρ ( ρ b p , Δ h f ; C I ) with both ρ b p and Δ h f for lower 22 BHs.
Connection-Based Index ρ = ρ ( ρ bp , Δ hf ; C I )
M c 1 , Equation (1) 0.9138
M c 2 , Equation (2) 0.8938
Π c 1 , Equation (3) 0.6783
Π c 2 , Equation (4) 0.6793
R c , Equation (5) 0.7947
R c 1 , Equation (6) with β = 1 0.8938
R c 1 , Equation (6) with β = 1 0.9164
R c 2 , Equation (6) with β = 2 0.8412
R c 2 , Equation (6) with β = 2 0.7950
S C c , Equation (7) 0.9318
S C c 1 , Equation (8) with β = 1 0.9138
S C c 1 , Equation (8) with β = 1 0.9336
S C c 2 , Equation (8) with β = 2 0.8927
S C c 2 , Equation (8) with β = 2 0.9140
A B C c , Equation (9) 0.9310
A Z I c , Equation (10) 0.8892
G A c , Equation (11) 0.9279
A G c , Equation (12) 0.9272
R R c , Equation (13) 0.9143
R R R c , Equation (14) 0.9097
S O c , Equation (15) 0.9132
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hayat, S.; Wazzan, S. A Computational Approach to Predictive Modeling Using Connection-Based Topological Descriptors: Applications in Coumarin Anti-Cancer Drug Properties. Int. J. Mol. Sci. 2025, 26, 1827. https://doi.org/10.3390/ijms26051827

AMA Style

Hayat S, Wazzan S. A Computational Approach to Predictive Modeling Using Connection-Based Topological Descriptors: Applications in Coumarin Anti-Cancer Drug Properties. International Journal of Molecular Sciences. 2025; 26(5):1827. https://doi.org/10.3390/ijms26051827

Chicago/Turabian Style

Hayat, Sakander, and Suha Wazzan. 2025. "A Computational Approach to Predictive Modeling Using Connection-Based Topological Descriptors: Applications in Coumarin Anti-Cancer Drug Properties" International Journal of Molecular Sciences 26, no. 5: 1827. https://doi.org/10.3390/ijms26051827

APA Style

Hayat, S., & Wazzan, S. (2025). A Computational Approach to Predictive Modeling Using Connection-Based Topological Descriptors: Applications in Coumarin Anti-Cancer Drug Properties. International Journal of Molecular Sciences, 26(5), 1827. https://doi.org/10.3390/ijms26051827

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop