Next Article in Journal
Clinical Prediction and Spatial Statistical Analysis of Ascending Thoracic Aortic Aneurysm Structure
Previous Article in Journal
Study of Performance from Hierarchical Decision Modeling in IVAs Within a Greedy Context
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Graph-Theoretical Approach to Bond Length Prediction in Flavonoids Using a Molecular Graph Model

by
Moster Zhangazha
1,*,
Alex Somto Arinze Alochukwu
1,2,
Elizabeth Jonck
2,
Ronald John Maartens
2,
Eunice Mphako-Banda
2,
Simon Mukwembi
2 and
Farai Nyabadza
1,3
1
Department of Mathematics and Applied Mathematics, University of Johannesburg, Johannesburg 2006, South Africa
2
School of Mathematics, University of the Witwatersrand, Johannesburg 2050, South Africa
3
Institute of Applied Research and Technology, Emirates Aviation University, Dubai International Academic City, Dubai 53044, United Arab Emirates
*
Author to whom correspondence should be addressed.
Math. Comput. Appl. 2026, 31(1), 9; https://doi.org/10.3390/mca31010009
Submission received: 17 September 2025 / Revised: 13 November 2025 / Accepted: 17 November 2025 / Published: 9 January 2026

Abstract

The accurate determination of bond lengths is fundamental to understanding molecular geometry and the physicochemical behavior of chemical compounds. However, obtaining these measurements is often challenging, as both experimental techniques and advanced quantum-chemical methods are complex, computationally demanding, and costly to apply across diverse molecular systems. In this work, we present a novel graph-theoretical model for predicting bond lengths in flavonoid molecules based on molecular descriptors derived from atomic and topological parameters. By integrating atomic electronegativity with graph-based descriptors, such as the weighted second-order neighborhood, the proposed model predicts the bond lengths of luteolin with a coefficient of determination of R 2 = 0.990 . This approach offers a computationally efficient and highly accurate alternative to conventional experimental and theoretical methods, providing a practical framework for bond length estimation when experimental data are unavailable.
MSC:
05C12; 05C90; 92E10

1. Introduction

Understanding molecular geometry is essential for elucidating chemical structure and bonding. In particular, the presence of specific bonds, such as N–H and C=O, plays a fundamental role in biological processes [1]. These bonds influence molecular size, shape, and reactivity, making accurate determination of bond lengths crucial.
Bond length, defined as the distance between the nuclei of two covalently bonded atoms, underpins much of modern chemistry. Experimental determination of bond lengths at the atomic scale remains challenging, with historical data being relatively scarce. Before 1987, most experimental bond-length measurements were obtained via X-ray and neutron diffraction, microwave spectroscopy, and electron diffraction; for a comprehensive compilation of these early data, see Allen et al. [2]. More recent technological advances have enabled alternative techniques, for instance, Khalaji et al. [3].
The theoretical investigation of bond lengths dates back to the early 20th century. As early as 1920, Wyckoff and Huggins investigated bond length data in relation to covalent radii, highlighting both the effectiveness and the inherent limitations of the approach [4]. A detailed historical overview is provided by Pauling [5], who proposed a quantitative relationship between bond length and bond order. In Pauling’s formulation, bond order is defined as the ratio of the total number of bonds to the number of bond groups, yielding remarkably accurate predictions of bond lengths. The following two models, originally presented by Pauling [6] and Coulson [7], remain widely used:
R i j P = R 1 ( R 1 R 2 ) F P i j P ( F 1 ) P i j P + 1 ,
where R 1 and R 2 are the single- and double-bond lengths, respectively, F is the force constant ratio, and P i j P is the Pauling bond order. Typical parameter values are R 1 = 1.540 Å, R 2 = 1.330 Å, and F = 3 .
R i j C = s s d 1 + f s f d 1 P i j C P i j C ,
where s and d denote single- and double-bond lengths, f s and f d their force constants (typically 2.48 × 10 5 and 4.90 × 10 5 dyn/cm, respectively), and P i j C the Coulson bond order.
In 1941, Schomaker and Stevenson introduced an empirical formula based on electronegativity differences [8]:
d A B = r A + r B C | x A x B | ,
where r A and r B are covalent radii in picometers, x A and x B their electronegativities, and C = 9 pm. While Equation (3) generally agrees with experiments, it relies heavily on a single fluorine radius and does not consider other bonding factors.
Numerous experimental studies have investigated bond lengths in biologically and chemically significant molecules. For instance, Harsasius examined the geometry of uracil and its methylated derivatives, while Rossi analyzed the crystal structure of quercetin [9]. Despite this extensive data, several semi-empirical methods (e.g., MINDO/2, CNDO/2, STO-3G) have been criticized for their limited accuracy [1]. As a result, theoretical approaches that correlate bond lengths with molecular properties such as electronegativity, covalent radii, and bond order have gained increasing attention [10,11].
Density Functional Theory (DFT) has emerged as the standard method for predicting molecular bond lengths, typically achieving accuracies within 0.01–0.02 of experimental values for organic compounds. Another widely used approach is the parametric method 6, a semi-empirical quantum-chemical method commonly applied for bond length optimization and molecular geometry evaluation. Although both methods provides high accuracy, they remain computationally intensive for larger molecular systems. Consequently, while DFT is preferred when sufficient computational resources are available, graph theoretic approaches offer a computationally efficient alternative that retains chemical interpretability.
More recently, the application of graph theory to chemical structures has offered new insights. In this framework, atoms are represented as vertices and covalent bonds as edges, allowing the use of topological indices to predict molecular geometry and reactivity.
The graphical representation of a compound is frequently referred to as a c h e m i c a l   g r a p h or m o l e c u l a r   g r a p h . See, for example, Figure 1.
The depiction of chemical compounds through molecular graphs, first proposed by Sir Arthur Cayley in 1874 [13], serves as the primary application of graph theory in the field of chemistry. This depiction connects physical characteristics and biological functions with the analogues.
In this study, we are concerned with predicting the bond length of the chemical compound in the group of flavonoids called luteolin with the experimental X-ray bond length data presented by Cox et al. in [14]. In this study, we use graph theory, a branch of mathematics, to create a multi-variable model to predict the bond lengths between atoms in the chemical compound luteolin. We use the parameters with a high correlation with the bond length to develop the mathematical model, and the model provides an alternative means of predicting bond lengths without conducting experiments.
This paper is organized as follows: The subsequent section introduces the graph theoretic parameters. Section 3 details the methodology employed, and the effectiveness and applicability of the model are evaluated in Section 4. Finally, Section 5 presents the conclusion and discussion of this paper.

2. Graph and Chemical Parameters

This section outlines the fundamental graph-theoretical and chemical parameters employed in the development of the proposed models.
A graph G = ( V , E ) is a mathematical structure comprising a finite set of vertices V and a set of edges E, where each edge is a two-element subset of V. The degree of a vertex v, denoted deg ( v ) , is the number of edges incident to v. The distance between two vertices u and v, written as d G ( u , v ) , is defined as the length of the shortest path connecting u and v in G. The eccentricity of a vertex v, denoted e c ( v ) , is the greatest distance from v to any other vertex in the graph.
A vertex u is said to be a neighbor of vertex v if there exists an edge between them, and the set of all such neighbors is called the neighborhood of v, denoted by N G ( v ) . More generally, the k th neighborhood, N G k ( v ) , refers to the set of all vertices at a distance of at most k from v.
The atomic radius of an atom v i is defined as the minimum distance between its nucleus and outermost electron shell. The covalent radius  r i of an atom v i is defined as half the distance between the nuclei of two identical atoms bonded together covalently. Data for covalent radius was obtained from WebElements, maintained by Mark Winter at the University of Sheffield (UK) [15]. For a bond between atoms v i and v j , we define κ = r i + r j as the sum of their covalent radii. Electronegativity, e i , refers to an atom’s tendency to attract bonding electrons in a covalent bond. Pauling’s electronegativity was used in this research, and the data were obtained from the National Center for Biotechnology Information (NCBI) website [16]. For atoms v i and v j forming a bond, we define:
(i)
α = e i · e j , the product of their electronegativities,
(ii)
δ = ( e i · e c ( v i ) ) + ( e j · e c ( v j ) ) , the sum of weighted eccentricities,
(iii)
σ = ( e i · e c ( v i ) ) · ( e j · e c ( v j ) ) , the product of weighted eccentricities.
Here, the term weighted eccentricity is defined as the product of an atom’s eccentricity in the chemical graph and its electronegativity.
Valency describes an atom’s bonding capacity and is the number of electrons it must gain or lose to achieve a stable configuration. For simplicity, a single bond is denoted by 1, a double bond by 2, and a triple bond by 3. We define a valency-based parameter β = 2 λ which helps identify bond types (single, double, or triple bonds), where λ represents the bond order, taking values of 1 , 2 or 3.
We also introduce kth-neighborhood-based descriptors:
(i)
ν k , the weighted kth neighborhood of the edge { v i , v j } with respect to electronegativity,
(ii)
γ k , the weighted kth neighborhood of the edge { v i , v j } with respect covalent radii.

3. Methodology and Model Development

3.1. Model Quality Assessment

The reliability and predictive performance of the proposed multiple linear regression (MLR) models are typically assessed using a combination of statistical measures and diagnostic checks. One of the fundamental indicators is the coefficient of determination ( R 2 ), which represents the proportion of variance in the dependent variable that is explained by the independent variables in the model. The R 2 value ranges from 0 to 1 and is calculated using the following formula:
R 2 = 1 R S S T S S ,
where R S S represents the residual sum of squares and T S S denotes the total sum of squares. An R 2 value close to 1 indicates a strong explanatory power, while values near 0 suggest limited predictive capability. However, since R 2 can increase with the addition of more predictors, even if they are not meaningful, the adjusted R 2 is also used to penalize model complexity and provide a more robust measure of model fit. Additional metrics such as the root mean square error (RMSE), mean absolute error (MAE), and standard error of the estimate are employed to offer further insight into the model’s accuracy by quantifying the average magnitude of prediction errors. In the model development, multicollinearity among predictors is examined using the correlation matrix and the variance inflation factor (VIF), with high VIF values indicating redundancy that may compromise coefficient estimates.
Residual analysis is also performed to check for homoscedasticity (constant variance of errors), linearity, independence, and normality of residuals. Graphical tools such as residual versus fitted values plots, residuals versus the explanatory variables plots, Q-Q plots, and histogram of the residuals plots are employed to identify potential violations of assumptions or influential data points that may distort the model.

3.2. Model Development

We present two models and compare their predictions. Consider a naturally occurring compound, luteolin, whose molecular structure is shown in Figure 2 with carbon atoms numbered in black, hydrogen atoms numbered in blue, and oxygen atoms numbered in red.
The molecular graph corresponding to the structure of luteolin is illustrated in Figure 3.
The general observation from correlation analysis is that bond length is highly correlated with the sum of the covalent radii of the atoms making up a bond, the weighted second neighborhood, and their respective reciprocals, which is summarized in Table 1 below.
Having first quantified the correlations between the dependent variable and each predictor, we then examine the interrelationships among the independent variables by constructing a correlation matrix and presenting it as a heat map shown in Figure 4.
From the correlation heat map, several predictors exhibit strong correlations, indicating a risk of multicollinearity and potential overfitting if all are retained. To address this, we performed a backwards-elimination procedure, iteratively removing the least significant variables until only those with p-values below 0.05 (as reported under “ p > t ” in the stepwise regression output) remained in the final model.
The dataset provided by Cox [14] is employed to estimate the constant parameters of the proposed multiple linear regression model. To evaluate the model’s predictive performance; data from the edges C 3 C 4 , C 12 C 13 , O 1 C 7 , O 5 H 3 , C 14 H 10 , and O 3 H 7 are excluded during the model development phase. The selected data represent both bonds located on the molecular rings and those located outside the rings equally. These withheld data points are subsequently used to test the model’s ability to generalize to unseen cases.
Two predictive models were developed to achieve a coefficient of determination ( R 2 ) as close to 1 as possible. Model 1 was constructed using standard linear regression techniques, while Model 2 employed a domain-specific approach by partitioning the compound into two distinct structural regions and applying the Cobb–Douglas production function to each. The performance of both models was assessed by evaluating their predictive accuracy on a subset of edges excluded from the training dataset.

3.2.1. Model 1

We begin by representing the compound as a molecular graph, as illustrated in Figure 3, and we formulate a multi-variable linear regression model of the following form:
Y = k 0 + k 1 x 1 + k 2 x 2 + + k n x n + ϵ ,
where:
(a)
Y denotes the bond length (dependent variable),
(b)
x 1 , x 2 , , x n are the independent variables (molecular descriptors),
(c)
k 0 , k 1 , , k n are the regression coefficients to be estimated,
(d)
ϵ represents the error term.
To identify the most relevant predictors and mitigate multicollinearity, a backward elimination procedure was applied in conjunction with stepwise regression. This process yielded a reduced and optimized model of the following form:
Y = 2.3833 + 0.0761 · β 1.8500 · 1 κ + 0.0053 · ν 2 ,
where:
(a)
Y is the bond length,
(b)
β is a valency-based parameter,
(c)
κ is the sum of the covalent radii of the atoms forming the bond,
(d)
ν 2 is the weighted second neighborhood descriptor.
After removing statistically insignificant variables, we calculated the variance inflation factors (VIFs) for the remaining predictors. The results, shown in Table 2, indicate low multicollinearity among the variables.
The final regression model demonstrates strong explanatory power with:
R 2 = 0.9898 , Adjusted R 2 = 0.9884 .
  • Moreover, the OLS diagnostics presented in Table 3 confirm the model’s overall statistical significance and robustness.
Table 3. Ordinary Least Squares (OLS) regression results.
Table 3. Ordinary Least Squares (OLS) regression results.
CoefficientStd. Errort-Statisticp-ValueCI Lower (0.025)CI Upper (0.975)
valency parameter, β 0.07610.0213.62130.00140.03260.1195
1/(r1 + r2)−1.850.0715−25.85930.0−1.998−1.702
weighted 2nd Nbd, ν 2 0.00530.00124.24830.00030.00270.0078
const2.38330.078230.46740.02.22142.5451
Model Statistics
R-squared0.9898
Adj. R-squared0.9884
F-statistic742.56
Prob (F-statistic)5.0439 × 10−23
Log-Likelihood65.947
AIC−123.894
BIC−118.71
The model exhibits a high degree of explanatory power, with an R 2 value of 0.9898 and an adjusted R 2 of 0.9884, indicating that 99% of the variability in bond length is accounted for by the predictors. The F-statistic of 742.6 with a p-value less than 5.04 × 10 23 confirms the overall significance of the model.
All three predictors were found to be statistically significant at the 1% level. The coefficient for the valency parameter, β , which is the reciprocal of the bond order, is positive ( 0.0761 , p = 0.001 ). This positive coefficient indicates that as the bond order increases, the valency parameter decreases, leading to a decrease in predicted bond length. This aligns with fundamental chemical principles, where higher bond orders correspond to stronger, shorter bonds. Thus, the model correctly captures the expected inverse relationship between bond order and bond length.
The inverse sum of covalent radii exhibits a strong negative association with bond length ( 1.8500 , p < 0.001 ), aligning well with chemical intuition that larger atomic sizes yield longer bonds. Additionally, the weighted second neighborhood parameter, which captures topological and electronic effects surrounding the bond, has a positive and significant contribution ( 0.0053 , p < 0.001 ).
The regression intercept was estimated at 2.3833, and the model showed favorable information criteria (AIC = −123.9, BIC = −118.7). These results demonstrate that the selected descriptors not only have strong statistical support but also convey meaningful physical interpretations, making the model robust and interpretable for predicting bond lengths in molecular systems.
To assess the generalization of the fitted OLS model, 5-fold cross-validation was performed on the selected feature set. In each fold, the data were randomly partitioned (shuffle seed = 42) into 80 % training and 20 % testing, and the mean squared error (MSE) and coefficient of determination R 2 were recorded.
Across the five folds, the MSE ranged from 0.0003 to 0.0010 (mean = 0.0006 , ± 0.0002 ) and R 2 ranged from 0.9646 to 0.9940 (mean = 0.9811 , ± 0.0113 ). These narrow intervals indicate that the model’s predictive performance is both strong and stable across different data splits. The consistency of the R 2 value also suggests that multicollinearity and over-fitting are unlikely to bias the estimates. A full breakdown of fold-level metrics is given in Table 4.
After fitting and validating the final regression model, we proceed to verify that the classical assumptions underlying linear regression are satisfied. In particular, we assess the following:
  • Zero mean of the errors,
  • Errors have constant variance, i.e., homoscedasticity,
  • Normality of errors.
Potential outliers are identified and evaluated for exclusion. There are different ways of checking whether these assumptions are met; in this study, we primarily employ diagnostic plots (e.g., residual vs. fitted, Q–Q plots) to visually confirm that these assumptions hold. Figure 5 shows the residuals plotted against the fitted values.
Figure 6 displays a Q–Q plot comparing the sample quintiles of the observed variable with the corresponding theoretical quintiles.
The points in the Q–Q plot lie closely along the 45 degree reference line, indicating that the residuals are strongly correlated with the expected values.
Lastly, a histogram of residuals is plotted to check the distribution of the residuals, as shown in Figure 7.
The residuals exhibit the characteristic bell-shaped curve of a normal distribution, thereby confirming that the model’s assumption of normally distributed errors holds.

3.2.2. Model 2

We examine luteolin by dividing its molecular graph into two main components: ring edges and terminal edges. Ring bonds form closed loops within the molecule and share electrons across the ring, which makes their bond lengths more uniform. In contrast, terminal bonds are found at the ends of the molecule, where the bonding is more localized and mainly depends on the types of atoms involved. In flavonoids like luteolin, rings show strong electron sharing, while terminal parts have more isolated bonds, such as the carbon–oxygen double bond on ring C, which behaves more like an end group than part of the ring system. As illustrated in Figure 2, luteolin consists of two benzene rings (labeled A and B) and an oxygen-containing heterocyclic ring (labeled C) with a carbon–oxygen double bond and two hydroxyl substituents on each benzene ring. Accordingly, we define:
1.
Ring edges: All bonds forming the rings A, B, and C.
2.
Terminal edges: The carbon–oxygen single bonds external to the C ring, the oxygen–hydrogen bonds, the carbon–hydrogen bonds, and the carbon–oxygen double bond located on the C ring.
Each component is analyzed separately under the hypothesis that the graph parameters defined above govern bond lengths. Accordingly, we introduce two multiple-regression models with interaction terms, one for each component, to quantify the influence of these parameters. The constants in our models were estimated using the dataset of Zheng [17]. For the ring edges, we propose a multiple linear regression model that includes a three-way interaction term, allowing each parameter’s effect to depend simultaneously on the other two. The model for the ring edges is specified as follows:
f ( β , σ , ν 2 ) = k 0 + k 1 ν 2 + k 2 β + k 3 σ 2 + k 4 β σ ν 2
where β denotes the valency-based parameter, σ represents the product of the weighted eccentricities of the atoms forming the bond, and ν 2 corresponds to the weighted second neighborhood of the respective bond edges, calculated with respect to the electronegativities of the atoms involved. The terms k i , for i = 0 , 2 , , 4 , are model constants.
Fitting the model to the data by Cox [14], we obtain the values of the constants as follows:
k 0 = 1.0000 , k 1 = 0.2177 , k 2 = 3.3376 , k 3 = 0.0005 , k 4 = 0.0001 .
For the terminal edges, we propose a multiple linear regression model that incorporates a two-way interaction term, allowing the effect of one descriptor to depend on the other. The model is expressed as follows:
f ( κ , ν 2 ) = k 0 + k 1 κ + k 2 ν 2 + k 3 κ ν 2
where κ represents the sum of the radii of the atoms forming the bond, and ν 2 corresponds to the weighted second neighborhood of the respective bond edges, calculated with respect to the electronegativities of the atoms involved. The terms k i , for i = 0 , 1 , , 3 , are model constants.
Fitting the model to the data by Cox [14], we obtain the values of the constants as follows:
k 0 = 1.0000 , k 1 = 7.0034 , k 2 = 0.9558 , k 3 = 1.0000 .
Combining the two models (6) and (7) accurately predicts the bond length of all the edges in the compound luteolin, with R 2 = 0.9913 , adjusted R 2 = 0.9897 , and root mean square R M S E = 0.0195 .
The plots of the fit using the least square fitting method and the residuals are given in Figure 8 and Figure 9.
The model fitting plot shows a strong linear relationship between predicted and experimental bond lengths, with an R 2 value of 0.9913 , indicating that over 99 % of the variability in the experimental bond lengths is explained by the model. The points closely follow the identity line, confirming high predictive accuracy.
The residual plot, which displays residuals versus predicted values, helps assess model assumptions. Ideally, residuals should be randomly scattered around zero without any clear pattern. In this case, most residuals are tightly clustered around zero, and no clear pattern is observed.

4. Model Testing and Performance

The predictive performance of the proposed models (Model 1 and Model 2) was benchmarked against the established method of Schomaker and Stevenson [8] and quantum-chemical calculations performed at the Parametric Method 6 (PM6) level using MOPAC. This evaluation was conducted on a hold-out set of six bonds, specifically C 3 C 4 , C 12 C 13 , O 1 C 7 , O 5 H 3 , C 14 H 10 , and O 3 H 7 , which were systematically omitted from the training datasets to provide an independent test of predictive performance. The comparative results are summarized in Table 5.
With only a negligible difference between the two models’ R 2 values, we evaluated their predictive performance on the six testing edges relative to Schomaker’s model and the parametric method 6 (PM6) using adjusted R 2 , the root mean square error (RMSE), and the mean absolute error (MAE). Table 6 presents the adjusted R 2 , RMSE, and MAE for all four models. Models 1 and 2 demonstrate competitive predictive accuracy compared to the parametric method 6 and slightly superior to Schomaker’s model, with Model 1 yielding the lowest root mean square error and mean absolute error, indicating that it provides the best overall performance.
This comparison demostrates that Models 1 and 2 represent improved formulations over both the Schomaker–Stevenson model [8] and the PM6 computational method. Their superior performance suggests that the models may have captured additional interatomic interactions within the compound that were not accounted for in the earlier approaches.

Model Validation and Benchmarking

Experimental bond-length data obtained from crystallographic studies of three flavonoids were used to provide an informative validation of the proposed models. Accordingly, three comparative tables were developed to evaluate the bond lengths predicted by Model 1, Model 2, the semi-empirical PM6 method, and the Schomaker–Stevenson model against the corresponding experimental measurements. Statistical performance indicators, including the coefficient of determination ( R 2 ), root mean square error (RMSE), and mean absolute error (MAE), were employed to assess predictive accuracy quantitatively.
Table 7, Table 8 and Table 9 provides a comparative evaluation of the predictive performance of Model 1, Model 2, the Schomaker–Stevenson model, and the Parametric Method 6 (PM6) based on bond-length data for quercetin, kaempferol and -(-)epicatechin reported in [9,18,19], respectively. The comparison underscores the relative accuracy and internal consistency of the proposed graph-theoretical models when compared to semi-empirical computational approaches. Notably, Models 1 and 2 exhibit predictive accuracies closely aligned with experimental measurements, demonstrating their robustness and suggesting their potential applicability to other flavonoid compounds with comparable structural complexity.
The results indicate that both proposed models reproduce the experimental geometries with accuracy comparable to, and in some instances exceeding, that of the PM6 method. This suggests that the underlying graph-theoretical framework effectively captures essential structural features of flavonoid systems. However, the current validation is limited by the small number of experimentally characterized compounds. A more statistically robust evaluation would require expanding the dataset to include a broader range of flavonoid structures.
The predictive performance of the models was evaluated based on the adjusted R 2 , the root mean square error, and the mean absolute error, as summarized in Table 10. The comparative evaluation of the four predictive approaches across three flavonoid compounds demonstrates that both graph-theoretical models, particularly Model 2, exhibit strong predictive accuracy and consistency relative to semi-empirical methods. On average, Model 2 achieved the highest adjusted R 2 values and the lowest RMSE and MAE across all datasets, indicating superior alignment with experimental bond lengths. Model 1 also performed competitively, often surpassing the PM6 semi-empirical method in both error metrics. In contrast, the Schomaker–Stevenson model consistently displayed the weakest predictive performance. Overall, the findings suggest that the proposed graph-theoretical framework offers a reliable and broadly applicable method for representing and predicting the structural characteristics of flavonoid compounds exhibiting comparable molecular architectures.
Given the limited availability of experimental bond-length data for the compounds under study, the semi-empirical parametric method 6 (PM6) was adopted as a computational benchmark for model validation. The performance of the proposed models was evaluated by comparing the predicted bond lengths against PM6 reference values using standard statistical measures, including the coefficient of determination ( R 2 ), and the root mean square error (RMSE). Consistently high correlations demonstrate that the graph-theoretical descriptors effectively reproduce PM6-level geometric trends. Nevertheless, further validation against experimental measurements, once such data become available, remains necessary to establish the absolute predictive accuracy of the models. The predictive capability of the proposed models was further evaluated using an independent validation dataset comprising four flavonoid compounds, apigenin, genistein, hesperetin, and catechin. For each compound, the bond lengths predicted by Model 1 and Model 2 were compared against PM6-optimized geometries to assess the models’ generalization performance across structurally related flavonoids. Since the numerical trends can be clearly intepreted from the tabulated results, graphical figures were deemed unnecessary. The complete results are presented in Table 11, Table 12, Table 13, Table 14 and Table 15.

5. Discussion and Conclusions

This study presents a graph-theoretical framework for estimating bond lengths in molecular systems, demonstrated using flavonoid compounds as a case study. The results highlight the capability of mathematical models to predict key structural parameters with high accuracy, offering a computationally efficient alternative to traditional experimental and quantum-chemical methods. While experimental determination of bond lengths remains resource-intensive, the proposed models provide a reliable means of approximation based on topological and atomic descriptors.
The correlation analysis indicated that the sum of covalent radii, r 1 + r 2 , was the most significant predictor of bond length, exhibiting a correlation coefficient of r = 0.9874 , consistent with established physical principles. The weighted second-neighborhood parameter, which quantifies the electronegativity-weighted atomic environment around a bond, also contributed meaningfully by capturing local electronic delocalization effects. Together with bond order, these descriptors formed the basis of Model 1, which achieved a high predictive accuracy ( R 2 = 0.9898 ). Model 2, incorporating weighted eccentricity and neighborhood-based parameters, likewise demonstrated strong performance. These results suggest that combining atomic-scale geometric and electronic descriptors within a graph-theoretical framework can yield quantitatively accurate bond-length predictions.
In comparing predictive performance, both models achieved results comparable to the Parametric Method 6 (PM6) and superior to the classical Schomaker–Stevenson formulation. This indicates that graph-theoretic descriptors can effectively approximate quantum-derived bond-length data while offering major advantages in computational simplicity and interpretability. However, these findings are presently limited to the flavonoid compounds examined and should not be generalized beyond molecules with similar structural and electronic characteristics without further validation.
Luteolin was selected as a representative case within the subclass of polyphenolic flavonoids due to its well-characterized structure and availability of experimental bond-length data. Nonetheless, the flavonoid family is structurally diverse, encompassing numerous subclasses with varying degrees of conjugation, hydroxylation, and substitution patterns. Thus, luteolin cannot be considered representative of all flavonoids, but rather of compounds sharing analogous molecular topologies. Future work should systematically evaluate the applicability of the proposed model across broader flavonoid subclasses and other organic compound families.
While the present models demonstrate strong internal consistency and agreement with experimental data, their predictive generality remains to be established. The current dataset is limited in chemical scope, and further testing on larger, structurally diverse molecular datasets will be necessary to assess the robustness of the approach. Consequently, the proposed framework should be viewed as an initial step toward a generalizable model, pending validation across multiple compound classes.
Several refinements could enhance the current formulation. Incorporating geometric parameters such as bond angles and torsional strain, as well as energetic descriptors like bond dissociation energy, would likely improve predictive accuracy. Moreover, extending the molecular graph representation to three dimensions could capture spatial effects and conformational flexibility not accounted for in planar models. Future developments may also leverage nonlinear regression and graph-based machine learning algorithms, such as graph neural networks, to model complex, higher-order dependencies among descriptors.
Beyond bond-length prediction, the proposed framework provides a foundation for broader applications in computational chemistry and cheminformatics. The identified descriptors, sum of covalent radii, weighted neighborhood indices, bond order, and weighted eccentricity can serve as key features for quantitative structure–property or structure–activity models. These descriptors could aid in predicting molecular stability, reactivity, or biological activity when combined with experimental or simulated datasets.
In summary, this work introduces a mathematically rigorous and computationally efficient approach to molecular geometry prediction using graph-theoretical methods. The results demonstrate that topological and atomic descriptors can provide physically meaningful estimates of bond lengths for selected flavonoid compounds. Although the general applicability of the model requires further validation across diverse molecular families, the findings establish a strong conceptual and methodological basis for future exploration of graph-theoretic models in chemical structure prediction.

Author Contributions

Conceptualization, A.S.A.A., E.J., R.J.M., E.M.-B., M.Z., S.M. and F.N.; methodology, S.M., F.N., M.Z. and A.S.A.A.; software, S.M., F.N., M.Z. and A.S.A.A.; formal analysis, S.M., F.N. and M.Z.; writing—original draft preparation, M.Z. and A.S.A.A.; writing—review and editing, S.M., F.N., M.Z. and A.S.A.A.; supervision, E.J., R.J.M., E.M.-B., S.M. and F.N.; project administration, S.M. and F.N.; funding acquisition, S.M. and F.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the DSI-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), South Africa, and the IMU-CDC project grant.

Data Availability Statement

All crystallographic data used in this study are publicly available from the references cited in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Harsányi, L.; Császár, A.; Császár, P. Equilibrium geometries of uracil and its C- and N-methylated derivatives. J. Mol. Struct. Theochem 1986, 137, 207–215. [Google Scholar] [CrossRef]
  2. Allen, F.; Kennard, O.; Watson, D.; Brammer, L.; Orpen, A.; Taylor, R. Tables of bond lengths determined by X-ray and neutron diffraction. Part 1. Bond lengths in organic compounds. Chem. Soc. Perkin Trans. 1987, 2, S1–S19. [Google Scholar] [CrossRef]
  3. Khalaji, A.; Mighani, H.; Kazemnejadi, M.; Gotoh, K.; Ishida, H.; Fejfarova, K.; Dusek, M. Synthesis, characterization, crystal structure and theoretical studies on 4-bromo-2-[(E)-6- methyl-2-pyridyliminomethyl] phenol. Arabian J. Chem 2017, 10, S1808–S1813. [Google Scholar] [CrossRef]
  4. Slater, J.C. Atomic Radii in Crystals. J. Chem. Phys. 1964, 41, 31–99. [Google Scholar] [CrossRef]
  5. Pauling, L. Bond numbers and bond lengths in tetrabenzo[de,no,st,c1,d1] heptacene and other condensed aromatic hydrocarbons: A valence bond treatment. Struct. Sci. 1980, 36, 1898–1901. [Google Scholar] [CrossRef]
  6. Pauling, L.; Brockway, L.O.; Beach, J.Y. The dependence of inter atomic distance on single bond-double bond resonance. J. Am. Chem. Soc. 1935, 57, 2705–2709. [Google Scholar] [CrossRef]
  7. Coulson, C.A. The electronic structure of some polyenes and aromatic molecules. VII. Bonds of fractional orders by the molecular orbital method. Proc. R. Soc. Lond. A 1939, 169, 413–428. [Google Scholar] [CrossRef]
  8. Schomaker, V.; Stevenson, D.P. Some Revisions of the Covalent Radii and the Additivity Rule for the Lengths of Partially Ionic Single Covalent Bonds. J. Am. Chem. Soc. 1941, 63, 37–40. [Google Scholar] [CrossRef]
  9. Rossi, M.; Rickles, L.F.; Halpin, W.A. The crystal and molecular structure of quercetin: A biologically active and naturally occurring flavonoid. Bioorg. Chem 1986, 14, 55–69. [Google Scholar] [CrossRef]
  10. Hamzah, M.O. A Comparative Study of Bond Order and Bond Length Calculations of Some Conjugated Hydrocarbons using Two Different Methods. Adv. Res. Chem. Sci. 2017, 24, 29–42. [Google Scholar] [CrossRef]
  11. Lang, P.F.; Smith, B. Single Bond Lengths of Organic Molecules in the Solid State. Glob. J. Sci. Front. Res. 2015, 16, 55–60. [Google Scholar]
  12. Gutman, I.; Polansky, O.E. Chemical Graphs. In Mathematical Concepts in Organic Chemistry; Springer: Berlin/Heidelberg, Germany, 1986. [Google Scholar] [CrossRef]
  13. Cayley, F.R.S. On the mathematical theory of isomers. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1874, 47, 444–447. [Google Scholar] [CrossRef]
  14. Cox, P.; Kumarasamy, Y.; Nahar, L.; Sarker, S.; Shoeb, M. Luteolin. Acta Cryst. 2003, 59, o975–o977. [Google Scholar] [CrossRef]
  15. Winter, M. WebElements: The Periodic Table on the Web; Copyright Mark Winter, University of Sheffield: Sheffield, UK. Available online: https://www.webelements.com/ (accessed on 27 October 2025).
  16. National Center for Biotechnology Information. Electronegativity in the Periodic Table of Elements. Available online: https://pubchem.ncbi.nlm.nih.gov/periodic-table/electronegativity (accessed on 27 October 2025).
  17. Zheng, Y.; Zhou, Y.; Liang, Q.; Chen, D.; Guo, R. Theoretical studies on the hydrogen-bonding interactions between luteolin and water: A DFT approach. J. Mol. Model. 2016, 22, 257. [Google Scholar] [CrossRef]
  18. Milenković, D.; Marković, J.M.D.; Dimić, D.; Jeremić, S.; Amić, D.; Pirković, M.S.; Marković, Z.S. Structural characterization of kaempferol: A spectroscopic and computational study. Maced. J. Chem. Chem. Eng. 2019, 38, 49–62. [Google Scholar] [CrossRef]
  19. Fronczek, F.R.; Gannuch, G.; Mattice, W.L.; Tobiason, F.L.; Broeker, J.L.; Hemingway, R.W. Dipole moment, solution, and solid state structure of (–)-epicatechin, a monomer unit of procyanidin polymers. J. Chem. Soc. Perkin Trans. 1984, 2, 1611–1616. [Google Scholar] [CrossRef]
Figure 1. Example of a molecular graph, source: [12].
Figure 1. Example of a molecular graph, source: [12].
Mca 31 00009 g001
Figure 2. Luteolin molecular structure; source: [17].
Figure 2. Luteolin molecular structure; source: [17].
Mca 31 00009 g002
Figure 3. Luteolin molecular graph.
Figure 3. Luteolin molecular graph.
Mca 31 00009 g003
Figure 4. Correlation heat map for predictors.
Figure 4. Correlation heat map for predictors.
Mca 31 00009 g004
Figure 5. Residuals against predicted values plot.
Figure 5. Residuals against predicted values plot.
Mca 31 00009 g005
Figure 6. Q-Q plot.
Figure 6. Q-Q plot.
Mca 31 00009 g006
Figure 7. Histogram of residuals.
Figure 7. Histogram of residuals.
Mca 31 00009 g007
Figure 8. Model predictions compared with experimental data.
Figure 8. Model predictions compared with experimental data.
Mca 31 00009 g008
Figure 9. Residual plot.
Figure 9. Residual plot.
Mca 31 00009 g009
Table 1. Pearson correlations of model predictors.
Table 1. Pearson correlations of model predictors.
ParameterPearson’s Correlation
r1 + r20.9874
1/r1 + r2−0.9882
weighted 2nd Nbd0.8309
1/weighted 2nd Nbd−0.8397
bond order−0.3308
Wecc(a)−0.4793
Wecc(b)0.0967
Wecc(a)*Wecc(b)−0.1963
1/Wecc(a)*Wecc(b)0.1646
Table 2. Variance Inflation Factors.
Table 2. Variance Inflation Factors.
VariableVIF
valency based parameter, ( β )1.1951
1/(r1+r2), 1 κ 2.7701
weighted 2nd Nbd, ( ν 2 ) 2.6376
Table 4. Cross-validation metrics.
Table 4. Cross-validation metrics.
FoldMSER2
Fold 10.00070.9716
Fold 20.00060.9914
Fold 30.00030.9940
Fold 40.00060.9838
Fold 50.00100.9646
Mean ± Std0.0006 ± 0.00020.9811 ± 0.0113
Table 5. Detailed comparison of experimental and predicted bond lengths (in Å) across all models.
Table 5. Detailed comparison of experimental and predicted bond lengths (in Å) across all models.
BondExperimentalModel 1Model 2SchomakerPM6
C3–C41.3851.4211.4041.5401.394
C12–C131.4381.4101.3791.5401.391
C7–O11.3771.3771.3671.4201.343
C14–H100.9500.9510.9541.1091.080
O5–H30.8400.8470.8440.9880.967
O3–H70.8400.8470.8440.9880.967
R 2 0.99470.99250.98100.9905
Table 6. Comparison of model performance on excluded bonds.
Table 6. Comparison of model performance on excluded bonds.
MetricModel 1Model 2Schomaker ModelPM6
Adjusted R 2 0.98800.98130.95250.9881
RMSE0.01930.02560.13250.0936
MAE0.01320.01670.12580.0790
Table 7. Comparison of kaempferol bond lengths (in Å) predicted by Model 1, Model 2, the Schomaker–Stevenson model, and PM6 against experimental data from [18].
Table 7. Comparison of kaempferol bond lengths (in Å) predicted by Model 1, Model 2, the Schomaker–Stevenson model, and PM6 against experimental data from [18].
BondExperimentalModel 1Model 2SchomakerPM6
C2–O11.3651.3841.3871.4201.344
C2–C31.3991.3951.4041.5401.363
C2–C1’1.4001.4281.3911.5401.475
C3–C41.3911.4191.3881.5401.419
C3–O31.3631.3601.3791.4201.361
C4–C101.3911.4391.3921.5401.472
C4–O41.2351.3061.2351.4201.221
C10–C51.3981.4441.3941.5401.401
C10–C91.3931.4081.4001.5401.406
C5–C61.3921.3741.3841.5401.382
C5–O51.3621.3491.3661.4201.357
C6–C71.3911.4111.3821.5401.391
C7–C81.3911.3731.3891.5401.386
C7–O71.3631.3471.3651.4201.358
C8–C91.3921.4141.3861.5401.387
C9–O11.3691.3781.3841.4201.349
C1’–C2’1.3941.4081.3851.5401.398
C2’–C3’1.3931.3561.3901.5401.377
C3’–C4’1.3941.4041.3811.5401.389
C4’–C5’1.3891.3541.3851.5401.390
C4’–O4’1.3571.3471.3651.4201.358
C5’–C6’1.3911.3831.3771.5401.377
C1’–C6’1.3901.3701.3961.5401.398
O4’–H4’0.9500.8480.9500.9880.967
O3–H30.9500.8480.9500.9880.967
O5–H50.9500.8480.9500.9880.967
O7–H70.9500.8480.9500.9880.967
R 2 0.97860.99700.95960.9741
Table 8. Comparison of quercetin bond lengths (in Å) predicted by Model 1, Model 2, the Schomaker–Stevenson model, and PM6 against experimental data from Rossi [9].
Table 8. Comparison of quercetin bond lengths (in Å) predicted by Model 1, Model 2, the Schomaker–Stevenson model, and PM6 against experimental data from Rossi [9].
BondExperimentalModel 1Model 2SchomakerPM6
O1–C21.3651.3841.4121.4201.344
O1–C91.3701.3781.3981.4201.349
C2–C31.3571.3951.3891.5401.363
C2–C111.4791.4401.4431.5401.475
C3–C41.4491.4191.4151.5401.419
C4–C101.4181.4391.4211.5401.472
C5–C61.3551.3741.3841.5401.383
C5–C101.4201.4441.4111.5401.401
C6–C71.4031.4111.3841.5401.391
C7–C81.3861.3731.3761.5401.386
C8–C91.3971.4141.4031.5401.386
C9–C101.3921.4081.4081.5401.406
C11–C121.3881.3811.3771.5401.398
C11–C161.3971.4261.4211.5401.398
C12–C131.3911.3941.3891.5401.378
C13–C141.3931.3731.3761.5401.388
C14–C151.3761.4221.3991.5401.394
C15–C161.3961.3741.3731.5401.382
O18–H180.9480.8480.9360.9880.967
O19–H190.9140.8480.9360.9880.967
O20–H200.9060.8480.9360.9880.967
O21–H210.9750.8480.9360.9880.967
O22–H220.9900.8480.9360.9880.967
C6–H61.0060.9521.0141.1091.080
C8–H81.0220.9521.0141.1091.080
C12–H121.0050.9411.0011.1091.080
C13–H131.0080.9461.0061.1091.080
C16–H161.0060.9481.0081.1091.080
O19–C71.3581.3471.3481.4201.358
O18–C51.3751.3491.3511.4201.357
O4–C41.2671.3061.3431.4201.221
O20–C31.3511.3601.3701.4201.361
O22–C151.3731.3531.3591.4201.360
O21–C141.3961.3531.3591.4201.359
R 2 0.98300.98020.95370.9751
Table 9. Comparison of -(-)epicatechin bond lengths (in Å) predicted by Model 1, Model 2, the Schomaker–Stevenson model, and PM6 against experimental data from [19].
Table 9. Comparison of -(-)epicatechin bond lengths (in Å) predicted by Model 1, Model 2, the Schomaker–Stevenson model, and PM6 against experimental data from [19].
BondExperimentalModel 1Model 2SchomakerPM6
C2–C31.5291.4611.5181.5401.534
C2–C111.5061.4631.5211.5401.507
C3–C41.5191.4471.4741.5401.531
C4–C101.4981.4561.4621.5401.508
C5–C61.3861.3741.4001.5401.385
C5–C101.4051.4491.4291.5401.390
C6–C71.3861.4111.3841.5401.387
C7–C81.3891.3731.3771.5401.385
C8–C91.3871.4141.4111.5401.391
C9–C101.3961.4131.4391.5401.384
C11–C121.3941.4381.4591.5401.383
C11–C161.3891.3931.3871.5401.382
C12–C131.3781.3741.3631.5401.385
C13–C141.3911.4221.4061.5401.392
C14–C151.3811.3731.3771.5401.385
C15–C161.3851.3941.3871.5401.383
O1–C21.4451.4071.4611.4201.428
O1–C91.3861.3891.4171.4201.359
O2–C31.4291.3881.4161.4201.429
O3–C51.3661.3491.3721.4201.359
O4–C71.3711.3471.3731.4201.359
O5–C131.3761.3531.3731.4201.360
O6–C141.3731.3531.3711.4201.360
R 2 0.95070.95620.91240.9508
Table 10. Comparative performance of Model 1, Model 2, the Schomaker–Stevenson model, and PM6 on three flavonoid datasets. Experimental data were taken from [9,18,19].
Table 10. Comparative performance of Model 1, Model 2, the Schomaker–Stevenson model, and PM6 on three flavonoid datasets. Experimental data were taken from [9,18,19].
CompoundMetricModel 1Model 2Schomaker ModelPM6
(-)-EpicatechinAdjusted R 2 0.94430.94820.90510.9488
RMSE0.05520.05250.13340.0880
MAE0.03890.03370.11540.0538
QuercetinAdjusted R 2 0.98120.97820.94900.9727
RMSE0.05090.02700.11310.0359
MAE0.03910.02150.10170.0271
KaempferolAdjusted R 2 0.97580.99650.95620.9731
RMSE0.04660.00840.12250.0254
MAE0.03410.00630.11220.0163
Table 11. Comparison of PM6 reference and model-predicted bond lengths (in Å) for apigenin.
Table 11. Comparison of PM6 reference and model-predicted bond lengths (in Å) for apigenin.
BondPM6Model 1Model 2
C1–C21.3861.4141.386
C1–C241.3871.3731.389
C1–H301.0800.9521.089
C2–C31.4061.4081.429
C2–O71.3501.3781.394
C3–C41.4731.4331.417
C3–C221.4001.4441.427
C4–C51.4151.4011.394
C4–O211.2201.2991.221
C5–C61.3601.3761.378
C5–H201.0800.9521.089
C6–O71.3431.3781.403
C6–C81.4751.4331.438
C8–C91.3981.4191.414
C8–C151.3981.3811.389
C9–C101.3771.3561.361
C9–H191.0800.9411.057
C10–C111.3901.4041.384
C10–H181.0800.9461.070
C11–O121.3581.3471.358
C11–C141.3901.3661.380
O12–H130.9670.8480.967
C14–C151.3771.3941.382
C14–H171.0800.9461.070
C15–H161.0800.9411.057
C22–C231.3831.4121.380
C22–O281.3571.3111.358
C23–C241.3901.4111.394
C23–H271.0800.9521.089
C24–O251.3581.3471.358
O25–H260.9670.8480.967
O28–H290.9670.8480.967
R 2 0.96880.9834
Table 12. Comparison of PM6 reference and model-predicted bond lengths (in Å) for genistein.
Table 12. Comparison of PM6 reference and model-predicted bond lengths (in Å) for genistein.
BondPM6Model 1Model 2
C1–C21.3881.4141.390
C1–C241.3851.3731.400
C1–H301.0800.9521.097
C2–C31.4061.4081.433
C2–O191.3471.3761.380
C3–C41.4731.4341.449
C3–C221.4001.4441.437
C4–C51.4761.4281.448
C4–O211.2191.3011.221
C5–C61.4831.4511.472
C5–C181.3581.3761.358
C6–C71.3941.4151.407
C6–C131.3941.3771.376
C7–C81.3791.3561.357
C7–H171.0800.9411.064
C8–C91.3881.4041.369
C8–H161.0800.9461.077
C9–O101.3591.3471.354
C9–C121.3881.3661.391
O10–H110.9670.8480.973
C12–C131.3791.3941.377
C12–H151.0800.9461.077
C13–H141.0800.9411.064
C18–O191.3331.3511.373
C18–H201.0800.9341.045
C22–C231.3821.3741.386
C22–O281.3571.3491.362
C23–C241.3911.4111.371
C23–H271.0800.9521.097
C24–O251.3571.3471.354
O25–H260.9670.8480.973
O28–H290.9670.8480.973
R 2 0.96720.9877
Table 13. Comparison of PM6 reference and model-predicted bond lengths (in Å) for hesperetin.
Table 13. Comparison of PM6 reference and model-predicted bond lengths (in Å) for hesperetin.
BondPM6Model 1Model 2
C1–O21.4281.3871.430
C1–C71.5361.3921.467
C1–C201.5071.4431.460
C1–H361.0900.9711.101
O2–C31.3541.3891.398
C3–C41.4071.4331.398
C3–C171.3831.3761.391
C4–C51.4671.3941.413
C4–C101.4041.3931.383
C5–O61.2141.2601.222
C7–H81.0900.9301.042
C7–H91.0900.9301.042
C10–O111.3561.3491.375
C10–C131.3841.4121.362
O11–H120.9670.8480.995
C13–C141.3891.3731.389
C13–H191.0800.9521.072
C14–O151.3571.3471.380
C14–C171.3891.4111.393
O15–H160.9670.8480.995
C17–H181.0800.9521.072
C20–C211.3821.4001.396
C20–C321.3821.4311.453
C21–C221.3861.4121.439
C21–H351.0800.9481.065
C22–O231.3601.3531.413
C22–C251.3911.3861.389
O23–H240.9670.8480.995
C25–O261.3601.3901.324
C25–C311.3861.4121.416
O26–C271.4291.3331.410
C27–H281.0900.9171.027
C27–H291.0900.9171.027
C27–H301.0900.9171.027
C31–C321.3821.3561.394
C31–H341.0800.9461.062
C32–H331.0800.9411.056
R 2 0.94450.9612
Table 14. Comparison of PM6 reference and model-predicted bond lengths (in Å) for catechin.
Table 14. Comparison of PM6 reference and model-predicted bond lengths (in Å) for catechin.
BondPM6Model 1Model 2
C1–C21.5071.4631.508
C1–O151.4281.4071.453
C1–C291.5341.4611.505
C1–H351.0900.9911.088
C2–C31.3831.4381.451
C2–C91.3821.3931.391
C3–C41.3851.3741.368
C3–H141.0800.9481.078
C4–C51.3921.4221.397
C4–O121.3601.3531.363
C5–O61.3601.3531.354
C5–C81.3851.3731.376
O6–H70.9670.8480.963
C8–C91.3831.3941.388
C8–H111.0800.9461.078
C9–H101.0800.9411.077
O12–H130.9670.8480.963
O15–C161.3591.3891.411
C16–C171.3841.4511.443
C16–C211.3911.3761.371
C17–C181.3901.4111.433
C17–C281.5081.4561.448
C18–C191.3851.4121.374
C18–O261.3591.3491.356
C19–C201.3871.3731.388
C19–H251.0800.9521.079
C20–C211.3851.4111.389
C20–O231.3591.3471.357
C21–H221.0800.9521.079
O23–H240.9670.8480.963
O26–H270.9670.8480.963
C28–C291.5311.4471.463
C28–H331.0900.9711.083
C28–H341.0900.9711.083
C29–O301.4291.3881.410
C29–H321.0900.9991.090
O30–H310.9670.8600.964
R 2 0.96970.9813
Table 15. Summary of validation statistics for Model 1 and Model 2 using PM6 bond lengths as reference values.
Table 15. Summary of validation statistics for Model 1 and Model 2 using PM6 bond lengths as reference values.
CompoundMetricModel 1Model 2
Apigenin R 2 0.96880.9340
RMSE0.07650.0212
Genistein R 2 0.96720.9877
RMSE0.0770.0183
Hesperetin R 2 0.94450.9612
RMSE0.09870.0347
Catechin R 2 0.96970.9813
RMSE0.08160.0237
Average R 2 0.96260.9661
RMSE0.08350.0245
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhangazha, M.; Alochukwu, A.S.A.; Jonck, E.; Maartens, R.J.; Mphako-Banda, E.; Mukwembi, S.; Nyabadza, F. A Graph-Theoretical Approach to Bond Length Prediction in Flavonoids Using a Molecular Graph Model. Math. Comput. Appl. 2026, 31, 9. https://doi.org/10.3390/mca31010009

AMA Style

Zhangazha M, Alochukwu ASA, Jonck E, Maartens RJ, Mphako-Banda E, Mukwembi S, Nyabadza F. A Graph-Theoretical Approach to Bond Length Prediction in Flavonoids Using a Molecular Graph Model. Mathematical and Computational Applications. 2026; 31(1):9. https://doi.org/10.3390/mca31010009

Chicago/Turabian Style

Zhangazha, Moster, Alex Somto Arinze Alochukwu, Elizabeth Jonck, Ronald John Maartens, Eunice Mphako-Banda, Simon Mukwembi, and Farai Nyabadza. 2026. "A Graph-Theoretical Approach to Bond Length Prediction in Flavonoids Using a Molecular Graph Model" Mathematical and Computational Applications 31, no. 1: 9. https://doi.org/10.3390/mca31010009

APA Style

Zhangazha, M., Alochukwu, A. S. A., Jonck, E., Maartens, R. J., Mphako-Banda, E., Mukwembi, S., & Nyabadza, F. (2026). A Graph-Theoretical Approach to Bond Length Prediction in Flavonoids Using a Molecular Graph Model. Mathematical and Computational Applications, 31(1), 9. https://doi.org/10.3390/mca31010009

Article Metrics

Back to TopTop