Next Article in Journal
Environmental Impact of Minimally Invasive Radical and Partial Nephrectomy: A Multicenter Prospective Comparative Study Comparing Robot-Assisted and Laparoscopic Surgical Approaches
Previous Article in Journal
Explainable Artificial Intelligence for State of Charge Estimation of Lithium-Ion Batteries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Cardiovascular Aging Risk Based on Clinical Data Through the Integration of Mathematical Modeling and Machine Learning

1
Department of Internal Medicine, Faculty of Medicine and Healthcare, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan
2
Department of Big Data and Artificial Intelligence, Faculty of Information Technology, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan
3
School of Data Science, Fudan University, Yangpu District, 220 Handan Rd., Shanghai 200437, China
4
Center for Scientific Research and Competence, Civil Aviation Academy, Zakarpatskaya st., 44, Almaty 050039, Kazakhstan
5
Medical Director Smart Health University City, Al-Farabi av., 71, Almaty 050040, Kazakhstan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(9), 5077; https://doi.org/10.3390/app15095077
Submission received: 19 March 2025 / Revised: 18 April 2025 / Accepted: 22 April 2025 / Published: 2 May 2025
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Background: The aging population is increasing rapidly, with individuals aged 65 and older now representing more than 15% of the global population. This demographic shift is associated with a rising incidence of age-related cardiovascular diseases (CVDs). Early prediction and prevention of cardiovascular aging are essential to improve health outcomes among elderly patients. Objective: This study aimed to develop and externally validate a mathematical model for predicting cardiovascular aging in individuals aged 65 and older, based on general clinical and behavioral data. Methods: The model was built using data from 800 individuals aged 65+ from Almaty, Kazakhstan. Predictors included sex, marital status, education, smoking, alcohol use, disability, physical activity, total cholesterol, hypertension, BMI, coronary artery disease (CAD), myocardial infarction, diabetes mellitus, and chronic heart failure. A system of ordinary differential equations was used to simulate the dynamic interactions of these factors. Numerical integration was performed using the Runge–Kutta, Adams–Bashforth, and backward Euler methods. The model was verified statistically using Pearson correlation analysis and externally validated on independent age cohorts. In addition, we applied k-means clustering to identify hidden patterns and risk profiles within the dataset. A Random Forest classifier was trained to distinguish between high-risk and low-risk individuals using the same feature set. These machine learning approaches were used as complementary tools to enhance the robustness and interpretability of the modeling results. Results: The model trained on the 65–74 age group achieved an external validation accuracy of 98.8% and an AUC of 0.989 when applied to the 75–89 group. Risk modeling showed that in the 65–74 group, smoking and alcohol increased the risk of myocardial infarction, hypertension, and obesity by up to 53%. In the 75–89 group, these factors increased the likelihood of hypertension by 21%, chronic heart failure by 16%, and CAD by 14%. Among individuals aged 90+, hypercholesterolemia increased the risk of chronic heart failure by 17%, while hypertension increased myocardial infarction risk by 16%. Conclusions: The proposed model demonstrated high accuracy in predicting cardiovascular aging and identifying high-risk individuals across elderly subgroups. The integration of clustering and classification methods (k-means and Random Forest) provided additional insights and confirmed the consistency of the findings. This multi-method approach may serve as a valuable tool for developing personalized prevention strategies in geriatric care and improving healthy life expectancy.

1. Introduction

Cardiovascular disease (CVD) is one of the leading causes of morbidity and mortality among older people worldwide. The risk of CVD increases significantly with age, which makes early detection and prediction of the progression of cardiovascular aging particularly important [1,2]. Modern methods of diagnosis and treatment can significantly improve the quality of life of patients, but in order to optimize treatment strategies, it is necessary to understand how various physiological and behavioral factors influence the process of cardiovascular aging. According to worldwide studies, a correlation between aging and many factors, such as lifestyle, nutrition, gene polymorphisms, immune inflammation, endothelial dysfunction, and others, has been identified [3]. However, the identification of the relationship between these factors in individuals subjected to accelerated cardiovascular aging in the population of Kazakhstan still requires additional scientific research.
The preventive direction in cardiology is based on the strategy of early detection of persons at high risk of cardiovascular diseases. Mathematical modelling is a powerful tool for analyzing complex biological systems and predicting their dynamics. In recent years, mathematical modeling methods have been actively used in medicine to study various diseases and develop individualized approaches to treatment. In particular, models based on differential equations and probabilistic methods allow taking into account multiple interacting factors and making long-term predictions.
Existing prediction approaches include the use of biomarkers. To create a mathematical model in our work, various social/behavioral factors and clinical data were selected as biomarkers to study cardiovascular ageing of respondents: gender, marital status, education, smoking, alcohol consumption, disability, physical activity, total cholesterol, arterial hypertension, body mass index, CHD, myocardial infarction, diabetes mellitus, chronic heart failure.
The development of mathematical modelling in medicine and experimental scientific work is actively considered in the modern scientific community. The scales used to determine the risk of cardiovascular events are aimed at preventing and predicting cardiovascular mortality and non-fatal events. The SCORE (Systematic COronary Risk Evaluation) scale assesses the risk of death from cardiovascular disease over the next 10 years, and improved scales are used to predict the 10-year risk of fatal and non-fatal cardiovascular events in people aged 40–69 (SCORE2) and older than 70 years (SCORE2-OP). The indicators are age, sex, smoking, blood pressure, and cholesterol. However, these risk scales have several limitations: a lack of accuracy for an individual or the appearance of “residual risk” [4].
Polo Y La Borda et al. [5] evaluated the prognostic value of four cardiovascular (CV) risk algorithms, SCORE, its modified version (mSCORE), European Alliance of Rheumatological Associations (EULAR) 2015/2016, SCORE2 algorithm (updated and improved version of SCORE), and QRESEARCH risk assessor version 3 (QRISK3), to identify high-risk patients in a group of individuals with psoriatic arthritis. The four cardiovascular risk scales showed strong correlations, and all of them showed a significant association with cardiovascular events (p < 0.001). Risk diagram algorithms were very useful for distinguishing PsA at low and high cardiovascular risk. The integrated model with QRISK3 and SCORE2 gave an optimal synergy between the discriminative power of QRISK3 and the calibration accuracy of SCORE2.
The present work aimed to develop and validate a mathematical model to predict cardiovascular aging in people aged 65 years and older.
The use of our model will allow us not only to predict the progression of cardiovascular aging but also to identify patients at increased risk of CVD, which will open new opportunities for prevention and early intervention.
The issues of the development of mathematical models in medicine and experimental scientific works are actively considered in the modern scientific community. Marokov’s study [6] is devoted to a simple mathematical model of aging based on a system of autonomous ODEs of the first order. The key assumption of the model is that the organism has no internal clock counting down chronological time on the scale of decades. Instead, the organism uses internal biological factors denoted by variables gi(t), each of which counts down its biological time ti. Overall, the proposed model provides a simple and efficient framework for systematizing experimental data on age-related changes in different biological subsystems of an organism. Even in linear approximation, the parameters {ai, bij} allow us to quantitatively identify and structure the significant cause-and-effect relationships for the aging process.
Murase et al. represent the aging process using a one-dimensional model of a multicellular system that allows the analysis of local and global modes of cell behavior [7]. In the early stages, simple interactions provide coordinated patterns, whereas in later stages, complex interactions result in chaotic changes reflecting an abnormal state.
In the study [8], the complex cellular mechanisms that characterize aging are reviewed with a focus on two metabolic centers: mTOR and NAD+-dependent deacetylase SIRT1. Experimental evidence points to an interaction between these pathways, but the mechanisms of their interaction remain unclear. The authors propose to use computational modelling combined with experiments to elucidate these mechanisms. The basic models are discussed and a reduced reaction pathway for modelling is presented. Finally, the limitations of computational modelling and opportunities for future research in this area are described. The main limitation of the approach is the need for a precise set of parameters. Therefore, well-studied areas should be modelled first to gradually understand the mechanisms and develop new therapies to increase healthy life expectancy or slow aging. Moreover, in [9], a mathematical model describing the influence of the environment on the aging of living systems is presented. It is based on the competition between the destruction and restoration of a biological system regulated by the kinetics of autocatalytic reactions. The influence of the environment is considered through time-dependent parameters corresponding to thermodynamic and gerontological principles, as well as medical observations.
Hibs [10] discusses mathematical models of physiological behavior that describe a system returning to the initial state after a perturbation through homeostasis. These models include the concept of a “fatal limit” of system deviation. The author concludes that the numerical values of the limits depend on other parameters, such as recovery times and coupling coefficients, and can be experimentally measured. Age-related changes in parameters and the derivation of mortality parameters, such as Gompertz parameters, from experimental data are discussed.
The process of proliferative cell senescence in culture has been described using a mathematical model [11]. Based on the hypothesis of DNA damage as a cause of cell senescence, the model can explain both limited and unlimited proliferative potential of both normal and transformed cells in vitro. According to the model, the fate of a cell population depends on two counteracting factors: the rate of proliferation of dividing cells and the rate of accumulation of gene damage. Computer simulations demonstrate agreement with experimental data in general terms.
Based on Rifin’s mathematical model, a 10-year cardiovascular disease (CVD) risk estimate using the WHO scale was presented in [12] using data from 5503 adults in Malaysia. It was found that 4.9% of participants had a high risk (≥20%), which was more common in males (7.3%) compared to females (2.5%). Major risk factors included low education level, unemployment, and obesity combined with physical inactivity (aOR = 2.19). The recommended interventions to reduce risk included health promotion, screenings and information campaigns. These results highlight the importance of targeted prevention programs for high-risk groups.
A large-scale study [13] used data from 10,432 individuals aged 40 to 69 years to compare 17 measures of adiposity (ROC) to predict arterial hypertension and dyslipidemia, which are major predictors of cardiovascular disease, type 2 diabetes (T2DM), and multimorbidity. ROC and logistic regression curves showed that the Chinese visceral adiposity index (CVAI) and triglyceride-glucose index (TYG) better predicted arterial hypertension and multimorbidity, and body mass index (BMI) better predicted dyslipidemia.
Personalized prediction of cardiovascular disease using multi-omics technologies and machine learning techniques was conducted in a study [14] that combined RNA sequencing data, single nucleotide polymorphisms (SNPs), and clinical information to create personalized risk profiles. Using trait selection methods, researchers identified 27 key transcriptomic markers and SNPs that distinguished patients with cardiovascular disease. An optimized XGboost model tuned with Bayesian hyperparameters demonstrated high accuracy. Risk assessment using Shapley’s additive mixture helped to explain the importance of biomarkers (RPL36AP37 and HBA1), which was supported by the literature and demonstrated the potential of the framework to predict other diseases.
An epidemiological study of cardiovascular disease (CVD) in Kashgar Prefecture, Xinjiang, northwest China was conducted to identify key risk factors. Data from 1,887,710 adults (baseline 2017) from the Kashgar prospective cohort study were analyzed, including 16 potential factors—7 demographic, 4 lifestyle, and 5 clinical factors—collected through questionnaires and medical examinations. Logistic regression models showed that all factors were significantly associated with SWD (odds ratio 1.03 to 2.99, p < 0.05). Machine learning methods (Random Forest, Random Ferns, Extreme Gradient Boosting) ranked age, occupation, hypertension, exercise frequency, and dietary patterns as the five major risk factors for SWD. Stratification analyses showed consistent rankings across subgroups. This study considered cardiovascular disease as a major problem in Kashgar, and these five factors were crucial for future preventive measures [15].
A recent study [16] explored the interplay between SARS-CoV-2 (SC-2) transmission and cardiovascular complications, particularly heart attacks, using advanced mathematical modeling. A fractional-order system incorporating a fractal fractional operator (FFO) was employed to analyze local and global stability through Lyapunov functions and flip bifurcation testing. Key epidemic properties, including existence, boundedness, and positivity, were examined to ensure reliable findings. Simulations highlighted symptomatic and asymptomatic effects, offering insights into the combined impact of SC-2 and cardiovascular conditions. These approaches contributed to a deeper understanding of disease dynamics and informed future prediction and control strategies.
Another study [17] focused on fractional-order modeling of cholera transmission, incorporating both asymptomatic cases and treatment interventions. The Atangana–Toufik scheme was applied to examine the system’s stability, boundedness, and bifurcation properties, confirming the absence of flip bifurcation. The reproductive number (R0;) was computed to evaluate the transmission potential, while a sensitivity analysis highlighted key factors affecting disease spread. MATLAB R2024a simulations illustrated the role of asymptomatic measures and treatment in controlling the outbreak. These insights enhanced the understanding of cholera dynamics, supported early detection efforts, and contributed to the development of effective disease control strategies.

2. Materials and Methods

Problem statement: The determination of biomarkers for the mathematical model of cardiovascular aging prediction was based on their clinical significance and confirmed role in the development of cardiovascular diseases (CVD). The study included data from 800 people aged 65–74, 75–89, and 90+ (see Figure 1) according to the WHO classification. Questionnaires were administered using four standard international questionnaires, which ensured comprehensive collection of information on risk factors.
The weights of individual biomarkers were assigned based on their clinical relevance and alignment with modifiable or non-modifiable cardiovascular risk factors, in accordance with international recommendations. Modifiable factors included dyslipoproteinemia, arterial hypertension, diabetes mellitus, smoking, physical activity, and obesity, as they were directly related to the progression of vascular aging and can be corrected by preventive strategies. Non-modifiable factors, such as gender and age, were included in the model because they determined the baseline risk level and influenced the body’s response to external influences.
Factor weights were determined using statistical analysis and machine learning methods. A correlation analysis (Pearson’s coefficients) allowed us to identify the degree of association of each factor with the processes of cardiovascular aging. Numerical modelling using Runge–Kutta, Adams–Bashforth and backward Euler methods allowed us to describe the dynamic interaction of risk factors over time.
Thus, the included biomarkers reflected the key mechanisms of cardiovascular aging, and their weights in the model were determined taking into account evidence and a complex statistical analysis, providing high accuracy of prediction and the possibility of individualized prevention.
To create the mathematical model, the following parameters were selected as biomarkers of cardiovascular ageing and social status of the respondents, which had high Pearson correlation values:
ch is cholesterol (total cholesterol);
Pr is blood pressure (hypertension);
W is body mass index (BMI);
Px is postinfarction cardiosclerosis;
X is chronic heart failure (CHF);
G is gender;
S is smoking;
E is education;
A is alcohol;
I is ischemic heart disease (IHD).
a 1 , b 1 , c 1 , d 1 , a 2 , b 2 , c 2 , d 2 , a 3 , b 3 , c 3 , and d 3 are constants. These markers were selected as the best results when calculating the Pearson correlation coefficient from the biomarker statistics of the experimented patients:
r = i = 1 N x i x ¯ y i y ¯ i = 1 N x i x ¯ 2 i = 1 N y i y ¯ 2
Here, r is the correlation coefficient, xi are clinical biomarkers, yi are social parameters, y ¯ is the mean value of clinical biomarkers, y ¯ is the mean value of social parameters, and N is the number of biomarkers and parameters that match.

3. Results and Discussion

All actual values of the database were dimensionless (by dividing the value of each column by the maximum value of the same column). Dimensionless values bring all values to the interval from zero to one. This procedure makes the equation less rigid and simplifies the computational process. In this paper, all values were dimensionless.
Pearson’s correlation coefficients were calculated according to Formula (1), which characterize the strength of the relationship between the parameters (≤1). The results of calculating the correlation between the biomarkers and social characteristics of respondents are shown in Figure 2, Figure 3 and Figure 4:
From the correlation plots (see Figure 2, Figure 3 and Figure 4), we determined the best relationships between biomarkers and social characteristics of respondents to show the level of interaction between the product of parameters in Equations (2)–(4) as shown in Table 1.
We developed a mathematical model in the form of a nonlinear dynamic (nonstationary) equation that could provide a prediction of changes in premature aging. For patients in the age range of 65–74 years, we mathematically assumed that their cardiovascular system aged as follows:
d x d t = a 1 · P x x , t · S x , t + b 1 · P x x , t · A x , t + c 1 · I x , t · S x , t + d 1 · P r x , t · S x , t ,
where the total derivative d x d t in the left part of Equation (2) represents the process of premature ageing by time t. The first term is the interaction of postinfarction cardiosclerosis and smoking, the second term postinfarction cardiosclerosis and alcohol, the third term body mass index with smoking, and the fourth term blood pressure with smoking, respectively, with probabilities a 1 , b 1 , c 1 , and d 1 .
Cardiovascular aging for patients in the age range 75–89 could be described as the following differential equation:
d x d t = a 2 · P r x , t · S x , t + b 2 · P r x , t · A x , t + c 2 · X x , t · A x , t + d 2 · I x , t · S x , t ,
where d x d t represents changes in premature aging over time t, the first and second terms are the interactions of blood pressure with smoking and alcohol, the third term chronic heart failure with alcohol, and the fourth ischemic heart disease with smoking, respectively, with probabilities a 2 , b 2 , c 2 , and d 2 .
a 1 , b 1 , c 1 , d 1 , a 2 , b 2 , c 2 , and d 2 are interaction coefficients of the indicated markers. The derivatives of biomarkers in Equations (2) and (3) were chosen in such a way that they showed the best correlation relations.
Thus, as a result, in Equations (2) and (3) of the mathematical model of premature aging of people in the age intervals 65–74 and 75–89, it can be observed that smoking and alcohol are present in all products of biomarkers, which indicates high correlation. Smoking and alcohol consumption in these age intervals can lead to postinfarction cardiosclerosis, coronary heart disease, variation in blood pressure, and changes in blood cholesterol levels. It can be concluded that people in this age range should give up bad habits in order to maintain and prevent their decline in biological age.
Cardiovascular aging for patients aged 90+ was described as follows:
d x d t = a 3 · G x , t · c h x , t + b 3 · S x , t · X x , t + c 3 · P x x , t · E x , t + d 3 · P r x , t · S x , t ,
where d x d t is the time derivative of cardiovascular aging and means changes in premature aging through time t, a 3 is the value of the correlation coefficient of the interaction between human sex and chronic heart failure, b 3 is the value of the correlation coefficient of the interaction between smoking and cholesterol, c 3 is the value of the correlation coefficient of the interaction between postinfarction cardiosclerosis and human education, and d 3 is the value of the correlation coefficient of the interaction between blood pressure and smoking.
The choice of the product of parameters in Equation (4) refers to their high correlation values. The first product shows the interactions and dependence of cholesterol level on human gender (women in this age group are more likely to have elevated cholesterol levels). From the second product, we can conclude that smoking can lead to chronic heart failure. The third product selected from the correlation results shows that a person’s social status as an entity may influence the development of myocardial infarction. The assumption here is that education can lead a person to psychological resilience, which may prevent postinfarction cardiosclerosis. The fourth product shows that smoking leads to a variation in blood pressure. Moreover, the addition of the products means that the interactions of other pairs of biomarkers also showed good correlation results, and their level of interaction is described by the coefficients a 3 , b 3 , c 3 , and d 3 .
The parameters P x x , t , A x , t , I x , t , P r x , t , S x , t , X x , t , G x , t , c h x , t , and E x , t used for calculating the correlation coefficient were read from the statistical database. The initial conditions were as follows:
P x x , 0 = 0 , A x , 0 = 0 , I x , 0 = 0 , P r x , 0 = 0 , S x , 0 = 0 , X x , 0 = 0 , G x , 0 = 0 , c h x , 0 = 0 , E x , 0 = 0 .
The first boundary conditions were as follows:
P x 0 , t = 1 , A 0 , t = 1 , I 0 , t = 1 , P r 0 , t = 1 , S 0 , t = 1 , X 0 , t = 1 , G 0 , t = 1 , c h 0 , t = 1 , E 0 , t = 1 .
For the numerical solution of Equations (2)–(4), these parameters have the following expressions:
P x x , t = P x x 1 , t + i x , A x , t = A x 1 , t + i x , I x , t = I x 1 , t + i x , P r x , t = P r x 1 , t + i x , S x , t = S x 1 , t + i x , X x , t = X x 1 , t + i x , G x , t = G x 1 , t + i x , c h x , t = c h x 1 , t + i x , E x , t = E x 1 , t + i x ,
with x = b a N the grid spacing. However, in order to obtain previous values, the parameters had their values taken from the statistical database.
For numerical modeling, the fourth-order Runge–Kutta method [18], Adams–Bashforth method [19], and backward Euler method [20] were used to solve ordinary differential Equation (4). First, we considered the fourth-order Runge–Kutta method:
x = a 3 · G x , t · c h x , t + b 3 · S x , t · X x , t + c 3 · P x ( x , t ) · E ( x , t ) + d 3 · P r ( x , t ) · S ( x , t ) ,
where x ( t ) is the right part of Equation (4).
The finite solution of the ODE by the Runge–Kutta method is:
x n + 1 = x n + x 6 · ( k 1 + 2 k 2 + 2 k 3 + k 4 ) ,
where k 1 , k 2 , k 3 , and k 4 are
k 1 = f x n ,   t n ,
k 1 = a 3 · G x n ,   t n · c h x n ,   t n + b 3 · S x n ,   t n · X x n ,   t n +                     + c 3 · P x ( x n ,   t n ) · E ( x n ,   t n ) + d 3 · P r ( x n ,   t n ) · S ( x n ,   t n )
k 2 = f t n + t 2 ,   x n + x 2 k 1 ,
k 2 = a 3 · G t n + t 2 ,   x n + x 2 k 1 · c h t n + t 2 ,   x n + x 2 k 1 + + b 3 · S t n + t 2 ,   x n + x 2 k 1 · X t n + t 2 ,   x n + x 2 k 1 + + c 3 · P x t n + t 2 ,   x n + x 2 k 1 · E t n + t 2 ,   x n + x 2 k 1 + d 3 · P r t n + t 2 ,   x n + x 2 k 1 · S t n + t 2 ,   x n + x 2 k 1 ,
k 3 = f t n + t 2 ,   x n + x 2 k 2 ,
k 3 = a 3 · G t n + t 2 ,   x n + x 2 k 2 · c h t n + t 2 ,   x n + x 2 k 2 + + b 3 · S t n + t 2 ,   x n + x 2 k 2 · X t n + t 2 ,   x n + x 2 k 2 + + c 3 · P x t n + t 2 ,   x n + x 2 k 2 · E t n + t 2 ,   x n + x 2 k 2 + + d 3 · P r t n + t 2 ,   x n + x 2 k 2 · S t n + t 2 ,   x n + x 2 k 2 ,
k 4 = f t n + t ,   x n + x k 3 ,
k 4 = a 3 · G t n + t ,   x n + x k 3 · c h t n + t ,   x n + x k 3 + + b 3 · S t n + t ,   x n + x k 3 · X t n + t ,   x n + x k 3 + + c 3 · P x t n + t ,   x n + x k 3 · E t n + t ,   x n + x k 3 +                                 + d 3 · P r t n + t ,   x n + x k 3 · S t n + t ,   x n + x k 3 .
We considered the application of the following second-order Adams–Bashforth method to compare numerical results:
x i + 1 = x i + 3 2 · t · R H S i 1 2 · t · R H S i 1 ,
where x i is the current aging value, R H S i is the right-hand side of Equations (2)–(4), R H S i 1 is the right-hand side of Equations (2)–(4) at the previous point, and t is the time step. The third backward Euler method was used, which can simplify the calculation time and is a less costly method:
d x d t = R H S , x i + 1 = x i + t · R H S i + 1 ,
where t is the time step, and R H S i + 1 is the right-hand side of Equations (2)–(4) at a new point.
To verify the numerical methods, it was necessary to obtain the analytical (exact) solutions of Equations (2)–(4). For example, consider the nonlinear dynamic Equation (4). In order to avoid the differential, let us reintegrate Equation (4) in time from both sides:
x = a 3 · G · c h · t + b 3 · S · X · t + c 3 · P x · E · t + d 3 · P r · S · t .
The obtained Equation (6) is an exact (analytical) solution of Equation (4). Let us write Equation (9) in finite-difference form for numerical implementation:
x n + 1 = t · n · ( a 3 · G i · c h i + b 3 · S i · X i + c 3 · P x i · E i + d 3 · P r i · S i ) .
All these methods were used for Equations (2) and (3) in the same way as for Equation (4).
We selected the time step in the interval 0 < t < 1 , with t = 0.01 . The grid step was x = 1 0 100 = 0.01 .
With the above, numerical results could be described and real-time analyses could be conducted with N · t = t , where N = 100 was the number of iterations in the program code.
The created mathematical models from Equations (2)–(4) can be considered as experiments for future similar studies.
The data from Table 2 were used to simulate the numerical solution.
The results obtained through various mathematical modeling approaches are visually presented in the following figures. Figure 5, Figure 6 and Figure 7 demonstrate the behaviors of premature aging change over time t for different age categories with the proposed methods.
Figure 5, Figure 6 and Figure 7 demonstrate a strong agreement between the numerical modeling results for all categories using the fourth-order Runge–Kutta method and the exact (analytical) solution. This consistency helped optimize the process of selecting the most suitable method for further analysis.
According to the Table 1 and Figure 5, Figure 6 and Figure 7, we can see that correlation coefficients in different age groups differed: in the 65–74 category, they were higher, in the 74–89 and 90+ categories, coefficients were significantly lower. These differences may indicate different rates of ageing processes depending on age.
Thus, the analysis conducted using various methodologies provided a comprehensive understanding of age-related changes in the cardiovascular system. The selected biomarkers formed interactions that reflected the degree of influence of various factors on the aging processes in each age category. In the 65–74-year-old group, the correlation coefficients were the highest, which may indicate a more pronounced influence of biomarkers on age-related changes. In the 75–89-year-old category, the values of the coefficients decreased, and in the 90+-year-old group, they became even more balanced, which may indicate that the correlations between biomarkers decrease with age.
This implies that in the early stages of aging, the influence of biomarkers is more pronounced, whereas at later ages, other factors such as accumulated age-related changes, chronic diseases, or adaptive mechanisms may reduce the importance of individual biomarkers in modelling the rate of aging.
Differences between the numerical and exact solutions were found (see Table 3).
Comparison of the maximum error of the selected numerical methods with the exact solution for each age group is presented in Table 3. The Runge-Kutta method demonstrated the highest order of accuracy among the methods considered. Maximum errors of m a x x i e x a c t x i R u n g e K u t t = 0.00002677413 , m a x x i e x a c t x i R u n g e K u t t = 0.0004091325 , and m a x x i e x a c t x i R u n g e K u t t = 0.00004571414 were found for the age groups 65–74, 75–89, and 90+, respectively. These values were the smallest among all methods, which indicated the high accuracy of the numerical solution. The Runge–Kutta method provided the best agreement with the analytical (exact) solution.
The Adams–Bashforth method had a lower order of accuracy compared to the Runge–Kutta method. Its maximum errors were m a x x i e x a c t x i A d a m s B a s h f o r t h = 0.007965958 , m a x x i e x a c t x i A d a m s B a s h f o r t h = 0.03635441 , m a x x i e x a c t x i A d a m s B a s h f o r t h = 0.01277912 for the age categories 65–74, 75–89, and 90+, respectively, significantly higher, especially in the 75–89 age category, indicating a lower accuracy of the method.
The backward Euler method had the lowest orders of accuracy m a x x i e x a c t x i B a c k w a r d E u l e r = 0.005310639 , m a x x i e x a c t x i B a c k w a r d E u l e r = 0.02423627 , and m a x x i e x a c t x i B a c k w a r d E u l e r = 0.008519411 for the age groups 65–74, 75–89, and 90+, respectively. Although this method has a simple implementation, its high uncertainty makes it less favorable for our application, especially in age groups with large data variation.
Based on the obtained data, the fourth-order Runge–Kutta method (O(∆x4)) was the optimal choice for the numerical solution of the considered problem, as it improved the accuracy of the numerical solution by providing the smallest errors and the best agreement with the analytical calculations.
Next, let us consider our problem using the already selected fourth-order Runge–Kutta method. Figure 8 presents data illustrating changes in the ageing process for the three age groups: 65–74 years old, 75–89 years old, and 90+ years old. The abscissa axis (horizontal axis) denotes time, and the ordinate axis (vertical axis) shows the changes in aging according to Figure 2, Figure 3 and Figure 4.
In the 65–74 age group, moderate fluctuations in indicators with relatively low amplitude were observed, which may indicate a more stable aging process in that interval. In that group, the influence of bad habits, such as alcohol consumption and smoking, on the development of postinfarction cardiosclerosis (correlation coefficient b1), increased blood pressure, as well as changes in body weight, which may accelerate the aging process, was noted.
In the 75–89 age category, the amplitude of the fluctuations was higher than in 65–74-year-olds but lower than in persons over 90 years of age, indicating the intermediate nature of changes associated with aging. For that group, we can assume that the aging process is associated with the level of a2 interaction with the increase in blood pressure, the appearance of the c2 correlation coefficient reflecting the risk of developing chronic heart failure, as well as the d2 coefficient characterizing the probability of coronary heart disease.
The analysis of the graphs showed that the peaks in each age category may correspond to the time intervals during which the ageing processes were most intensive.
The group of 65–74-year-olds showed the most stable dynamics of changes, probably due to the influence of bad habits, which may indicate a more predictable nature of ageing at that age. On the contrary, the group of 90+-year-olds showed the greatest fluctuations, which may indicate a decrease in the predictability of the ageing process or its increased sensitivity to various factors, such as the respondent’s gender, level of education, and smoking.
Thus, with the numerical implementation of the model using the Runge–Kutta, Adams–Bashforth, and backward Euler methods, we observed a high degree of consistency between the approximate and exact solutions. This agreement confirmed both the mathematical validity and computational stability of the proposed differential equation system.
Building upon this foundation of numerical reliability, the second phase of the study focused on enhancing predictive capabilities through machine learning. While the initial stage of the study was focused on numerical modeling of cardiovascular risk factors, we further conducted an external validation using a Random Forest classifier to test the generalizability of the model. The model, trained on the 65–74 cohort, was applied to independent subgroups (75–89 and 90+ years) using the same features.
Results showed that the model maintained high predictive accuracy without recalibration, achieving AUC = 0.989 in the 75–89 group and 1.000 in the 90+ group, indicating strong extrapolation across age categories.
The validation results demonstrated high predictive performance across key evaluation metrics (Table 3).
The model achieved an overall accuracy of 98.8% (see Table 4), indicating a high proportion of correct predictions across both classes. The F1-scores of 0.98 for class 0 (absence of condition) and 0.99 for class 1 (presence of condition) reflected strong precision–recall balance and robustness across class distributions.
Furthermore, the area under the ROC curve (AUC) was 0.989, demonstrating excellent discrimination between patients with and without the target condition (e.g., ischemic heart disease). These results confirmed that the model retained its predictive capability when applied to an independent population of older individuals.
These results support the conclusion that the developed ODE-based mathematical model, when solved using standard numerical methods and trained on representative clinical data, can accurately predict cardiovascular aging trajectories across elderly subpopulations. The consistency of performance across age groups reinforces the model’s potential for application in real-world preventive cardiology and geriatric risk stratification.
Having validated the model using numerical methods and supervised machine learning approaches, we then proceeded to explore the potential of unsupervised learning for pattern discovery in the data.
To complement the supervised analysis, a clustering model based on the k-means algorithm (k-means) was developed and tested for automatic segmentation of clinical data from patients with suspected cardiovascular disease (CVD). The main objective was to test the hypothesis that without predefined labels, the model based on unsupervised learning could categorize patients into groups with different cardiovascular status—with and without CVD.
Despite the presence of classes in the original medical records, they were not included in the model, as the task was specifically to independently identify the structure of the data using the k-means algorithm.
To ensure robustness of the results and clarity of visualization, the data were first normalized using standardization and then transformed into a two-dimensional space using the principal component analysis (PCA) method.
After applying the principal component method (PCA), two-dimensional projections of the training data were visualized on a plane, where color differentiation marked the membership of objects to the corresponding clusters identified by the k-means algorithm. The coordinates of the centroids of the clusters were also marked on the plot, which allowed us to analyze the spatial distribution of the groups (Figure 9). The observed clear separation into two distinct areas confirmed the correct segmentation of the sample and demonstrated the ability of the model to identify the internal structure of the data without using class labels.
Similar visualization was performed for the independent test dataset (df1), which was not used in the model training phase. The first four observations corresponded to patients without cardiovascular disease (class 0), while the last four patients belonged to the group with an established diagnosis of CVD (class 1). However, this information was deliberately not provided to the model during the prediction phase, as the aim of the study was to test the ability of the clustering algorithm to independently identify this separation solely based on internal patterns in the original features. This dataset was processed using the same normalization methods and projection into PCA space, after which cluster labels were predicted using the pre-trained k-means model.
The resulting visual pattern (Figure 10) showed a stable and homogeneous clustering like the training sample, indicating the high robustness of the model and its ability to generalize to new data. The “Cluster Label” color scale in the graph indicates the belonging of each point to one of the two clusters extracted by the k-means algorithm, where zero and one are the numbers of the corresponding clusters.
Despite the fact that the model did not use class labels, it was known that the test sample df1 contained a hidden (implicit) structure corresponding to the true distribution of patients according to the presence of CVD. A comparison of the predicted clusters with the known but hidden information showed complete correspondence (Figure 11)—all patients were correctly assigned to the corresponding groups. This allowed the external validation of the clustering results, confirming the high accuracy and significance of the extracted features.
Therefore, the analysis showed that the developed clustering model based on the k-means algorithm was highly effective when working with clinical data that did not contain explicit class labels. The model successfully identified the latent structure in medical indicators and demonstrated the ability to detect latent patterns specific to different patient groups. The external validation performed on an independent dataset showed the full agreement of the clustering results with the actual distribution of patients by cardiovascular health status, which confirmed the high accuracy of the model. Taking this into account, the proposed approach can be recommended for use in primary screening and preliminary patient segmentation tasks, especially in cases when the data do not contain annotated classes or are at the stage of primary analysis.
The model has a number of limitations. It depends on the quality and completeness of clinical data, which may affect its generalizability. The selected biomarkers represent only part of the complex aging process. The model was trained on a specific cohort; therefore, further testing on broader populations is needed.

4. Conclusions

In conclusion, the proposed mathematical model demonstrated a high level of consistency and accuracy when solving cardiovascular aging dynamics using numerical methods such as Runge–Kutta, Adams–Bashforth, and backward Euler. The close match between numerical and exact solutions confirmed the correctness and stability of the model. The originality and scientific significance of this model were supported by its registration in the State Copyright Register (No. 46336, dated 23 May 2024).
To evaluate its real-world applicability, we performed external validation using clinical data from elderly patients. The model was trained and tested on older age groups using a Random Forest classifier. The results showed high predictive performance, with an accuracy of 98.8% and an AUC of 0.989.
In addition, we applied k-means clustering to identify subgroups of patients with similar clinical profiles. This allowed us to uncover meaningful patterns in the data and better understand the distribution of cardiovascular risk across different patient clusters.
Thus, risk modelling showed that in the elderly, behavioral, and metabolic factors significantly influenced cardiovascular health. In 65–74-year-olds, smoking and alcohol increased the risk of infarction and hypertension by up to 53%, in the 75–89-year-old group, the risk of hypertension and CVD by up to 21%, and in the 90+ age group, hypercholesterolemia increased the risk of CVD by 17%. These data confirmed the importance of an age-specific approach to the prevention of cardiovascular ageing.
Overall, the integration of mathematical modeling, machine learning classification, and unsupervised clustering provides a comprehensive framework for predicting cardiovascular aging. The model may serve as a valuable tool in geriatric healthcare for developing personalized prevention strategies and improving the quality of life in the elderly population.

Author Contributions

Conceptualization: K.A.; methodology: A.K.; software: M.S.; validation: A.M.; formal analysis: A.B., M.A. and U.S.; resources: D.S., R.B. and N.B.; data curation: S.C. and M.M.; writing—original draft: M.S. and A.M.; writing—review and editing: M.S.; visualization: M.S.; supervision: S.A.; project administration: K.A. All authors have read and agreed to the published version of the manuscript.

Funding

The work was carried out within the framework of the scientific project No. AP19677754 “Development of markers and diagnostic algorithm for detection and prevention of early cardiovascular aging”. State Institution: “Ministry of Science and Higher Education of the RK”.

Institutional Review Board Statement

This study was approved by the Local Ethics Committee of Al-Farabi Kazakh National University (grant No.: IRN AR19677754), approval number: IRB-A515, 9 November 2023.

Informed Consent Statement

Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement

The presented data in this study are not publicly available due to ongoing research, ethical restrictions, and the need to protect the confidentiality of study participants.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Kholyavka, M.G.; Rakhmanova, T.I. Biomarkers of Aging and New Targets for Anti-Aging Therapy. Vestn. VSU Ser. Chem. Biol. Pharm. 2020, 3, 127. Available online: http://www.vestnik.vsu.ru/pdf/chembio/2020/03/2020-03-17.pdf (accessed on 20 April 2025).
  2. World Health Organization. Cardiovascular Diseases (CVDs). 2019. Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 20 April 2025).
  3. Franceschi, C.; Capri, M.; Monti, D.; Giunta, S.; Olivieri, F.; Sevini, F.; Panourgia, M.P.; Invidia, L.; Celani, L.; Scurti, M.; et al. Inflammaging and anti-inflammaging: A systemic perspective on aging and longevity emerged from studies in humans. Mech. Ageing Dev. 2007, 128, 92–105. [Google Scholar] [CrossRef] [PubMed]
  4. Drapkina, O.M.; Kontseva, A.V. Zaklyuchenie Soveta ekspertov: Novye vozmozhnosti biomarkerov v stratifikatsii riska serdechno-sosudistykh zabolevaniy. Ross. Kardiol. Zhurnal 2021, 26, 4700. [Google Scholar] [CrossRef]
  5. Polo YLa Borda, J.; Castañeda, S.; Heras-Recuero, E.; Sánchez-Alonso, F.; Plaza, Z.; García Gómez, C.; Ferraz-Amaro, I.; Sanchez-Costa, J.T.; Sánchez-González, O.C.; Turrión-Nieves, A.I.; et al. Use of risk chart algorithms for the identification of psoriatic arthritis patients at high risk for cardiovascular disease: Findings derived from the project CARMA cohort after a 7.5-year follow-up period. RMD Open 2024, 10, e004207. [Google Scholar] [CrossRef] [PubMed]
  6. Morokov, Y. Simple Mathematical Model of Aging. arXiv 2019, arXiv:1912.08163. [Google Scholar]
  7. Murase, M.; Matsuo, M. Mathematical Modeling for the Aging Process: Normal, Abnormal and Self-Terminating Phenomena in Spatio-Temporal Organization. Mech. Ageing Dev. 1991, 60, 99–112. [Google Scholar] [CrossRef] [PubMed]
  8. Mc Auley, M.T.; Mooney, K.M.; Angell, P.J.; Wilkinson, S.J. Mathematical Modelling of Metabolic Regulation in Aging. Metabolites 2015, 5, 232–251. [Google Scholar] [CrossRef] [PubMed]
  9. Viktorov, A.A.; Kholodnov, V.A.; Gladkikh, V.D.; Alekhnovich, A.V. Influence of Environment on Aging of Living Systems: A Mathematical Model. Adv. Gerontol. 2013, 3, 255–260. [Google Scholar] [CrossRef]
  10. Hibbs, A.R.; Walford, R.L. A Mathematical Model of Physiological Processes and Its Application to the Study of Aging. Mech. Ageing Dev. 1989, 50, 193–214. [Google Scholar] [CrossRef] [PubMed]
  11. Zheng, T. A Mathematical Model of Proliferation and Aging of Cells in Culture. J. Theor. Biol. 1991, 149, 287–315. [Google Scholar] [CrossRef] [PubMed]
  12. Rifin, H.M.; Omar, M.A.; Wan, K.S.; Hasani, W.S.R. 10-Year Risk of Cardiovascular Diseases According to the WHO Predictive Chart: Results from the 2019 National Health and Morbidity Survey (NHMS). BMC Public Health 2024, 24, 2513. [Google Scholar] [CrossRef] [PubMed]
  13. Feng, X.; Zhu, J.; Hua, Z.; Yao, S.; Tong, H. Comparison of Obesity Indicators for Predicting Cardiovascular Disease Risk Factors and Multimorbidity Among the Chinese Population Based on ROC Analysis. Sci. Rep. 2024, 14, 20942. [Google Scholar] [CrossRef]
  14. DeGroat, W.; Abdelhalim, H.; Peker, E.; Sheth, N.; Narayanan, R.; Zeeshan, S.; Liang, B.T.; Ahmed, Z. Multimodal AI/ML for Discovering Novel Biomarkers and Disease Prediction Using Multi-Omic Profiles of Patients with Cardiovascular Diseases. Sci. Rep. 2024, 14, 26503. [Google Scholar] [CrossRef] [PubMed]
  15. Li, J.-X.; Li, L.; Zhong, X.; Fan, S.-J.; Cen, T.; Wang, J.; He, C.; Zhang, Z.; Luo, Y.-N.; Liu, X.-X.; et al. Machine learning identifies prominent factors associated with cardiovascular disease: Findings from two million adults in the Kashgar Prospective Cohort Study (KPCS). Glob. Health Res. Policy 2022, 7, 48. [Google Scholar] [CrossRef] [PubMed]
  16. Ahmad, A.; Abbas, S.; Inc, M.; Ghaffar, A. Stability Analysis of SARS-CoV-2 with Heart Attack Effected Patients and Bifurcation. Adv. Biol. 2024, 8, 2300540. [Google Scholar] [CrossRef] [PubMed]
  17. Ahmad, A.; Abbas, F.; Farman, M.; Hincal, E.; Ghaffar, A.; Akgül, A.; Hassani, M.K. Flip Bifurcation Analysis and Mathematical Modeling of Cholera Disease by Taking Control Measures. Sci. Rep. 2024, 14, 10927. [Google Scholar] [CrossRef] [PubMed]
  18. Musa, H.; Saidu, I.; Waziri, M.Y. A Simplified Derivation and Analysis of Fourth Order Runge Kutta Method. Int. J. Comput. Appl. 2010, 9, 51–55. [Google Scholar] [CrossRef]
  19. Peinado, J.; Ibáñez, J.; Arias, E.; Hernández, V. Adams–Bashforth and Adams–Moulton Methods for Solving Differential Riccati Equations. Comput. Math. Appl. 2010, 60, 3032–3045. [Google Scholar] [CrossRef]
  20. Rapp, B.E. Microfluidics: Modelling, Mechanics and Mathematics; Elsevier: Amsterdam, The Netherlands, 2017; Available online: www.sciencedirect.com/book/9781455731411/microfluidics-modeling-mechanics-and-mathematics (accessed on 20 April 2025).
Figure 1. Age scale from 65 years and above by gender.
Figure 1. Age scale from 65 years and above by gender.
Applsci 15 05077 g001
Figure 2. Pearson correlation coefficients for respondents aged 65–74.
Figure 2. Pearson correlation coefficients for respondents aged 65–74.
Applsci 15 05077 g002
Figure 3. Pearson correlation coefficients for respondents aged 75–89.
Figure 3. Pearson correlation coefficients for respondents aged 75–89.
Applsci 15 05077 g003
Figure 4. Pearson correlation coefficients for respondents aged 90 and above.
Figure 4. Pearson correlation coefficients for respondents aged 90 and above.
Applsci 15 05077 g004
Figure 5. Comparison of solution methods for ages 65–74.
Figure 5. Comparison of solution methods for ages 65–74.
Applsci 15 05077 g005
Figure 6. Comparison of solution methods for ages 75–89.
Figure 6. Comparison of solution methods for ages 75–89.
Applsci 15 05077 g006
Figure 7. Comparison of solution methods for the 90+ age group.
Figure 7. Comparison of solution methods for the 90+ age group.
Applsci 15 05077 g007
Figure 8. Numerical interpretation of the change in cardiovascular aging by the fourth-order Runge–Kutta method.
Figure 8. Numerical interpretation of the change in cardiovascular aging by the fourth-order Runge–Kutta method.
Applsci 15 05077 g008
Figure 9. Two-dimensional visualization of clustering in clinical patient data.
Figure 9. Two-dimensional visualization of clustering in clinical patient data.
Applsci 15 05077 g009
Figure 10. Predictive clustering of an independent medical dataset based on a pre-trained k-means model.
Figure 10. Predictive clustering of an independent medical dataset based on a pre-trained k-means model.
Applsci 15 05077 g010
Figure 11. Test set clustering results: full coincidence with the real class structure.
Figure 11. Test set clustering results: full coincidence with the real class structure.
Applsci 15 05077 g011
Table 1. Coefficients of the mathematical model.
Table 1. Coefficients of the mathematical model.
Age CategoryCoefficientValue
65–74 a 1 0.66
b 1 0.53
c 1 0.5
d 1 0.43
74–89 a 2 0.21
b 2 0.15
c 2 0.16
d 2 0.14
90+ a 3 0.17
b 3 0.17
c 3 0.16
d 3 0.16
Table 2. Parameters of modeling.
Table 2. Parameters of modeling.
DeterminationValue
Time step, t 0.01
Grid step x 0.01
Number of iterations, N 1000
Calculation interval, [a;b][0;1]
Table 3. Values of accuracy orders of numerical methods.
Table 3. Values of accuracy orders of numerical methods.
Solution MethodsOrder of MethodAge CategoryMaximum Error with Exact Solution
Runge–Kutta O x 4 65–740.00002677413
75–890.0004091325
90+0.00004571414
Adams–Bashforth O x 2 65–740.007965958
75–890.03635441
90+0.01277912
Backward Euler O x 65–740.005310639
75–890.02423627
90+0.008519411
Table 4. External validation results.
Table 4. External validation results.
MetricValue
Accuracy98.8%
F1-score (class 0)0.98
F1-score (class 1)0.99
ROC AUC0.989
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Abzaliyev, K.; Suleimenova, M.; Chen, S.; Mansurova, M.; Abzaliyeva, S.; Manapova, A.; Kurmanova, A.; Bugibayeva, A.; Sundetova, D.; Bitemirova, R.; et al. Predicting Cardiovascular Aging Risk Based on Clinical Data Through the Integration of Mathematical Modeling and Machine Learning. Appl. Sci. 2025, 15, 5077. https://doi.org/10.3390/app15095077

AMA Style

Abzaliyev K, Suleimenova M, Chen S, Mansurova M, Abzaliyeva S, Manapova A, Kurmanova A, Bugibayeva A, Sundetova D, Bitemirova R, et al. Predicting Cardiovascular Aging Risk Based on Clinical Data Through the Integration of Mathematical Modeling and Machine Learning. Applied Sciences. 2025; 15(9):5077. https://doi.org/10.3390/app15095077

Chicago/Turabian Style

Abzaliyev, Kuat, Madina Suleimenova, Siming Chen, Madina Mansurova, Symbat Abzaliyeva, Ainur Manapova, Almagul Kurmanova, Akbota Bugibayeva, Diana Sundetova, Raushan Bitemirova, and et al. 2025. "Predicting Cardiovascular Aging Risk Based on Clinical Data Through the Integration of Mathematical Modeling and Machine Learning" Applied Sciences 15, no. 9: 5077. https://doi.org/10.3390/app15095077

APA Style

Abzaliyev, K., Suleimenova, M., Chen, S., Mansurova, M., Abzaliyeva, S., Manapova, A., Kurmanova, A., Bugibayeva, A., Sundetova, D., Bitemirova, R., Baizhigitova, N., Abdykassymova, M., & Sagalbayeva, U. (2025). Predicting Cardiovascular Aging Risk Based on Clinical Data Through the Integration of Mathematical Modeling and Machine Learning. Applied Sciences, 15(9), 5077. https://doi.org/10.3390/app15095077

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop