1. Introduction
The early prediction of diseases through preventive measures results in increased patient survival rates. This requires the computerization of the healthcare system, which, through the Internet of Things (IoT) and cloud computing, provides cyclical access to accurate patient data for artificial intelligence (AI) systems, including machine learning (ML), predicting patient health status and the risk of various conditions [
1,
2].
The issue of preventive medicine in heart attack has so far been addressed in 111 papers published between 1979 and 2023, with some intensification since 2015, but papers using AI have been rare. Research to date has tended to look at incidence rates and seasonal variances in specific populations [
3,
4] and at prevention guidelines [
5,
6] rather than prevention models [
7,
8]. More attention has been paid to technical solutions, such as a remote cardiac monitoring system with an automatic warning of acute cardiac episodes, and less to the integration of different design, measurement, and optimization logics in complex adaptive healthcare systems, including for cardiology [
9,
10]. A part of this interest stems from the development of telerehabilitation systems for patients undergoing home rehabilitation, sometimes at a considerable distance from the nearest hospital with an interventional cardiology unit.
Incorrect, incomplete, or uncertain medical data can result in longer waiting times for patient prognosis, failure to ensure best outcomes, or even false prognoses. Similar results can be produced by a targeted action, i.e., an attack on data transfer or processing systems due to insufficient security. In this article, we will deal with the first of the aforementioned situations, ensuring the appropriate level of processing, inference, and prediction from data using AI. Preventive medicine, i.e., the medicine of healthy people aimed at predicting the health of patients and preventing their diseases, still poses challenges for engineers and medical specialists to accurately predict diseases for a specific patient.
Cardiovascular diseases as lifestyle diseases pose a special challenge: the epidemic of lifestyle diseases has been stopped, but their number is not decreasing. Perhaps, this can be achieved with the use of AI-based preventive cardiology medicine. The future paradigm of early cardiac diagnosis is shifting the focus towards heart attack prevention medicine based on non-invasive medical imaging with the support of artificial intelligence. It is necessary to detect increased risk early in a preventive way and use preventive drugs before moving on to more effective, but also more invasive, forms of therapy.
The main motivation of our study was to improve existing and develop new AI-based solutions for cardiac preventive medicine, with particular emphasis on the prevention of heart attacks. This is due to the fact that the epidemic of lifestyle diseases (including cardiologic ones) has been stopped but not reversed; hence, automatically supervised prevention using AI seems to be a key opportunity to introduce progress in the above-mentioned areas. This can have major effects not only scientific and clinical in nature, but also economic and social.
The aim of this article is to develop and test an artificial intelligence-based tool for predicting the occurrence of a heart attack for the purposes of preventive medicine. It used the combination and comparison of multiple artificial intelligence methods and techniques to determine a personalized probability of a heart attack based on a wide range of patient characteristics and, from a computational point of view, determine the minimum set of characteristics necessary to do so. When applied to a specific patient, this represents progress in this field of research, resulting in improvements in preclinical care and diagnosis, as well as predictive accuracy in preventive medicine.
The novelty and contribution of the described system lies in the use of AI for widely available, cheap, and quick predictive analysis of circulatory system functions in a group of patients classified as at risk, and over time in all patients in the form of a standard periodic examination qualifying them for further tests and more advanced diagnosis of heart diseases.
Current studies showed deep learning (DL) is worth using for the quick, automated segmentation of coronary plaques based on angiography, allowing for the prediction of a future heart attack, or for automating the assessment of calcium concentration in coronary arteries during gateless computed tomography (CT) of the heart and the chest, and for automating the measurement of epicardial fat tissue. This requires predictive models based on artificial intelligence, integrating clinical and diagnostic imaging parameters [
11,
12,
13,
14]. So far, AI is used to improve the effectiveness of cardiovascular CT: for image acquisition, reconstruction and denoising, image segmentation, quantitative analysis, decision support, and data integration. The further development of AI systems may provide an opportunity to improve current diagnostic procedures as precision medicine tools [
11,
15]. Furthermore, even freely available data can provide a reading of risk at the population level. The routine assessment of biomarkers (including hematological ones) for inclusion in clinical risk models is sometimes carried out on the basis of data that are already available to most patients. This allows for a trade-off between data availability, analysis cost, and the ease of implementing such a system in the existing healthcare system. The ideal input data for risk assessment in population health initiatives will be those that are already available to most patients [
15].
Recent research focuses on three areas, i.e., prediction, classification, and detection, including, e.g., atrial fibrillation. This creates a certain way of proceeding. Most of the research to date concerns analysis based on AI (40% is based on DL and 60% on conventional ML), with a strong increase in the number of publications after 2015 [
16]. Additionally, lipid-based methods currently used to predict the risk of coronary heart disease (CHD) have limitations that can be eliminated by incorporating epigenetic information (including the history of diseases in the patient’s ancestors) into AI-based risk prediction algorithms. The identification of genetic and epigenetic biomarkers in people with coronary artery disease already outperforms other risk assessment methods [
17].
This is even more important because current guidelines may underestimate the risk of some cardiovascular diseases, including atherosclerotic cardiovascular disease (CVD), in high-risk people. An ML-based risk calculator based on support vector machines (SVM) eliminates this error based on data from a 13-year multi-ethnic atherosclerosis study (6459 participants) that achieved a sensitivity of 0.86, a specificity of 0.95, and an AUC of 0.92, even forall CVDs, but still requires validation [
18]. Researchers are encouraged to explore alternative ML-based diagnostic methods that use non-invasive clinical data to diagnose the disease and assess its severity [
19]. It is also important to have the support of medical experts in assessing the predicted risk of diseases in order to validate AI-based software 2024 [
20].
The opportunities created by AI-based preventive medicine systems primarily relate to the early detection of disease through monitoring, enhanced surveillance, and the early warning of disease prediction. This applies to both individual patients and population health risks, facilitating timely and targeted interventions. Personalized risk assessment allows the creation of personalized health plans, i.e., personalized risk profiles and prevention strategies based on them. This includes the integration of genomic data, allowing genetic predispositions to be understood and taken into account when formulating individualized disease prevention recommendations. Lifestyle data and health-seeking behaviors offer insights into habits (beneficial and harmful) that influence health status, and indirectly are used to design interventions and support individuals to make informed, stimulated healthier choices. An important element of preventive medicine is individually optimized vaccination schedules to ensure timely and effective vaccination against infectious diseases, the elimination of vaccination gaps, etc. As a part of preventive medicine, early interventions for chronic diseases are viable and support AI in identifying people at risk of developing chronic diseases, confirming their risks/symptoms and implementing early interventions to avoid or mitigate these conditions. This integrated approach remotely monitors individuals with chronic diseases and their groups, improving the effectiveness of therapy and reducing the risk of complications. Environmental health monitoring, which includes monitoring the quality of air, water, and soil, allows to take into account the impact of air or water quality on public health, contributing to the introduction of preventive measures against threats resulting from periodic or permanent environmental pollution. Due to the above reason, preventive medicine systems play a key role in global health surveillance, increasing preparedness for pandemics and poisonings. Through international cooperation, environmental data and health information should be shared, contributing to a more coordinated global response to health challenges. In the long run, this will result in reduced healthcare costs by avoiding some of the high expenses associated with the treatment (e.g., in hospital) of advanced diseases and by optimizing the allocation of resources, moving saved financial resources to where they are more needed. To this end, targeted health education and mental health programs in the workplace become key areas of impact. Targeted health education campaigns that build awareness and encourage healthy habits can be actively supported by AI-powered digital health platforms, making reliable knowledge more accessible to society. Government, organizational, and corporate health initiatives targeting chronic fatigue syndrome, prolonged stress, burnout, and depression can proactively support workplace well-being programs, thanks to AI providing insight into objectively measured employee health, guiding the development of targeted interventions and promoting healthier habits among employees. Their successful implementation requires addressing technological, social, and ethical challenges, such as privacy issues, data security, and ensuring equal access to preventive interventions. Collaboration between the patient community, healthcare providers, policy makers, and technology developers is essential to achieve the full potential of preventive medicine systems, especially for cardiovascular diseases.
3. Results
More than one-third of the study population (35.82%,
Figure 1) was assessed as being at risk of a heart attack (“1”), a very high proportion that indicates the need for immediate preventive action, as therapeutic measures alone may not be sufficient in a few years time.
Age and smoking are moderately correlated with the heart attack risk (
Figure 2 and
Figure 3).
Age, sex, and smoking were the most significant risk factors for a heart attack, and smoking is a modifiable factor enabling prevention strategies.
The selection of model for classification purposes was made based on its accuracy (
Figure 4):
LinearSVC—LinearSupport Vector Classifier;
Logistic Regression—LogisticRegression;
K-Nearest Neighbors—KneighboursClassifier;
RandomForest—RandomForestClassifier.
Hyperparameters Tuning
A study was performed to assess the impact of tuning on different models using hyperparameters specific to each model.
For the KNeighborsClassifier, model manual tuning was provided. The maximal KNN score achieved was 62.86% (
Figure 5).
The test score changed with an increasing number of neighbors but stabilized at the value of k = 20, and no further improvement was seen.
For the RandomForrestClassifier, manual tuning was provided. The maximal score achieved was 63.32%. Sample train scores are shown in
Table 2.
Sample test scores are shown in
Table 3.
For the LogisticsRegression and RandomForestClassifier, tuning using RandomizedSearchCV was provided. The maximal scores achieved were63.52% and 64.18%, respectively.
For the LogisticRegression, tuning using GridSearchCV was provided. The maximal score achieved was 64.18%.
Results achieved for 27 models’ lazy classifiers are shown in
Table 4.
Receiver Operating Characteristic (ROC) is a plot of true positive rate versus false positive rate. A false positive in this case occurs when the person tests positive but does not actually have the disease. A false negative occurs when the person tests negative, suggesting they are healthy, when they actually do have the disease.
True positive = model predicts 1 when truth is 1;
False positive = model predicts 1 when truth is 0;
True negative = model predicts 0 when truth is 0;
False negative = model predicts 0 when truth is 1 (
Figure 6).
Figure 6.
AUC (Area Under Curve): (a) DummyClassifier, (b) ExtraTreeClassifier.
Figure 6.
AUC (Area Under Curve): (a) DummyClassifier, (b) ExtraTreeClassifier.
Area Under Curve (AUC) is the area underneath the ROC curve. A perfect model achieves a score of 1.0.
Confusion matrix shows where model made the right predictions and where it made the wrong predictions (
Figure 7).
Precision indicates the proportion of positive identifications (model predicted class 1) that were actually correct. A model that produces no false positives has a precision of 1.0. Recall indicates the proportion of actual positives that were correctly classified. A model that produces no false negatives has a recall of 1.0. F1 score is a combination of precision and recall. A perfect model achieves an F1 score of 1.0. Support is the number of samples each metric was calculated on. The accuracy of the model is in the decimal form. Perfect accuracy is equal to 1.0.
Accuracy shows the accuracy of the model in the decimal form. The ideal accuracy is equal to 1.0. Precision indicates the percentage of positive identifications (predicted by the model class 1) that were actually correct. A model that does not generate false positives has a precision of 1.0. Recall indicates the percentage of actual positives that were correctly classified. A model that does not generate false negatives has a recall of 1.0. F1 score is a combination of precision and recall. An ideal model achieves an F1 score of 1.0 (
Table 5).
Additional metrics are macro average and weighted average. Macro average stands for the average of precision, recall, and F1 score between classes. Macro average does not consider class imbalances, so if there are class imbalances, pay attention to this metric. Weighted average is the weighted average of precision, recall, and F1 score between classes. Weighted means each metric is calculated with respect to the number of samples in each class.
The key features (risk factors) within the heart attack risk assessment are (from the most important) heart rate, age, BMI, and cholesterol. BMI and cholesterol are modifiable factors, including diet. Although specific risk factors allow the rapid diagnosis of heart attack, it is advisable to combine biomarkers to improve discrimination (
Figure 8).
Moreover, AI can potentially improve the prevention and self-treatment of several chronic diseases simultaneously, within one preventive medicine system. However, we are aware that the perspectives of patients, their families, and several medical specialists are combined here, which may affect both the implementation prospects and the preventive/therapeutic success of these systems. This also applies to the secondary prevention of heart attack due to polypharmacy and the coexistence of other diseases.
4. Discussion
A comparison of the models developed in the study shows that models based on logistic regression proved to be the most accurate, although their predictive value is moderate, but sufficient for the initial screening diagnosis—selecting patients who require further, more accurate testing. In addition, this can be performed based on a reduced set of parameters, particularly heart rate, age, BMI, and cholesterol. This allows the development of a prevention strategy based on modifiable factors (e.g., in the form of diet, activity modification, or a hybrid combining different factors) combined with the monitoring of heart attack risk by the proposed system.
Guidelines for the diagnosis and treatment of acute and chronic heart failure are constantly being innovated and clinically researched in the search for more effective methods of prevention—the only way, it seems, to reverse the epidemic of civilization cardiovascular conditions. This requires not only harnessing the analytical and predictive power of artificial intelligence but also developing a framework for combining health, ecological, mechanistic, and social models to design, track, and evaluate complex interventions while translating knowledge. Experienced in a variety of healthcare settings, both procedural and technological innovations can measure and optimize their impact in complex, dynamically changing interactions within the healthcare system. A review of the six largest bibliometric databases yielded 134,012 publications (1857–2024) with the keyword “preventive medicine”, including only 110 publications with the keywords “preventive medicine” + “heart attack” (1979–2023) and only 3 “preventive medicine” + “heart attack” + “artificial intelligence” or “preventive medicine” + “heart attack” + “machine learning” (2003–2023) [
22,
23,
24]. The results from the perspective of research to date show that AI-based heart attack prediction systems offer a number of benefits that can significantly improve healthcare outcomes, but more are needed. AI algorithms analyzing large datasets (patient records, medical images, and genetic information) identify rules, mechanisms, patterns, and risk factors associated with heart attacks, and this allows for the early detection of potential problems, enabling quick intervention and taking preventive actions. AI provides a personalized risk assessment by taking into account a wide range of factors, including medical history, lifestyle, genetics, and environmental variables, allowing for more accurate predictions tailored to an individual’s specific risk profile [
1,
22,
23,
24,
25]. Continuous patient monitoring provides the real-time analysis of health parameters and can detect subtle changes in health that may indicate an increased risk of heart attack. AI algorithms help medical specialists interpret complex medical data, reducing the likelihood of diagnostic errors and also providing additional information (computational biomarkers) and signaling potential problems. This can automate and accelerate the analysis of large datasets collected from patients and increase the accuracy of analyses. This relieves some of the burden of manual data analysis, allowing focus on patient care and clinical decision-making. By supporting decision-making, AI helps in developing personalized prevention, treatment, rehabilitation, and care plans [
22,
23,
24,
25,
26,
27,
28]. AI ensures data integration and analyzes data from various sources, which enables a more comprehensive understanding of the patient’s health condition and increases the accuracy of predictions and the effectiveness of preventive/therapeutic actions taken. Conclusions from the above data may contribute to the identification of new risk factors, mechanisms, decision rules, and potential treatment options (ultimately leading to the innovative forms of prevention and therapy). Preventive medicine systems based on AI can strengthen patients’ motivation by providing them with personalized support in the area of health status and desired activities, which can encourage them to adopt a healthier lifestyle, follow treatment plans and cardiac rehabilitation (including in a hybrid form: intertwining home rehabilitation and outpatient), and actively engage with their own healthcare. Moreover, from a systems perspective, the early detection and prevention of heart attacks can lead to savings in the healthcare system, because by identifying people in high-risk groups and implementing preventive measures, the overall cost of treating heart disease can be reduced and scarce resources can be allocated more efficiently (
Figure 9).
This group of solutions offers significant benefits, but challenges include data privacy, algorithm transparency, and the need for continuous validation and improvement of artificial intelligence models for use in heart attack prevention [
1,
22,
28,
29,
30,
31]. It is necessary to establish a common platform of views. The study by Pelly et al. [
32] identified as many as 21 concepts resulting from the views of patients and medical specialists in the area of preventive medicine. They covered five categories:
Trust, information reliability, and security;
Expected features, tailored feedback, and personalized advice;
Adoption, usability, and general interest in artificial intelligence;
Concerns and previous negative experiences with artificial intelligence;
Perceived benefits and the usefulness of artificial intelligence in providing advice when regular contact with healthcare is not possible.
Positive public perceptions of preventive medicine systems may help increase the likelihood of successful implementation and the adoption of AI-enabled systems in the context of heart attack, as an example of broader applications in the management of chronic diseases.
4.1. Limitations
The number and severity of the aforementioned limitations will change as research progresses and further problems encountered in the daily operation of AI-based preventive medicine systems, including those dedicated to cardiovascular conditions, are solved. The limitations observed today are summarized in
Table 6.
Continued research, collaboration between computer scientists and medical specialists, and a commitment for validating predictive models in real-world settings will help overcome the above limitations.
In the technology area, existing prediction models based on multilayer perceptron with extreme gradient boosting provided better predictions than models based on logistic regression when analyzing diverse datasets, with different algorithm classification lists. Hence, a beneficial trend may be the hybridization of preventive medicine systems based on the use of more than one algorithm and then the aggregation of their results, e.g., based on a separate algorithm with trend extraction (based on a fuzzy system and/or multifractal analysis) [
33].
The problem is much broader. Current cardiovascular mortality rates vary significantly across countries, and systems of care play a major role in determining outcomes also through shortcomings in pre-hospital and in-hospital systems and improvements in the quality of care. The time to first medical contact is critical here. It can be shortened by the preventive medicine system that we have described and by improving patients’ awareness of the importance of individual symptoms and the need to call emergency medical services [
34,
35]. However, this requires the coordination of preventive medicine systems with other systems used in healthcare. The algorithmic (thanks to AI) identification of patients at risk of heart attack or even with a high-risk heart attack must be synchronized with the transmission of pre-hospital electrocardiograms and appropriate referral of such patients to hospitals, e.g., for percutaneous coronary intervention [
35]. Optimizing primary and secondary prevention, including patient adherence to medications, also requires data collection to facilitate the auditing and assessment of clinical improvement, but also to facilitate the detection of non-adherent patients through prediction system that produce unreliable outcomes (e.g., variability over time, jumps or gaps in data, etc.) [
36,
37].
4.2. Directions for Further Research
Potential areas of further interest and progress that can be explored in the future are included in
Table 7.
When thinking about the future of such preventive medicine systems, it is worth noting that these are long-term projects, i.e., for tens of years and centuries, because the collection and comparison of subsequent predictions with the patient’s results should be repeated no less than three times (to obtain a prediction), but ultimately cyclically throughout the life of a patient, thus obtaining increasingly accurate, personalized predictions, thereby increasing the efficiency of the system.
From the point of view of the entire system, collecting data from the patient’s entire life (not only the history of his illnesses, but also, e.g., environmental factors) will make it possible to take into account a number of new factors, the existence of which we may not be aware of today, i.e., personal (mental hygiene), local (environmental pollution), and computational (virtual indicators of quality of life or mental health) factors.
A long-term approach can create a system for counteracting lifestyle diseases, and in particular, the excessive occurrence of cardiovascular diseases in the population. However, to fully achieve this, such a system of preventive medicine must be technologically appropriately developed and, moreover, harmonize with the entire healthcare system in several areas simultaneously:
legally and ethically (i.e.,the universal acceptance of preventive medicine);
in terms of systematicity and the scope of periodic examinations of healthy and sick people;
Automatic availability of data collected for analysis;
The scope of notification and alerting about deviations from the norm.
This is already a part of periodic examination programs related to the profession or sport practiced, as well as age (children and elderly people).
It is worth noting that in this case, possible delays in the transmission and processing process are not critical parameters—even delays of the order of seconds will not significantly reduce its efficiency. The strength of the system lies in the quantity and quality of collected data and the individual approach to the user throughout the history of his research. Due to the above reason, there is a growing chance of wider use of wearable, i-wear, and smart home devices in preventive medicine systems, including those connected occasionally (such as today’s blood pressure meter or thermometer). This makes transferring the processing part of the system to the cloud more and more feasible and relieves the burden on the traditional system based on servers located throughout the country.
By reducing the time it takes for preventive medicine tools to process biomedical data (including as part of the initial feature extraction by edge computing sensors), it is possible not only to improve diagnosis, help more patients, and speed up research, but also to improve solutions hitherto only considered to be promising, such as brain–computer interfaces (BCIs) based on EEG and fNIRS [
38,
39,
40,
41,
42,
43].
Our study partly meets the requirements of AI-based tools recently discussed by the American Heart Association (AHA) to perform cardiac CT scans on patients with chest pain that can predict the risk of a fatal heart attack within 10 years. This technology could transform the diagnosis and treatment of heart disease in the future, saving thousands of patients’ lives. Our study aims to develop a simpler, less invasive, and less expensive system suitable for mass use, including in cardiac home care [
44].