The proposed system primarily integrates machine learning (ML) algorithms to identify risk factors for heart conditions and support treatment decisions, aiming to enhance healthcare outcomes through data-driven insights. It builds upon an existing foundation that includes Attribute-Based Encryption (ABE) and blockchain technology for securing electronic health records (EHRs) [
12]. Ciphertext-Policy Attribute-Based Encryption (CP-ABE) is a type of ABE that ensures secure EHRs by restricting access based on defined attributes, preventing unauthorized tampering [
13]. These technologies ensure authorized access and protect against unauthorized modifications, while core functionalities such as user registration, doctor and patient login, and EHR management are designed to maintain data integrity, privacy, and accountability in healthcare practices. This approach underscores the system’s commitment to leveraging advanced ML analytics to optimize clinical workflows and improve patient care.
3.1. Combination of Blockchain and Machine Learning for Privacy Prediction
The proposed technique was to run the machine learning models for encryption and deidentification processes of patient data. Electronic healthcare records were deployed on the blockchain for the encryption process using Blockchain and Machine Learning Combined for Prediction that Preserves Privacy [
14]. In this proposed concept, only use smart contracts to generate and distribute the encrypted key, which provides authorization for the patient data.
It provides two ways to enable the machine learning method to access sensitive data: 1. one is before uploading; 2. the second solution employs a two-tier procedure to enable machine learning without directly exposing sensitive data. These anonymized datasets, which are kept in the blockchain’s off-chain storage and accessed through secure APIs managed by smart contracts, are used to train or run the machine learning model [
15].
3.1.1. Machine Learning Algorithms
The proposed system utilizes several ML algorithms to predict the risk factors associated with heart conditions. These include the following:
Random Forest: A versatile and robust ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random Forest is known for its high accuracy, ability to handle large datasets with higher dimensionality, and robustness against overfitting.
Support Vector Machine (SVM): A powerful supervised learning model used for classification and regression tasks. SVM works by finding the hyperplane that best divides a dataset into classes. It is effective in high-dimensional spaces and is particularly useful for cases where the number of dimensions exceeds the number of samples.
Naive Bayes (Gaussian): A probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions. The Gaussian Naive Bayes variant assumes that the continuous values associated with each feature are distributed according to a Gaussian (normal) distribution. It is particularly suited for high-dimensional datasets and provides a baseline performance in many applications.
3.1.2. Implementation and Impact
The integration of these ML algorithms enables the system to analyze historical health data, identify potential risk factors for heart conditions, and assist medical professionals in making data-driven treatment decisions. By utilizing a dataset of patient health records, the system can predict the likelihood of heart conditions, thus allowing doctors to understand the underlying issues and prescribe appropriate medications and diagnoses accordingly.
3.1.3. System Architecture
The diagram in
Figure 2 depicts a system utilizing blockchain for secure EHR storage and machine learning algorithms to analyze patient data and predict the risk of heart conditions.
3.1.4. Security Analysis
The integrated electronic health records (EHRs) and blockchain system ensures confidentiality, integrity, and availability of sensitive health data. Key to the approach is Attribute-Based Encryption (ABE), specifically Ciphertext-Policy ABE (CP-ABE). Confidentiality: CP-ABE provides fine-grained access control over EHR data, allowing for access based on attributes such as user roles and patient demographics. This ensures that only authorized users can decrypt and access specific EHR information. Integrity: Blockchain technology ensures data integrity by maintaining an immutable ledger of all EHR transactions. This guarantees that EHR data cannot be altered without detection, preserving its authenticity and reliability. Availability: The system leverages blockchain’s decentralized architecture and redundant data storage to ensure continuous access to EHRs, even during network disruptions or node failures.
3.1.5. Experimental Results
The methodology encompasses data preprocessing, machine learning model training, blockchain integration, user authentication, and EHR management.
Data Preprocessing: The Cleveland, Switzerland, and Long Beach datasets are preprocessed to standardize features such as age, blood pressure, cholesterol levels, maximum heart rate, and serum blood sugar. This enhances data consistency and comparability for accurate model training. Data cleaning is performed to handle missing values and outliers, ensuring the quality of the dataset. Feature scaling techniques, such as normalization and standardization, are applied to bring all features onto a common scale, which is crucial for the performance of machine learning algorithms.
Model Training: Machine learning algorithms (Random Forest, Support Vector Machine, Naïve Bayes) predict heart disease risk factors based on standardized datasets. Performance metrics such as accuracy, precision, recall, and F1 score validate model efficacy in risk assessment.
Random Forest: Random Forest is an ensemble learning method that constructs multiple decision trees during training. It aggregates the predictions of individual trees to improve accuracy and reduce overfitting. This model is robust for both classification and regression tasks. It is shown in
Figure 3.
The RF-Algorithmic process is trained with the preprocessed “heart issues or disease data”. In this case, we will illustrate the patterns and input features “(e.g., age, gender, clinical indicators)” with the goal variable. By using the digital data of people with illnesses, we will be able to predict the cause of heart disease risk. For classification and regression purposes, we can use SVM.
SVM is an efficient machine learning algorithmic process when we work on high-dimensional computational processes and nonlinear classification for kernel functions.
Figure 4 reflects this. In this case, the SVM classifier is trained using the preprocessed heart disease dataset. It learns to identify an optimal hyperplane that separates different classes of heart disease severity based on features like age, gender, and clinical indicators, aiding in the prediction of heart disease risk. Naïve Bayes (Gaussian NB): Naïve Bayes is a probabilistic classifier based on Bayes’ theorem, assuming independence among features. Gaussian Naïve Bayes specifically models the distribution of features as Gaussian. It is simple, computationally efficient, and effective for various classification tasks. This is shown in
Figure 5.
In this case, Gaussian Naïve Bayes is trained using the preprocessed heart disease dataset. It estimates the mean and variance of each feature for each class of heart disease severity, allowing it to predict the likelihood of new data points belonging to different classes, thereby assisting in heart disease risk assessment.
Blockchain Integration: Via blockchain tech, the whole process of EHR transactional communication becomes more transparent and fair for each party involved. Every time a new record is added to an EHR, the transaction is hashed using a one-way function and a block built on the previous one is created to store the record, making up an indelible ledger. Different schemes such as CP-ABE are implemented for EHR security on the blockchain. ABE achieves a high level of security by providing access control in EHRs by which each piece of data is encrypted together with policies specific. Where users are able to perform the decryption process to access the computerized information. The “Ethereum” blockchain is a gateway to secure user registration, make new EHRs, implement access control, and provide a trace of the performed activities. Along with data security, transparency of EHR transactions is guaranteed by the fact that blockchain transactions can neither be reversed nor altered, which is a significant plus for the system. Moreover, the system allows for decentralized storage and management of EHRs with the result that data can always be accessed and remain incorruptible. Every action is documented on the blockchain, thus creating an audit trail that cannot be changed.
User Authentication: Strong authentication methods are in place to verify users before allowing them access and thus keep patient information confidential. Security handling of user authentication and sessions is carried out by the Django framework, which is a secure way of performing system interaction. The use of role-based access control (RBAC) restricts the amount of data to which a user can access only to those relevant to his or her role. CP-ABE can be used to implement very specific access control policies which in turn guarantee that only entities that have been authorized can decrypt and gain access to sensitive health data.
EHR Management: The objective of this section was to highlight the aspects of the EHR system that are being managed locally. It describes how doctors may perform operations like creating new patient records, updating old ones, and reviewing the data of the existing ones all in an environment secured by blockchain and encryption. All medical records are kept in a blockchain, and doctors cannot access them unless given permission to do so by the patient. Every part of medical information in the given blockchain is encrypted via CP-ABE to ensure that only those who are allowed and have the right qualifications can obtain the key and read the medical data. This is to say, the doctor’s medical records are encrypted and stored on the blockchain in the form of a new block; thus, the doctor’s files remain unaltered. Patients are given the green light to access their EHR in a secure manner, while any authorized healthcare professional can access at the data stored in the EHR for treatment purposes, decrypt it if necessary, and then proceed with the first aid care giving.
3.1.6. Dataset Description and Preprocessing Details
We utilized three important benchmark heart disease datasets—”Cleveland, Switzerland, and Long Beach”—received from the UCI machine learning data repository. Integration of these datasets evaluates the 920 patient records; each record contains 14 clinical attributes, “including age, gender, resting blood pressure, cholesterol level, fasting blood sugar, resting ECG results, maximum heart rate, exercise-induced angina, and ST depression”. The target variable was the combination of the presence or absence of heart disease, identified as the binary class (1 = disease, 0 = no disease). The datasets provide a slight class imbalance, with 54% positive cases and 46% negative cases. Data cleaning needed to be performed to remove the incomplete or inconsistent entries. The mean of the numerical data and median for categorical features are used to impute missing values (less than 3% of all records). Outliers are identified using the interquartile range (IQR) method, with threshold values to prevent skewed distributions.
3.1.7. Blockchain System Performance
Regarding the use of the Ethereum test network, while we measured performances such as transaction latency, throughput, scalability, and operational cost will be evaluated to refine the progress of the blockchain component’s effectiveness. The encrypted electronic health record (EHR) can be retrieved with the minimum average transaction delay of 2.3 s, guaranteeing near-real-time updates without sacrificing security. The settings for medium throughput is 25 transactions per second, which is enough in private, and the average gas cost/transaction was ETH 0.00042, which was a little expensive. The total framework was tested using 10,000 simulated patient records, and the results showed stable performance with little loss in transaction speed. The approach illustrates blockchain layer’s effectiveness, affordability, and scalability.
3.1.8. System Evaluation
To evaluate the most common causes of health issues using patient digital data, this study integrates machine learning models like RF, SVM, and Naive Bayes. Demonstrating the decentralized framework for private digital health data, blockchain was used to preserve the computerized data’s confidentiality and integrity process.
By using machine learning techniques, efficient predictions were gauged through the main evaluation metrics, such as 1. accuracy, 2. precision, 3. recall, and 4. F1, score; these are important for pointing out the models’ trustworthiness and ability to generalize to new data.
Moreover, <strong>confusion matrix</strong> was used to offer an in-depth understanding of the performance of the models. It outlines the prediction results by showing the following numbers:
true positives_TP: provides the accurate and efficient prediction positive class.
true negatives_TN: provides the accurate and efficient prediction negative class.
false positives_FP: provides the accurate and efficient prediction positive class.
false negatives_FN: provides the accurate and efficient prediction negative class.
Step 1: Accuracy: “TP + TN TP + TN + FP + FN”; Step 2:
Precision = “TPTP + FP [
2]”; Step 3:
Recall = “TP TP + FN [
3]”; Step 4: F1 Score = 2. “Precision. Recall Precision + Recall”.
The EHR management portal for hospital operations is a product of the Django framework. The portal thus created allows doctors and patients to manage electronic health records (EHRs) conveniently. Along with this, it also integrates user-friendly authentication and access control mechanisms. To ensure security, user accounts are created through blockchain technology in a secure manner, and EHRs are stored via smart contracts on Ethereum to guarantee that they are tamper-proof and transparent.
This research compares the machine learning computational process for the accurate detection of heart attacks. “RF” was able to outperform other algorithms, evaluated using the following components: “1. accuracy, 2. precision, 3. recall, and 4. F1 score”. In fact, it scored 1 in all metrics, which means that it was able to correctly identify true positives while at the same time producing a minimal number of false positives and negatives. SVM was able to display a strong performance like that of Random Forest but with slightly lower values for precision and recall. The performance of Naive Bayes was the worst of the three, leading to the lowest scores for predicting heart disease risk factors accurately.
Numerical features are standardized with “z-score” to ensure equal contribution for the training model, and categorical variables are encoded with one-hot encoding. The dataset was divided into “80%” training and “20%” testing subsets.