A Novel Approach for Fraud Detection in Blockchain-Based Healthcare Networks Using Machine Learning

Mohammed, Mohammed A.; Boujelben, Manel; Abid, Mohamed

doi:10.3390/fi15080250

Open AccessEditor’s ChoiceArticle

A Novel Approach for Fraud Detection in Blockchain-Based Healthcare Networks Using Machine Learning

by

Mohammed A. Mohammed

^1,*

,

Manel Boujelben

² and

Mohamed Abid

¹

Computer & Embedded Systems Laboratory CES-ENIS, University of Sfax, Sfax 3000, Tunisia

²

National School of Electronics and Telecoms of Sfax ENET’Com, University of Sfax, Sfax 3000, Tunisia

^*

Author to whom correspondence should be addressed.

Future Internet 2023, 15(8), 250; https://doi.org/10.3390/fi15080250

Submission received: 1 July 2023 / Revised: 16 July 2023 / Accepted: 21 July 2023 / Published: 26 July 2023

(This article belongs to the Section Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

:

Recently, the advent of blockchain (BC) has sparked a digital revolution in different fields, such as finance, healthcare, and supply chain. It is used by smart healthcare systems to provide transparency and control for personal medical records. However, BC and healthcare integration still face many challenges, such as storing patient data and privacy and security issues. In the context of security, new attacks target different parts of the BC network, such as nodes, consensus algorithms, Smart Contracts (SC), and wallets. Fraudulent data insertion can have serious consequences on the integrity and reliability of the BC, as it can compromise the trustworthiness of the information stored on it and lead to incorrect or misleading transactions. Detecting and preventing fraudulent data insertion is crucial for maintaining the credibility of the BC as a secure and transparent system for recording and verifying transactions. SCs control the transfer of assets, which is why they may be subject to several adverbial attacks. Therefore, many efforts have been proposed to detect vulnerabilities and attacks in the SCs, such as utilizing programming tools. However, their proposals are inadequate against the newly emerging vulnerabilities and attacks. Artificial Intelligence technology is robust in analyzing and detecting new attacks in every part of the BC network. Therefore, this article proposes a system architecture for detecting fraudulent transactions and attacks in the BC network based on Machine Learning (ML). It is composed of two stages: (1) Using ML to check medical data from sensors and block abnormal data from entering the blockchain network. (2) Using the same ML to check transactions in the blockchain, storing normal transactions, and marking abnormal ones as novel attacks in the attacks database. To build our system, we utilized two datasets and six machine learning algorithms (Logistic Regression, Decision Tree, KNN, Naive Bayes, SVM, and Random Forest). The results demonstrate that the Random Forest algorithm outperformed others by achieving the highest accuracy, execution time, and scalability. Thereby, it was considered the best solution among the rest of the algorithms for tackling the research problem. Moreover, the security analysis of the proposed system proves its robustness against several attacks which threaten the functioning of the blockchain-based healthcare application.

Keywords:

healthcare; blockchain; SC; machine learning; security

1. Introduction

Recently, digital transformation has become more than just a world trend. It is a revolution that is changing the way we live and work. This transformation uses AI, cloud computing, and blockchain to create smart systems based on traditional systems in different fields such as education, healthcare, and industry [1].

Smart healthcare is the use of digital technologies to improve the delivery of healthcare services. It comprises patient remote monitoring and the gathering, storing, and analysis of patient data using electronic health records, mobile internet, wearable technology, and other linked health technologies. This data can then be used to improve the diagnosis, treatment, and prevention of diseases. Therefore, patients will benefit from a better quality of services in the health field [2,3].

Despite the benefits of smart healthcare, it also has many challenges that need viable solutions. These challenges include quality of service (QoS), standard protocol support, delay and bandwidth-limited, and security and privacy. Smart healthcare system devices are vulnerable to many attacks, such as spoofing, cloud polling, RF jamming, and Denial of service (DoS). Additionally, transmitted data pass through highly heterogeneous networks and are frequently managed by third parties. These issues can also affect healthcare systems and patient safety [4]. As electronic health records contain sensitive and important information, it is susceptible to various security risks. If this data is compromised, it could have serious consequences for patient safety. For example, attackers could use this data to impersonate patients, gain access to their medical records, or even tamper with their treatment, thereby leading to potential medical accidents [5]. In 2017, hackers stole the personal data of over 15 million patients from Anthem, a large health insurance company. This data included names, addresses, social security numbers, and medical information. The attack was a major breach of privacy with serious consequences for the affected patients. In addition, the financial losses were $115 million.

Blockchain technology has been deployed in the healthcare environment to address security and privacy challenges. Blockchain is a distributed, decentralized, efficient, and secure technology. It is an ordered chain of blocks, meaning each block is connected to the one before it. In smart healthcare systems, blockchain ensures immutability, transparency, and control of personal medical records for all participants. It also guarantees the anonymity of patient medical data. Therefore, it can motivate patients to share their data for various ongoing clinical studies [6].

Certainly, applying blockchain in the healthcare field provides a solution for the security and privacy of personal records; however, blockchain and healthcare integration still face several challenges. The challenges include storing patient data, high energy consumption, and attacks on blockchain networks. Blockchain suffers from several attacks that target the network of nodes, the consensus algorithm, the SCs, and the wallets [7]. Different types of attacks have been detected, such as DDoS attacks, 51% vulnerability attacks, and Sybil attacks. The presence of attacks and vulnerabilities in blockchain applications causes financial losses and exploitation of sensitive data, and may lead to a break of the medical service of these applications.

SC, one of the main components of a BC network, is a program that handles and transfers assets of significant value. Once SC is deployed to the blockchain, it cannot be modified or updated. Security patches cannot be a viable solution for traditional programs. This motivated scholar to propose robust security strategies before deployment. Despite the presence of these strategies, adversaries can still tamper with their execution. Therefore, their functioning after deployment needs to be controlled.

Many solutions are proposed in the literature to detect SC attacks. They are based either on SC auditing tools or on AI technology. SC auditing tools, such as Mythril, Securify, and Slither, perform static and dynamic analyses of SC codes to identify vulnerabilities [8]. Nevertheless, these strategies are quite weak at finding vulnerabilities in SC code. There are many reasons for these weaknesses, including the fact that no single auditing tool can find all the vulnerabilities in an SC. Additionally, the complexity of SCs makes it difficult for auditing tools to fully understand the code and identify all potential vulnerabilities. They can only identify vulnerabilities that are already known. Moreover, producing incorrect results, finding different problems in the same set of SCs, or identifying a vulnerability that is not actually present in the code means wasting time and resources [9]. As the number of SCs grows and the complexity of these contracts increases, using these SC auditing tools will become increasingly impractical.

AI and ML offer a promising solution to this challenge. ML models can be used to protect the blockchain from malicious activities and attacks. These models can be trained on large datasets of known vulnerabilities and then used to identify patterns and anomalies in SC code. In this context, ML can leverage vast volumes of data to create more robust and accurate models for vulnerability detection. This is because machine learning models can learn from the data, and adapt their predictions as new data becomes available. Moreover, machine learning can also be used to protect against attacks, as the ML model could be used to monitor the behavior of SCs in real-time. If the model detects any suspicious activity, it could trigger an alarm or take other steps to protect the contract from attack [10]. ML algorithms are categorized into three types:

I.: Supervised Learning: adds tags to various data types by sorting them into key groups/categories;
II.: Unsupervised Learning: Clustered data sets do not have a particular label. It also uses previously gained knowledge to recognize the data patterns;
III.: Reinforced Learning: The key characteristic of this type is collecting and enhancing knowledge to communicate with external entities. After that, it assigns a penalty or reward according to the action taken [9].

ML can solve problems using different methods, such as clustering, classification, and regression. Classification methods are preferred to manage security and privacy problems in blockchain networks.

This paper addresses security and privacy issues in a blockchain-based smart healthcare system due to the sensitivity of medical data and its vulnerability to cyber-attacks. A system architecture is capable of detecting attacks and fraudulent transactions in blockchain networks based on ML algorithms. This system is composed of two stages. The first stage checks the medical data gathered by sensors before inserting it into the BC. Hence, abnormal ones are blocked from being processed by the BC nodes. The second stage verifies the legitimacy of transactions and eliminates the suspected ones. The proposed system is implemented using six different ML algorithms. A performance evaluation of these algorithms with respect to many metrics is conducted to choose the best solution that tackles BC security challenges. We demonstrate that the proposed ML-based system achieved high levels of accuracy, recall, F1-score, precision, execution time, ROC-AUC, and scalability, suggesting its effectiveness in detecting and preventing fraud within the blockchain network. According to the results of experiments conducted on two datasets and after the pre-processing and feature selection, the Random Forest achieved the highest accuracy, execution time, and scalability. Thereby, it was considered the best solution among the rest of the algorithms for tackling the research problem. After a security analysis phase, we demonstrate that the proposed system is able to enhance the security and reliability of blockchain-based healthcare applications and networks by detecting and preventing several known attacks.

The rest of the paper is organized as follows. In Section 2, a brief review of related works on detecting intrusion and malicious activities in blockchain networks is presented. Section 3 details the proposed system, which can detect and protect the blockchain from attacks and fraudulent transactions. Section 4 evaluates the proposed system using various ML algorithms and with respect to many metrics. Section 5 analyses the security of the proposed system. Finally, the conclusion and some future work directions are drawn in Section 6.

2. Related Works

This section illustrates the existing works related to blockchain security issues. There are many solutions to address security issues in blockchain networks, such as SC auditing tools and machine learning. Table 1 summarizes this paper’s related work regarding the main contribution, pros and cons, metrics, and validation tools.

According to SC auditing tools, Mohamed et al. [11] present an innovative approach to detecting malicious SCs in blockchain systems. The DOORchain framework has the potential to improve the security of blockchain systems and protect users from malicious behavior. Haozhe et al. [12] explored the current state of SC’s security, prevalent vulnerabilities, and security analysis tool support. Then, the author studied thirteen crucial vulnerabilities in Ethereum SCs and their countermeasures. The authors investigated nine security-analysis tools to detect vulnerabilities in Ethereum SCs. Han Liu et al. [13] proposed a novel semantic aware security auditing technique for Ethereum named S-gram for Ethereum. It is a combination of lightweight static semantic labeling and N-gram language modeling. The s-gram technique can predict potential vulnerabilities by identifying irregular token sequences and optimizing existing in-depth analyzers. Sarwar et al. [8] provide an overview of attacks on SCs and analyses 10 security tools to detect vulnerabilities in SCs, then proposes a set of countermeasures to mitigate these attacks. Christof et al. [14] introduce ÆGIS, a dynamic analysis tool that protects SCs from being exploited during runtime. Its capability of detecting new vulnerabilities can easily be extended through so-called attack patterns.

According to machine learning technology, Yasser et al. [15] proposed a KNN- MLSC approach to secure authentication using ML. In their design, the authors used the K-Nearest neighbor (KNN) using an SC to identify and detect the dynamic time attack and authentication in an Internet of Medical Things (IoMT) environment. Syed Badruddoja et al. [16] developed a system based on the Naive Bayes algorithm to perform prediction and protect SCs from cyber-attacks. The system has been implemented in the field of decentralized applications. Therefore, it must blend Blockchain and AI technologies to provide a more secure environment to exchange sensitive data for many applications in healthcare, insurance, etc.

Deebak et al. [17] designed a framework for privacy-preserving in SCs using a blend of blockchain and AI technologies. The proposed system simplifies human interaction, identifies security risks and service alerts, and detects fraudulent claims. Yingjie Xu et al. [18] presented a model to analyze the vulnerabilities of SCs in blockchain networks based on machine learning. The proposed model builds the Abstract-Syntax-Trees (ASTs) for SCs. Then, it extracted feature vectors from ASTs to use them for training the ML model. The proposed model can detect many kinds of SCs vulnerabilities written in solidity language in the Ethereum blockchain platform. Kruthika et al. [19] suggested a new architecture to create a tamperproof and transparent healthcare system using Ethereum SCs. This proposed system can ensure the integrity of sensitive patient data. It also presented a solution to detect and eliminate fraudulent insurance claims using blockchain with ML. Wesley et al. [20] The authors present a new model of sequential learning to identify weaknesses of SCs using machine learning, named long-short-term memory (LSTM). The proposed model can detect new attack trends relatively quickly, making SCs safer.

Rajesh et al. [21] surveyed SC’s security vulnerabilities in the software code, which a malicious user can easily hack, thereby compromising the entire Blockchain network. Additionally, the authors explored various AI techniques and tools for SC privacy protection. Lastly, open issues and challenges for AI-based SCs are analyzed. Soumya Ray et al. [22] suggested a novel algorithm for early detection of DDoS attacks in the healthcare system. The DDoS detection algorithms work to prevent attackers’ access to the system. Additionally, illustrated the effect of different DDoS attacks on the system.

Table 1. Summary of related works.

Author and Ref.	Year	MAIN Idea	Pros	Cons	ML Algorithm	Dataset	Result
Sarwar et al. [8]	2016	Proposes a set of countermeasures to mitigate SCs attacks	Analyzes 10 security tools to detect vulnerabilities in SCs	More research and evaluation are needed for the proposed countermeasures	No	No	Not all vulnerabilities were detected
Haozhe et al. [12]	2022	Analysis of 13 vulnerabilities in Ethereum SC	Investigates security tools	Detects pre-defined vulnerabilities	No	No	Smart Check is the best
Han Liu et al. [13]	2018	Proposes S-gram semantic-aware security auditing technique	Investigates different types of potential vulnerabilities	Used only with solidity SCs	No	SCs from the Etherscan repository	Accuracy 90%
Christof et al. [14]	2020	ÆGIS tool that protects and detects new vulnerabilities in SCs	Shields vulnerable SCs against attacks	Limited evaluation and lack of real-world testing	No	No	Accuracy 93%
Mohamed et al. [11]	2019	The DOORchain framework to improve the security of blockchain systems	Detects malicious SCs	Do not discuss the potential challenges of implementation	CNN	Etherscan platform, (SWC) registry, (SCSA)	Accuracy over 90%
Yasser et al. [15]	2022	Secure authentication approach	Improves security and Reduces latency	Less transmission rate	KNN	Transmitting request medical data	Accuracy 96%
Syed et al. [16]	2021	A system based on AI algorithms to protect SCs	Prediction and protect the SCs from cyber-attacks	Needs to be implemented in real and complex environments	Naive Bayes	Iris flowers, Pima- Diabetes and heart disease	Accuracy 94% 87% 62%
Deebak et al. [17]	2021	Privacy-preserving in SCs	Risk assessment	Less accuracy	Decision-tree, Naive Bayes, KNN	No	Accuracy 79%
Yingjie et al. [18]	2021	Analyzes the vulnerabilities of SCs based on ML	Detects many kinds of the SCs vulnerabilities	Cannot locate the line of code in the SC where the vulnerability occurs	KNN	Smart bug, SolidiFi-benchmark Smabugs-wilds.	Accuracy 91%
Kruthika et al. [19]	2021	A tamperproof and transparent healthcare system	Eliminates fraudulent insurance claims	Less accuracy	KNN Random-Forest	No	Accuracy 80%
Wesley et al. [20]	2019	Model to identify weaknesses of SCs using ML	Detects new attack trends relatively quickly	Cannot provide additional insights for analysis	long-short term memory (LSTM)	Ethereum blockchain dataset	Accuracy 92%
Rajesh et al. [21]	2020	Investigates various tools and AI techniques for SC privacy protection	Computational Intelligence to create robust cipher hashes	Not implemented to evaluate the performance parameters	NO	No	No
Soumya Ray et al. [22]	2022	A novel algorithm to detect DDoS attacks in the healthcare system	Works to prevent the access of attackers to the system	Detects one type of DDoS attack.	No	No	Prevent the DDoS attack

The related works have used different solutions based on either SC auditing tools (as presented in the first four rows in the table) or AI technology (The other rows) to secure SCs in blockchain networks. The SC auditing tools are weak in finding vulnerabilities in SCs code. They produce erroneous results and determine different problems in the same set of SCs. Consequently, they provide a dangerously false sense of security that attackers can abuse. According to AI solutions, the metrics used are few, and the results are still weak. In addition, it detects only specific types of attacks and vulnerabilities. Therefore, our proposed system consists of two security stages and uses several algorithms and datasets to enhance the results. Also, performance evaluation uses many metrics to ensure security and efficiency. Thereby producing a system with a high level of accuracy, security, and scalability compared to other solutions.

3. Proposed System

This section presents the proposed system for protecting the blockchain from vulnerabilities and attacks. It is based on ML algorithms and consists of two stages, as illustrated in Figure 1. The first stage is executed outside the blockchain network to check the robustness of medical data and detect non-standard data before having access to the blockchain. The second one is implemented inside the blockchain network and specifically in the blockchain nodes to avoid storing abnormal transactions in the blockchain. In the next subsection, we give an overview of the typical architecture of a blockchain-based smart healthcare system. More details about the proposed system are presented in the following sub-sections to ensure a comprehensive understanding of its functioning.

3.1. Architectural Layers of BC-Based Healthcare System

A blockchain-based smart healthcare system typically consists of several layers [23], each of which serves a distinct purpose as follows:

A.: Sensor layer: The sensor layer in a smart healthcare system refers to the layer of sensors and devices that collect health data from patients. This layer is critical for enabling real-time monitoring of patient health and providing personalized care;
B.: Application layer: This layer contains the applications and interfaces that allow users to interact with the system. This includes applications for patients, healthcare providers, and other stakeholders;
C.: Blockchain layer: The blockchain layer in a smart healthcare system refers to the layer of distributed ledger technology that underpins the system. This layer provides a secure and transparent way to store and share healthcare data, while ensuring privacy and confidentiality;
D.: Access layer: In a smart healthcare system, the access layer refers to the layer of the system that provides access to various components and services of the system, such as sensors, devices, networks, databases, and applications. The access layer plays a critical role in enabling users to interact with the system, retrieve and analyze data, and control various aspects of the system.

3.2. First Stage: Classifying Data outside the BC

In the first stage, ML models can be used to classify collected data into normal or abnormal based on pre-defined parameters such as heart rate, respiration rate, and temperature. Also, it uses the human vital signs dataset for the training model [24]. The result will classify data as normal; it will skip to the blockchain network. Otherwise, the abnormality will block it from passing to the blockchain (see Figure 2). There are several steps involved in sending data to the application after it has been classified:

Data collection: collect data (Electronic Health Records (EHRs) from various sources such as wearables and medical sensors;
Data processing: pre-process and transform the collected data into a format suitable for analysis;
AI model training: train an AI model using the pre-processed data to classify the collected data into normal or abnormal based on pre-defined parameters;
AI model evaluation: evaluate the performance of the AI model using a test dataset to ensure its accuracy;
Data classification: use the trained AI model to classify the collected data into normal or abnormal;
Data transmission: transmit the classified data to the healthcare application, typically through an application programming interface (API) or other data integration tools.

3.3. Second Stage: Classifying Transactions inside the BC

To train the model, the second stage uses the same ML algorithms and Ethereum fraud detection dataset [25]. This stage checks the transactions to determine whether it is normal or malicious data. If it is a normal transaction, it is stored in the blockchain. Otherwise, (i.e., it is an attack) will be blocked and marked as a novel attack and sent to the attacks database (see Figure 3). There are several steps that start from receiving data from the application layer and end with storing the transaction in the blockchain as follows:

7.

Receive data: The blockchain node acts as a mediator between the application layer and the blockchain network. When an application layer sends data to the blockchain node, the node receives the data and verifies its validity;

8.

Analysis transactions: To validate a transaction in a blockchain network, digital signatures are created using the sender’s private key, and each receiving node verifies them using the sender’s public key. The NODEID of each node in the network is compared to the broadcasting node to ensure authenticity, and timestamps may be used as well. Transactions that fail any of these checks are rejected and not added to the blockchain;

9.

Transaction classification: it consists of two subsections as follows:

Transaction classification using a marked attacks database creates a catalog of known attack patterns to identify and classify transactions, preventing fraudulent transactions in the blockchain. Characteristics such as transaction type and timestamp are used to check transactions against the database and flag them as high-risk (blocked) or low-risk;
The proposed system using machine learning can automate the transaction classification process, allowing for fraud detection, transaction monitoring, and optimization of the blockchain network. It collects and processes transaction data, extracts relevant features, trains and evaluates the model, and deploys it in real time.

10.

Transactions can be stored in the blockchain depending on their classification as normal or abnormal. Normal transactions are added to the blockchain by nodes after validation;

11.

Abnormal transactions flagged as high-risk are blocked and sent to the database of marked attacks. The database is continually updated to protect against new threats, maintaining the integrity of the blockchain system.

Machine learning (ML) models are used in the first stage to classify the collected data that comes from patients as normal or abnormal. The second stage also checks the transactions for normal or malicious. Normal transactions are stored in the blockchain, while abnormal (malicious) transactions are blocked and marked as new attacks. The process begins with receiving data from patients and ends with storing transactions in the blockchain. Figure 4 and Figure 5 also describe the architecture and workflow of the proposed system, which illustrates the all-whole stages of the proposed system as mentioned above.

Ultimately, the use of ML in blockchain-based smart healthcare can detect user behavior anomalies and prevent attacks by analyzing activity patterns and monitoring data integrity. ML can also develop predictive models using historical data to identify potential threats before they occur, thereby enhancing security and protecting sensitive patient data.

4. Performance Evaluation

The design stages of the proposed system pass through the data preparation stage, machine learning modeling, and Feature engineering. Data preparation or pre-processing transforms the raw data process making it suitable for analysis and running through machine learning algorithms to uncover insights and predictions [26]. Data preparation has several steps, such as data formatting, filtering, and data validation and cleansing.

After the data preparation stage, it is time to develop the model. This stage involves three substages:

Model Selection and Assessment. It is the process of selecting the model type to see which one performs better (regression model, classification model, etc.);
Model training: ML models tackle two problems, classification, and regression. There are algorithms used in each type, but some can be used in both types (for example, Decision Tree and KNN). In designing the ML algorithm, training is the most important step, which consists of passing the prepared data to the ML model to find patterns and make predictions. Over time, with training, the model gets better at predicting;
Model Evaluation: after doing training, the final stage includes model performance which is evaluated in terms of different metrics, such as accuracy and precision [22].

It is important to mention that feature engineering is the advanced step used to derive some extra important features from existing features, which are then used for better data modeling. The proposed system architecture consists of two parts. In the first part, the medical data is checked before passing to the blockchain network. The second is after passing data to the blockchain. The datasets are split into 80% and 20% sets for training and tests, respectively. The training set is utilized for training the model to predict and be tested by the test set. In the model, we adopted the following algorithms:

Logistic regression: It is a supervised ML algorithm used for classification problems. It aims to map a function from the dataset’s features to the targets to predict the probability that a new example belongs to one of the target classes [27];
K Nearest Neighbors (KNN): It uses the Euclidean distance between data points to obtain neighbor data to work in the coordinate plane with the linear decomposition method;
Support Vector Machine (SVM): SVM overcomes the problem of overfitting by expanding the concept of constructional risk minimization. Additionally, it examines the optimal hyperplane between the two classes [28];
Naive Bayes: It is named because the calculations of the probabilities for each class are simplified to make their calculations tractable. It considers a classification algorithm for binary (two-class) and multiclass classification problems;
Decision tree: It unites a series of the basic test efficiently and cohesively, where a numeric feature is compared to a threshold value in each test [29];
Random forest: It is used for classification and regression and works on gathering the results. The predictions of several decision trees. In the end, it chooses the best output and the mean prediction or mode of the classes [30].

Evaluation metrics are accuracy, precision, recall, F1-score, execution time, Receiver Operating Characteristic-Area Under Curve (ROC-AUC), and scalability. The confusion matrix derives all the metrics or scores [31]. A confusion matrix is used to evaluate a classification model’s performance. It compares the predicted values obtained from the machine learning model with actual target values. This way provides a holistic view of the performing classification model and the errors it may make [32].

As shown in Figure 6, the confusion matrix comprises different combinations of the predicted and actual values of a classifier (i.e., algorithm):

True Positive (TP): The number of times the predicted value matches the actual value. The actual value was positive, and the model predicted a positive value;
False Positive (FP): The number of times falsely predicted, where the actual value was negative, but the model predicted a positive value;
True Negative (TN): The number of times the actual value matches the predicted value. The actual value was negative, and the model predicted a negative value;
False Negative (FN): The number of times falsely predicted, where the actual value was positive, but the model predicted as a negative value.

Accuracy: It refers to the percentage of correct classifications a trained machine learning model achieves. It is the number of correct predictions divided by the total number of predictions across all classes, as in the following equation:

A c c u r a c y = \frac{T P + T N}{T P + F P + F N + T N}

(1)

2.: Precision: We used precision to calculate the model’s ability to classify positive values correctly. It represents the true positives divided by the total number of predicted positive values as in the following equation:

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

3.: Recall: It refers to how many actual positive cases the model could predict correctly. The recall is the true positives divided by the total number of actual positive values.

R e c a l l = \frac{T P}{T P + F N}

(3)

4.: F1-Score: It is the harmonic mean of recall and precision. It is useful when you need to combine them, and it has the maximum value when precision is equal to recall, as in the following equation:

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

5.: Execution time: execution time is the amount of time it takes for an algorithm to complete its operation and produce its output, including data pre-processing, data splitting, and model evaluation [33]. In general, shorter execution times are desirable since they indicate faster performance and better efficiency of machine learning models or algorithms.
6.: Receiver Operating Characteristic-Area Under Curve (ROC-AUC): It measures the ability of a binary classifier to distinguish between positive and negative classes across all possible threshold settings. It is widely used in machine learning to evaluate the performance of binary classification models, particularly in cases where the classes are imbalanced, or the cost of false positives and false negatives is different [34]. It ranges from 0 to 1, with higher values indicating better performance. The range 0.5 indicates that the classifier is performing at random, while the range 1 value indicates perfect classification.
7.: Scalability: Scalability in machine learning refers to a model’s ability to handle increasing amounts of data without requiring significant increases in processing time and resources [35]. It is crucial in applications involving large datasets, such as big data analytics, image recognition, natural language processing, and speech recognition. A scalable machine learning model can efficiently handle large datasets, making it practical for real-world use.

Scenario 1: Dataset 1

Table 2 lists the evaluation results of the proposed system using the algorithms according to the human vital signs dataset. A human vital signs dataset would typically include information about a person’s key physiological indicators that help assess their basic body functions [24]. These vital signs are essential for determining the overall health status of an individual. The most common vital signs include:

Heart rate (HR): The number of times a person’s heart beats per minute;
Blood pressure (BP): The force exerted by circulating blood on the walls of blood vessels, usually measured as systolic and diastolic pressures;
Respiratory rate (RR): The number of breaths a person takes per minute;
Body temperature (Temp): The internal temperature of the body, usually measured in degrees Celsius or Fahrenheit;
Oxygen saturation (SpO2): The percentage of oxygen-bound hemoglobin in the blood, indicating how well oxygen is being transported to various parts of the body.

We obtained the best results after the features selection method because it improves the model’s performance. Feature selection is a technique in machine learning that selects the most important and relevant features of input data, while discarding less important ones to improve the model’s accuracy and efficiency. It also helps remove noise in the data, improving the model’s ability to identify patterns and make accurate predictions. This technique is particularly important for dealing with large and complex datasets [36]. Accordingly, we illustrated the achieved accuracy of each algorithm in Figure 7. Table 2 and Figure 8 demonstrate that Random Forest achieved the highest accuracy (99.82), which means it is the best method for solution in this part of the proposed system.

Scenario 2: Dataset 2

The proposed system was evaluated using the same machine learning algorithms on the Ethereum Fraud Detection dataset. The most known blockchain networks are Ethereum, HyperLedger, and Corda. After a deep study, we have chosen to work with the Ethereum network as it is used in most healthcare applications, and available datasets include only Ethereum transactions. The Ethereum Fraud Detection dataset on Kaggle contains transaction data with both fraudulent and legitimate transactions. It has 10,000 records with 51 features representing transactions on the Ethereum network. Fraudulent transactions are labeled based on their characteristics, such as high-value transfers between addresses that have no real-world identity associated with them or multiple low-value transfers to obscure addresses. This dataset can be used for building predictive models using supervised learning techniques to classify new incoming transactions into either fraudulent or legitimate ones based on past patterns observed in historical datasets [25].

This dataset represents transactions inside the blockchain network, and obtained the results in terms of accuracy, precision, recall, and F1-score, as shown in Table 3. Accuracy is considered an important parameter to evaluate the system; Figure 8 illustrates and compares the accuracy of each algorithm with other algorithms to determine the best method for solution in this part of the proposed system.

Figure 9 compares the execution time of machine learning algorithms. The x-axis represents the name of the algorithms, while the y-axis represents the time in seconds. By looking at the figure, we can observe that the Nave Bayes, decision tree, and Random Forest algorithm achieved a short execution time, which means speed and best performance from the other algorithms.

As mentioned above, Receiver Operating Characteristic-Area Under Curve (ROC-AUC) measures the ability of a binary classifier to distinguish between positive and negative classes across all possible threshold settings. Figure 10 displays the (ROC-AUC) of machine learning algorithms. The x-axis represents the name of the algorithms, while the y-axis represents the range (0–1). By looking at the figure, we can note that the Random Forest algorithm achieved the best value (0.99), which means the best performance from the other algorithms.

Figure 11 displays the scalability of various machine-learning algorithms in the proposed system. The x-axis represents the training set size, while the y-axis represents training time (ms). The colored lines represent the algorithms of the proposed system. The model that uses more data in the shortest amount of time is considered the best. Figure 11 shows that the training time for all of the algorithms increases as the training set size increases. However, the rate of increase varies for each algorithm. KNN has the slowest rate of increase, followed by naive Bayes, decision trees, logistic regression, SVM, and random forest. We can observe that the KNN is the most scalable algorithm, as it can be trained on large training sets without significantly increasing the training time. Naive Bayes is also relatively scalable, but not as much as KNN. Logistic regression, decision tree, SVM, and random forest are all less scalable than KNN and naive Bayes. Overall, all algorithms achieved acceptable results, and they are scalable because the difference in training time between them is in order of ms.

In conclusion, the results of experiments conducted on two datasets. Although all the algorithms performed well after the pre-processing and feature selection, the Random Forest and Decision Tree achieved the best performance. The random forest achieved high values in terms of accuracy, precision, recall, F1-score, execution time, and ROC-AUC. In comparison, the Decision Tree algorithm achieved good values in terms of scalability and execution time. Generally, the random forest algorithm is considered the best solution among the rest of the algorithms for tackling the research problem.

5. Security Analysis

The proposed approach leverages the power of machine learning and blockchain features to identify attacks and their fraudulent behavior and prevent them from occurring. By addressing these concerns, our approach can provide a secure and reliable platform for healthcare providers and patients to interact and share information. We discuss the security analysis which we performed on the proposed BC-based healthcare system as follows:

I.: Identity theft: Identity theft in a blockchain network is the fraudulent acquisition of a user’s identity to manipulate their assets or data. The attacker typically steals the user’s private key to impersonate them and perform unauthorized transactions. Machine learning algorithms in the proposed system can help detect identity theft attacks on a blockchain network by analyzing user behavior patterns and detecting anomalies that may indicate fraudulent activity;
II.: Distributed Denial of Service (DDoS): DDoS attack in a blockchain network involve overwhelming the network with traffic, rendering it unavailable to users. Blockchain networks can use consensus mechanisms that require participants to perform computational work before contributing to the network, making it more difficult for attackers to mount a DDoS attack. The proposed approach can detect any attempts to flood the network with traffic to disrupt its normal functioning;
III.: Sybil attacks: Sybil attacks in a blockchain network involve creating multiple identities to control a significant portion of the network’s resources. The proposed system can detect any attempts to create multiple fake identities by using identity verification processes to prevent users from creating multiple identities. Additionally, it can detect fake nodes to manipulate the network’s consensus mechanism by analyzing network traffic and identifying clusters of nodes that behave abnormally or have suspicious communication patterns;
IV.: Routing attacks: Routing attacks in a blockchain network involve an attacker manipulating the network’s routing protocol to redirect traffic to malicious nodes, allowing the attacker to intercept, modify, or block network traffic. This attack cannot be ensured because our proposed approach analyzes network traffic patterns and identifies potential attacks or suspicious behavior;
V.: Decentralized Autonomous Organization (DAO) attacks: The DAO attack occurs when the attacker exploits a vulnerability in the code of the DAO SC, allowing them to siphon off a large amount of digital currency. The proposed system can detect this type of attack because it analyzes transactions to detect abnormal or fraudulent behavior, such as repeatedly requesting refunds multiple times using different accounts;
VI.: Billing fraud: Billing fraud is when someone submits false or misleading information to receive payment for services that were not provided or were misrepresented. In healthcare blockchain applications, it can occur when a healthcare provider submits false claims for reimbursement. Our system can detect billing fraud by analyzing data and identifying abnormal billing patterns, such as unusually high rates of certain procedures or services, or frequent billing for services that are not typically provided;
VII.: Reentrancy attack: A reentrancy attack is a type of SC vulnerability where an exploiter contract leverages the loophole of the victim contract to continuously withdraw from it until the victim contract goes bankrupt by using machine learning algorithms in the proposed system that analyzes the transactions. It can detect and identify patterns and behaviors of known reentrancy attacks.

6. Conclusions

Smart healthcare is a health service that uses technology and data to improve the quality of healthcare services, increase efficiency, and reduce costs. Blockchain technology plays an important role in smart healthcare by providing secure storage for sensitive patient information, such as medical records and personal identification details. Despite the benefits of integrating smart healthcare with blockchain, but are still many challenges. The major challenges are privacy and security issues because of the attacks and vulnerabilities of the blockchain network. This article proposes a novel approach for detecting and protecting against fraud in a blockchain-based smart healthcare network using machine learning techniques. The proposed system uses six machine learning algorithms to identify abnormal transactions within the blockchain network, enabling timely detection and prevention of fraudulent activities.

The proposed system achieved high levels of accuracy, precision, recall, F1-score, execution time, ROC-AUC, and scalability, suggesting its effectiveness in detecting and preventing fraud within the blockchain network. It presents the results of experiments conducted on two datasets. Although all the algorithms performed well after the pre-processing and feature selection, the Random Forest achieved the highest accuracy and execution time. Moreover, it is scalable.

The proposed system includes some limitations. In fact, it can be used only in the healthcare field, and it needs many adaptations to be implemented in other fields. Moreover, many novel attacks are emerging, and we cannot predict the reaction of our proposal to these attacks. Overall, the proposed system has the potential to enhance the security and reliability of blockchain-based healthcare applications and networks by detecting and preventing fraudulent activities using machine learning. Future work includes investigating the use of deep learning algorithms to improve the proposed system’s performance further. Additionally, we inspect using the proposed system in other BC-based applications to provide solutions for security issues in the blockchain network.

Finally, the article presented a promising solution for secure and efficient data transmission in IoT healthcare applications using AI and blockchain technologies. It also provided insights into the potential of these technologies to address security and privacy concerns in various domains beyond healthcare.

Author Contributions

M.A.M.: Conception and design of this study, Acquisition of data, Analysis and/or interpretation of data, writing—original draft, Writing—review and editing. M.B.: Conception and design of this study, analysis and/or interpretation of data, writing—review, and editing. M.A.: Conception and design of this study; writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, J.; Long, J.; von Schaewen, A.M.E. How Does Digital Transformation Improve Organizational Resilience? —Findings from PLS-SEM and fsQCA. Sustainability 2021, 13, 11487. [Google Scholar] [CrossRef]
Tian, S.; Yang, W.; Le Grange, J.M.; Wang, P.; Huang, W.; Ye, Z. Smart healthcare: Making medical care more intelligent. Glob. Health J. 2019, 3, 62–65. [Google Scholar] [CrossRef]
Mohanty, S.P.; Choppali, U.; Kougianos, E. Everything you wanted to know about smart cities: The Internet of things is the backbone. IEEE Consum. Electron. Mag. 2016, 5, 60–70. [Google Scholar] [CrossRef]
Zeadally, S.; Siddiqui, F.; Baig, Z.; Ibrahim, A. Smart healthcare: Challenges and potential solutions using internet of things (IoT) and big data analytics. PSU Res. Rev. 2020, 4, 149–168. [Google Scholar] [CrossRef] [Green Version]
Al Omar, A.; Jamil, A.K.; Khandakar, A.; Uzzal, A.R.; Bosri, R.; Mansoor, N.; Rahman, M.S. A Transparent and Privacy-Preserving Healthcare Platform with Novel SC for Smart Cities. IEEE Access 2021, 9, 90738–90749. [Google Scholar] [CrossRef]
Bishta, S.; Bishta, N.; Singha, P.; Dasilaa, S.; Nisar, K.S. Smart healthcare using blockchain technologies: The importance, applications, and challenges. Blockchain Appl. Healthc. Inform. 2022, 163–180. [Google Scholar] [CrossRef]
Sodhro, A.H.; Sennersten, C.; Ahmad, A. Towards Cognitive Authentication for Smart Healthcare Applications. Sensors 2022, 22, 2101. [Google Scholar] [CrossRef]
Sayeed, S.; Marco-Gisbert, H.; Caira, T. SC: Attacks and Protections. IEEE Access 2020, 8, 1. [Google Scholar] [CrossRef]
Available online: https://www.h-x.technology/blog/top-3-smart-contract-audit-tools (accessed on 23 March 2023).
Truong, T.C.; Diep, Q.B.; Zelinka, I. Artificial Intelligence in the Cyber Domain: Offense and Defense. Symmetry 2020, 12, 410. [Google Scholar] [CrossRef] [Green Version]
El-Dosuky, M.A.; Eladl, G.H. DOORchain: Deep Ontology-Based Operation Research to Detect Malicious SCs. In New Knowledge in Information Systems and Technologies; Springer Nature: Basel, Switzerland, 2019. [Google Scholar]
Zhou, H.; Fard, A.M.; Makanju, A. The State of Ethereum SC Security: Vulnerabilities, Countermeasures, and Tool Support. J. Cybersecur. Priv. 2022, 2, 358–378. [Google Scholar] [CrossRef]
Liu, H.; Liu, C.; Zhao, W.; Jiang, Y.; Sun, J. S-gram: Towards Semantic-Aware Security Auditing for Ethereum SCs. In Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE18), Montpellier, France, 3–7 September 2018. [Google Scholar]
Torres, C.F.; Baden, M.; Norvill, R.; Pontiveros, B.B.F.; Jonker, H.; Mauw, S. ÆGIS: Shielding Vulnerable SCs Against Attacks. arXiv 2020, arXiv:2003.05987. [Google Scholar]
Al-Otaibi, Y.D. K-nearest neighbour-based SC for internet of medical things security using blockchain. Comput. Electr. Eng. 2022, 101, 108129. [Google Scholar] [CrossRef]
Badruddoja, S.; Dantu, R.; He, Y.; Upadhayay, K.; Thompson, M. Making SCs Smarter. In Proceedings of the 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), Virtual, 3–6 May 2021. [Google Scholar]
Deebak, B.D.; AL-Turjman, F. Privacy-preserving in SCs using blockchain and artificial intelligence for cyber risk measurements. J. Inf. Secur. Appl. 2021, 58, 102749. [Google Scholar]
Xu, Y.; Hu, G.; You, L.; Cao, C. A Novel Machine Learning-Based Analysis Model for SC Vulnerability. Secur. Commun. Netw. 2021, 2021, 5798033. [Google Scholar] [CrossRef]
Alnavar, K.; Babu, D.C.N. Blockchain-based SC with Machine Learning for Insurance Claim Verification. In Proceedings of the 2021 5th International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques, Mysuru, India, 10–11 December 2021. [Google Scholar]
Tann, W.J.-W.; Han, X.J.; Gupta, S.S.; Ong, Y.-S. Towards Safer SCs: A Sequence Learning Approach to Detecting Security Threats. arXiv 2019, arXiv:1811.06632. [Google Scholar]
Gupta, R.; Tanwer, S.; AL-Turjman, F.; Italiya, P.; Nauman, A.; Kim, S.W. Smart Contract Privacy Protection Using AI in Cyber-Physical Systems: Tools, Techniques and Challenges. IEEE Access 2020, 8, 24746–24772. [Google Scholar] [CrossRef]
Ray, S.; Mishra, K.N.; Dutta, S. Detection and prevention of DDoS attacks on M-healthcare sensitive data: A novel approach. Int. J. Inf. Technol. 2022, 14, 1333–1341. [Google Scholar] [CrossRef]
Udupa, P. Smart home for elder care using wireless sensor. Circuit World 2018, 44, 69–77. [Google Scholar] [CrossRef]
Available online: https://www.kaggle.com/datasets/engrarri21/human-vital-signs (accessed on 27 March 2023).
Available online: https://www.kaggle.com/datasets/rupakroy/ethereum-fraud-detection (accessed on 29 March 2023).
Zhang, S.; Zhang, C.; Yang, Q. Data Preparation for Data Mining. Appl. Artif. Intell. 2010, 17, 375–381. [Google Scholar] [CrossRef]
Thabtah, F.; Abdelhamid, N.; Peebles, D. A machine learning autism classification based on logistic regression analysis. Health Inf. Sci. Syst. 2019, 7, 12. [Google Scholar] [CrossRef]
Available online: https://machinelearningmastery.com/method-of-lagrange-multipliers-the-theory-behind-support-vector-machines-part-3-implementing-an-svm-from-scratch-in-python/ (accessed on 7 April 2023).
Jijo, B.T.; Abdulazeez, A.M. Classification Based on Decision Tree Algorithm forMachine Learning. J. Appl. Sci. Technol. Trends 2021, 2, 20–28. [Google Scholar]
Kurdi, F.T.; Amakhchan, W.; Gharineiat, Z. Random Forest Machine Learning Technique for Automatic Vegetation Detection and Modelling in LiDAR Data. Int. J. Environ. Sci. Nat. Resour. 2021, 28, 556234. [Google Scholar] [CrossRef]
Yuvalı, M.; Yaman, B.; Tosun, Ö. Classification Comparison of Machine Learning Algorithms Using Two Independent CAD Datasets. Mathematics 2022, 10, 311. [Google Scholar] [CrossRef]
Available online: https://www.simplilearn.com/tutorials/machine-learning-tutorial/confusion-matrix-machine-learning#:~:text=A%20confusion%20matrix%20presents%20a,actual%20values%20of%20a%20classifier (accessed on 18 April 2023).
AlZoman, R.M.; Alenazi, M.J.F. A Comparative Study of Traffic Classification Techniques for Smart City Networks. Sensors 2021, 21, 4677. [Google Scholar] [CrossRef]
Mandrekar, J.N. Receiver Operating Characteristic Curve in Diagnostic Test Assessment. J. Thorac. Oncol. 2010, 5, 1315–1316. [Google Scholar] [CrossRef] [Green Version]
Cheng, D.; Zhang, H.; Xia, F.; Li, S.; Zhang, Y. The Scalability for Parallel Machine Learning Training Algorithm: Dataset Matters. arXiv 2020, arXiv:1910.11510. [Google Scholar]
Aziz, R.M.; Baluch, M.F.; Patel, S.; Ganie, A.H. LGBM: A machine learning approach for Ethereum fraud detection. Int. J. Inf. Technol. 2022, 14, 3321–3331. [Google Scholar] [CrossRef]

Figure 1. Block diagram of the proposed system.

Figure 2. The first stage of the proposed system.

Figure 3. The second stage of the proposed system.

Figure 4. The architecture of the proposed system.

Figure 5. The workflow of the proposed system.

Figure 6. Confusion matrix.

Figure 7. Accuracy of the algorithms using dataset 1.

Figure 8. Accuracy of the algorithms using dataset 2.

Figure 9. Execution time of machine learning algorithms in the proposed system.

Figure 10. Receiver Operating Characteristic-Area Under Curve (ROC-AUC).

Figure 11. Scalability of machine learning algorithms in the proposed system.

Table 2. Performance evaluation of the proposed system using dataset 1.

Model	Accuracy	Precision	Recall	F1-Score
Logistic Regression	97.79	98.55	98.63	98.59
KNN	99.66	99.81	99.84	99.83
SVM	99.76	99.81	99.83	99.88
Gaussian NB	97.5	98.65	97.92	98.28
Decision Tree	99.76	99.89	99.79	99.84
Random Forest	99.82	99.89	99.87	99.88

Table 3. Performance evaluation of the proposed system using dataset 2.

Model	Accuracy	Precision	Recall	F1-Score
Logistic Regression	84.05	88.48	32.26	47.2
KNN	95.68	92.6	87.51	89.97
SVM	84.55	80.55	39.88	53.34
Gaussian NB	30.88	23.99	97.88	38.54
Decision Tree	97.02	93.2	93.39	93.28
Random Forest	98.2	98.64	93.16	95.82

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mohammed, M.A.; Boujelben, M.; Abid, M. A Novel Approach for Fraud Detection in Blockchain-Based Healthcare Networks Using Machine Learning. Future Internet 2023, 15, 250. https://doi.org/10.3390/fi15080250

AMA Style

Mohammed MA, Boujelben M, Abid M. A Novel Approach for Fraud Detection in Blockchain-Based Healthcare Networks Using Machine Learning. Future Internet. 2023; 15(8):250. https://doi.org/10.3390/fi15080250

Chicago/Turabian Style

Mohammed, Mohammed A., Manel Boujelben, and Mohamed Abid. 2023. "A Novel Approach for Fraud Detection in Blockchain-Based Healthcare Networks Using Machine Learning" Future Internet 15, no. 8: 250. https://doi.org/10.3390/fi15080250

APA Style

Mohammed, M. A., Boujelben, M., & Abid, M. (2023). A Novel Approach for Fraud Detection in Blockchain-Based Healthcare Networks Using Machine Learning. Future Internet, 15(8), 250. https://doi.org/10.3390/fi15080250

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Approach for Fraud Detection in Blockchain-Based Healthcare Networks Using Machine Learning

Abstract

1. Introduction

2. Related Works

3. Proposed System

3.1. Architectural Layers of BC-Based Healthcare System

3.2. First Stage: Classifying Data outside the BC

3.3. Second Stage: Classifying Transactions inside the BC

4. Performance Evaluation

5. Security Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI