High Performance Classiﬁcation Model to Identify Ransomware Payments for Heterogeneous Bitcoin Networks

: The Bitcoin cryptocurrency is a worldwide prevalent virtualized digital currency conceptualized in 2008 as a distributed transactions system. Bitcoin transactions make use of peer-to-peer network nodes without a third-party intermediary, and the transactions can be veriﬁed by the node. Although Bitcoin networks have exhibited high efﬁciency in the ﬁnancial transaction systems, their payment transactions are vulnerable to several ransomware attacks. For that reason, investigators have been working on developing ransomware payment identiﬁcation techniques for bitcoin transac-tions’ networks to prevent such harmful cyberattacks. In this paper, we propose a high performance Bitcoin transaction predictive system that investigates the Bitcoin payment transactions to learn data patterns that can recognize and classify ransomware payments for heterogeneous bitcoin networks. Speciﬁcally, our system makes use of two supervised machine learning methods to learn the distinguishing patterns in Bitcoin payment transactions, namely, shallow neural networks (SNN) and optimizable decision trees (ODT). To validate the effectiveness of our solution approach, we evaluate our machine learning based predictive models on a recent Bitcoin transactions dataset in terms of classiﬁcation accuracy as a key performance indicator and other key evaluation metrics such as the confusion matrix, positive predictive value, true positive rate, and the corresponding prediction errors. As a result, our superlative experimental result was registered to the model-based decision trees scoring 99.9% and 99.4% classiﬁcation detection (two-class classiﬁer) and accuracy (multiclass classiﬁer), respectively. Hence, the obtained model accuracy results are superior as they surpassed many state-of-the-art models developed to identify ransomware payments in bitcoin transactions.


Introduction
The evolution of digitalization has resulted in massive users of the cryptocurrency market space [1]. Many cryptocurrencies exist in the market, and Bitcoin has become the most popular and the most valuable digital currency [2]. Bitcoin is a decentralized virtual system that uses a peer-to-peer network and was firstly introduced in 2008 by Satoshi Nakamoto [3]. In Bitcoin, the digital money is stored in virtual wallets and not owned or administered by a central authority [4]. In general, cryptocurrency users can effortlessly request payment transactions without user verification, and the payment addresses are generated anonymously [5]. Therefore, Bitcoin payment transactions have been intensively used around the globe [6]. In addition, Bitcoin transactions can be made between users through network nodes without a third-party intermediary, and the transactions can be verified by the nodes [7].
Bitcoin was built based on Blockchain technology, and this has brought several benefits to the network communication of Bitcoin, such as improving security, decentralization, and establishing trusted peer-to-peer networks [8]. However, the vast growth of Bitcoin users requires more research and investigation on cybercrime threats. Furthermore, Bitcoin is Ransomware is malware software that can lock users' data or screens; therefore, the users are blocked from accessing their own data. Ransomware decrypts the user data; consequently, the user is no longer able to decrypt those data. In order to decrypt the data, the user needs to pay ransom [1]. Ransomware is relatively a novel intrusion attack targeting the cryptocurrency data of Bitcoin users [12]. It was reported that more than 500 existing ransomware families can target Bitcoin data; therefore, more research is needed to identify and classify ransomware in order to prevent it [3]. Primarily, there are three genders of ransomware based on their functionality, as represented in Figure 2. The first type is crypto-ransomware, and in this type, the attacker encrypts the victim files and asks the victim to pay a ransom. Crypto-ransomware is considered the most threatening ransomware, and therefore it should be detected early [12]. The second type is locker-ransomware, where the attacker locks the victim screen and asks for a ransom. Finally, the third type is scareware, and in this type, the attacker only scares the victim and asks for a ransom to be paid [12].
Encryption is a powerful technique widely used in network security to prevent user data from unauthorized access. Basically, the message is encoded before sending and decoded once it is received. Encryption has enhanced the security of Bitcoin, but unfortunately, this technique can be misused by attackers. Therefore, attackers can block victim's files and extort ransom using ransomware [11]. Ransomware is malware software that can lock users' data or screens; therefore, the users are blocked from accessing their own data. Ransomware decrypts the user data; consequently, the user is no longer able to decrypt those data. In order to decrypt the data, the user needs to pay ransom [1]. Ransomware is relatively a novel intrusion attack targeting the cryptocurrency data of Bitcoin users [12]. It was reported that more than 500 existing ransomware families can target Bitcoin data; therefore, more research is needed to identify and classify ransomware in order to prevent it [3]. Primarily, there are three genders of ransomware based on their functionality, as represented in Figure 2. The first type is crypto-ransomware, and in this type, the attacker encrypts the victim files and asks the victim to pay a ransom. Crypto-ransomware is considered the most threatening ransomware, and therefore it should be detected early [12]. The second type is lockerransomware, where the attacker locks the victim screen and asks for a ransom. Finally, the third type is scareware, and in this type, the attacker only scares the victim and asks for a ransom to be paid [12].

Integrity
Encryption is a powerful technique widely used in network security to prevent user data from unauthorized access. Basically, the message is encoded before sending and decoded once it is received. Encryption has enhanced the security of Bitcoin, but unfortunately, this technique can be misused by attackers. Therefore, attackers can block victim's files and extort ransom using ransomware [11]. Therefore, it is essential to detect and identify the payment ransomware attack early before its encryption function occurs to help minimize its risk. However, late detection of ransomware increases the level of challenge of tackling the attack [13]. This paper proposes a novel model to detect ransomware payments early for heterogeneous Bitcoin networks. Basically, the Optimizable Decision Trees (Bootstrap Aggregation DT ensemble) method and Shallow Neural Network based on multilayer Perceptron (MLP) are used to decide whether the ransomware exists at the detection stage, and after that, they are used to classify ransomware attacks into three ransomware attacks family, including Ransomware-Montreal Family, Ransomware-Padua Family, and Ransomware-Princeton Family. To validate the proposed model, we compared our proposed model with other models, and the comparison result confirmed that our model competed with the existing models.
The rest of the paper is organized as follows: in Section 2, a literature review is represented. In Section 3, system modeling and configurations are discussed. Section 4 presents the results and discussion. Finally, in Section 5, we provide a conclusion of the research work.

Literature Review
This section discusses the latest technologies used by researchers to mitigate the impacts of ransomware attacks in cryptocurrency sectors. For instance, in [14], the authors proposed a pre-encryption detection algorithm (PEDA) to detect crypto-ransomware early. Detecting ransomware at an early stage means that the attack is detected before the beginning of its encryption function. When the attack is detected by the PEDA, the victim will be notified; therefore, sensitive data and files can be transferred to a new location while the ransomware is cleared. The PEDA contains two phases. In Phase I, the learning algorithm (LA) analyzes the Windows application programming interface (API) to identify the ransomware attack. When the attack is identified, then, in Phase II, a signature is generated and stored in the signature repository to be used later to detect ransomware in Phase I. To verify the performance of the proposed LA, the authors compared the false positive rate (FPR) of the LA with the other LAs, which are: EldeRan, Ensemble (NB and RF), Random Forest (RF), and Naive Bayes (NB). According to the authors, the proposed LA scored the lowest FPR result.
Deep learning approaches have been proposed by researchers to detect malware in the Bitcoin system [8]. In this reference [15], the authors used Long-Short Term Memory (LSTM) to detect ransomware. The LSTM was used to classify API sources to identify ransomware families based on their behavior. LSTM is a deep learning algorithm that provides an accurate classification result. Furthermore, it overcomes the vanishing gradient problem of classical Recurrent Neural Networks (RNN) [15]. Abbas Yazdinejad et al. [8] also used LSTM to detect cryptocurrency malware on Windows Operating System. The dataset used in this research consists of 200 legal cryptocurrency samples and 500 cryptocurrency malware samples. To evaluate the work, the authors compare the performance of the proposed method with K-Nearest Neighbor, Decision Tree, Random Forest, SVM, Naïve Bayes, and Ada-Boost using the 10-fold cross-validation (CV) metric. According to the authors, the LSTM detection accuracy reached 98%, exceeding the other detection algorithms.
In this paper [16], the authors proposed NetConver, a machine learning approach to analyze Windows ransomware in network traffic. NetConverse was used with Decision Tree (J48), and according to the authors, it achieved a 97.1% accuracy rate based on the True Positive Detection Rate (TPR) metric. The research experiment consists of three phases. In phase I, the dataset was collected by capturing the malicious network traffic. In phase 1, the authors focused only on Windows ransomware. The dataset has 9 ransomware families divided into two ransomware classes: crypto-ransomware and locker-ransomware. In phase II, the TShark software was used to extract features of the collected network traffic, and there were five different features. Finally, in phase III, the authors compared the proposed algorithm with five different classifier algorithms: Bayes Network, Random Forest, Multilayer Perceptron, k-nearest neighbors (KNN), and Least Mean Squares Filter (LMS) using TPR and FPR. According to the comparison result, the proposed algorithm defeated the five classifier algorithms.
In [17], the authors proposed a novel detection technique to detect ransomware attacks at an early stage based on dynamic pre-encryption boundary instead of fixed time pre-encryption boundary. In a fixed time pre-encryption boundary, the pre-encryption phase starts at a fixed time. However, in the dynamic pre-encryption boundary, the preencryption phase is determined by DPBD-FE, which refers to Dynamic Pre-Encryption Boundary Delineation and Feature Extraction. The fixed time pre-encryption boundary involves that the encryption time for all samples starts simultaneously; however, some malware could change its behavior which could decrease the detection accuracy. On the other hand, the dynamic pre-encryption boundary takes the encryption time for all samples at different times based on the cryptography-related APIs that occurs first. Therefore, the dynamic pre-encryption boundary is considered more accurate than the fixed time preencryption boundary.
Convolutional Neural Network (CNN) is a powerful deep learning tool that can be used to classify malware samples. However, the malware samples need to be first converted to grayscale images before the training process. Authors in [18] proposed a CNN model to classify malware samples. Two datasets were used in this research, Microsoft and Malimg. The Malimg dataset consists of 9339 malware samples that belong to 25 malware families. The Microsoft dataset comprises 21,741 malware samples that belong to 9 malware families. The CNN classification performed higher on the Microsoft dataset than the Malimg dataset. Authors in [19] also used CNN to classify malware samples. However, the malware samples were converted to color images this time. The authors used two datasets, Malimg, which was also used in [18], and IoT-android mobile. The accuracy of the classification in the Malimg dataset reached 98.82%, slightly better than [18], which was 98.52%. However, the accuracy of CNN in the IoT-android mobile dataset was 97.35%.
In order to increase the accuracy of the CNN, authors in [20] used Markov images with the CNN instead of gray images. Therefore, the malware samples were converted to Markov images before training the CNN. This classification technique was applied to two datasets, Drebin and Microsoft. The Microsoft dataset consists of 10,868 malware samples that belong to 9 malware families, and the Drebin datasets consist of 4020 malware samples that belong to 10 malware families. According to the authors, the average accuracy of the proposed method achieved 97.364% in the Drebin dataset. Therefore, we can say that processing the dataset in different ways while using the same classifier method result in various classification accuracy. This is evident since the authors in [18,19] and [20] used the same method, which was CNN; however, different techniques of processing datasets were applied. In [18], the malware samples were converted to grayscale images. In [19], they were converted to color images, and in [20], converted to Markov images. Most of the state-of-art solutions are usually developed to assess security triad of Blockchain technology in public sector applications [21].

System Development and Specifications
This section provides an explanation of the employed dataset for the bitcoin payment transactions over a heterogeneous network, a detailed description of the development stages of the proposed machine learning-based classification system, and finally, it mentions the simulation and experimental setup configurations for the system development and validation processes.

Dataset of Bitcoin Transactions
Bitcoin is a distributed digital currency system that records transactions in a distributed archive called a Blockchain [22]. The Bitcoin transactions dataset [23] is used to evaluate the performance of our system. This dataset contains address features on the heterogeneous Bitcoin network to identify ransomware payments. The authors have traced, downloaded, and analyzed the whole graph of Bitcoin transactions for ten consecutive years (January 2009 to December 2018) [23]. They have extracted the daily transactions on the network and formed the Bitcoin graph using a time interval of 24 hours. Since ransomware values are usually above the B0.3, the authors have filtered any network edge that transfers less than this threshold. Ransomware addresses are taken from three widely adopted studies: Montreal, Princeton, and Padua. For full information about the creation and preparation of this dataset, we refer the reader to the BitcoinHeist article [23]. However, a sample Bitcoin transactions' graph which has been used by the authors is provided in Figure 3 below. In this figure, the authors consider a toy network of 10 addresses and 7 transactions where dashed edges indicate transaction outputs from earlier windows; t1, t3, t4, and t5 are starter transactions. Coin amounts are shown on the edges, and the transaction outputs are equal to transaction inputs, i.e., transaction fees are 0.
10, x FOR PEER REVIEW 16 of 18 of the proposed method achieved 97.364% in the Drebin dataset. Therefore, we can say that processing the dataset in different ways while using the same classifier method result in various classification accuracy. This is evident since the authors in [18,19] and [20] used the same method, which was CNN; however, different techniques of processing datasets were applied. In [18], the malware samples were converted to grayscale images. In [19], they were converted to color images, and in [20], converted to Markov images. Most of the state-of-art solutions are usually developed to assess security triad of Blockchain technology in public sector applications [21].

System Development and Specifications
This section provides an explanation of the employed dataset for the bitcoin payment transactions over a heterogeneous network, a detailed description of the development stages of the proposed machine learning-based classification system, and finally, it mentions the simulation and experimental setup configurations for the system development and validation processes.

Dataset of Bitcoin Transactions
Bitcoin is a distributed digital currency system that records transactions in a distributed archive called a Blockchain [22]. The Bitcoin transactions dataset [23] is used to evaluate the performance of our system. This dataset contains address features on the heterogeneous Bitcoin network to identify ransomware payments. The authors have traced, downloaded, and analyzed the whole graph of Bitcoin transactions for ten consecutive years (January 2009 to December 2018) [23]. They have extracted the daily transactions on the network and formed the Bitcoin graph using a time interval of 24 hours. Since ransomware values are usually above the B0.3, the authors have filtered any network edge that transfers less than this threshold. Ransomware addresses are taken from three widely adopted studies: Montreal, Princeton, and Padua. For full information about the creation and preparation of this dataset, we refer the reader to the article [23]. However, a sample Bitcoin transactions' graph which has been used by the authors is provided in Figure 3 below. In this figure, the authors consider a toy network of 10 addresses and 7 transactions where dashed edges indicate transaction outputs from earlier windows; t1, t3, t4, and t5 are starter transactions. Coin amounts are shown on the edges, and the transaction outputs are equal to transaction inputs, i.e., transaction fees are 0. As a result of their tracing model, they end up with a dataset of 2,916,697 records (around 3 million records). Each record is attributed 10 features, including Bitcoin address  As a result of their tracing model, they end up with a dataset of 2,916,697 records (around 3 million records). Each record is attributed 10 features, including Bitcoin address (categorical data), year (numerical data), day of the year (numerical data), the length (numerical data), weight (numerical data), count (numerical data), looped (numerical data), neighbors (numerical data), income (numerical data), Satoshi amount (numerical data), and the label feature (numerical data), Category String. For the class label, the dataset considers either the white transaction (normal) accumulating 2,875,284 of the total number of records, or the ransomware transaction (anomaly) accumulating 41,413 records distributed into three different ransomware families of 28 attacks. The statistical specifications of the dataset are provided in Table 1, considering all class labels provided in the dataset. Based on the table, the statistical distribution of records between families seems to be almost balanced, with 13,163 Montreal Family, 12,402 Padua Family, and 15,848 Princeton Family.

System Modeling
The classification process is an intelligent task that predicts the class label of a given data record by utilizing machine learning algorithms [24]. Machine learning algorithms used to build predictive modeling to approximate the mapping for the target output based on a number of features are called supervised machine learning models [25]. Bitcoin transactions identification is a typical example of classification tasks that requires the engagement of supervised machine learning algorithms to build a predictive model to uncover anomalies (i.e., ransomware payments) and classify the bitcoin payments over heterogeneous bitcoin networks. Like any financial system, bitcoin data is temporal in nature, as transactions are time-stamped. Thus, several labeled historical market data can be collected and afforded to train the bitcoin predictive model and test the accuracy of the machine learning models in the bitcoin transactions system.
In this paper, we consider bitcoin transaction recognition as a transaction classification problem to identify and classify the ransomware payments for heterogeneous bitcoin networks. Specifically, we are concerned in experimentally examining and designing a learning-based self-reliant classification scheme that learns and investigates the prescribed features of bitcoin payment transactions to recognize trustworthy and ransomware payments for better-quality cryptocurrency practices and transactions. More precisely, our system development is composed of five stages, as illustrated in Figure 4. (categorical data), year (numerical data), day of the year (numerical data), the length (numerical data), weight (numerical data), count (numerical data), looped (numerical data), neighbors (numerical data), income (numerical data), Satoshi amount (numerical data), and the label feature (numerical data), Category String. For the class label, the dataset considers either the white transaction (normal) accumulating 2,875,284 of the total number of records, or the ransomware transaction (anomaly) accumulating 41,413 records distributed into three different ransomware families of 28 attacks. The statistical specifications of the dataset are provided in Table 1, considering all class labels provided in the dataset. Based on the table, the statistical distribution of records between families seems to be almost balanced, with 13,163 Montreal Family, 12,402 Padua Family, and 15,848 Princeton Family.

System Modeling
The classification process is an intelligent task that predicts the class label of a given data record by utilizing machine learning algorithms [24]. Machine learning algorithms used to build predictive modeling to approximate the mapping for the target output based on a number of features are called supervised machine learning models [25]. Bitcoin transactions identification is a typical example of classification tasks that requires the engagement of supervised machine learning algorithms to build a predictive model to uncover anomalies (i.e., ransomware payments) and classify the bitcoin payments over heterogeneous bitcoin networks. Like any financial system, bitcoin data is temporal in nature, as transactions are time-stamped. Thus, several labeled historical market data can be collected and afforded to train the bitcoin predictive model and test the accuracy of the machine learning models in the bitcoin transactions system.
In this paper, we consider bitcoin transaction recognition as a transaction classification problem to identify and classify the ransomware payments for heterogeneous bitcoin networks. Specifically, we are concerned in experimentally examining and designing a learning-based self-reliant classification scheme that learns and investigates the prescribed features of bitcoin payment transactions to recognize trustworthy and ransomware payments for better-quality cryptocurrency practices and transactions. More precisely, our system development is composed of five stages, as illustrated in Figure 4.   Data collection is a systematic process of gathering accurate observations in the forms of records from a diversity of adequate sources to develop data analytics models that can be used to address research questions, validate experimental outcomes, and develop prediction models as well as draw insights to help decision-makers. Managing and analyzing data have always offered the greatest benefits and the greatest challenges for organizations of all sizes and across all industries.
In this paper, the data records for the bitcoin transactions dataset have been originally assimilated and collected from the internet environment of several heterogeneous bitcoin payments networks before getting arranged into systematically structured datasets. Specif-ically, the bitcoin transaction dataset has been downloaded and parsed from the entire bitcoin transaction networks on a daily basis for 10 consecutive years (from 2009 January to 2018 December). As a result, ransomware addresses are taken from three widely adopted studies, namely: the Montreal family, Princeton family, and Padua family, in addition to the normal transactions. To gain more insight into the single bitcoin transaction, Figure 5 shows the bitcoin payment transaction life-cycle from the transaction request to the transaction response.

Data Preprocessing Stage
This stage concerns applying a consecutive set of transformation processes over the data records to bring them into a form that can be easily interpreted by the machine learning techniques [26]. In other words, the features of the data can now be easily interpreted by the algorithm. Specifically, Figure 6 illustrates the preprocessing stages performed at this stage. • Data Import: this operation is the first operation of the predictive model. It is responsible for reading the original dataset of the CSV (Comma-Separated Values) file format into MAT (MATLAB data units) as a matrix of the double data-type.
• Feature selection: this operation concerns selecting the most adequate attributes (i.e., variables or columns) and eliminating any inadequate attributes from the dataset of the classification task at hand.
• Class Labeling: this operation concerns transforming the categorical (text) data records of the class feature into numerical data records that can be fed and manipulated by machine learning techniques. In the ODT model, the integer encoding technique was used to encode the labels for the two-class model as (1) for the normal transaction and (2) for an anomaly transaction, as well as for the multiclass model as (1) for the normal transaction, (2) for Montreal ransomware, (3) for Princeton ransomware, and (4) for Padua

Data Preprocessing Stage
This stage concerns applying a consecutive set of transformation processes over the data records to bring them into a form that can be easily interpreted by the machine learning techniques [26]. In other words, the features of the data can now be easily interpreted by the algorithm. Specifically, Figure 6 illustrates the preprocessing stages performed at this stage.

Data Preprocessing Stage
This stage concerns applying a consecutive set of transformation processes over the data records to bring them into a form that can be easily interpreted by the machine learning techniques [26]. In other words, the features of the data can now be easily interpreted by the algorithm. Specifically, Figure 6 illustrates the preprocessing stages performed at this stage. • Data Import: this operation is the first operation of the predictive model. It is responsible for reading the original dataset of the CSV (Comma-Separated Values) file format into MAT (MATLAB data units) as a matrix of the double data-type.
• Feature selection: this operation concerns selecting the most adequate attributes (i.e., variables or columns) and eliminating any inadequate attributes from the dataset of the classification task at hand.
• Class Labeling: this operation concerns transforming the categorical (text) data records of the class feature into numerical data records that can be fed and manipulated by machine learning techniques. In the ODT model, the integer encoding technique was used to encode the labels for the two-class model as (1) for the normal transaction and (2) for an anomaly transaction, as well as for the multiclass model as (1) for the normal transaction, (2) for Montreal ransomware, (3) for Princeton ransomware, and (4) for Padua • Data Import: this operation is the first operation of the predictive model. It is responsible for reading the original dataset of the CSV (Comma-Separated Values) file format into MAT (MATLAB data units) as a matrix of the double data-type. • Feature selection: this operation concerns selecting the most adequate attributes (i.e., variables or columns) and eliminating any inadequate attributes from the dataset of the classification task at hand. • Class Labeling: this operation concerns transforming the categorical (text) data records of the class feature into numerical data records that can be fed and manipulated by machine learning techniques. In the ODT model, the integer encoding technique was used to encode the labels for the two-class model as (1) for the normal transaction and (2) for an anomaly transaction, as well as for the multiclass model as (1) for the normal transaction, (2) for Montreal ransomware, (3) for Princeton ransomware, and (4) for Padua ransomware. In the SNN model, one-hot encoding was used to encode the labels for the two-class model as (01) for a normal transaction and (10) for an anomaly transaction, as well as for the multiclass model as (0001) for normal transaction, (0010) for Montreal ransomware, (0100) for Princeton ransomware, and (1000) for Padua ransomware. • Data Normalization: this operation concerns normalizing all integer quantities of the dataset matrix into a range between 0 and 1 using min-max normalization [26]. Min-max normalization changes the values of numerical data in the dataset to be on a common scale without losing any information. • Records Shuffling: this operation concerns mixing up the dataset records while preserving the logical relationships between dataset features. The shuffling algorithm is performed randomly, and it helps in enhancing the classifier classification by avoiding any biasing toward specific data labels into the dataset [24]. • Dataset Distribution: this operation concerns randomly dividing dataset targets into three datasets as follows: Training dataset (70% of the original dataset) used for model learning (training), Validation dataset (5% of the original dataset) used to validate the model during the learning process, and Testing dataset (25% of the original dataset) used to test the model prediction and calculate prediction accuracy (for detection and classification).

Machine Learning Stage
The machine learning stage is the principal stage of this predictive model, where the whole learning process for the data pattern takes place. Two supervised machine learning mechanisms were utilized in this work to construct the classification models, namely, shallow neural networks (SNNs) and optimizable decision trees (ODT).
Shallow Neural Network (SNN) is a feedforward multilayer Perceptron (MLP) neural network that is widely used in pattern recognition and classification tasks in different artificial intelligence and machine learning applications [27]. In SNN, data records acquainted with the neural network pass through an individual hidden (processor) layer to process the pattern recognition task and provide the output proration for each class label at the output layer. Figure 7 illustrates the SNN model architecture developed in this classification task. In this figure, we show only the model for the multiclass classification. For the detection model, the only difference is to replace the output layer with two neurons instead of four neurons to provide the probabilistic values for the two classes (normal vs. anomaly). According to the figure, our SNN has 8 features that are provided by the dataset to be fed at the input layer of the SNN and then processed at 50 neurons of the hidden/processor layer in advance to compute the numerical probabilities for the 4 classes at the output layer represented by the four neurons corresponding to the four labels.
Decision trees are a well-known supervised machine learning technique that are widely used to perform classification tasks with high-performance classification accuracy. The bagging classifier algorithm (also known as Bootstrap Aggregation Decision trees) [28] is used mainly to decrease the discrepancy of a decision tree by generating a number of subgroups of data records from training datasets that are selected randomly with replacement. Then, every subgroup of data records is employed to train their decision trees. Consequently, the process will result in an ensemble of different models. Finally, the mean value of all the predictions from all decision trees is computed to produce one overall robust decision. Since the ODT model has a vast amount of tree splits, it is not feasible to be shown in a normal figure. Alternatively, we are illustrating the process flow in Figure 8 to show how the bagging decision tree classifier works in the classification tasks [29]. the detection model, the only difference is to replace the output layer with two neurons instead of four neurons to provide the probabilistic values for the two classes (normal vs. anomaly). According to the figure, our SNN has 8 features that are provided by the dataset to be fed at the input layer of the SNN and then processed at 50 neurons of the hidden/processor layer in advance to compute the numerical probabilities for the 4 classes at the output layer represented by the four neurons corresponding to the four labels. Decision trees are a well-known supervised machine learning technique that are widely used to perform classification tasks with high-performance classification accuracy. The bagging classifier algorithm (also known as Bootstrap Aggregation Decision trees) 10, x FOR PEER REVIEW 16 of 18 [28] is used mainly to decrease the discrepancy of a decision tree by generating a number of subgroups of data records from training datasets that are selected randomly with replacement. Then, every subgroup of data records is employed to train their decision trees. Consequently, the process will result in an ensemble of different models. Finally, the mean value of all the predictions from all decision trees is computed to produce one overall robust decision. Since the ODT model has a vast amount of tree splits, it is not feasible to be shown in a normal figure. Alternatively, we are illustrating the process flow in Figure  8 to show how the bagging decision tree classifier works in the classification tasks [29].

Detection and Classification Stages
This stage concerns the final output layer of our predictive model, where two modes of operations are used at the output layer of our predictive model, including: • Detection Mode: Produce the output using a two-class classifier as a normal transaction or anomaly transaction, using either an ensemble classifier for the ODT model or a Sigmoid classifier for the SNN model.

Detection and Classification Stages
This stage concerns the final output layer of our predictive model, where two modes of operations are used at the output layer of our predictive model, including: • Detection Mode: Produce the output using a two-class classifier as a normal transaction or anomaly transaction, using either an ensemble classifier for the ODT model or a Sigmoid classifier for the SNN model. • Classification Mode: Produce the output using a four-class classifier as a normal transaction, Montreal ransomware, Padua ransomware, or Princeton ransomware, using either an ensemble classifier for the ODT model or Softmax classifier for the SNN model.

Development and Validation Environment
To implement and evaluate the proposed Bitcoin attacks detection and classification models, the training and testing phases were performed on the Bitcoin Transactions 2020 dataset comprising the main ransomware attacks against heterogeneous bitcoin networks. In addition to the two aforementioned machine learning models (i.e., SNN and ODT), two classifier schemes were implemented; two classes (ransomware detection) or four classes (ransomware classification). The models mentioned above have been developed using MATLAB 2020b computing platform utilizing our high-performance commodity machine operating with multiprocessing CPU system with multicore NVIDIA GPUs system. Finally, to sum up, Table 2 provides a brief description of the experimental environment configurations and considerations.
Validation Strategy 5-Validation Checks and 5-Fold Cross Validation [34]. Data records Shuffling process is executed at each Epoch.

Results and Discussion
In this paper, we propose machine learning-based predictive models to automate the detection and classification for bitcoin payment transactions in heterogeneous bitcoin networks. The models have been trained and tested using a comprehensive, up-to-date, and large dataset comprising 29 different types of bitcoin payment transactions grouped into two categories (normal vs. anomaly) used for the detection model or four categories (normal, Montreal, Padua, Princeton) for the classification model. In order to analyze the effectiveness of the proposed predictive models, we have evaluated their performance using several evaluation metrics to provide more insights about the performance of proposed predictive models.
To begin, Figure 9 illustrates the performance analysis trajectories for Figure 9a, the SNN based detection system and Figure 9b, the SNN based classification system. The crossentropy loss/cost function has been investigated in both classifiers to track the status for misclassification error for the validation/testing phases. Cross-entropy loss/cost function functions are used to optimize the classification models during training, aiming to minimize the loss function. Normally, the lower the loss, the better the model. Consequently, the best validation performance for the detection system has been recorded at epoch 145 with a 0.0325 value of cross-entropy loss, while the best validation performance for the classification system has been recorded at epoch 97 with a 0.0225 value of cross-entropy loss. Therefore, both models performed as near-perfect models, since their cross-entropy loss values are approaching 0 (≤0.1). two categories (normal vs. anomaly) used for the detection model or four categories (normal, Montreal, Padua, Princeton) for the classification model. In order to analyze the effectiveness of the proposed predictive models, we have evaluated their performance using several evaluation metrics to provide more insights about the performance of proposed predictive models.
To begin, Figure 9 illustrates the performance analysis trajectories for Figure 9a, the SNN based detection system and Figure 9b, the SNN based classification system. The cross-entropy loss/cost function has been investigated in both classifiers to track the status for misclassification error for the validation/testing phases. Cross-entropy loss/cost function functions are used to optimize the classification models during training, aiming to minimize the loss function. Normally, the lower the loss, the better the model. Consequently, the best validation performance for the detection system has been recorded at epoch 145 with a 0.0325 value of cross-entropy loss, while the best validation performance for the classification system has been recorded at epoch 97 with a 0.0225 value of crossentropy loss. Therefore, both models performed as near-perfect models, since their crossentropy loss values are approaching 0 (≤0.1).
(a) (b) Figure 9. Performance analysis trajectories using cross-entropy cost for (a) SNN based detection system and (b) SNN based classification system.
In addition, Figure 10 illustrates the performance analysis trajectories: Figure 10a, the ODT-based detection system and Figure 10b, the ODT-based classification system. The mean square loss/cost function has been investigated in ODT classifiers to track the status for minimum classification error during the 30 iterations of the learning process by tracking: (a) The estimated minimum classification error, (b) The observed minimum classification error, (c) The best point hyperparameters, and (d) The minimum error hyperparameters. Consequently, the best validation performance for the detection system has been recorded at iteration 19 with a 0.0012 value of minimum classification error, while the best validation performance for the classification system has been recorded at iteration 29 with a 0.0055 value of minimum classification error. Therefore, both models performed as almost perfect models since their minimum classification error values are almost 0 (≤0.01).
Moreover, Figure 11 illustrates the performance analysis using confusion matrix records for (a) the ODT-based detection system and (b) the ODT-based classification system (since ODT performed better, we show only the confusion matrix for ODT models). The confusion matrix is a summarized table of the number of correct and incorrect predictions yielded by the classification model for binary/multi-classification tasks [35]. Specifically, the confusion matrix provides records for four performance indicator metrics for every predicted class label: namely, the true positive record, which counts the number of samples the model correctly predicts for the positive class; the true negative record, which counts the number of samples the model correctly predicts for the negative class; a false positive record, which counts the number of samples the model incorrectly predicts for the positive class when the actual class is negative; and the false negative record, which counts the number of samples the model incorrectly predicts for the negative class when the actual class is positive [35]. Since the good classification model will normally generate confusion matrix results with large values across the diagonal and small values off the diagonal, this shows that our predictive models are high performant models for both detection and classification, especially for the models build using optimizable decision trees. Besides, Figure 12 shows the organization we followed in our confusion matrices along with the formulas used to calculate the corresponding metrics (Precision, Recall, Accuracy). Please note that TP and FP correspond to the ransomware records (the minority class) while TN and FN correspond to the normal records (the majority class). In addition, Figure 10 illustrates the performance analysis trajectories: Figure 10a, the ODT-based detection system and Figure 10b, the ODT-based classification system. The mean square loss/cost function has been investigated in ODT classifiers to track the status for minimum classification error during the 30 iterations of the learning process by tracking: (a) The estimated minimum classification error, (b) The observed minimum classification error, (c) The best point hyperparameters, and (d) The minimum error hyperparameters. Consequently, the best validation performance for the detection system has been recorded at iteration 19 with a 0.0012 value of minimum classification error, while the best validation performance for the classification system has been recorded at iteration 29 with a 0.0055 value of minimum classification error. Therefore, both models performed as almost perfect models since their minimum classification error values are almost 0 (≤0.01). Moreover, Figure 11 illustrates the performance analysis using confusion matrix records for (a) the ODT-based detection system and (b) the ODT-based classification system (since ODT performed better, we show only the confusion matrix for ODT models). The confusion matrix is a summarized table of the number of correct and incorrect predictions yielded by the classification model for binary/multi-classification tasks [35]. Specifically, the confusion matrix provides records for four performance indicator metrics for every predicted class label: namely, the true positive record, which counts the number of samples the model correctly predicts for the positive class; the true negative record, which counts the number of samples the model correctly predicts for the negative class; a false positive record, which counts the number of samples the model incorrectly predicts for the positive class when the actual class is negative; and the false negative record, which counts the number of samples the model incorrectly predicts for the negative class when the actual class is positive [35]. Since the good classification model will normally generate confusion matrix results with large values across the diagonal and small values off the diagonal, this shows that our predictive models are high performant models for both detection and classification, especially for the models build using optimizable decision trees. Besides, Figure 12 shows the organization we followed in our confusion matrices along with the formulas used to calculate the corresponding metrics (Precision, Recall, Accuracy). Please note that TP and FP correspond to the ransomware records (the minority class) while TN and FN correspond to the normal records (the majority class).   Furthermore, based on the results obtained for the confusion matrix, Table 3 rizes the model evaluation metrics [35] for (a) the ODT based detection system ODT based classification system, in terms of:  Furthermore, based on the results obtained for the confusion matrix, Table 3 summarizes the model evaluation metrics [35] for (a) the ODT based detection system and (b) ODT based classification system, in terms of:  The obtained results exhibit the extraordinary performance indicators of our predictive models with greater figures obtained for the predictive models based ODT of both detection (two-class) and classification (multiclass) systems scoring accuracy values of 99.9% and 99.4%, respectively, compared to the accuracy of the detection and classification systems based SNN, which scored 98.6% and 98.4%, respectively.
Finally, to realize advanced observations of the advantage of the proposed ODT based predictive model for bitcoin payment transactions, we contrast the classification accuracy of our detection/classification models employing ODT with several other existing up-to-date machine learning-based bitcoin payment transactions detection/classification models engaging different machine learning methods to detect and/or classify the ransomware transactions in the bitcoin payments applications. The comparison is provided in Table 4 below. Since classification accuracy is the vital performance evaluator to indicate the robustness of machine learning-based models, we have centered our comparison with the model's classification accuracy values that are reported in the literature. According to the comparison table, the proposed model is competent and superior to provide detection and classification for the ransomware payment transactions in heterogeneous bitcoin networks.

Conclusions and Future Directions
A self-reliant and intelligent ransomware detection and predictive classification system for bitcoin transactions in heterogeneous bitcoin networks has been developed, investigated, and evaluated in this paper. The proposed system employs two supervised machine learning methods to recognize data patterns in bitcoin payment transactions, namely, shallow neural networks (SNN) and optimizable decision trees (ODT). The proposed predictive models have been evaluated using a recent, up-to-date and comprehensive bitcoin transactions dataset (BitcoonHeist2020) via several performance evaluation metrics such as classification accuracy, precision, and recall. Consequently, the validation testing of model experimentation recorded 98.6% and 99.9% for the bitcoin transaction detection accuracy (two-class classifier) as well as 98.4% and 99.4% bitcoin transaction classification accuracy (multiclass classifier), using SNN and ODT, respectively. Our best-achieved accuracy results have transcended the accuracy results for several existing bitcoin transactions predictive models. In the future, we will consider carrying out other important analysis of the model quality measures, such as the area under Precision/Recall curve (Average Precision), and precision/recall curve as a function of the threshold in our future ML development, especially when we deal with an imbalanced dataset. Moreover, we will consider analyzing the impact of hyper-parameters tuning on the predictive model results. Funding: This research received no external funding.

Data Availability Statement:
The BitcoinHeist 2020 dataset employed in this research can be retrieved from from UCI Machine Learning Repository at: https://archive.ics.uci.edu/ml/datasets/ BitcoinHeistRansomwareAddressDataset.