Efficient Deep Learning-Based Cyber-Attack Detection for Internet of Medical Things Devices †

: The usage of IoT in the medical field, often referred to as IoMT, plays a vital role in facilitating the exchange of sensitive data among medical devices. This capability significantly contributes to enhancing the quality of patient care. However, it comes with privacy issues that compromise the security of the data collected by medical sensors, making them vulnerable to potential cyber threats such as data modification, replay attacks, etc. These attacks can lead to significant data loss or unauthorized alterations. Machine learning, particularly in cyber-attack detection systems, is crucial for identifying and classifying such attacks. Yet, the main challenge lies in adapting to the dynamic and unpredictable nature of malicious attacks and creating scalable solutions to combat them. The objective of this paper is to detect cybersecurity threats, with a particular focus on man-in-the-middle attacks that occur within the IoMT communication network. The study utilizes principal component analysis (PCA) for feature reduction and employs multi-layer perceptron to classify unforeseen cyber-attack IoT-based healthcare devices. The study evaluates the effectiveness of this proposed strategy using real-time data from the St. Louis Enhanced Healthcare Monitoring System (WUSTL-EHMS). The findings indicate that the multi-layer perceptron outperforms other tested classifiers, achieving an accuracy score of 96.39%, while also improving the performance by reducing the time complexity.


Introduction
The usage of Internet of Medical Things (IoMT) devices has been rapidly increasing in the health industry from monitoring the patient to forming and safeguarding digital health records.Utilizing medical equipment like implanted medical devices is one example of IoMT.These devices are used for both monitoring as well as actuating treatment.For example, in the case of pacemakers, data collected from the sensors are sensitive and need to be sent over to the server via the Internet.These data are then accessed by other parties of the hospital that include the physician, caregiver, etc.However, these devices are vulnerable to data spoofing attacks and man-in-the-middle attacks.
In order to provide an efficient solution, deep learning algorithms can be used for the detection of intrusions and anomalies over the communication network in IoT-based healthcare devices [1].These algorithms can be used to observe the real-time network traffic to detect any deviations from normal behavior that may indicate a threat to security [2].Machine learning is capable of detecting various cyber-attacks such as man-in-the-middle (MITM) attacks, data injection, and spoofing by analyzing network traffic for abnormal patterns.Man-in-the-middle attacks occur when someone unauthorized intercepts the communication between a user and a server, allowing them to capture messages and inject false data.This can result in inaccurate information being transmitted and poses a risk to the security of the system.Conversely, data spoofing in an IoMT system occurs when the attacker can compromise the network by manipulating routing information.This can lead to security breaches and compromise the integrity of the entire system [3].
Various techniques have been proposed to develop an effective attack detection model using machine learning.The usage of machine learning algorithms enables these systems to adapt to new threats and enhance the security of IoMT systems while preserving patient privacy [3].The principal aim of this paper is to address particular cybersecurity threats, specifically those related to man-in-the-middle attacks.These attacks encompass spoofing and data injection which are much more prevalent today in IoT-based healthcare devices.The approach presented demonstrates better performance in contrast to the approach outlined in reference [4].The model designed comprises four key stages: pre-processing, dimensionality reduction, and detection.

1.
The first step, label encoding, involves converting categorical data into numerical values.By converting the required information into a numerical value, it can be more easily utilized in subsequent steps of the model; 2.
Following this, the data must be normalized using a standard scalar method to ensure that it is on a consistent scale.As IoMT data often contain multiple dimensions, PCA is used for dimensionality reduction and further helps to increase the performance.By reducing the dimensions, the computational complexity is also reduced, making the model more efficient; 3.
To analyze the variability in the training sets, a cross-validation technique K-fold mechanism is used.Using this technique, the accuracy and generalizability of the model also is improved; 4.
In the final stage, a multi-layer perceptron is used for the detection process on the pre-processed dataset.The algorithm is trained on the pre-processed data; 5.
The proposed strategy is then evaluated against the existing algorithms to demonstrate its effectiveness through a comparative analysis.Finally, the objective of our study is obtained when combining PCA with deep learning, which is effective in detecting potential security threats.
The proposed work is organized in the following way.An overview of "Related Works" on the topic is given in Section 2, while Section 3 outlines the "Materials and Methods", including the dataset used, pre-processing techniques, and the deep learning algorithm employed.The "Results and Discussions" of the research are presented and analyzed in Section 4.

Related Works
Using machine learning techniques for detecting intrusions involves leveraging automated algorithms to identify potential security breaches in a network or system.There have been a variety of approaches suggested for constructing health monitoring systems, with the following examples serving as illustrations of this trend.The authors in [4] have gathered real-time data from the Enhanced Healthcare Monitoring System (WUSTL-EHMS) dataset and made use of several machine learning models, including Support Vector Machine (SVM), Random Forest (RF), and k Nearest Neighbor (kNN) to analyze it.They have achieved an accuracy rate of 90.04% using Artificial Neural Networks (ANN).In a study referenced in [5], the authors describe their development of a network intrusion detection model that employs a tree classifier.The system is designed to decrease the dimensionality of input data in order to accelerate the detection of anomalies, with a reported accuracy of 94.23%.An attacker in IoMT may remotely alter the settings of a device, posing a risk to patients' lives.To mitigate such risks, a specification-based misbehavior detection system, known as SMDAps, was proposed in [6].This system monitors events in the artificial pancreas system (APS) to identify misbehaving components based on behavior rules.The performance of the SMDAps, as well as kNN and SVM, was evaluated and found to achieve AUROC values of 99.98%, 99.96%, and 99.95%, respectively.The authors [7] have presented an IDS named HEKA that monitors traffic on personal medical devices to identify attacks.HEKA is capable of detecting various attacks on personal medical devices with an F1-score and accuracy of 98% and 98.4%, respectively, using the SVM classifier.According to [8], the KDDCup-'99' dataset was used to develop an intrusion detection system that employed principal component analysis (PCA) for feature reduction and ensemble-based classifiers for predicting intrusion attacks on the networks.The system achieved an accuracy of 93.2% using the bagged decision trees of the bagging algorithm.The authors [9] proposed a hierarchical federated learning (HFL) named the Dew-Cloud-based model that features a hierarchical long-term memory (HLSTM) model.The HLSTM model can have a backend supported by cloud computing and can be deployed on distributed Dew servers.The proposed model utilized the NSL-KDD dataset and achieved a training accuracy of 99.31% with a training loss of 0.034.In [10], a fog-cloud architecture-based cyberattack detection framework with ensemble learning for IoMT networks is proposed.The framework uses Random Forest, Naive Bayes, and Decision Tree as first-level individual learners; for identifying normal and abnormal instances, XGBoost is used.The proposed model utilizes the ToN-IoT dataset and XGBoost to achieve a 99.98% detection rate and 96.35% accuracy.It also reduces the false alarm rate by up to 5.59%.The authors [11] proposed an Empirical Intelligent Agent (EIA) where a unique Swarm-Neural Network (Swarm-NN) method is used for identifying intruders in the edge-centric IoMT framework.An accuracy of 99.5% was achieved in the proposed Swarm-NN strategy on the ToNIoT dataset.The authors in [12] proposed a deep neural network (DNN) for intrusion detection systems (IDS) in the IoMT environment.To enhance its effectiveness and efficiency, the network parameters were pre-processed and optimized using a hybrid approach of Principal Component Analysis (PCA) and Grey Wolf optimizer (GWO) on the NSL-KDD dataset.Overall, 100% accuracy was achieved by the proposed model, which outperformed the existing machine learning approaches.
From all the related work analysis in Table 1, to address the issue of reducing the computational time, a comprehensive pre-processing methodology is performed to shrink the dimension of the dataset.Hence, the proposed system attempts to fulfill the above requirements as demonstrated in Figure 1.The proposed system as shown in Figure 1 contains the following: 1.
The dataset has been obtained from the St. Louis Enhanced Healthcare Monitoring System (WUSTL-EHMS); 2.
The given attributes are encoded using label encoding to transform all the categorical values into numerical data; 3.
After which normalization is performed, where the data values range between 0 and 1; 4.
K-Fold: A fivefold K-fold was applied to the dataset used for training to exhibit the performance variety among the folds; 5.
Furthermore, to reduce the dimensionality of the dataset, we have used the PCA algorithm on the intrusion data; 6.
Finally, the obtained dataset is trained with different machine learning models to analyze the best and most effective one to proceed with.

Data
The WUSTL-EHMS-2020 dataset was gathered from a testbed called the Enhanced Healthcare Monitoring System (EHMS), which provides real-time monitoring of patients' biometrics and network flow metrics [4].The WUSTL-EHMS-2020 dataset contains 16,318 data samples in total, where 14,272 of these samples are classified as normal, while 2046 attack instances.The dataset encompasses 44 distinct features and is further categorized into three groups: 35 features pertain to network flow metrics, 8 features capture patients' biometric data, and 1 feature serves as the label.The labeling strategy here is based on the source MAC address, where samples linked to the attacker's laptop MAC addresses become labeled as 1, while those without such addresses are labeled as 0. This dataset focuses mainly on man-in-the-middle attacks, particularly spoofing and data injection, which have been observed under an IoMT environment.This aligns well with the objective of the study.However, it does not include other conceivable attacks like DOS, probe, R2L, etc., which are also much more prevalent in IoT-based healthcare devices.

Label Encoding
The original dataset contains categorical data that cannot be considered for analysis; hence, label encoding is used to convert categorical label values to numerical values [13].

Normalization
Standard Scaler in python uses the sci-kit library and is mainly used to convert the data such that its standard deviation is 1 and its distributed mean value is 0 [14].

Dimensionality Reduction and Feature Selection
PCA is a common yet popular method used to decrease the number of variables in a dataset without losing significant information [15,16].The proposed approach employs PCA to reduce the dimensionality of the dataset, which originally contained 35 features of network flow metrics, and was reduced to 14 essential features while retaining 16,318 instances as seen in Figure 2.
The key steps involved in PCA are as follows: 1.The first step in PCA involves taking a complete dataset with d-dimensional samples and no class labels.This dataset is represented as a matrix of size p × q and becomes an N-dimensional vector after conversion with input data represented as Y 0 , Y 0,1 , and so on Calculate the mean vector of N dimensions using the following formula: Find the covariance matrix as shown below: Analyze and determine the eigenvectors and eigenvalues of the covariance matrix; 5.
Selection of feature vector formation and components; 6.
After calculating the eigenvalues and eigenvectors, the eigenvectors are sorted by matching eigenvalues in descending order.A subset of k eigenvectors is chosen from the sorted eigenvectors to create a matrix W with dimensions of d × k; 7.
Principal component formation: The matrix W, which consists of eigenvectors with a size of d × k, is used to perform a transformation on the samples.

Dimensionality Reduction and Feature Selection
PCA is a common yet popular method used to decrease the number of va dataset without losing significant information [15,16].The proposed approa PCA to reduce the dimensionality of the dataset, which originally contained of network flow metrics, and was reduced to 14 essential features while retai instances as seen in Figure 2. The key steps involved in PCA are as follows: 1.The first step in PCA involves taking a complete dataset with d-dimensio and no class labels.This dataset is represented as a matrix of size p x q a an N-dimensional vector after conversion with input data represented as so on 2. Calculate the mean vector of N dimensions using the following formula:

Cross Validation
The training dataset was subjected to a K-Fold cross-validation technique with 5 folds in order to determine the range of performances across different folds.

Multi-Layer Perceptron
Multi-Layer Perceptron (MLP) is an algorithm for deep learning that comprises interconnected layers of neurons, enabling it to learn complex patterns and solve various machine learning tasks effectively [17].To improve results, the study makes use of a fully linked feed-forward network that consists of an input layer, four hidden layers, and an output layer.The instances from the dataset are delivered to the input layer.Through numerous hidden layers, the neurons transport the examples from the input layer with weight and bias.ReLU and Sigmoid functions are used to activate the hidden layer and output layer, respectively.

Classification Models
The performance of the suggested dimensionality reduction model is then compared to that of a number of well-known classifiers, including Multi-Layer Perceptron (MLP), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Naive Bayes (NV) [18,19].Metrics like accuracy, specificity, and sensitivity are employed to assess the effectiveness of the various classifiers.

K-Nearest Neighbor (KNN)
The classification method used here makes decisions based on a majority vote from the nearest neighbors of a test instance.It assigns the sample to the class that is most common among these k-nearest neighbors.As for the hyperparameters, a value of 10 neighbors was found to be the most effective within a range of 1 to 20.
The findings in Figure 3 show that the accuracy, sensitivity, and specificity of the KNN classifier are 92.3%, 91%, and 96.7%, respectively, without the need for dimensionality reduction.The findings also demonstrate an improvement in performance with 94.39%, 92.5%, and 96.9% when classification is performed using the same KNN classifier after the dimension is reduced by PCA as dimensionality reduction [20].Although KNN is beneficial in identifying anomalies, it is a computationally intensive process to handle large volumes of data.

Classification Models
The performance of the suggested dimensionality reduction model is then to that of a number of well-known classifiers, including Multi-Layer Perceptro Nearest Neighbor (KNN), Support Vector Machine (SVM), and Naive Bayes (N Metrics like accuracy, specificity, and sensitivity are employed to assess the e of the various classifiers.

K-Nearest Neighbor (KNN)
The classification method used here makes decisions based on a majorit the nearest neighbors of a test instance.It assigns the sample to the class that is mon among these k-nearest neighbors.As for the hyperparameters, a value o bors was found to be the most effective within a range of 1 to 20.
The findings in Figure 3 show that the accuracy, sensitivity, and speci KNN classifier are 92.3%, 91%, and 96.7%, respectively, without the need for ality reduction.The findings also demonstrate an improvement in perform 94.39%, 92.5%, and 96.9% when classification is performed using the same KN after the dimension is reduced by PCA as dimensionality reduction [20].Alth is beneficial in identifying anomalies, it is a computationally intensive proces large volumes of data.

Gaussian Naïve Bayes (GNB)
The classification method employed here is the Gaussian-based Naïve rithm.In GNB, the hyperparameter 'priors' (prior probabilities for each class) based on the data, estimating them from the training data.These 'prior probab resent the likelihood of various types of network intrusions or attacks occur analyzing any specific network traffic data.According to the results, the NB accuracy, sensitivity, and specificity in Figure 4

Gaussian Naïve Bayes (GNB)
The classification method employed here is the Gaussian-based Naïve Bayes algorithm.In GNB, the hyperparameter 'priors' (prior probabilities for each class) is assumed based on the data, estimating them from the training data.These 'prior probabilities' represent the likelihood of various types of network intrusions or attacks occurring before analyzing any specific network traffic data.According to the results, the NB classifier's accuracy, sensitivity, and specificity in Figure 4   Although GNB is computationally efficient and only requires minimal r the data significantly deviate from the assumption, GNB might not perform w

Support Vector Machine (SVM)
The method used here is a linear SVM-based kernel function.This meth classification by creating a hyperplane to distinguish the attack instances.
The findings in Figure 5 show that the sensitivity, accuracy, and speci SVM classifier are 81.2%,93.9%, and 96.6%, respectively, without the need for ality reduction.The findings demonstrate an improvement in performance w 86%, and 98% when the dimension is reduced by PCA as dimensionality red the same SVM classifier are used for classification.Choosing SVM depends on data size and linearity; however, the challeng carefully select the kernel function in the case of larger volumes of data.

Multi-Layer Perceptron (MLP)
The experimental MLP neural network consists of an input layer, four hi of size (50, 50, 50, 50), and an output layer.For the hidden layers, it uses ReLU  Although GNB is computationally efficient and only requires minimal resources, if the data significantly deviate from the assumption, GNB might not perform well.

Support Vector Machine (SVM)
The method used here is a linear SVM-based kernel function.This method achieves classification by creating a hyperplane to distinguish the attack instances.
The findings in Figure 5 show that the sensitivity, accuracy, and specificity of the SVM classifier are 81.2%,93.9%, and 96.6%, respectively, without the need for dimensionality reduction.The findings demonstrate an improvement in performance with 94.45%, 86%, and 98% when the dimension is reduced by PCA as dimensionality reduction and the same SVM classifier are used for classification.
demonstrate an improvement in performance with 86.3.0%,78.8%, and 98.5% dimension is reduced by PCA and the same NB classifier is used for classifica Although GNB is computationally efficient and only requires minimal r the data significantly deviate from the assumption, GNB might not perform w 4.1.3.Support Vector Machine (SVM) The method used here is a linear SVM-based kernel function.This meth classification by creating a hyperplane to distinguish the attack instances.
The findings in Figure 5 show that the sensitivity, accuracy, and speci SVM classifier are 81.2%,93.9%, and 96.6%, respectively, without the need for ality reduction.The findings demonstrate an improvement in performance w 86%, and 98% when the dimension is reduced by PCA as dimensionality red the same SVM classifier are used for classification.Choosing SVM depends on data size and linearity; however, the challeng carefully select the kernel function in the case of larger volumes of data.

Multi-Layer Perceptron (MLP)
The experimental MLP neural network consists of an input layer, four hi of size (50, 50, 50, 50), and an output layer.For the hidden layers, it uses ReLU while for the output layer, it uses Sigmoid activation.The learning rate is through Grid SearchCV, while the ADAM optimizer is used for optimization.obtained with a batch size of 64 over 20 epochs.The performance results of th  Choosing SVM depends on data size and linearity; however, the challenge here is to carefully select the kernel function in the case of larger volumes of data.

Multi-Layer Perceptron (MLP)
The experimental MLP neural network consists of an input layer, four hidden layers of size (50, 50, 50, 50), and an output layer.For the hidden layers, it uses ReLU activation, while for the output layer, it uses Sigmoid activation.The learning rate is determined through Grid SearchCV, while the ADAM optimizer is used for optimization.Results are obtained with a batch size of 64 over 20 epochs.The performance results of the MLP classifier have been depicted in Figure 6, showing accuracy, sensitivity, and specificity of 94.7%, 93.3%, and 98.3% when utilized without any dimensionality reduction step.The findings demonstrate an improvement in performance with 96.39%, 95.4%, and 100% for accuracy, sensitivity, and specificity using the same classifier with PCA, respectively.
Eng. Proc.2023, 59, x FOR PEER REVIEW 94.7%, 93.3%, and 98.3% when utilized without any dimensionality reductio findings demonstrate an improvement in performance with 96.39%, 95.4%, an accuracy, sensitivity, and specificity using the same classifier with PCA, respe The study shows that using Principal Component Analysis (PCA) to sel for categorizing network attacks is effective in obtaining better performance.also highlight that MLP (Multi-Layer Perceptron) is the best performer and is effective in detecting attacks in the IoMT environment when compared to oth learning techniques.
The training time for the proposed deep learning-based model has been because the sample space size has been reduced in dimension when compar tional methods.This reduction in training time is depicted in Figure 7.

Conclusions
The network data enhanced healthcare monitoring system (WUSTL-E Washington University in St. Louis is used in this study to present a PCAlearning model for identifying the assaults by an adversary.The suggested ap function in an IoMT network where smart devices connect to it.The propos The study shows that using Principal Component Analysis (PCA) to select features for categorizing network attacks is effective in obtaining better performance.The results also highlight that MLP (Multi-Layer Perceptron) is the best performer and is particularly effective in detecting attacks in the IoMT environment when compared to other machine learning techniques.
The training time for the proposed deep learning-based model has been shortened because the sample space size has been reduced in dimension when compared to traditional methods.This reduction in training time is depicted in Figure 7.
Eng. Proc.2023, 59, x FOR PEER REVIEW 94.7%, 93.3%, and 98.3% when utilized without any dimensionality reduction s findings demonstrate an improvement in performance with 96.39%, 95.4%, and 1 accuracy, sensitivity, and specificity using the same classifier with PCA, respectiv The study shows that using Principal Component Analysis (PCA) to select for categorizing network attacks is effective in obtaining better performance.Th also highlight that MLP (Multi-Layer Perceptron) is the best performer and is par effective in detecting attacks in the IoMT environment when compared to other m learning techniques.
The training time for the proposed deep learning-based model has been sh because the sample space size has been reduced in dimension when compared tional methods.This reduction in training time is depicted in Figure 7.

Conclusions
The network data enhanced healthcare monitoring system (WUSTL-EHM Washington University in St. Louis is used in this study to present a PCA-bas learning model for identifying the assaults by an adversary.The suggested appro function in an IoMT network where smart devices connect to it.The proposed can effectively manage massive amounts of produced data.Principal Componen sis (PCA) was employed in order to minimize the dataset attributes and pick out most important elements.After reducing the dataset using PCA, a number of class

Conclusions
The network data enhanced healthcare monitoring system (WUSTL-EHMS) from Washington University in St. Louis is used in this study to present a PCA-based deep learning model for identifying the assaults by an adversary.The suggested approach will function in an IoMT network where smart devices connect to it.The proposed strategy Eng.Proc.2023, 59, 139 9 of 10 can effectively manage massive amounts of produced data.Principal Component Analysis (PCA) was employed in order to minimize the dataset attributes and pick out only the most important elements.After reducing the dataset using PCA, a number of classification methods such as Support Vector Machine, K-Nearest Neighbor, Naïve Bayes, and Multi-Layer Perceptron are used.Among them, deep learning-based models exhibit the best performance, achieving an accuracy of 96.39%.Overall, our findings suggest that the intrusion detection systems' performances are greatly improved when feature selection methods like PCA are applied.This method is considered suitable for the IoMT environment as it can alert healthcare officials quicker when an intrusion occurs, leading to improved efficiency in the healthcare industry.This approach helps healthcare providers to deliver better services to patients without having to worry about any network intrusions.

10 Figure 1 .
Figure 1.Framework of the proposed model.The proposed system as shown in Figure 1 contains the following: 1.The dataset has been obtained from the St. Louis Enhanced Healthcare Monitoring System (WUSTL-EHMS); 2. The given attributes are encoded using label encoding to transform all the categorical values into numerical data;

Figure 1 .
Figure 1.Framework of the proposed model.

Figure 2 .
Figure 2. Reduction in features after PCA.

Figure 5 .
Figure 5. Performance evaluation of the SVM classifier.

Figure 5 .
Figure 5. Performance evaluation of the SVM classifier.

Figure 5 .
Figure 5. Performance evaluation of the SVM classifier.

Table 1 .
Comparative study of the previous models.
are 80.30%, 73.5%, and 96.1%.T are 80.30%, 73.5%, and 96.1%.The findings demonstrate an improvement in performance with 86.3.0%,78.8%, and 98.5% when the dimension is reduced by PCA and the same NB classifier is used for classification.Eng.Proc.2023, 59, x FOR PEER REVIEW demonstrate an improvement in performance with 86.3.0%,78.8%, and 98.5% dimension is reduced by PCA and the same NB classifier is used for classifica