Security of Things Intrusion Detection System for Smart Healthcare

: Web security plays a very crucial role in the Security of Things (SoT) paradigm for smart healthcare and will continue to be impactful in medical infrastructures in the near future. This paper addressed a key component of security-intrusion detection systems due to the number of web security attacks, which have increased dramatically in recent years in healthcare, as well as the privacy issues. Various intrusion-detection systems have been proposed in different works to detect cyber threats in smart healthcare and to identify network-based attacks and privacy violations. This study was carried out as a result of the limitations of the intrusion detection systems in responding to attacks and challenges and in implementing privacy control and attacks in the smart healthcare industry. The research proposed a machine learning support system that combined a Random Forest (RF) and a genetic algorithm: a feature optimization method that built new intrusion detection systems with a high detection rate and a more accurate false alarm rate. To optimize the functionality of our approach, a weighted genetic algorithm and RF were combined to generate the best subset of functionality that achieved a high detection rate and a low false alarm rate. This study used the NSL-KDD dataset to simultaneously classify RF, Naive Bayes (NB) and logistic regression classiﬁers for machine learning. The results conﬁrmed the importance of optimizing functionality, which gave better results in terms of the false alarm rate, precision, detection rate, recall and F1 metrics. The combination of our genetic algorithm and RF models achieved a detection rate of 98.81% and a false alarm rate of 0.8%. This research raised awareness of privacy and authentication in the smart healthcare domain, wireless communications and privacy control and developed the necessary intelligent and efﬁcient web system. Furthermore, the proposed algorithm was applied to examine the F1-score and precisionperformance as compared to the NSL-KDD and CSE-CIC-IDS2018 datasets using different scaling factors. The results showed that the proposed GA was greatly optimized, for which the average precision was optimized by 5.65% and the average F1-score by 8.2%.


Introduction
The origin of smart healthcare can be traced back to the idea of smart Earth projected by the International Business Machines Corporation (IBM) in 2009, which made reference to the unified utilization of artificial intelligence, big data, cloud computing and, mostly, the Internet of Things (IoT) to develop an interactive framework for distributing medical-related data and allowing communication among medical equipment (such as smart devices, applications and local networks), medical staff/institutions and patients (https://www.ibm.com/midmarket/us/en/article_Industries_1209.html (accessed on 12 January 2021)). Some of the most promising technologies that enable the implementation of smart healthcare systems include Wireless Sensor Networks (WSNs) and Ultra-High-Frequency (UHF) Radio Frequency Identification (RFID). However, since the inception of the smart healthcare concept, securing smart devices and confidential hospital/patient data has been one of the major challenges facing the industry. On the other hand, Machine Learning (ML), which is closely associated with computational statistics and one major aspect of the Security of Things (SoTs), has been presented to several applications to cybersecurity for the analysis of hybrid networks, comprised of both anomaly detection and the detection of data mismanagement [1]. The ML technique is apparently becoming the most promising approach to deal with security glitches and several hidden (otherwise known as zero-day) attacks in healthcare systems [2]. The technique can recognise attacks merely by monitoring the modifications of data or simply by detecting alterations in the features of the network's traffic [3]. Even though ML may not be appropriate for issues that necessitate a prescribed descriptive solution, the technique can realize vigorous outcomes in problems that are ambiguous for human formalization. In a smart healthcare system (just as other smart systems), the performance of most Internet-based security frameworks is in line with compiling a list of malicious features to wedge them. Nevertheless, intruders repeatedly utilize ingenuity in refining and altering their procedures, which makes it extremely difficult to forecast their bad features being injected into the security black-list [4]. One little change of the security protocol can permit an intruder to inject unwanted packets or gain access to confidential information unnoticed. These undesirable frameworks relating to all hypothetically destructive packets and incessantly apprising the ruleset are unrealistic and enormously capital intensive [5]. Therefore, at the moment, the ML technique can play a substantial part in understanding the good packets, thus producing a prototype of them in a manner such that data packets that do not match them are measured as glitches, which are probably attacks or intrusions [6]. Thus, the performance of ML is thriving in the fields of data clustering and classification, which are both key in data security applications. A survey carried out on the latest intrusion detection system used in IoT models showed how improving ML's efficiency and reliability is on important task [7].
The Intrusion Detection System (IDS) is one of the most popular SoT paradigms for detecting anomaly or intrusion in live network traffic [8]. Several scholars have highlighted different issues relating to the Network Intrusion Detection System (NIDS) in the past due to its increasing importance in the current era of intelligent cyber-attacks. Recently, various Machine-Learning (ML) and feature-selection methods were widely implemented to develop the NIDS. One of the first attempts to achieve the detection rate and false alarm rate was performed using the 1998 DARPA dataset in the study of [9]. The algorithm utilized Principal Component Analysis (PCA) to select 22 features and neural networks for classification. Although PCA provides an optimal feature set, it compromises the training efficiency [10]. Some experiments employed multiple techniques for feature selection. Hee-su et al. [11] utilized four feature-selection techniques. The placement of the IDS is also a significant challenge. Many organizations implement or purchase the IDS, but they do not know where the IDS should be placed and where it can detect anomalies with perfection. According to Abhijit Sarmah [12], the IDS's success is dependent on how it is deployed. Furthermore, great effort is required to design and implement perfect IDS and implement a new state-of-the-art network-based IDS dataset. Other issues are the lack of efficient detection algorithms that have a very high Detection Rate (DR) and low False Alarm Rate (FAR).
The primary motivation of this paper was centred on designing a feature optimization method and developing a primal IDS with a high DR and more accurate FAR system.
The main objective of this research was that in network traffic, when we inspect a network packet to determine whether it is normal or an anomaly, it takes a huge amount of time to examine a significant number of attributes. Therefore, through optimized attribute selection, we identified and proposed only those features that play an essential role in an intrusion detection system and ignored all other irrelevant features. This has the practical implication of using fewer system resources with an increased DR and a reduced FAR.
The contribution of this study can be summarized as follows: 1.
In this research, after performing preprocessing on the NSL-KDD and CSE-CIC-IDS2018 datasets, all the features of both datasets were passed into the genetic algorithm fitness function. The Random Forest (RF) entropy function was utilized in the genetic algorithm fitness function to calculate the fitness values of the features. After the selection of optimal features, these features were again passed to an RF classifier to predict the network attacks. The newly proposed approach had the ability to use RF inside the genetic algorithm. Our main goal was to find the optimal feature subset using the genetic algorithm, and for this, we proposed a new fitness function, which was based on RF; 2.
This study also proposed the following weights for the genetic algorithm to obtain the optimal features from the NSL-KDD and CSE-CIC-IDS2018 datasets used in wireless communications systems. The parameters we updated were: SPX-crossover, crossover probability 0.7 and random-init. These were implemented as an initialization operator for the genetic algorithm in this study. Moreover, bit-flip was employed as mutation operator with a mutation probability rate of 0.5 and a population size of 200. The generational parameter was used as the replacement operator; the report frequency was set at 200; and the selection operator was the tournament selection, which was employed in this research; 3.
The experimental results upheld the importance of feature optimization, which yielded better results considering the precision, recall, F1-measure, DR and FAR; 4.
The study of J. Ren et al achieved a 92.8% accuracy and a 33% false alarm rate with the DO-IDS method [13]. Their findings also indicated an 86.5% accuracy and a 12.4% false alarm rate using the traditional genetic algorithm and RF classifier. However, our proposed model, which was a combination of the genetic algorithm and RF (GA-RF), outperformed these results at 98.81% DR and 0.8% FAR, respectively; 5.
The results showed that the proposed GA was greatly optimized, in which the average precision was optimized by 5.65%, and the average F1-score was optimized by 8.2%.

Research Organization
The remaining part of this research is structured as follows: In Section 2, several related literature works are collected, reviewed and analysed in comparison to the current study. The proposed models coupled with the selected datasets are analysed and established in Section 3. Several numerical experiments to prove the performance of the proposed model are performed and discussed in Section 4. The results of the performances are also extensively analysed in this section. Finally, conclusions and findings are outlined in Section 5.

Literature Review
Several scholars highlighted the NIDS's issues in the past due to its increased importance in the current era of intelligent cyber-attacks. A mobile agent-based IDS was designed to secure the network of interconnected medical gadgets [14]. Numerous solutions regarding privacy, security, authorization and authentication and making use of blockchain technology for secure data sharing in smart healthcare have been discussed [15]. Smart health solutions have great market potential considering that the expenditure made on them by the EU makes up about 10% of GDP [16]. In the past, various ML methods such as nearest neighbour, RF, logistic regression, Support Vector Machines (SVMs), and k-means were widely used to develop the NIDS. Manjula C. Belavagi and Balachandra Muniyal [17], in the measurement of the accuracy of different models on the NSL-KDD dataset, used a supervised learning approach.
They used SVM, RF logistic regression and Gaussian Naive Bayes (NB). An accuracy of 84% was achieved using logistic regression, an accuracy of 79% using Gaussian NB and an accuracy of 75% and 99% by the SVM and RF classifiers, respectively.
Almeida [18] compiled and made available the NSL-KDD dataset to be used for future research in the intrusion detection domain. Chen [19] combined deep learning and the IDS model and proposed a recurrent neural network model for the NIDS. Eduardo DelaHoz et al. [20] detected anomalies using the statistical and self-organizing maps methods. Optimal features were selected using PCA and Fisher's discriminant ratio method. After removing noise and using probabilistic self-organizing maps, the classification was performed by classifying normal or anomalous traffic. S.K. Wagh [21] proposed various machine learning techniques for the NIDS. The NIDS's performance of C.Qiu [22] was improved by using various feature extraction techniques. Taher et al. [23] used a two-feature selection technique for feature selection. One was the chi-squared, and the other one was correlation based. Seventeen features were selected using the correlation technique, and thirty-five features were selected using the chi-square method. After optimal feature selection, SVM and ANN were applied to the selected optimal feature subset for classification. The overall accuracy with SVM was 82.34%, and with Artificial Neural Networks (ANNs), they achieved a 94.02% classification accuracy. The detection accuracy with SVM and ANN using the correlation method was 81.78% and 94.02%, respectively. With chi-squared using SVM and ANN, they achieved an accuracy 82.34% and 83.68%, respectively.
Mukherjee et al. [24] used the Feature Vitality-Based Reduction Method (FVBRM) for feature selection from the NSL-KDD dataset, and twenty-four optimal features were selected from the dataset out of 41 features. The NB classifier was used for classification with an accuracy of 97.78%. However, they did not perform extensive results to find the precision, recall, F1 scores, DR and FAR. Although NB produces good results with a structured dataset, it has strong feature independence. The NB classifier considers that all the attributes in the dataset are not correlated with each other. The feature removal technique was used by Yinhui and Jingbo [25] for optimal feature selection. Different feature subsets were selected, and the experiment was performed using the KDD99 dataset. For classification, they used SVM. They achieved a 98.62% classification accuracy. They did not find the confusion matrix and other results in the form of the DR and FAR. They also failed to mention the computational cost of the proposed method.
The Cuttlefish Optimization Algorithm (CFA) was used to select an optimal feature subset by Adel Sabry Eesa [26]. The KDD99 dataset was used to perform all the experiments. The detection rate and false positive rate were used as the evaluation measures, and they achieved a DR of 91% and an FPR of 3.91% with five features. Kumar et al. [27] used three hybrid feature selection models, CFS + best-first, gain ratio + ranker and info-gain + ranker. They used a modified version of the NB classier, which had fewer feature independence assumptions. They selected fifteen features from the dataset using three types of hybrid feature selection techniques, and they classified the data using a modified version of the NB classifier. In their research, they achieved a good accuracy of around 94% to 98% for different attacks, but they did not mention false alarms and the detection rate in their research. The PSO algorithm was used by Syarif et al. [28] to select optimal features from the KDDCUP99 dataset after the selection of the 25 best subsets of features, The K-nearest neighbour classifier was used for detection. The ant colony optimization technique was used for optimal feature selection. Different features of the subsets were selected. For the normal class, five features were selected, and for the DoS attack, four features were selected. Four and three features were selected for U2R and R2L, respectively. Eight optimal features were selected for the probe attack. Two datasets, KDD99 and NSL-KDD, were used in the research. An accuracy of 98.80% and a 2.59% false-positive rate were achieved. Accuracies were reported for DoS of 99.78%, U2R of 93.51%, R2L of 99.17%, probe of 74.65% and the normal class of 97.41%, respectively [29]. Likewise, many early studies reported the results of several traditional ML approaches. The main objective of this research was that in network traffic, when we inspected a network packet to determine it to be normal or an anomaly, it took much time to inspect a significant number of attributes. Therefore, through optimized attribute selection, we identified and proposed only those features that played an important role in the intrusion detection system and ignored all irrelevant features. A survey of some of these features and classification is shown in Table 1. This had the practical implication of using fewer system resources with an increased DR and a reduced FAR.

Proposed Methodology
This section outlines the methodology used in realizing our proposed optimization method. The workflow of the proposed scheme is shown in Figure 1. For feature optimization, a weighted genetic algorithm and RF were integrated to produce an optimal subset of features that could be used to achieve a high detection rate and a low false alarm rate. Classification was performed on RF, NB and logistic regression. The NSL-KDD dataset was used in this research, which is a benchmark dataset for IDS, mostly used in recent studies. This research had 6 phases. Phase 1 was the data collection. Phase 2 was the data pre-processing. Missing and duplicates values were replaced and removed, respectively. Outliers were also checked. To bring a dataset down to one standard scale, data normalization was performed using min-max normalization. Data encoding was also performed. Phase 3 was a feature optimization step where the weighted genetic algorithm was used with RF. Phase 4 was the splitting of the data into two sets for training and testing. In Phase 5, different machine learning classifiers were utilized for classification, which included RF, NB and logistic regression. Phase 6 was a comparison of our proposed model with existing research, as shown in Figure 1.

NSL-KDD Dataset
Intrusion detection systems are those systems that can detect abnormal traffic over the Internet and are trained on information from Internet traffic records. The most famous dataset for the IDS is NSL-KDD (https://www.unb.ca/cic/datasets/vpn.html (accessed 12 January 2021 )), and it is used as a benchmark for modern-day network traffic. This dataset contains 43 attributes per file, and out of the 43 attributes, forty-one features are on traffic input, while two characteristics indicate whether the packet is standard or malicious. The malicious class has 4 different types of attacks in it. They include probe, Remote to Local (R2L), Denial-of-Service (DoS) and User to Root (U2R). In a DoS attack, the hacker tries to stop the flow of network traffic to and from the target system. A considerable amount of abnormal packets are sent towards the victim system, and the IDS is flooded with these malicious packets. As a result, the system fails to handle the vast amounts of packets and shut downs to protect itself. When the system shuts down, normal traffic is also disturbed, and a user cannot access anything over the network In the probe attack, all the information is gathered from the web. The purpose of a probe attack is to act as a thief and collect useful information about a person or bank, then launch the attack. In the U2R attack, the attacker gains access as a normal user and then tries to access the system or network as an administrator. The attacker takes advantage of vulnerabilities inside the system to gain root access. In the R2L type of attack, the attacker tries to gain local access to a remote machine. The attacker does not have the right to access the local system/network, so he/she decides to hack the system to enter the system and perform the desired operations. The total size of the dataset is 148,517 with 43 features. There are 4 different types of features inside the dataset, and they are categorized as below: There are 3 possible values for the protocol type, 60 possible values for service and 11 possible values for the flag. In this research, different pre-processing methods were applied to the NSL-KDD dataset to clean the dataset and obtain optimal results. For this purpose, missing values were checked whether they existed or not. Then, features were brought into one order (features were normalized using the min-max technique. Outliers were also tested in this study. The last step of pre-processing in this research was feature encoding to change the categorical variables in the continuous process. Next was the optimal feature selection using the genetic algorithm and then passing the optimal feature set to the machine learning model to obtain the results. The dataset was divided into a 70:30 training and testing ratio.

Data Normalization (Min-Max Method)
Min-max normalization is the most popular and state-of-the-art method to scale down the features between 0 and 1. For every attribute, minimum and maximum values are transformed between 0 and 1, respectively. The min-max equation is given below in Equation (1): Z i and X = (x 1 , . . . . . . , x n ) were now the normalized data, while the min and max were the range that we wanted to set for scaling (M) in this research: min was 0, and max was 1.

Optimal Feature Selection Using Evolutionary Search
Genetic Algorithms (GAs) are part of evolutionary computation, a field of artificial intelligence that is rapidly growing. Genetic algorithms are optimization and searching algorithms based on the principle of natural selection and genetics [40]. A chromosome population in GAs shows candidate solutions to the problem. Every chromosome has fixed-length bits. The primary chromosome population is formed by randomly distributing 1s and 0s. Values between 0 and 1 are assigned to different distributions. Chromosomes in this encoding scheme are bits of a string (1 s and 0 s) whose length is determined by the number of principal components in the main space. Each chromosome suggests a solution for the candidate or a subset of significant elements. The population grows by using genetic operators to look for the optimal solution. The flow of GA for feature selection is shown in Figure 2. In the above genetic algorithm (Algorithm 1) , we calculated the fitness function by using RF conditional entropy, as shown in the equation: where Ga(x)/S is the relationship among the number of entities in the dataset, vector a has the value x and H(Ga(x)) is the entities' group entropy where the variable has the value x. The other parameters we updated were: SPX-crossover, crossover probability 0.7 and random-init, used as an initialization operator for the genetic algorithm in this study. Moreover, bit-flip was used as the mutation operator; the mutation probability was 0.5; the population size was 200; generational was used as the replacement operator; the report frequency was 200; and the selection operator was tournament selection.
Algorithm 1: Pseudocode of the genetic algorithm. Begin Set parameters Generation of initial population while i <Max iteration and optimal fitness <max fitness do Fitness calculation using RF entropy function Selection Crossover Mutation End while loop end Return the optimal features subset End

Machine Learning Algorithms
The advantages of machine learning algorithms are numerous. They have been used by several researchers in several domains such as healthcare, banking, finances, the stock market, intrusion detection, etc. Several scholars highlighted NIDS's issues in the past due to its increased importance in the current era of intelligent cyber-attacks. In the past, various ML methods such as nearest neighbour, RF, logistic regression, support SVM and k-means were widely used to develop the NIDS. In this study, the following machine learning classifiers were used.

•
Random forest: Multiple decision trees are combined to build an RF classifier. The purpose of the RF classier is that it assembles numerous decision trees to give more meaningful and precise results. For regression, it calculates the mean of every decision tree and assigns the mean value to the predicted variable. For classification cases, the RF employs a majority vote approach. For example, if 3 trees predicated yes and 2 trees predicated no, then it will assign yes to the predicated variable. Entropy and information gain are used for deciding the root or parent node of the tree [41] and give the following known equations.
Entropy:H(x 1 , x 2 , . . . . . . , where x 1 , x 2 , . . . , x n represents the probabilities of the class labels: • Logistic regression: Another popular and well-known machine learning algorithm used for classification is logistic regression. Typically, logistic regression produces more dichotomous results. The working methodology of logistic regression is that it finds a correlation among final output values and also provides the characteristics. In logistic regression, the log odds function is used for prediction. • Naive Bayes: NB is a supervised machine learning technique. NB is a probability machine learning model used for classification. In this research, we used a traditional NB classifier for the predication of various attacks.

Evaluation Metrics
Various performance metrics as shown from the equations below can be applied to evaluate the proposed solution, including the accuracy, recall, F1-measurement, false alarm rate (FAR) and detection rate (DR). The above performance measurements were based on True Positives (TPs), False Positives (FPs), False Negatives (FNs), and True Negatives (TNs). The FAR is a combination of the total instances that are normal, but classified as the attack class.

Random Forest Attack Experiment Results
The red curve represents the training and testing scores, while the green curve represents the cross-validation scores. Figure 3 depicts the training learning curve for the RF, while Figure 4 represents the testing curve for the RF. From Figure 3, we can see that the training rate for RF classifiers was 99.5%, and the validation score for RF was 98.75%. Figure 4 shows that the testing rate for the RF classifiers was approximately 99%, while the validation score for testing was 98.5%.   Figure 5 represents the confusion matrix for RF. The prediction was made on five classes. Four classes contained attacks, and one class was normal. From Figure 6, we can see that the detection rate for the DoS attack was 99.7%, which means that out of 16,039 DoS attacks. there were 15,984 attacks detected correctly by our proposed model, which are quite high numbers. Ninety-nine percent of probe attacks were identified correctly; a total of 4209 packets had probe attacks in them, and four-thousand one-hundred sixty-six packets were correctly detected as probe attacks. Similarly, for the R2L and U2R attacks, out of 1192 packets, one-thousand twenty-six packets were detected correctly for the R2L attack, and eight packets were identified correctly out of 28 for the U2R attack. Ninety-six-point-eight percent of packets were detected correctly as normal packets that had no anomaly. Out of 23,087 packets, twenty-two-thousand eight-hundred and four packets were identified correctly with a percentage of 98.8%. The reason for the low detection for U2R was because it had a very low amount of packets in it, and it was mostly machine learning that performed very well when it had more data for training. It could be tested on new data when we compared DoS and U2R, and we can see that DoS had 16,039 packets, while U2R only had 28 packets. That was the reason for its low performance.   Figure 6 represents the classification report for various attacks mentioned in this research. For normal packets, precision was 98.8%; for U2R and R2L, the precision scores were 64.30% and 86.50%, respectively. The probe attack precision score was 98.70%, and the DoS attack had the highest precision score of 99.6%. Recall and the F1-measure for the normal packets were 98.90% and 98.80%, respectively. Recall for both U2R and R2L were 32.1% and 85.30%, respectively. The F1-measure for U2R was 42.90% and for R2L was 85.90%, respectively. The probe attack had a 98.90% recall and a 98.80% F1-measure. The DoS attack had a 99.6% recall and a 99.6% F1-measure, respectively.

Naive Bayes Classifier Experimental Results
The training curve for NB is represented in Figure 7. The training score for NB started from 83.31% and went to 83.45%, while the cross-validation score started from 83.41% for NB and went to 83.44%, respectively. In Figure 8, the red line represents the testing, while the green line represents the cross-validation score for the NB algorithm. The testing score of NB was 83.45%, and the validation score of NB was 83.44%.  Figure 9 represents the confusion matrix for NB. From Figure 9, we can see that the detection rate for the DoS attack was 86.3%, which means that out of 16,039 DoS attacks, there were 13,849 attacks detected correctly. Fifty-seven-point-one probe attacks were detected correctly using NB classifiers. A total of 4209 packets had probe attacks in them. Two-thousand four-hundred and eight packets were correctly identified as probe attacks. Similarly, for the R2L and U2R attacks out of 1192 packets, no packet was detected successfully for the R2L attack, and fourteen packets were detected correctly out of 28 packets for the U2R attack. Ninety-point-seven percent of packets were identified correctly as normal packets with no anomaly in them, which means that 20,933 out of 23,087 packets were detected successfully with a 90.7% accuracy for the normal class. NB achieved very low per-class accuracy for R2L and U2R, and the reason for this was that both had low data for the training set and testing set compared to the DoS, probe, and normal classes. These affected the testing accuracy for R2L and U2R.  Figure 10 represents the classification report for NB. For normal packets, the precision was 82.20%. Similarly, for U2R and R2L, the precision scores were 32.60% and 0%, respectively. The probe attack precision score was 62.40%, and the DoS attack had a precision score of 91.10%. The recall and F1-measure for the normal packet were 90.70% and 86.20%, respectively. Recall for both U2R and R2L were 50.00% and 0%, respectively. The F1-measure for U2R was 39.40% and for R2L was 0.0%, respectively. The probe attack had a 57.20% recall and a 59.70% F1-measure. The DoS attack had an 86.30% recall and an 88.70% F1-measure, respectively.

Logistic Regression Classifier Experiment Results for Multiple Attacks
In Figures 11 and 12, red lines represent the training and testing scores for logistic regression, while the green curve represents the cross-validation scores. Figure 11 depicts the training learning curve for logistic regression, while Figure 12 represents the testing curve for logistic regression. From Figure 11, we can see that the training rate for the logistic regression classifiers was 88.70%, and the validation score for logistic regression was also 88.70%. Figure 12 shows that the testing rate for the logistic regression classifier was approximately 88.47%, while the validation score for testing was 88.42%, respectively.  Figure 13 represents the confusion matrix for logistic regression. From Figure 13, we can see that the detection rate for the DoS attack was 90.10%, which means that out of 16,039 DoS attacks, there were 14,456 attacks detected correctly. Seventy-seven-point-six percent of probe attacks were identified correctly; a total of 4209 packets had probe attacks in them, and three-thousand two-hundred sixty-five packets were correctly identified as probe attacks. Similarly, for R2L attacks, out of 1192 packets, two-hundred and one packets were detected successfully. Ninety-three-point-seven percent of packets were detected correctly as normal packets with no anomaly in them. There were 21,622 packets detected correctly out of 23,087, which was a high ratio for detection.   Figure 14 represents the classification report for logistic regression. For normal packets, the precision was 87.5%, and for U2R and R2L, the precision scores were 33.3% and 64.2%, respectively. The probe attack precision score was 77.4%, and the DoS attack had a precision score of 94.5%. The recall and F1-measure for the normal packet were 93.7% and 90.5%, respectively. The recall for both U2R and R2L were 7.10% and 16.9%, respectively. The F1-measure for U2R was 11.80% and for R2L was 26.70%, respectively. The probe attack had a 77.60% recall and a 77.50% F1-measure. The DoS attack had a 90.10% recall and a 92.30% F1-measure, respectively. We show below in Tables 2-4 the average detection rate and the training and testing accuracies of the different proposed models.    Figure 15, we can see that the training rate for the RF classifiers was99.70%, and the validation score for RF was 99.00%. Figure 16 shows that the testing rate for the RF classifiers was approximately 99.75%, while the validation score for testing was 99.00%, respectively.  From Figure 17, we can see that 99.10% of attacks were detected correctly; out of 21,468 packets, there were 21,274 packets identified correctly as anomaly packets. There were 194 packets detected as normal packets, but in reality, they were anomaly packets. Similarly, there were 256 packets detected as anomaly packets, but they were normal packets. Ninety-eight-point-nine percent of packets were correctly identified as normal packets. Out of 23,087 normal packets, there were 22,831 packets detected successfully. Figure 17. Random forest classification report. Figure 18 represents the RF classification report for the binary class. For the normal class, the precision was 99.10%, recall 99.10%, and F1-measure also 99.10%, respectively. The precision, recall, and F1-measure scores for the attack class were 99.00%, 99.00%, and 99.00%, respectively.

Naive Bayes Experiment Results for the Binary Class
The training curve for the binary class with NB is represented in Figure 19. The training and cross-validation score for NB was more than 88%. Figure 20 shows the binary class testing and validation curves for NB. Both had the same scores for training and validation, which was 86.18%, respectively. Figure 21 represents the binary class confusion matrix for the NB classifier. It can be seen from Figure 21 that 80.8% of attacks were detected correctly. Out of 21,468 packets, there were 17,345 packets detected correctly as anomaly packets. There were 4123 packets identified as normal packets, but in reality, they were anomaly packets. Similarly, there were 1887 packets detected as anomaly packets, but in reality, they were normal packets. Likewise, ninety-one-point-eight percent of packets were correctly recognized as normal packets. Out of 23,087 normal packets, there were 21,200 packets identified correctly.

Logistic Regression Experiment Results for the Binary Class
In Figure 23, the green curve represents the validation curve for logistic regression, and its score was 89.45%, while the red curve represents the training score, which was 89.50%. From Figure 24, it can be seen that the testing score for logistic regression started from 89.60% and went down to 89.55%. Similarly, the cross-validation score began at 89.45% and went up to 89.55%.  Figure 25 represents the binary class confusion matrix for the logistic regression classifier. It can be seen from Figure 25 that 87.3% of attacks were detected correctly; out of 21,468 packets, there were 18,732 packets detected correctly as anomaly packets. There were 2736 packets identified as normal packets, but in reality, they were anomaly packets. Similarly, there were 1793 packets detected as anomaly packets, but in reality, they were normal packets. Ninety-two-point-two percent of packets were correctly detected as normal packets. Out of 23,087 normal packets, there were 21,294 packets identified correctly. Figure 26 represents the logistic regression classification report for the binary class. For the normal class, the precision was 88.10%, recall 92.40% and F1-measure also 90.20%, respectively. The precision, recall and F1-measure scores for the attack class were 91.40%, 86.60% and 88.90%, respectively. Table 5 shows the average comparison of the classification reports of the proposed model, while Table 6 shows the average comparison of the confusion matrices for the proposed models.    Table 7, we can conclude that our proposed GA-RF model achieved a high detection rate and false alarm rate. The GA-RF model outperformed the other two proposed models. The DR and FAR for the GA-RF model were 98.81% and 0.8%, respectively. From Table 8, we can conclude that the performance of RF was evident, so we applied the weighted genetic algorithm to optimize our feature selection techniques further and then passed these best feature subsets to RF for classification; as a result, our proposed model outperformed other similar methods for the IDS, and we achieved a high DR and FAR of 98.81% and 0.8%, respectively.

Discussion
The experiments were performed on the NSL-KDD dataset for a two-class training set and a five-class training set separately. Three classifiers were used for classification, and a genetic algorithm was used for optimal feature selection. The DoS and normal classes had a high amount of packets in them, while R2L, U2R and probe had fewer packets. Since the DoS attack had more packets, it outperformed the other attacks. R2L, U2R and probe had a very low amount of packets in them, so it was a challenge to achieve a high detection rate for these attacks. To achieve this, we selected optimal features to achieve high detection accuracy. Our proposed model not only achieved high accuracy, but it also achieved a very promising FAR.  Table 7 shows that our proposed GA-RF model achieved a high detection rate and false alarm rate. The GA-RF model outperformed the other two proposed models. The DR and FAR for the GA-RF model were 98.81% and 0.8%, respectively. The reason for achieving such a high accuracy and low FAR was the optimal feature selection and RF, which is a bagging technique and outperforms other traditional machine learning algorithms.
Our investigation also analysed the performance of the classifier with regard to the training set, which was picked with diverse deflation factors and setting the scaling factor as M. This paper examined the F1-score and precision performance as compared to the NSL-KDD and CSE-CIC-IDS2018 datasets. Thus, we explored the security performance on the NSL-KDD dataset, which guaranteed that the data selection was valuable and devoid of extreme noise. The results indicated that the classifiers attained an outstanding average performance at M = 60. Similarly, on the CSE-CIC-IDS2018 dataset, the classifiers attained exceptional average performance M = 10. Consequently, with respect to the average F1-score, on the NSL-KDD dataset, M = 60 was utilized as the performance scaling factor, where the complex models in the normal class with DoS and probe were flattened, and the complex models in R2L and U2R were amplified with the available data. M = 20 was utilized as the scaling factor on the CSE-CIC-IDS2018 dataset, while a comparable action was executed on the NSL-KDD dataset for the complex models. After the execution, we generated the new training set and present the results in Table 9. Furthermore, we utilized diverse values of the scaling factors to treat the training set in both the CSE-CIC-IDS2018 and NSL-KDD datasets. We executed tests on the six projected classifiers. By utilizing the average F1-score of each classifier, we estimated their performance and present their results in Figures 27 and 28.  As illustrated in Figure 29, for each sampling technique, we calculated the F1-score and average precision of the classifier. The performance of the sampling algorithms in the NSL-KDD dataset utilized RUS, ROS and SMOTE. The results showed that all performed with better quality as compared to the initial algorithm. Regarding the forecasting of the F1-score and precision, the enhancement was a little higher. The proposed GA was greatly optimized, for which the average precision was optimized by 5.65%, and the average F1-score was optimized by 8.2%. On the other hand, the performance improvements were insignificant or even worse after utilizing the RUS, ROS and SMOTE sampling algorithms for the CSE-CIC-IDS2018 dataset. After the training set with genetic algorithm sampling, which was projected in this research, the average precision was optimized by 3.65%, while the average F1-score was optimized by 4.24%. Considering that the F1-score model is a harmonic average of the prediction and recall rates, it is a decent indicator of the performance of classification models. Subsequently, the average precision and F1-score were accepted in this research as the metrics to measure the different approaches proposed by previous researchers regarding disparities in network traffic. In summary, conventional sampling approaches decreased the disparity in the training set and synthesized it with good proximity to the actual data; they did not create a resultthat fit the actual data. The RUS algorithm resulted in the loss of valid data, while the ROS algorithm resulted in overfitting and data redundancy. SMOTE interruption produced data intersection and noise traffic, as well as increased the number of complex samples in the training set. In all, the major target of our proposed genetic algorithm was to compress and improve complex information from a disproportionate training set. This allowed the classifier to attain a better distribution of the data, therefore enhancing the performance of the classification.

Conclusions
Considering the recent increase of security threats in smart systems and particularly in the smart healthcare industry, machine learning security approaches are considered as a very promising and viable technique towards combating these unrelenting intruding vulnerabilities. Therefore, this study performed an intrusion detection analysis for false alarm rate detection using machine learning feature selection, which was integrated in a genetic algorithm with the combination of random forest in a SoT model, and we further proposed a secured communication system suitable for smart healthcare. The research employed various machine learning algorithms on the NSL-KDD dataset for the experiments. Firstly, the utilized dataset was transformed into two binary classifications, namely the attack class and the normal class. The analysis performed pre-processing on the selected dataset, and non-numeric values are replaced with numeric encoding. Secondly, the data were normalized using min-max normalization. Then, the study performed feature selection using a Genetic Algorithm (GA), and the features with optimal performance were selected. After feature selection, the research employed different machine learning algorithms to analyse the selected dataset. For all selected ML algorithms, Random Forest (RF) outperformed all others in terms of the accuracy, detection rate and false alarm rate. Furthermore, the proposed technique was compared with other recent work, and the experimental results proved that our method performed better in terms of the detection rate, false alarm rate and accuracy. With respect to the FAR, the proposed model achieved 0.8% on the NSL-KDD dataset, which was improved from the base case, which had an FAR of 0.6% using the NSL-KDD dataset. Similarly, we delivered, on average, a 98.81% detection rate. In the future, attack classification accuracy based on high data and low data implementation will be improved. For example, there is a vast difference between the data attack size of Neptune with forty-five-thousand eight-hundred seventy-one packetsand the SQL attack, which has only two packets with almost zero classifications. We intend to use SMOTE, which is a data-balancing technique, to solve this anomaly. Finally, this study employed deep neural network models such as CNN and RNN to test the model. Another area of interest would be to auto-learn the pattern of the packet and then detect the anomaly by self-learning. Furthermore, the proposed algorithm was applied to examine the F1-score and precisionperformance as compared to the NSL-KDD and CSE-CIC-IDS2018 datasets using different scaling factors. Results showed that the proposed GA was tremendously optimized, in which the average precision was optimized by 5.65% and the average F1-score was optimized by 8.2%. Finally, the proposed GA-LG model performed well as it had an 88.10% precision, 92.40% recall and 90.20% F1-measure for the normal class. For the attack class GA-LG model, it had a 91.40% precision, 86.60% recall and 88.90% F1-measure.