Towards a Hybrid Machine Learning Model for Intelligent Cyber Threat Identiﬁcation in Smart City Environments

: The concept of a smart city requires the integration of information and communication technologies and devices over a network for the better provision of services to citizens. As a result, the quality of living is improved by continuous analyses of data to improve service delivery by governments and other organizations. Due to the presence of extensive devices and data ﬂow over networks, the probability of cyber attacks and intrusion detection has increased. The monitoring of this huge amount of data trafﬁc is very difﬁcult, though machine learning algorithms have huge potential to support this task. In this study, we compared different machine learning models used for cyber threat classiﬁcation. Our comparison was focused on the analyzed cyber threats, algorithms, and performance of these models. We have identiﬁed that real-time classiﬁcation, accuracy, and false-positive rates are still the major issues in the performance of existing models. Accordingly, we have proposed a hybrid deep learning (DL) model for cyber threat intelligence (CTI) to improve threat classiﬁcation performance. Our model was based on a convolutional neural network (CNN) and quasi-recurrent neural network (QRNN). The use of QRNN not only resulted in improved accuracy but also enabled real-time classiﬁcation. The model was tested on BoT-IoT and TON_IoT datasets, and the results showed that the proposed model outperformed the other models. Due to this improved performance, we emphasize that the application of this model in the real-time environment of a smart system network will help in reducing threats in a reasonable time.


Introduction
The transformation of cities into smart cities is on the rise, where technologies such as the Internet of Things (IoT) and cyber-physical systems (CPS) are connected through networks for the better provision of quality services to citizens [1]. The smart city concept refers to urban systems that are integrated with information and communication technologies (ICTs) to improve city services in terms of monitoring, management, and control to be more efficient and effective [2]. A smart city contains a huge number of sensors that continuously generate a tremendous amount of sensitive data such as location coordinates, credit card numbers, and medical records [3]. These data are transmitted through a network to data centers for processing and analysis so that appropriate decisions, such as managing traffic and energy, can be made in a smart city [4]. The resource limitations of technological infrastructure expose smart cities to cyber attacks [5]. For instance, sensors that generate data and devices that handle the data in a smart city have vulnerabilities that can be exploited by cybercriminals. Consequently, citizens' privacy and lives can be at risk when collected data for analysis and decision making are manipulated, which makes people intimidated by smart cities [1].

•
We propose a hybrid DL model that consists of QRNN and CNN to improve cyber threat analysis accuracy, lower FPR, and provide real-time analysis.

•
We evaluated our proposed model on two datasets that were simulated to represent a realistic IoT environment.
The rest of this paper is structured as follows. In Section 2, we discuss related work by comparing and analyzing different threat classification schemes that have been proposed in the literature. The proposed model is presented in Section 3. The implementation of the proposed model is discussed in Section 4, the experiment results and analysis are presented in Section 5, and conclusions are presented in Section 6.

Related Work
In recent years, different studies have proposed mechanisms to predict and analyze cyber attacks in smart city environments. The authors of [24] proposed an ML-based detection mechanism that focused on classifying DDoS patterns to protect a smart city from them. In [25], the authors studied how IoT devices can affect smart city cyber security; the authors proposed a detection mechanism that depends on the selected features to improve the threat detection for IoT. The results of the proposed system showed high accuracy, but the dataset, KDD CUP 99, did not represent the behavior of IoT network attacks. Soe et al. [21] proposed an algorithm to improve prediction accuracy by selecting the optimal features for each type of attack in an IoT environment. The authors used ML models to evaluate the proposed feature selection algorithm, which was able to accurately predict the threats. However, the proposed algorithm selected a static set of features for each type of attack, which could be easily bypassed if exposed to the threat environment. In [26], the authors used a DL model to select the best features for threat prediction to improve the detection time in an IoT environment. The proposed model selects a set of features that are fed into feed-forward neural networks (FFNNs) to detect cyber threats and classify threat types. However, the proposed model showed limited accuracy in predicting information theft data.
In [19], the authors discussed how to use the ML model to rapidly and efficiently detect and classify IoT network attacks. The authors performed an experimental study by implementing various ML models and evaluating their performance. In [27], the authors proposed a hybrid ML model to detect IoT network attacks including that of the zero-day. The proposed model mainly consists of two stages: the first stage classifies the traffic into two categories (normal or attack), and the second stage classifies the type of attacks using SVM. Similarly, in [28], the authors proposed a hybrid ML model to detect and classify IoT network attacks in real time. The first layer of the proposed model uses a decision tree classifier to detect malicious behavior and the second layer classifies the type of attack using random forest (RF). In [29], the authors investigated the remote-control threat of connected cars and used an ML model to predict threats. The authors proposed a proactive anomaly detection mechanism that profiled the behavior of the autonomous connected cars using a recursive Bayesian estimator. To evaluate the effectiveness of the proposed method, the authors designed a dataset for connected cars using hypothetical events routes and global positioning system coordinates, and they then modeled the data to predict the anomalies' behavior. Lee et al. [30] proposed a technique, based on DL models, that transforms the multitude of security events into individual event profiles. The authors discussed how anomaly-based detection can be costly since it can trigger many false alerts. Therefore, they focused on improving security information and event management system by using DL to reduce the cost to differentiate between true and false alerts. In [31], the authors proposed a hybrid ML method to detect cyber threats. The authors focused on how to improve detection accuracy to handle an attacker's methods to evade detection tools. To evaluate the proposed method, the authors used different datasets including KDD Cup and UNSW-NB15. In [32], the authors discussed how to improve the threat analysis and classification, including novel attacks. The authors proposed a model based on a stacked autoencoder to enhance and automate feature selection to classify the threats.
Various scientific studies have proposed a hybrid DL model to improve threat analysis and classification. In [33], the authors proposed an improved version of grey wolf optimization (GWO) and a CNN. In the proposed hybrid model, the first GWO model is used to select the features and the second CNN model is used for threat classification. Other studies have used a hybrid DL model that is based on CNNs and RNNs for spatial and temporal feature extraction to improve attack classification. In [34], the authors used a CNN for feature selection since it could provide fast feature selection to support real-time analysis. For threat classification, the authors used one of the variants of the LSTM model: weight-dropped LSTM (WDLSTM). The proposed hybrid model showed good performance in terms of execution time. Vinayakumar et al. [35] studied the effect of CNN in threat classification and intrusion detection system (IDS). The authors investigated different hybrid DL models with CNNs including CNN-LSTM, CNN-GRU, and CNN-RNN, and the model implementing CNN-LSTM outperformed the other models. Moreover, the authors highlighted that selecting a minimum set of features for threat classification degraded the performance of the classification. Therefore, DL models can perform well in terms of feature selection. In [36], the authors proposed a hierarchical model based on CNN-LSTM. The authors used stacked CNN layers for spatial features learning using image classification and then stacked LSTM for temporal features learning. Similarly, in [20], the authors proposed an LuNet model based on CNN-LSTM. The authors discussed how stacking LSTM layers after CNN layers could drop some of the temporal features. Thus, the authors proposed the LuNet block, which consists of LSTM layer stacked after the CNN layer, and they then stacked the LuNet block in multiple layers to improve classification performance and lower the FPR.
As shown in Table 1, different network traffic benchmark datasets have been used to analyze the low-level IoC such as UNSW-NB15, NSL-KDD, and KDD CUP 99. For IoT attack classification, the BoT-IoT dataset has been used in multiple studies to evaluate the performance of proposed models. Different ML and DL models, such as the SVM, CNN, and LSTM, have been used to analyze threats and provide accurate results, and the CNN-LSTM hybrid model has been used in multiple studies to improve threat classification performance. In terms of the CTI for smart cities, multiple papers, including [24,25], have analyzed the threats pattern based on network traffic. Additionally, in [37], the authors proposed a trustworthy privacy-preserving secured framework (TP2SF) for smart cities; the authors used the optimized gradient tree boosting system (XGBoost) and blockchain, and they evaluated the proposed framework on two datasets: BoT-IoT and TON_IoT. DDoS is one of the challenging threats in a smart city that has been studied by different researchers, who have proposed methods to analyze IP addresses and track the sources to prevent this attack or to identify the behavior of the network when there is overload traffic. Data theft, which can be described as privacy and identity theft, is another threat that has been studied by various researchers. Data theft threats include reconnaissance, information theft, probe, R2L, and U2R, which may lead to the exposure of various vulnerabilities that can help in launching data theft attacks such as sniffing passwords and unauthorized access. Some of the proposed models for smart cites set a fixed threshold to detect attacks, which is not effective and can raise a lot of false alarms that affect the power consumption of the connected systems. In smart cities, the normal behavior of a system can change due to the increasing number of connected devices, so some researchers have achieved high accuracy but bad performance in terms of FPR.
Even though different researchers have proposed models to enhance threat classification for IoT environments, many aspects still require improvement. One of the limitations that is common between different methods is performance time. Low-level IoCs that are collected from network traffic have been used to analyze the threats in various papers to provide timely information to the CTI knowledge base and update the detection and prevention information for all systems connected to the CTI. However, to enhance classification performance, various models have multiple stacked ML model layers. Therefore, it may take time to train a model and classify threats while not taking advantage of these IoCs. Secondly, when some models are not provided with enough data for each type of threat, threat traffic cannot be profiled and modeled well enough. Consequently, ML models can have high FPRs. Furthermore, some models only provide accurate results when their system has precise details of threats. Consequently, the system is not able to recognize threats that do not have enough data for model training, which affects classification accuracy.
Moreover, we observed that few papers have addressed diverse patterns for threat analysis while considering time, accuracy, and FPR. Several works have proposed hybrid models based on the CNN and LSTM to learn spatial and temporal data. However, LSTM is computationally complex and requires a long time for analysis [38]. The QRNN model is a type of RNN that allows for sequence modeling by implementing computation in parallel while maintaining the data's long-and short-term sequence dependencies [23]. We could not find a work that used the QRNN model to improve cyber threat classification time while demonstrating high accuracy. Thus, in this work, we propose a hybrid DL model for CTI for smart cities that addresses the abovementioned challenges and uses the QRNN model. The proposed hybrid model can improve threat classification accuracy and lower the FPR in a reasonable time. Therefore, it can predict different attacks to protect citizens' data and enhance the security of smart cities.

Proposed Model
In this section, we discuss the proposed hybrid DL model in terms of its structure, the selected DL algorithms, and relevant theoretical concepts. The selected DL models (CNN and QRNN) can be used to classify a threat type in real time while providing a low FPR. The architecture of the proposed model is presented in Figure 1. A CNN is an extension of a neural network [39] and it is effective at extracting features at a low level from the source data, especially spatial features [40].
CNNs are used widely in image processing due to their ability to automate feature extraction [41]. Additionally, CNNs have demonstrated their effectiveness in many fields such as biomedical text analysis and malware classification [30]. Based on the shape of the input data, a CNN can be classified into different types including a two-dimensional (2D) CNN, which uses data such as images, and a one-dimensional (1D) CNN, which uses data such as text. A CNN consists of a convolution layer, pooling layer, fully connected (FC) layer, and activation function [42]. The convolution layer is fundamental building block in CNNs that takes two sets of information as inputs and performs a mathematical operation with these inputs. The two sets of information are the data and a filter, which can be referred to as kernel. The filter is applied to an entire dataset to produce a feature map [41]. Each CNN filter extracts a set of features that are aggregated to a new feature map as output [30]. The pooling layer is implemented to reduce feature map dimensions and to remove irrelevant data to improve learning [20]. The output of the pooling layer is fed into the FC layer to classify the data [43].
The LSTM-RNN is one of the most powerful neural network models that is used in cyber security due to its ability to accurately model temporal sequences and their long-term dependencies [44]. However, LSTM usually takes a longer time for model training and high computation cost [45]. The QRNN model [23] was designed to overcome the RNN limitations in terms of each timestep's computation dependency on the previous timestep, which limits the power of parallelism. The QRNN combines the benefits of the CNN and RNN by using convolutional filters on the input data and allowing the long-term sequence dependency to store the data of previous timestamps [23]. The computation structure of the QRNN is presented in Figure 2. The QRNN consists of convolutional layers and recurrent pooling function, which allow the QRNN to work faster than LSTM due to its a 16-timesincrease in speed while achieving the same accuracy as LSTM [46]. The convolutional and pooling layers allow for the parallel computation of the batch and feature dimensions [23]. The QRNN has been used in different applications such as video classification [45], speech synthesis [46], and natural language processing [47]. Our hybrid DL model consists of a 1D convolutional layer, 1D max-pooling layer, a QRNN, and FC layers. The first 1D convolutional layer selects the spatial features and produces a feature map that will be processed by the activation function. The Rectified Linear Unit (ReLU) activation function is used in the convolutional layers because of its rapid convergence of gradient descent, which made it a good choice for our proposed model [41]. Then, the feature map is processed by the second layer that uses the maxpooling operation. The max-pooling operation selects the maximum value in the pooling operation [41]. The pooling layer reduces dimensionality and removes irrelevant features. The output of the CNN model retains the temporal feature that is extracted by the QRNN model. Figure 3 provides details of our proposed model and shows that we used two QRNN layers to extract the temporal features. In the two layers of the QRNN, the hidden size represents the number of the hidden units and the output dimension. The hidden units can be selected based on the value of the number of features [45]. One of the problems of a neural network is overfitting, which means that a model learns the data too well. Consequently, the model is not able to identify variants in new data [22]. We added a dropout layer to prevent overfitting. Then, a 1D convolutional layer and max-pooling layer are used to extract more spatialtemporal features. The output of the CNN model is passed to the Flatten layer, which is a fully connected input layer that transforms the output of the pooling layer into one vector to be an input for the next layer [48]. Finally, the dense layer, which is also a fully connected layer, with the SoftMax activation function is used to classify the threats by calculating the probabilities for each class [34].

Implementation
In this section, we describe the datasets that we selected to evaluate the proposed model. Additionally, we discuss the data preprocessing steps, model parameter selection process, and selected evaluation metrics.

Datasets
In this work, we selected the BoT-IoT and TON-IoT datasets because they have been simulated to represent realistic IoT environments such as smart homes and cities. The datasets had a heterogeneity of simulated IoT devices including weather-monitoring systems, smart lights, smart thermostats, and a variety of cyber threats.

BoT-IoT Dataset
In previous studies, different datasets, such as KDD99, ISCX, and CICIDS2017, have been used to evaluate ML models; however, few datasets have been produced to reflect realistic IoT network traffic. These datasets were either not diverse enough in terms of attacks or not realistic in terms of the testbed [19]. Therefore, Koroniotis et al. [49] designed the BoT-IoT dataset to address these limitations. The BoT-IoT dataset is used in forensic analysis and to evaluate IDS. The dataset contains normal IoT traffic and different types of attack traffic with subcategories for each type, which are listed in Table 2. Reconnaissance is one of the privacy threats, and it allows a threat actor to collect data about a victim via port scanning and OS fingerprinting, among other ways. Information theft includes data theft by unauthorized access and keylogging. On the other hand, a DoS threat affects the availability of services and can damage systems, which make it one of the biggest threats to smart cities. In this dataset, UDP, TCP, and HTTP protocols were used to perform both DoS and DDoS attacks.

TON_IoT Dataset
The ToN_IoT dataset [50] is one of the newest cyber security datasets; it as collected from a testbed network for industry 4.0 IoT and Industrial IoT (IIoT), which makes it suitable to evaluate CTI for a smart city. We used the TON_IoT train-test dataset, which is in the CSV format. The dataset contains a total of 461,043 instances and 9 types of attacks, which are presented in Table 3 along with the number of instances for each type.

Data Preprocessing
Since we were interested in evaluating CTI for threat classification, we deleted the normal traffic from the datasets. Additionally, in the BoT-IoT dataset, we omitted the pkSeqID feature since it represented an identifier for the traffic records. The datasets contains some categorical features that could not be processed by the neural network. Thus, we converted the nominal values into numeric using sklearn LabelEncoder. LabelEncoder converts categorical values into numerical values [22]. We implemented sklearn StandardScaler to scale the data. For training and evaluation, several papers have split the dataset into training and testing, with a ratio of 20% for testing s in [19] and 30% for testing in [21]. However, due to the size of the BoT-IoT dataset and the resource constraints of our device, we divided the data into training and testing sets, with a ratio of 35% for testing, while having the same ratio of classes in both parts by using the stratify parameter.

Model Implementation
The parameters of the hybrid model were obtained during the training phase by trial and error including the number of CNN filters, the number of QRNN hidden units, and the dropout rate. As mentioned in different studies [35], kernel size values of 3 and 5 are the most common, so we used kernel size 3 with both datasets in our experiment. A filter can help in extracting more details from a dataset by increasing the number of filters [51]. Thus, for the first CNN layer, we used 64 filters, and for the other CNN, we used 128 filters. Additionally, we set the value of the batch size for the training at 128 and the value of the number of epochs at 10. The details and the selected parameters of the hybrid DL model are presented in Figure 3.

Evaluation Tools and Metrics
Different evaluation metrics were used in this work to evaluate the performance of the proposed model including accuracy, FPR, TPR, precision, recall, and F-Score. Accuracy represents the ratio of correctly classified threats to the total number of classified threats, so it demonstrates how accurate an model in classifying threats [52]. The FPR represents the ratio of misclassified data as a different type of threat, and the TPR represents a model's ability to correctly classify threats. A low FPR and a high TPR demonstrate the ability of a model to correctly classify cyber threats [53]. Precision, recall, and F-Score were used to evaluate the overall performance of the proposed model; a high value of precision indicates a low FPR, and recall represents a model's ability to correctly classify threats. Equations (1)-(6) represent the evaluation metrics, where FP is false positive, TP is true positive, TN is true negative, and FN is false negative.

Results and Analysis
This section presents the results and analysis for model implementation. We used Jupyter Notebook software with the Python programming language. We used the Keras and scikitlearn packages for data pre-processing and implementing the proposed model. We trained the proposed model on a MacBook Air with an Intel Core i5 CPU 1.6 GHz processor and 8 GB RAM. Additionally, we implemented different state-of-the-art ML models on the datasets to compare their performance with that of our proposed model. Figure 4 presents the confusion matrix of our proposed model on the BoT-IoT dataset. The results show that the model correctly classified most of the cyber threat categories. Furthermore, to illustrate the quality of the proposed model, the receiver operating characteristic (ROC) curve is plotted in Figure 5 for the BoT-IoT dataset.   The results of our proposed model on the testing datasets are presented in Table 4.   As shown in Table 4, the proposed model achieved high accuracy, with an average of 99.99% on both datasets. The TPR reached averages of 99.92% with the BoT-IoT dataset and 99.99% with the TON_IoT dataset. The proposed model achieved a low FPR of 0.0003 with the BoT-IoT dataset and 0.001 with the TON_IoT dataset. Thus, the proposed model showed good performance in classifying the threats with both datasets. Moreover, to demonstrate the effectiveness of the QRNN, we implemented our proposed model with LSTM instead of the QRNN to compare performance. Cybersecurity threats are very critical [54][55][56], and the results shown in Tables 5 and 6 highlight that our proposed approach could be very effective in dealing with them.  According to the results in Tables 5 and 6, our proposed model with the QRNN showed the same performance as our proposed model with LSTM in terms of accuracy, precision, recall, and F-Score. In terms of time, the proposed model with the QRNN showed better performance for training the model and testing. The average training time per epoch demonstrated that the QRNN performed faster than LSTM in terms of training the model on both datasets, with a 418.3 s difference on the BoT-IoT dataset and a 19.8 s difference on the TON_IoT dataset. Additionally, for the classification time on the test dataset, the QRNN model performed faster than LSTM, with a 75 s difference on the BoT-IoT dataset and a 3 s difference on the TON_IoT dataset. The QRNN showed its effectiveness in increasing the speed of the model while providing a high accuracy and low FPR. Therefore, the model can be used for real-time CTI. We further compared the performance of our proposed model on the BoT-IoT and TON_IoT datasets against the state-of-the-art models for the multi-class classification of threats. The results of these comparisons are shown in Tables 7 and 8. As shown in Tables 7 and 8, though K-NN [19] and RF [28] showed good performance for recall and F-score on the BoT-IoT dataset, our proposed model outperformed the stateof-the-art models on both datasets. Additionally, we implemented different ML models to compare their performance with that of our model. The accuracy, TPR, and FPR values of each model are given are Tables 9 and 10. Our model performed better than the other four models, with accuracy measured as 99.99% on both datasets and low FPR values of 0.0003 on the BoT-IoT dataset and 0.001 on the TON_IoT dataset. The LSTM model showed good performance in terms of accuracy and FPR, while the GRU showed a high TPR compared to the LSTM on the BoT-IoT dataset. On the TON_IoT dataset, the GRU performed poorly compared to the other models.

Theoretical and Practical Implications
This work describes a model that can correctly classify cyber threats with a low FPR while considering time performance. Thus, the proposed model can improve decision making for risk mitigation so that appropriate protection measures against cyber attacks in smart cities can be taken [57,58]. Additionally, this model will benefit organizations and services providers in smart cities because of the high costs of implementing and maintaining cyber security solutions [59]. The organizations and service providers in smart cities can take accurate proactive measures against detected cyber attacks such as data breaches, which will help in saving costs [60]. Furthermore, our proposed model can be implemented in the cloud to monitor cyber security and collect and update cyber threat data from the connected systems in smart cities.

Conclusions
A smart city facilitates the life of its citizens by providing better services than nonsmart cities. Due to the extensive presence of digital data, smart cities are also vulnerable to various types of attacks. Machine-learning-based cyber threat intelligence can secure smart city environments by monitoring attacks and analyzing data threats in order to take prevention measures. In this paper, we have proposed a hybrid deep learning model to classify threats. The proposed model uses a CNN and a QRNN to improve feature extraction, increases classification accuracy, and lower the FPR. We evaluated our model on the BoT-IoT and TON_IoT datasets, and our results showed the effectiveness of our model in improving classification accuracy and lowering the FPR. In addition, the results showed that the QRNN model could improve classification time performance while providing high accuracy and lower FPR than LSTM. Thus, the proposed model for CTI for smart cities can accurately analyze and classify data in real time.
One of the limitations of this work is the authors' use of datasets. Due to the security and privacy of smart city citizens, it was difficult to evaluate the proposed model on real-time data. Additionally, for implementation, we evaluated the model as a centralized system. In future work, we can implement the proposed model in a distributed environment with parallel training to improve classification performance. Funding: The authors would like to thank SAUDI ARAMCO Cybersecurity Chair, Imam Abdulrahman Bin Faisal University for funding this project.