Cyberattacks Detection in IoT-Based Smart City Applications Using Machine Learning Techniques

In recent years, the widespread deployment of the Internet of Things (IoT) applications has contributed to the development of smart cities. A smart city utilizes IoT-enabled technologies, communications and applications to maximize operational efficiency and enhance both the service providers’ quality of services and people’s wellbeing and quality of life. With the growth of smart city networks, however, comes the increased risk of cybersecurity threats and attacks. IoT devices within a smart city network are connected to sensors linked to large cloud servers and are exposed to malicious attacks and threats. Thus, it is important to devise approaches to prevent such attacks and protect IoT devices from failure. In this paper, we explore an attack and anomaly detection technique based on machine learning algorithms (LR, SVM, DT, RF, ANN and KNN) to defend against and mitigate IoT cybersecurity threats in a smart city. Contrary to existing works that have focused on single classifiers, we also explore ensemble methods such as bagging, boosting and stacking to enhance the performance of the detection system. Additionally, we consider an integration of feature selection, cross-validation and multi-class classification for the discussed domain, which has not been well considered in the existing literature. Experimental results with the recent attack dataset demonstrate that the proposed technique can effectively identify cyberattacks and the stacking ensemble model outperforms comparable models in terms of accuracy, precision, recall and F1-Score, implying the promise of stacking in this domain.


Introduction
Internet of things (IoT) is an interconnected scheme which promotes seamless information exchange between devices (e.g., smart home sensors, environmental sensors, automotive and road-side sensors, medical devices, industrial robots and surveillance devices) [1]. Recently, the emergence of the IoT has significantly increased its use in communities and services around the world, with the number of the linked IoT devices reaching 27 billion in 2017, and the number is projected to hit about 125 billion in 2030 [2]. IoT devices use different types of services, technologies and protocols. As a result, huge complexity will arise to maintain the future IoT infrastructures, which consequently leads to undesirable vulnerability to the system [3,4].
Since IoT devices are used in smart city applications, cyber-attacks can access in an unauthorized manner the details of citizen's everyday activities without the knowledge of the user or administrator or reconfigure devices to an unsecured setting (e.g., in Miria botnet attack [5,6], a malware that transforms Linux networked devices remotely). In 2019, Symantec recorded a 600% rise in attacks on the IoT platform [7] where attackers tried to manipulate the linked nature of those devices.
Smart city applications pose several security challenges. Firstly, zero-day attacks can occur by exploiting vulnerabilities in different protocols in smart city applications. Secondly, is it possible to identify cyber-attacks from the network intelligently before it disrupts smart city operations? Thirdly, the IoT devices used in smart cities are resource (e.g., memory) constrained, are typically resource constrained, have limited onboard functionality for security operations and send captured data to cloud servers for processing. Existing intrusion detection systems (IDS) do not take IoT devices into account. Combining all these issues, is it is possible to design an IDS design an IDS that is tailored for IoT networks?
The data collected from the IoT system is stored on the cloud computing environment which has progressively advanced processors and adequate memory assets. However, the volume of data transmitted from the IoT terminal layer to the cloud has increased rapidly with the recent increases in IoT devices and this causes delay and congestion problems in the cloud. Fog computing is designed as a possible solution to these problems [8]. The fog layer devices can share a greater amount of computing load originally transferred to the cloud. This reduces energy consumption, network traffic and latency and removes the data storage and transmission problem. It also aims to push the computation process near the edge device, enabling a quick response to the IoT-based smart city applications. The benefits of cyber attack detection in the fog layer are two folds [9]. Firstly, the ISP or network administrator can take necessary steps to prevent large damage if attacks (e.g., infected devices) are identified early in the fog layer. Secondly, it will not interrupt the normal flow of urban life.
In the literature, some techniques (e.g., signature base techniques) have been proposed to resolve the above-mentioned issue. In the signature-based technique, a collection of previously produced signatures (attacks) are checked against the current suspicious samples [10]. If the signature extraction method is not fully able to capture the distinct feature of attacks or attack families, it may lead to misdetection of an attack or produce false alarm [11]. This technique is not suitable is not suitable for identifying unknown attacks and suffers from high processing overhead. Machine learning techniques can detect attacks during runtime and take less processing time compared to other techniques.
In this paper, we explore a machine learning-based attack and anomaly detection technique in IoT-based smart city applications. This technique is able to identify infected IoT devices which is a major challenge in the cloud computing environment [12,13]. The technique is based on the implementation of a training model in the distributed fog networks that can learn intelligently from training near to IoT layer devices and detect attack and anomaly.
A single classifier is often insufficient to develop an effective IDS, motivating researchers to build an ensemble model of classifiers. Taking a multitude of models into account, ensemble methods combine those models to generate one final model. Research has demonstrated that the ensemble model produces better performance compared to the single classifier [14]. However, there are many factors (e.g., feature selection and base classifier) that need to be considered carefully to ensure enhanced performance by the ensemble method. The most suitable ensemble techniques are bagging [15], boosting [16] and stacking [17]. In this paper, we use individual classifiers as well as ensemble techniques to achieve better IDS performance in terms of different evaluation metrics such as accuracy, precision, recall and F1-Score.
The contributions of this paper are summarized as follows: • We explore a machine learning based attack and anomaly detection technique through analyzing network traffic in distributed fog networks over the IoT-based systems.

•
Existing works have generally used signature-based techniques to detect attacks and anomalies. These techniques suffer from high overheads and are vulnerable to known threats. In this paper, we explore the feasibility of ensemble based learning as compared to single model classifiers for identifying cyberattacks in IoT-based smart city applications. Further, we consider a multi-class classification setting as compared to binary class prediction considered in most relevant works. On top of these, we consider an integration of feature selection and cross validation, which even common machine learning approaches have not been well focused in existing literature for this domain. • Extensive evaluation incorporating the above integrations shows that the ensemble of machine learning-based classifiers works better in accurately identifying attacks and their types than single classifiers.
The paper is organized in the following sections. Section 2 discusses the related works. Section 3 discusses the IoT-based Smart city framework. Section 4 presents the proposed anomaly detection model. Sections 5 presents the experimental results. Section 6 gives concluding remarks.

Related Works
In the literature, many studies have been introduced to enhance the IDS performance. In this section, we highlight the recent notable works that have used machine learning techniques as well as ensemble methods.

IDS Based on Machine Learning Techniques
In [18], Pahl and Aubet introduced a machine learning based technique that can predict IoT service behavior by only observing the communication between services in a distributed multi-dimensional IoT microservices in an IoT site. This technique continually learns microservice models inside in an IoT site where K-means and BIRCH based clustering techniques [19] are applied. In this case, if the cluster centers are within the three times standard deviation gap, they are grouped into the same one. The model revises cluster formation using an online learning communication model. The overall accuracy for anomaly detection by this technique is 96.5% with 0.2% false positive rate.
In [20], a joint trust light probe based defense (TLPD) mechanism was introduced to detect On and Off attack in an industrial IoT site, originated from malicious network nodes. Here, the On and Off attack meant a malicious node might target the IoT network when it is in an On or Off state. The framework was designed for the identification of anomalies using a light probe routing mechanism with the measurement of confidence estimation for each neighbor node.
Diro and Chilamkurti [10] proposed a deep learning model to detect distributed attacks in a social IoT network where they compared the performance of the deep model with a shallow neural network using the NSL-KDD [21] open source dataset that captures attack data in the distributed and centralized system. They evaluated the performance of the deep and shallow models with two-class (normal and attack) and four-class (normal, DoS, Probe, R2L and U2R) categories. For binary-class and multi-class identification, their model achieved accuracies of 99.2% and 98.27% as well as 95.22% and 96.75%, respectively, for the deep and shallow models.
In [22], Pajouh et al. proposed a two-stage dimension reduction and classification technique to detect anomaly in IoT backbone networks where they detected low frequency attacks such as user to root (U2R) and remote to local (R2L) attacks from NSL-KDD dataset because of their detrimental consequences. They used principal component analysis (PCA) and linear discriminate analysis (LDA) feature extraction method to reduce the feature of the dataset and then used naïve bayes and K-nearest Neighbor (KNN) to identify anomaly and achieved 84.82% identification rate.
In [23], Kozik et al. introduced an attack detection technique that used extreme learning machine (ELM) [24] method in the Apache Spark cloud architecture. ELM architecture and properties allow for efficient computation and analysis of the Netflow formatted data that are collected from the fog computing environment. This work concentrated on three main cases in IoT systems-scanning, command and control and infected host-and attained accuracy levels of 99%, 76% and 95%, respectively.
In [25], Hasan et al. proposed a data analysis-based method to detect attacks on IoT infrastructure which overcomes the data processing overhead of signature based techniques. Their proposed solution is able to identify and prevent the systems from attacks when it faces any irregular behavior. They performed their experiment on the publicly available IoT dataset [18]. They explored several machine learning techniques such as DT, RF, LR, SVM and ANN, among which RF classifier yielded the best results.
In [26], a random forest-based anomaly detection model was proposed that can detect infected IoT devices at distributed fog nodes. Experimenting with the UNSW-NB15 dataset [27], their binary (normal and attack) random forest (RF) classifier considered only 12 out of 49 features from the dataset. These 12 features were extracted by using ExtraTreeClassifer [28]. Performance analysis showed that they achieved 99.34% accuracy with 0.02% false positive rate.
In [29], a deep learning model was studied on NSL-KDD, UNSW-NB15, WSN-DS [30] and CICIDS 2017 [31] datasets to identify cyberattacks. They concluded that the deep learning model performs better compared to the other machine learning techniques.

IDS Based on Ensemble Techniques
In the literature, several ensemble methods based IDSs are proposed to enhance accuracy over base classifiers. In [32], ANN and Bayesian net based ensemble method was proposed where they used gain ratio (GR) feature selection technique and performance was evaluated on KDD'99 [33] and NSL-KDD datasets where ensemble methods achieved 99.42% and 98.07% accuracy, respectively.
In [34], Haq et al. proposed an ensemble method that combines Naive Bayes, Bayesian Net and decision tree classifier. They extracted the common features by using Best First Search, Genetic and Rank Search feature selection techniques. The ensemble technique produced 98% true positive rate when tested with 10-fold cross validation method. Gaikwad et al. [35] introduced a bagging ensemble method where they used REPTree as a base classifier. Their model achieved 81.29% accuracy on NSL-KDD dataset. In [36], Jabbar et al. proposed an ensemble method comprising alternating decision tree (ADTree) and KNN, and the performance evaluation demonstrated that the proposed ensemble achieved better detection rate ( 99.8%) compared to the existing techniques.
In [37], Zhou et al. proposed feature selection and ensemble method based IDS model where a combination of correlation-based feature selection (CFS) and Bat algorithm [38] were used for optimal feature selection, followed by an ensemble method comprising DT, RF and Forest by Penalizing Attributes (Forest PA) algorithms. Experiments were performed on NSL-KDD, AWID [39] and CIC-IDS2017 datasets, achieving 99.8%, 99.5% and 99.8% accuracy, respectively.
In [40], a hybrid intrusion detection system was introduced comprising C5 classifier and One class support vector machine. The main focus of this work was to identify the common instruction and zero-day attack by using a Bot-IoT dataset [41] that contains IoT network traffic with several types of attacks. Performance analysis demonstrated that the proposed hybrid model attained higher accuracy to intrusion detection compared to Signature Intrusion Detection System (SIDS) and Anomaly-based Intrusion Detection System (AIDS).
In [42], bagging and boosting ensemble methods were proposed where the authors used decision tree and random forest tree as the base classifiers. Experiments were performed on the NSL-KDD dataset and it was found that bagging with decision trees gives better results. Table 1 summarizes the notable works addressing intrusion and anomaly detection in networks using machine learning techniques and in some works their ensemble techniques. Despite such a wide exploration, it is clear that different works have used different data and achieved different performance outcomes, which is not surprising due to machine learning algorithms' often dependence on data and differing contexts may result in different outcomes. However, UNSW-NB15, the latest version of data covering intrusion detection in IOT devices, has found only relatively less exploration. In this research, we used this dataset especially considering its concurrency. Contrary to Alrashdi et al. [26], our work is not limited to binary classification or RF classifier alone. Further, in this paper, we explore the multi-class problem. In other words, our focus is not limited to only identify the normal/abnormal state of data but also to detect the exact type of attacks in fog nodes within smart city infrastructure. Our work also differs due to analyzing the performance with the base as well as ensemble classifier.

IoT-Based Smart City Framework
Smart city is an integrated framework where IoT technology, smart systems and information and communication technology (ICT) are collectively used to enhance the quality and performance of the different city services such as transportation, health systems, pollution control and energy distribution.
A smart city framework, as based on existing literature [26], is shown in Figure 1 and consists of the following three layers: terminal layer, fog layers and cloud layer. The cloud layer contains storage resources (e.g., servers and virtual machines) to store as well as maintain a large amount of data. The fog layer acts as a bridge between the terminal layer devices and cloud layer and is responsible to ensure the computational process and management at the edges of the network. The fog layer is more effective at identifying the different cyber-attacks than the centralized cloud layer. The terminal layer consists of a set of IoT devices (sensors) that are installed within the city to collect data.
For several reasons, IoT networks and applications are vulnerable against attacks. Firstly, most IoT devices have limited resources (e.g., small processing power and memory) and as a result suffer from limited processing capability. Secondly, IoT devices are interconnected to different protocols and the increasing number of IoT devices further causes latency in cloud centers. Thirdly, sometimes IoT devices are unattended, which makes it possible for an intruder to physically access them. Fourthly, the greater part of the data communication is wireless, exposing it to eavesdropping.
As a consequence, conventional IDS systems often fail to detect the IoT attack accurately [43]. Thus, an attacker can successfully compromise vulnerable IoT devices to connect to smart city routers and devices located at various places such as homes, shopping malls, restaurants, hotels and airports. By doing so, an attacker who compromises these IoT devices may obtain sensitive data such as information of credit card, stream video and similar personal information.
One of the key issues that smart city framework and infrastructure must ensure is its ability to deliver services in a sustainable manner to meet the needs of the current and future generations of citizens [44,45]. Some ongoing smart city projects such as those initiated in Hong Kong and Masdar city in Abu Dhabi [46] have already been criticized because of the vulnerable urban development plan and consequently doubts about sustainability of the services. The management of several facets of sustainability programs inside smart cities are facilitated by IoT, and this exposes organizations to the risks of failure from the network unavailability, security breaches and damage of IoT infrastructure from natural disasters [47]. Further, sustainable operation of services such as intelligent transportation systems, smart buildings and sustainable usage of resources such as water and energy supply, garbage disposal, etc. are highly dependent on IoT and related cyber-physical systems [48]. Machine learning techniques are used to better manage those smart city services and resources in an autonomous manner [45,49]. In addition, machine learning techniques can well detect intrusion and cyberattacks in industrial IoT [50], which therefore can enhance sustainability and ensure uninterrupted services in smart cities by thwarting attacks and intrusion on respective IoT systems. However, there is a need for more research on machine learning implementation and model verification in terms of security and privacy [51].
In this paper, we hence explore the feasibility of both ensemble-based learning and single-model classifiers for identifying cyberattacks in IoT-based smart city applications.

Proposed Anomaly Detection Model
Our proposed model is shown in Figure 2. The model tracks the network traffic that goes through each fog node. Since fog nodes are closest to IoT sensors, they will be more effective at identifying the cyber-attacks at fog nodes instead of the cloud center. In this way, an attack can be quickly detected, and the IoT and network administrators can be notified of such attacks, which will then assist them to evaluate and upgrade their systems.
Notably, IDS can be categorized as host-based IDS (HIDS) and network-based IDS (NIDS). In this work, we choose anomaly-based NIDS. HIDS requires the installation of software on each network-connected device to track and identify the malicious activity focused solely on that device and is not suitable for most IoT devices which are resource constrained and support limited functionality (e.g., smart lamps, watches and lock-doors). Again, signature-based NIDS suffers from higher computational cost in storing attacks in a database and fails to detect a new attack in potential network traffic [52], which makes anomaly-based NIDS most suitable in our case. Data collected from this NIDS are used to build an ensemble of ML models to identify abnormal activities in the IoT fog networks.

Description of Used Datasets
We used the UNSW-NB15 [27] and CICIDS2017 [31] datasets. The reasons for using these datasets are two fold: firstly, they are relevant to the proposed smart city infrastructure concept of this paper, and, secondly, both contain samples of the recent types of attacks observed in IoT infrastructure.

UNSW-NB15 Dataset
The UNSW-NB15 dataset [27] is a recent and highly useful IDS dataset containing the modern attacks. In 2015, the UNSW-NB15 dataset was developed to track and identify normal and attack network traffic and the raw network packets were generated by the IXIA PerfectStorm tool in the Australian Centre for Cyber Security (ACCS) cyber range lab [53]. The dataset has been preprocessed through cleaning, visualization, feature engineering and vectorization. This original dataset contains over 2.54 million samples, of which a random portion (175,341 samples) is used in our work. The considered dataset contains 56,000 and 119,241 samples, respectively, representing the benign and attack conditions. We divided our dataset into training set (140,272 samples) and test set (35,069 samples), each set containing attack and benign samples in the same ratio as the original dataset. The distribution of different attacks and anomaly across the dataset is shown in Table 2.  Table 2 shows detailed information on sample distribution.

Data Pre-Processing
Feature selection is one of the key principles that greatly impacts the model's efficacy by selecting only those features that are most relevant and thereby reduces over-fitting, improves accuracy and reduces training time. We used information gain ratio, which is a ratio of information gain to the intrinsic information proposed by Quinlan et al. [54], to select the top 25 features which are highly relevant to the prediction for both datasets. The information gain score of the features of UNSW-NB15 dataset is shown in Tables 3 and 4. Out of 42 (UNSW-BC15) and 78 (CICIDS2017) features, the top 25 were selected based on their information gain ratio. A higher ratio for a feature can contribute more to identifying the benign and malware applications. We only consider the features whose information gain was greater than predetermined threshold 0.5 for UNSW-NB15 dataset and 0.85 for CICIDS2017 dataset.
In feature engineering phases, at first, we identify the type of features in the datasets. In UNSW-NB15 dataset, among the above mentioned 25 features, "proto" and "service" are categorical features and the rest are numerical data. This categorical data are converted into vectors. While categorical data can be translated to vectors in different ways such as 'Label Encoding' and 'One Hot Encoding', 'Label Encoding' [55] technique was used in this research.

Theoretical Consideration
Several machine learning techniques and ensemble methods were used for model building and performance evaluation. We used LR [56], SVM [57], DT [58], RF [59], KNN [60] and ANN [10] machine learning algorithms, which are widely used in the literature to design IDS scheme.
Ensemble methods are a widely used approach in machine learning that combines several base models to generate one optimal predictive model [14]. Taking a multitude of models into account, an ensemble method combines those models to generate one final model. It is based on the principle that a group of weak learners (models) comes together to form a strong learner, thereby increasing the model's accuracy. There are three types of ensemble techniques used in the literature. Bagging [15] is a parallel ensemble technique where the base learners are generated in parallel to improve the strength and accuracy of machine learning algorithms. Boosting [16] is a sequential ensemble technique where the base learners are generated in sequence to reduce bias and variance of supervised machine learning techniques. Stacking [17] is an ensemble learning technique incorporating predictions of several base classification models into a new dataset and used as the input for another classifier which is then used to solve the problem.

Evaluation Criteria
In this subsection, we describe some performance matrices such as accuracy, precision, recall, F1-Score and ROC curves which are widely used in evaluating the model performance in anomaly detection applications.
These performance metrics are defined by using the following parameters: • t p = true positive • t n = true negative • f p = false positive • f n = false negative • p = total positive = t p + f n • n = total negative = t n + f p Accuracy indicates the overall performance of the model with respect to both benign and attack classes and is defined as follows: Precision gives the information about how many selected items are relevant among the retrieved items and can be defined as follows: Recall gives the information about how many relevant items are selected from the total number of relevant items and is defined as follows: F1-Score can be derived from both precision and recall as follows: The Receiver operating characteristic (ROC) curve is utilized to summarize a classifier's performance over all possible decision thresholds in a graph, and it is generated by plotting the true positive rate (tpr) against the false positive rate (fpr). Equations (5) and (6) show the calculation of true positive rate and false positive rate, respectively.

Experimental Results
Experiments were implemented using Python programming language and several libraries such as Pandas, Numpy, Matplotlib, sklearn and Keras on a HP (ELITEBOOK) laptop where the operating system was Windows 10 Education 64-bit and the processor was Intel(R) Core(TM) i5-8350U CPU @ 1.70 GHz 1.9 GHz with 16 GB RAM.
To test the performance of the base classifier as well as the ensemble classifier, 10-fold cross-validation (CV) was used where the provided dataset was randomly divided into 10 equal size subsets. Out of these 10 subsets, nine were used to build the model classifier and the remaining one was used as a test set. The same procedure was repeated ten times to ensure that each subset was used once as the test dataset. Finally, the mean accuracy summarized from each classifier in each fold was noted. Figure 3 represents different evaluation metrics for different classifiers on the training and test datasets.
We first show the performance of the different classifiers in terms of accuracy which is presented in Figure 3a. Here, the task was to classify an unknown sample into one of the ten categories for UNSW-NB15 dataset and eight categories for CICIDS2017 dataset, as shown in Table 3 Among the algorithms, SVM shows poor and least performance while DT and RF shows better results compared to others. On the other hand, stacking ensemble, constructed from base-and meta-classifiers, shows better performance compared to others. We show the performance measure in terms of precision in Figure 3b. The precision for LR, SVM, DT, RF, ANN and KNN on test dataset are 72% and 92%, 70% and 94%, 81% and 99.8%, 82%, and 99.8%, 78% and 94.5% and 79% and 99.7%, respectively. The precision of ensemble methods bagging, boosting and stacking are 82% and 99.7%, 83% and 99.8% and 83% and 99.9%, respectively. Similar to the accuracy metric, SVM shows the least precision for UNSW-NB15 dataset. However, LR shows the least precision for CICIDS2017 and RF shows better results compared to others. On the other hand, the stacking ensemble method performances better compared to others.
The performance in terms of recall is shown in Figure 3c. The recall for LR, SVM, DT, RF, ANN and KNN on test dataset are 72% and 94%, 71% and 92%, 81% and 98%, 82% and 99.8%, 79% and 94.3% and 78% and 99.7%, respectively. The recall values of the ensemble methods bagging, boosting and stacking are 82% and 99.8%, 83% and 99.9% and 83% and 99.9%, respectively. Once again, ensemble techniques yield better performance compared to the base classifier and the stacking ensemble method outperforms others. Finally, we demonstrate the performance measure in terms of F1-score in Figure 3d. The F1-score for LR, SVM, DT, RF, ANN and KNN on test dataset are 71% and 92%, 70% and 94%, 80% and 99.7%, 81% and 99.7%, 78% and 94% and 78% and 99.7%, respectively. The F1-score of ensemble methods bagging, boosting and stacking are 81% and 99.8%, 81% and 99.9% and 83% and 99.9%, respectively. Once again, ensemble techniques show better performance than base classifier and the stacking ensemble method outperforms other classifiers considered in this research.
The results show that the ensemble of learning models provides better performance than the single model classifiers on both test datasets. This implies, while existing works on the data have focused on single learning model, ensemble classifiers such as stacking represent a promising approach for application in this domain.
We also experimented with how the classifier performs when applied in a multi-class classification context. More precisely, we considered each type of attack as a separate class and then assessed the classifiers' ability in identifying the attack from a normal situation. The results are shown in Tables 5  and 6 for UNSW-NB15 and CICIDS2017 datasets, respectively. The results illustrate that, for various types of attack, DT and RF perform better in comparison to other algorithms. On the other hand, the stacking ensemble technique shows significant improvement compared to bagging and boosting in some cases. For example, on the UNSW-NB15 dataset, in the DoS attack, stacking yields an F1-score of 0.45 vs. 0.24 for bagging and boosting, while, in the Worm attack, these scores are 0.57, 0.37 and 0.33, respectively. In most of the other types of attacks, stacking attains an F1-score of above 0.75. On the other hand, on CICIDS2017, in the Bot attack, stacking ensemble achieved 0.950 vs. 0.898 and 0.942 for bagging and boosting.   Table 6. Detection of various classes in multi-class scenario on CICIDS2017 dataset.

Benign DDoS DoS Web
Algorithm TPR FPR F1-Score TPR FPR F1-Score TPR FPR F1-Score TPR FPR F1-Score  Figure 4 shows the Receiver Operating Characteristic Curves for base and ensemble classifiers on UNSW-BC15 dataset. We found that, among the base classifiers, ANN shows better performance. On the other hand, among the ensemble techniques, boosting and stacking demonstrate almost the same results. Tables 7 and 8 show a comparison of the multi-class classification performance attained by ensemble approaches to a recent work [29] on UNSW-BC15 and CICIDS2017 datasets, respectively. The accuracies attained in our work using LR, SVM, DT, RF and KNN are 72.32% and 93.6%, 71.49% and 92%, 80.69% and 99.7%, 81.77% and 99.7% and 78.23% and 99.6%, respectively, while those in [29] are, respectively, 53.8% and 87%, 58.1% and 79.9%, 73.3% and 94%, 75.5% and 94.4% and 62.2% and 90.0%, albeit with some differences in the way the datasets were used. In [29], the researchers experimented with the boosting ensemble technique and achieved accuracies of 60.8% and 64.1 %, which are significantly lower than the accuracies attained in our work (83.3% and 99.9%) using stacking ensemble.   The results of individual classifiers as well as ensemble methods for the binary class classification are shown in Tables 9 and 10 on UNSW-BC15 and CICIDS2017 datasets, respectively. We used the same metric as the multi-class classification. The highest accuracies of 95.45% and 99.7% and F1-scores of 95% and 99.8% by an individual classifier were achieved with RF classifier and the values of those metrics rose to 96.83% and 99.9% and 97% and 99.9%, respectively, when the stacking ensemble technique was used.
A possible reason for the proposed model's significantly better performance compared to Vinayakumar et al. [29] is that they did not consider any feature selection. The existing work experimented with all features for both datasets. However, our proposed model considers an information gain-based feature selection technique and finally uses only 25 most important features based on their information gain ratio.  Notably, a question may be raised as to the complexity of using ensemble models as compared to a single classifier. With technological advances, however, processing units such as mobile devices are becoming increasingly faster and memory resources are becoming increasingly cheaper-a reason fog computing potentially has seen application of a wide range of algorithms including ensemble techniques [61,62]. There are also active investigation on efficient allocation of resources in fog computing [63]. Further, research has devised fog system architecture that can exploit ensemble learning without increasing latency of the system substantially [62]. Arguably, the stacking approach considered in this article can be rolled out using the architecture and efficient resource allocation mechanism. Thus, despite some increases in complexity, the finding that stacking can outperform single classifiers for counterattacks detection in IoT smart city applications has thus a notable value, especially with missing a cyberattack being linked to a high cost.
For example, the model building time for ten runs for each of the base classifiers (DT, RF and ANN) and the stacking ensemble technique on both datasets are shown in Table 11. The table shows that the model building times by the classifiers are 1.4, 2.17, 6.8 and 25.6 s for the UNSW-NB15 dataset and 5.3, 4.35, 7.4 and 27.09 s for the CICIDS2017 dataset, respectively, which shows that DT takes the least time to build the model for UNSW-NB15 while RF is the fastest for CICIDS2017. On the other hand, since the stacking ensemble model deals with more complexity by combining several base classifiers, it takes longer time to build the model for the both datasets. The time taken to test the model on a single sample by the classifiers is, respectively, 0.48, 2.53, 1.91 and 5.70 µs for UNSW-NB15 and 0.42, 1.57, 1.80 and 4.19 µs for CICIDS2017, suggesting that DT and RF take the least amount of time compared to others in both datasets. Thus, the model takes very little time, in the range of µs, to test whether an activity is malicious or not.

Conclusions
In this paper, we explore the feasibility of an ensemble based learning with single model classifiers for identifying cyberattacks within the IoT-based smart city applications. Our experiments with the most recent IoT attack database show that our ensemble approach, especially stacking, performs better than single models in identifying attacks from benign samples. Our approach employs an information gain based feature selection technique to identify the most influential features before building the model. Furthermore, in classifying attack types, our ensemble approach with stacking also leads to better performance than the single or other ensemble models used in recent works in terms of accuracy, precision, recall and F1-score metrics. Our future work will explore deep learning techniques to further enhance IoT attack detection performance.
Lastly, with automation and smart cities becoming increasingly popular, they are also increasingly being exposed to cyber threats. A denial of access or privacy intrusion within an automated system can greatly harm individual citizens and carry a substantial cost at both individual and jurisdiction levels. There can also be health risks if systems handling emergency events (e.g., accident and fire) are compromised. Our results indicating that stacking of classifiers can better detect cyberattacks in the smart city systems go beyond technical contributions and carry economic and social implications. Future research will provide further insights in this respect.