An Ensemble Tree-Based Model for Intrusion Detection in Industrial Internet of Things Networks

: With less human involvement, the Industrial Internet of Things (IIoT) connects billions of heterogeneous and self-organized smart sensors and devices. Recently, IIoT-based technologies are now widely employed to enhance the user experience across numerous application domains. However, heterogeneity in the node source poses security concerns affecting the IIoT system, and due to device vulnerabilities, IIoT has encountered several attacks. Therefore, security features, such as encryption, authorization control, and veriﬁcation, have been applied in IIoT networks to secure network nodes and devices. However, the requisite machine learning models require some time to detect assaults because of the diverse IIoT network trafﬁc properties. Therefore, this study proposes ensemble models enabled with a feature selection classiﬁer for Intrusion Detection in the IIoT network. The Chi-Square Statistical method was used for feature selection, and various ensemble classiﬁers, such as eXtreme gradient boosting (XGBoost), Bagging, extra trees (ET), random forest (RF), and AdaBoost can be used for the detection of intrusion applied to the Telemetry data of the TON_IoT datasets. The performance of these models is appraised based on accuracy, recall, precision, F1-score, and confusion matrix. The results indicate that the XGBoost ensemble showed superior performance with the highest accuracy over other models across the datasets in detecting and classifying IIoT attacks.


Introduction
Automated network systems have globally adopted the idea of modern technologies in various fields to ease their operations and for the collection of large amounts of big data.The Internet of Things (IoT) is the next level of information technology (IT) development that can be used to connect the world, ranging from a straightforward to a unique application to an IoT-based system.IoT is a collection of integrated devices that are cloud-connected and are used by customers to receive IT services by fusing internet protocol with electronics-related properties [1].The protocols used in IoT systems may include cybersecurity issues [2] that could affect the entire system.The devices connected to the Internet Industrial of Things (IIoT) are open to assault by cybercriminals because they do not have the most basic security measures.That suggests they are vulnerable to hacking and botnet attacks, which are used to launch DDoS attacks against industries [3].
However, it is crucial to identify and effectively categorize cyberattacks that cross these security gaps.Therefore, utilizing an ensemble of ML models, this study attempts to develop an accurate and effective Intrusion Detection system (IDSs) to recognize and categorize cyberattacks on an IoT/IIoT network.The learning-based methodology adopted will use tree-based ensemble classifiers, such as eXtreme gradient boosting (XGBoost), Bagging, extra trees (ET), random forest (RF), and AdaBoost, learned on the seven Telemetry data of TON IoT datasets: Fridge, Thermostat, GPS Tracker, Modbus, Motion Light, Garage Door, and Weather devices datasets.For supervised learning issues, tree-based ensemble models are frequently used [2].The power of ensemble classifiers depends on their capacity to combine many models' predictions to develop an improved model over a single model.When the foundational learners are distinct from one another, tree-based ensemble approaches operate at their best, which can be accomplished through randomization [4] or by employing significantly distinct training procedures for each decision tree.
Greater tree diversity results from randomization in tree growth, which also lowers correlation, i.e., increasing the independence of the decision trees.However, because each classifier in an ensemble technique must be trained, it can be computationally expensive.If there is a huge dataset involved, this cost may increase significantly.As a result, we concentrate on the widely used ensemble of ML models in the literature, particularly XGBoost, due to its efficiency and scalability.There are many different traffic aspects in the IoTs' noisy collected network traffic.Building models for ML-based models takes more time, and because IoT network traffic contains a multitude of features, they have an impact on IDS functionality and performance [5].Feature selection is required to effectively develop cost-efficient and time-safe models for intrusion detection in IoT [6,7].The study used criteria, including accuracy, recall, precision, F1-score, and confusion matrix, to evaluate how well the models performed.
Researchers have created and used various machine learning (ML)-based models, frequently combining them with feature selection methods to perhaps enhance their functionality and performance.Promising outcomes for the identification capabilities of ML have been produced using a set of performance metrics, but, for actual industrial IoT networks, these models are not yet trustworthy.This study strategy is to outperform cutting-edge outcomes for a particular dataset instead of learning more about a ML-based IDS application [8].As a result, there has been far more academic study done than there has been done in other fields where deployments took place.This may result from high errors generated when compared to other fields [8].Hence, these are unreliable for use in a real-world setting.Furthermore, using a single dataset with various features could be difficult to collect or store in a real-time IoT network connection.Besides, when using ML-based methods, their hyper-parameters, in most cases, require optimization for a better result.The optimization of hyper-parameters and feature selection will generally make the ML-based techniques run more efficiently.
The necessity to minimize risk and potential threats to IIoT systems has recently attracted academic interest.Effective IDSs specifically designed for IoT applications must be created.For training and evaluating such IDSs, a current and comprehensive IIoT dataset is needed.For assessing IDS-enabled IIoT systems, however, there are insufficient benchmark IIoT datasets that can be easily accessed or obtained from the internet freely [9,10].This study uses brand-new, data-driven IoT/IIoT real-world datasets to solve these issues.It contains a label feature that separates the attack and normal classes and a feature that categorizes the threat subclasses that attack IoT/IIoT network nodes for issues with several classifications [11].In addition, the TON_IoT dataset contains telemetry information for IoT and IIoT services [12].With various IIoT-based IDS datasets, this study intends to evaluate the generalizability of feature selection techniques and ensemble classifier combinations.
The following summarizes the main contributions of the study:

I
To create the best cyberattack multiclass classification model in IIoT systems, a thorough approach is proposed.

I
This study suggests a feature selection strategy for IDS in the IIoT, utilizing ranked features from the Chi-Square Statistical Model and analyzing the link between feature variance and detection accuracy.I Seven (7) ToN-IoT-based Telemetry datasets were employed to evaluate how well the model performed.In addition, extensive investigations have assessed the performances of an ensemble of ML-based models using these seven datasets.

I
The performance of the ensemble models was verified by comparing them with the baseline research, which used the datasets and other existing approaches that used the same datasets.
The remaining sections of this study are structured as follows: Section 2 presents some related work in IoT/IIoT-based IDSs studies.The materials and methods used for the study are covered in Section 3 of the study.Section 4 outlines the outcomes of the experiment that was conducted.Finally, Section 5 concludes the study and provides future perspectives.

Related Work
This section describes some state-of-the-art research on ML models and IDSs to classify attacks on IoT networks.The idea of intelligent devices, ranging from refrigerators, doors, GPS trackers, etc., is not new.This section recaps some existing works that are related to smart devices.Many researchers have used the IoT to suggest gadgets that can be remotely monitored for various activities.This IoT-based system has various attacks, such as threats at the device, network, or application layers, which can be exploited by an intruder [11].Various attacks can be launched against IoT-based networks, such as malware, SQL injection, scanning, DoS, malware, backdoor, ransomware, eavesdropping, and DDoS, among others, which are a few common cyberattack categories [12].These various attacks can be grouped according to origin and layer.
The processing and analyzing of various methods used for intrusion detection in networks and IoT-based applications play a key role in society.The evaluation of the accuracy and effectiveness of IIoT security solutions relies heavily on the related datasets used, which represent IoT-based operations in the physical realm [12].However, the major issue and challenge in evaluating IDSs specifically designed for IoT/IIoT purposes is the lack of real-world datasets that represent the IoT/IIoT application in the real world.The creation of IIoT-based IDSs is hampered by the lack of such datasets, considering that such strategies should perform well when empirically validated and evaluated [13,14].The authors of [15] reviewed publications based on ML-based and data mining models for IDS classification on cybersecurity.They claimed that a large gap in the literature prevents the development of effective anomaly-based intrusion detection methods since tagged datasets are not readily available.This is mostly because of privacy concerns, as most IoT statistics from big businesses are not shared with the academic community [14].
A novel IoT traffic dataset called "Sensor480," presented by the authors in [16], contains 480 cases with three (3) properties of binary class normal and "Man-In-The-Middle" attacks.Based on this dataset, an IDS system was created and examined using various ML-based models.The dataset was split into 80-20% split ratios, and various performance metrics were used to appraise the proposed models, and DT outperforms other models with 100% performance accuracy.Additionally, authors in [17] presented an IDS based on ML-based ensemble models to recognize various forms of IoT cyberattacks.Using the datasets from IoT-23 [18], IoTDevNet [19], DS2OS [20], IoTID20 [21], and IoT Botnet [22], these models are evaluated based on a variety of performance indicators.With the highest accuracy values on the NSLKDD (99.27%),IoTDevNet (99.97%),DS2OS (99.39%),IoTID20 (99.99%), and IoT Botnet (99.991%) datasets, the outcome demonstrates that Bi-LSTM outperformed other models.However, most of the presented datasets are outdated and do not contain the recent IIoT-based intrusion attacks.The Windows 10 dataset from the ToN IoT [12] was used by authors in [23] to pick the best features.They used the correlation function and the ReliefF method of feature selection schemes.With accuracy scores of 94.12% for the correlation function dataset and 98.39% for the ReliefF dataset, the Medium NN model outperformed other models.The results of the proposed model by the authors show that there is a need for improvement in the areas of IDS accuracy.The model is still very slow and takes a huge part of the computer processor.
To decrease the characteristics of the Linux, Network, and Windows 7 and 10 multiclass datasets of the ToN-IoT dataset, the authors in [24] presented the Chi2 approach and balanced the dataset for the best categorization using the synthetic minority oversampling (SMOTE) approach.They employed various ML-based models, with XGBoost outperforming all others on all datasets according to the numerous performance criteria they used to assess the suggested models.In [25], the authors applied supervised and unsupervised ML over the NF-ToN-IoT-v2 dataset to provide a thorough model of a network IDS (NIDS).It was demonstrated that the technique XGBoost Classifier, which obtained a F-Score of 98.8%, produced the best results when supervised learning was used, as implemented by Azure automated ML (AML).The random forest classifier, with a F-Score of 98.6%, produced the greatest results when a specially designed automated ML (AE2EML) was used.The suggested ML-based NIDS obtained a Silhouette score of 0.553, a Calinski-Harabasz index of 1533106, and a Davies-Bouldin index of 0.631 using clustering with PCA (Principal Component Analysis), performed by PyCaret-automated ML.The proposed model by the study performed excellently, but it used old datasets that did not contain recent IIoT-based network attacks.
By examining the applicability of ML-based algorithms in the detection of abnormalities within the data of such networks, the authors of [26] concentrated on the security element of IoT networks.It investigates ML algorithms that have been effectively applied in circumstances that are comparable to one another and contrasts them using a variety of factors and techniques.The RF algorithm produced the best results, with a 99.5% accuracy rate.The authors of [27] presented an IDS with an ensemble classifier enabled by a feature selection classifier.The study utilized the Correlation Coefficient (CC) method for feature selection before classifying the dataset for the detection of various attacks using various classifiers, such as NB, DT, and ANN.On the UNSW-NB 15 datasets, the system detected DoS assaults with an accuracy of 98.54% with a classifier ensemble that uses a subset of the features.The dataset used to test the model is not an IIoT-based network nodes attacks dataset and does not employ feature selection methods to remove the irrelevant features from the dataset used, and the issue of imbalance data is not considered, thus reducing the performance of the model.
The authors in [28] used the top 13 IG characteristics with the C5 classifier to obtain improved accuracy of 89.76% and a better FAR of 1.68.The study recommended that IG be used for choosing features for IDS.The top ten ranked IG attributes were used in the system to create a greater accuracy of 93.23% with 6.77% FAR.The authors obtained six reduced features in [29] using the multi-objective feature selection method on the CICIDS 2017 dataset.The system delivered an accuracy of 99.90% using an ELM classifier.To detect cyberattacks, the study authors in [30] suggested using LSTM networks enabled with parameter optimization, called Stochastic Gradient Descent (SGD), for the creation of IDS.The study obtained an accuracy of 99.91% for ISCX and 98.22% for AWID datasets, respectively.
The top 10 attributes of the GR technique were used in work by authors in [31], and their layer design was validated on a generated dataset.In contrast to previous rules and tree-based learners, the design performed better with the J48 classifier for recognizing DoS assaults.A decision tree-based multi-layer framework to identify DDoS attacks was provided in the study of authors in [32].The system recognized ICMP, TCP, and UDP flood attacks on a created dataset, with an accuracy of 99.98%, using eight features that were explicitly picked.The authors in [33] utilized nature-inspired techniques for feature selection with forecasting and chaos methods.The performance of the model was evaluated using the NS-3 created model.For the identification of DoS assaults at the transport and application layers, the approach obtained a detection rate (DR) of 94.3%.The authors employed the wrapper feature selection approach in [34] for feature selection in IDS.The study performance was tested using the honeypot Cowrie dataset, with various cyberattacks, with an accuracy of 97.4% using the SVM classifier.
The effectiveness of the PCA and the results obtained without it were compared by the authors in [35].Prior to being used in various ML-based techniques, the dataset was first submitted to Principal Component Analysis (PCA) for feature selection.This experimental investigation demonstrates that utilizing PCA reduces algorithm execution time greatly, with a smaller number of features, while producing the same results as not using PCA.In addition, when compared to SVM, the DT and RF algorithms accurately classified DDoS packets.Matplotlib was used to create a graph to display the results.The IoT-23 dataset was used for our experimental analysis.The authors in [36] developed an IDS model based on a hybrid AI model for the classification of attacks for an IoT-based system.The CIC-IDS2017 and UNSW-NB15 datasets were used to evaluate the performance of the suggested model.The model fared better, with a detection rate of 99.75% and an accuracy of 99.45%.
The authors of [6] presented a hybrid rule-based feature selection DL-based IDS paradigm for IIoT to train and validate data extracted from TCP/IP packets.A hybrid rule-based feature selection and deep feedforward neural network model were used to implement the training procedure.NSL-KDD and UNSW-NB15, two well-known network datasets, were used to test the suggested approach.According to the findings of the performance comparison, the suggested strategy outperforms other pertinent methods in terms of accuracy, detection rate, and FPR by 99.0%, 99.0%, and 1.0%, respectively, for the NSL-KDD dataset, and by 98.9%, 99.9%, and 1.1%, for the UNSW-NB15 dataset.The recommended method is suitable for IIOT intrusion network attack classification, according to simulated trials utilizing a variety of assessment metrics.
The authors of [37] proposed the RDTIDS intrusion detection system (IDS) for IoT networks.The RDTIDS integrates multiple classifier methodologies, such as REP Tree, JRip algorithm, and Forest PA, which are based on decision tree and rules-based principles.The first and second methods specifically classify the network traffic as attack/benign by using features from the data set as inputs.The outputs of the first and second classifiers are used as inputs for the third classifier, together with characteristics from the initial data set.The extensive experiments demonstrate the proposed IDS' effectiveness over existing state-ofthe-art schemes in terms of accuracy, detection rate, false alarm rate, and time overhead.These findings were made using the CICIDS2017 dataset and the BoT-IoT dataset.
Authors in [38] suggested a novel ensemble of Hybrid IDSs for IoT device security by fusing a C5 classifier and a One-Class Support Vector Machine classifier.The benefits of Signature IDS and Anomaly-based IDS are combined in HIDS.With high detection accuracy and low false-alarm rates, this system seeks to identify both known intrusions and zero-day threats.The Bot-IoT dataset, which includes legal IoT network traffic and various assaults, is used to assess the proposed HIDS.Studies reveal that, compared to SIDS and AIDS approaches, the proposed hybrid IDS offers a higher detection rate and a reduced percentage of false positives.
To identify out-of-norm actions for cyber threat hunting in the IIoT, the authors of [39] presented an ensemble DL-based model that combines LSTM with the Auto-Encoder (AE) architecture.Additionally, most of the prior literature did not consider the uneven nature of IIoT datasets, which led to low accuracy and performance.The suggested approach takes fresh, balanced data from the unbalanced datasets and feeds these new balanced data into the deep LSTM AE anomaly detection model to resolve this issue.In addition, the advanced related models Stacked Auto-Encoders (SAE), Naive Bayes (NB), Projective Adaptive Resonance Theory (PART), Convolutional Auto-Encoder (C-AE), and Package Signatures (PS) based LSTM (PS-LSTM), are compared to the proposed ensemble model.
In the reviewed literature, it was observed that almost all the studies used ISCX, CICIDS, UNSW-NB15, and KDD Cup 199, which are non-IoT/IIoT-based datasets.They are datasets for network intrusions that contain HTTP DoS assaults.The IEEE 802.11-relatedMadiun Access Control (MAC) Layer attacks are part of the AWID dataset.This study acquired datasets containing network traffic, operating system traces, and IoT telemetry data from diverse IoT/IIoT source materials.Additionally, the suggested dataset includes various valid and malicious IoT-related events, incorporating the reality of attacks and legal occurrences.
This study proposes a feature selection-based IDS enabled with various ensemble classifiers for detecting several attacks in an IIoT-based network.Very little research using the TON-IoT dataset is shown in this review.When the ML-based ensemble model was compared with the baseline findings, it was discovered that the frequently misclassified assaults are not discussed.The proposed ensemble classifiers enabled with feature selection will be applied to the IoT telemetry datasets, and the results of the proposed models will be compared with the baseline analysis.

Materials and Methods
This section describes a robust framework to detect and classify cyberattacks on IoT network trails.The ensemble classifier process in various successive steps is displayed in Figure 1, and preprocessing is the first step.At this stage, the dataset is explored for the number of instances, the number of features, the relationship between the features, the correlation between the features, etc.The details of the dataset used for the performance evaluation were discussed, followed by the details of the performance metrics used for evaluation purposes.Finally, a training and testing dataset was created from the cleaned dataset.The ensemble models use the training set to learn, while the test set is used to assess the performance of the model.

Data Preprocessing
Data preprocessing is a serious first step in streamlining the training of ML models.For the purpose of research, all datasets are openly accessible for download.To minimize storage space requirements and to prevent redundancy, duplicate samples (flows) are

Data Preprocessing
Data preprocessing is a serious first step in streamlining the training of ML models.For the purpose of research, all datasets are openly accessible for download.To minimize storage space requirements and to prevent redundancy, duplicate samples (flows) are eliminated.The flow identifiers, IP addresses, ports, and timestamps are eliminated to eliminate forecast bias against the attackers within end network nodes.Then, using a categorical encoding approach, numerical values are assigned to the strings and nonnumeric characteristics.The features in these datasets include protocols and services, which have been compiled as their native string values, as well as ensemble classifiers.However, these are built to function effectively with numerical data.
Hot encoding and label encoding are the two primary methods for encoding the features.The former adds X features to a feature to convert it into X categories, utilizing 0 to indicate that a category is not present and 1 to indicate it is.Nevertheless, this enhances the dataset's dimensionality, which could impact the ML models' effectiveness and performance.Hence, each category is converted to an integer using the label encoding technique.
Categorical features were converted to numerical values for straightforward ML technique application.For instance, the categorical values "open" and "closed" for the door state feature from the GarageDoor dataset were converted into "0" and "1".Furthermore, duplicate, incompatible, and missing values were effectively handled.Additionally, this process permits the equal weighting of all features because network traffic properties are complex, and there are higher numbers than others.This could cause the ensemble model to weigh them more heavily, so it will pay attention to them.The min-max scaler uses Equation (1) to calculate all values for each feature, where X * is a new feature value between 0 and 1, and X represents the unique feature value, where the feature maximum and minimum values are X max and X min , respectively.Segments for training and testing are separated from the dataset, and these components are categorized according to the label features, which are crucial given the class imbalances of the datasets.

The Chi-Square Statistical Feature Selection Model
This method was used in this study to select the most relevant features.The two most prominent variables are usually involved in using this model for feature selection.Typically, they relate to the likelihood of occurrence of category C based on the likelihood of occurrence of feature t.In IDS classification, it is considered whether attributes t and C are independent in the proposed approach.Unless features t and C contradict, characteristic t cannot be used to determine if a label falls under category C. It might be difficult to determine the degree of t and C in training, especially if they are not linked.Therefore, their relevance can be evaluated using the Chi-square test.Using a statistical method called Chi-square, it is possible to quantify the connection between feature t and category C. A bidirectional queue was used to express a label feature called t and a category called C.
Assuming feature t and type C, then the first-order degree of freedom chi-square distribution matches.The higher the category C chi-square score, the more category labels the feature holds.Therefore, t and C j have a lot in common.The feature t category C chi-square score is then defined as follows: The solution to Equation (2) demonstrated the relationship between feature t and category C j .The more autonomous members of the class category, C j , will be the feature t that matters.When CH I(t, C i ), then the label class, C j , and feature, t, are independent.You can calculate the value for one class, symbolized by X 2 (t, c), using Equation (3).However, by combining all of the classes of the value in feature label t in X 2 (t, c), then, for each characteristic of instance t across all classes, we first determine the X 2 (t, c).The number of m classes is then determined by testing feature t for each unique X 2 (t, c) score: Equation ( 3) is used to calculate the mean X 2 (t, c), the score for the feature label t across all classes. ℵMAX For all classes, the maximum X 2 (t, c) of a feature, the label, is determined using Equation ( 4).The threshold value is used to determine the appropriate number of feature labels after the feature label has been sorted by the X scores.

Machine Learning Model
This sub-section discusses the ensemble ML-based models used for detecting attacks in IIoT-based networks.
(1) Extreme Gradient Boosting (XGBoost) XGBoost is a modified gradient tree-boosting algorithm that is efficient and scalable.The optimization problem in ensemble algorithms can be solved using the boosting classifiers, where one weaker learner is added in succession to create a new model to lower the classifier loss function and to progressively reduce the mistakes of earlier models [40].The exemplary features of the algorithm proposed by the authors in [41] are the regularized model, split-seeking algorithm, column block structure, and cache-aware prefetching algorithm.Some current applications of XGBoost include genre classification of Nigerian songs [42], predicting stock price [43], and forecast gene expression value [44].
(2) Bagging Classifier A group meta-learner is the Bagging classification algorithm.The approach creates a large number of learners by training each unique base learner on a random subset of the actual dataset.The classifier then estimates the final prediction by averaging the results of all the models [45].This algorithm averages the probability values of base learners for regression tasks and applies the majority voting scheme to classify labels for the classification tasks.This algorithm starts by resampling the training data with replacements.This means that some instances may be selected again and again, while others may not.The strength of this meta-estimator is the reduction in the variance of the base learner by introducing randomness into the ensemble construction and generation method.Concurrent training is conducted on the randomly selected subset of the training set with the base learners using substitution using the initial dataset.Each base classifier's training dataset is distinct from the datasets of the others.
(3) Random Forest (RF) RF is a group of weak base learners that functions by building various collections of decision trees to enhance the DTs' effectiveness and resilience [46].This technique combines the bagging approach of instance sampling with the random selection method for features in creating a collection of DTs with a controlled variation.To complete the classification task of an unlabeled instance, each DT in a set acts as a base learner.The algorithm uses majority voting for the classification task and probability averaging of instance values from the regression task.The RF algorithm is immune to noise and over-fitting and has been applied to several domains, including heart disease classification [47] and label ranking [48].
(4) Extremely Randomized Trees (Extra Trees) Extra Tree is a collection of ML-based models that combine the classifications from several unpruned DTs on different sub-samples of the target to enhance generalization accuracy, being computationally efficient and preventing over-fitting [49].The entire training instance is used to grow trees, and the nodes at each tree are split by selecting the cut points fully at random.These predictions are made by using a majority voting scheme for classification tasks or averaging prediction values for regression tasks.
(5) Adaptive Boosting (AdaBoost) AdaBoost is an ensemble of ML models adopting the boosting method by joining many weak learners to create a new model using the weighted linear combination method iteratively.To reweight examples of the real train data, it progressively uses a learning algorithm [50].Firstly, all instances are assigned the same weight.Weights are increased for cases that were incorrectly classified, while they are raised for instances that were correctly classified.This procedure is iterated continually by new weights of the training data on the base model.Finally, a linear combination of all the models generated through the various iterations is used to create the final classification model [51].This algorithm's weakness is that it is sensitive to anomalies and noisy data.

Dataset
The study datasets include seven (7) ToN-IoT (https://cloudstor.aarnet.edu.au/plus/s/ds5zW91vdgjEj9i?path=%2FProcessed_datasets%2FProcessed_Network_dataset (accessed on 5 June 2022)) datasets obtained from Telemetry.The IoT/IIoT-based network testbed was used to generate various operating systems and Network data.These 7 datasets were generated from various IIoT-based devices, such as GPS_Tracker, Weather, Garage_Door, Modbus, Fridge, Thermostat, and Motion_Light.Table 1 presents all features of the seven (7) datasets, such as the smart fridge device, which measures the temperature and its adjustments below the on-demand.Based on a probabilistic input, the features of a remotely activated garage door when opened or closed.The components and features are based on the Global Positioning System (GPS) device, which tracks the geographical coordinates of a remote object.The features obtained are from the smart sense motion device.This uses a pseudo-randomly generated signal to either "on" or "off" the light.The features generated from the register in the Modbus service device are majorly used for industrial applications.These devices communicate via a master-slave arrangement.The characteristics of a smart thermostat regulate a system's temperature by controlling the heating/cooling system, such as the air conditioner.The dataset of a weather monitoring system creates features, such as temperature, air pressure, and humidity in the data.These ToN-IoT datasets were commonly labeled into binary categories of 'normal' or under 'attack'.The 'attack' class is also further divided into seven (7) subclasses-Scanning, password, DDoS, injection, ransomware, Cross-site Scripting (XSS), and backdoor.The scanning class occurs at the initial stage, where the information about the target system is obtained by the attackers [8,52] using a scanning tool, such as Nmap [53] or Nessus [54].The DoS attack [8,52] adopts the flooding strategy, where the attacker blasts off successive malicious attacks against a genuine user to disrupt their right to access service, while DDoS blasts off enormous successive connections to deplete the resources of the device memory, CPU, etc.These two similar attacks are usually blasted off by a vast network of hacked computers known as bots or botnets [8,55].The Ransomware attack [56] is a classical kind of malware that holds the access right of an authentic user to a system or service to ransom by encrypting their access and attempting to transfer the decryption key to restore the original user's access to the service or system.
The Backdoor attack [57] is a passive attack that uses backdoor software to give an opponent unauthorized remote access.The competitor utilizes this backdoor to manage the infected IIoT devices and to incorporate them into botnets to launch a DDoS attack [57].The Injection attack [57,58] often attempts to execute vindictive codes or implant vindictive data into the IIoT network to disrupt normal operation.Cross-Site Scripting (XSS) [58] often tries to run vindictive commands on a seb server in the IIoT applications.The XSS lets the attacker insert random web scripts remotely into the IIoT system.The information and the authentication procedure between IIoT devices and the remote web server may be compromised by this attack.A typical Password Cracking Attack [59] occurs when a rival applies password-cracking techniques to figure out an IIoT device's passcode.The attacker will bypass the authentication system and compromise the IIoT devices [57].A common network attack that might disrupt the communication link between two devices is the MiTM attack [13], which could alter their data.Examples of MiTM attacks include ICMP redirect, ARP Cache poisoning, and port theft [12].The datasets and their detailed descriptions are presented in Tables 1-7.The seven (7) datasets have login dates for the IoT Telemetry data, login times for the IoT Telemetry data, and the record of the binary label of normal and attacks, where '0 represents normal and '1 represents attacks.Where true positive (TP) is the proportion of instances of "attack" that are actually and correctly identified.True negative (TN) is the proportion of legitimately designated "normal" instances that occur.False positive (FP) refers to the percentage of actual "normal" samples that are mistakenly identified as "attack," while a false negative (FN) refers to the percentage of actual "attack" samples that are mistakenly classed as "normal".Table 2 gives details of each attack and the normal of the multi-class label of the entire dataset.The datasets are referred to as "ToN IoT", since they comprise a variety of data sources, including Windows 7 and 10 operating system datasets, Ubuntu 14 and 18 TLS, and network traffic datasets, as well as telemetry datasets of IoT and IIoT sensors.The datasets were gathered from a large-scale, realistic network created at the UNSW Canberra @ Australian Defence Force Academy (ADFA) of Cyber Range and IoT Labs, School of Engineering and Information Technology (SEIT).The industrial 4.0 network, which consists of the IoT and IIoT networks, has a new testbed network.To manage the connection between the three levels of IoT, Cloud, and Edge/Fog systems, the testbed was deployed, utilizing several virtual machines and hosts of Windows, Linux, and Kali operating systems.On the IoT/IIoT network, several hacking methods, including DoS, DDoS, and ransomware, are used against web apps, IoT gateways, and computer systems.Network traffic, Windows audit traces, Linux audit traces, and telemetry data from IoT services were among the datasets collected in parallel processing to capture various regular and cyberattack events.

Performance Indicators
Many different performance indicators were used to assess the performance and effectiveness of ML models on the different datasets.Some commonly used indicators, which will also be adopted for this study, are confusion matrix, ROC_AUC, F1_score, recall, precision, and accuracy [60].The confusion matrix is a table shown in Table 3, representing the detection rate of classes of dataset, thereby measuring the performance of an ML model on the test data.
The ROC_AUC indicates the tradeoff between True Positive Rate (TPR), or recall, and FPR, as shown by Equation (5).False Positive Rate is the percentage of 'normal' class instances wrongly classified as an 'attack' class, as shown by Equation ( 6).The accuracy assessment calculates a model's overall effectiveness as a percentage of all "normal" data and the various "attack" incidents that were correctly classified, as shown by Equation (7).The recall assessor indicates the percentage of 'attacks' instances that were properly detected in the test dataset, as indicated by Equation (8).In contrast, the precision assessor indicates the percentage of properly detected 'attack' instances of all the detected 'attacks', as indicated by Equation ( 9).Finally, the f1_score estimates the harmonic mean of precision and recall, as indicated by Equation (10).

Experimental Results and Discussions
This section presents the experimental result performed on five (5) different machine learning models: Bagging, XGBoost, Random Forest, ExtraTrees, and AdaBoost on seven (7) IoT/IIoT-based datasets.The candidate assessment methods on the models are accuracy, F1-score, precision, and recall assessor.

Experimental Results Based on the Proposed Model
The experimental findings for per-device datasets are presented in this section.The 70-30 train-test data split ratio was applied to all five ensemble models used in the study.The final result was calculated and displayed as the mean value of all evaluation methods.Table 4 presents the mean values of the accuracy, F1-score, precision, and recall metrics for the proposed ensemble models applied to the IIoT_Fridge sensor device dataset.For this dataset, the XGBoost classifier outperforms all other models, with 0.9873 for accuracy, 0.9801 for precision, 0.9942 for recall, and 0.9869 for F1_Score, respectively.Conversely, the worst classifier is the AdaBoost 'Ada', with 0.4909 for accuracy, 0.1299 for precision, 0.2589 for recall, and 0.1728 for F1-Score, respectively.
Table 5 demonstrates the mean values of the accuracy, F1-score, precision, and recall metrics for the proposed ML-based ensemble models applied to the IIoT_ Thermostat sensor device dataset.For this dataset, the XGBoost classifier outperforms all other models, with 0.9883 for accuracy, 0.9882 for precision, 0.9950 for recall, and 0.9915 for F1_Score, respectively.Conversely, the worst classifier is the Ada, with 0.5305 for accuracy, 0.3557 for precision, 0.3130 for recall, and 0.2837 for F1-Score, respectively.
Table 6 presents the mean values of the accuracy, F1-score, precision, and recall metrics for the candidate ML models applied to the IIoT_GPS_Tracker sensor device dataset.For this dataset, the XGBoost classifier outperforms all other models, with 0.9869 accuracies, 0.9780 precision, 0.9895 recall, and 0.9836 of F1_Score metrics.Conversely, the Ada classifier is the worst of all the models, with 0.4738 accuracy, 0.0592 precision, 0.1250 recall, and 0.0804 of F1_Score, respectively.
Table 7 illustrates the mean values of the performance metrics evaluation for the MLbased ensemble models applied to the IIoT_Modbus sensor device dataset.For this dataset, the XGBoost classifier still performs excellently when compared with other classifiers, with an accuracy of 0.9913, precision of 0.9864, recall of 0.9895, and F1_Score of 0.9879, respectively, for all the performance metrics used in the study.The worst classifier is the Ada, with an accuracy of 0.6292, precision of 0.1049, recall of 0.1667, and F1_Score of 0.1287, respectively.
Table 8 presents the mean values of the performance metrics evaluation for the MLbased ensemble models applied to the Motion_Light device dataset.For this dataset, the XGBoost classifier still performs excellently when compared with other classifiers, with an accuracy of 0.9719, precision of 0.9531, recall of 0.8957, and F1_Score of 0.9030, respectively.The worst classifier is the Ada, with an accuracy of 0.4695, precision of 0.1144, recall of 0.2241, and F1_Score of 0.1515.Table 9 shows the results of the performance evaluation metrics in terms of mean values for the ML-based ensemble models applied to the IIoT Garage_Door sensor device dataset.For this dataset, the XGBoost classifier outperforms all other models, with an accuracy of 0.9846, precision of 0.9796, recall of 0.9902, and F1_Score of 0.9847, respectively.The worst classifier is the Ada, with an accuracy of 0.4715, precision of 0.0589, recall of 0.1250, and F1_Score of 0.0801, respectively.Table 10 presents the mean values of the accuracy, F1-score, precision, and recall assessor for the candidate ML models applied to the Weather device dataset.The XGBoost classifier outperforms all other models for this dataset, with an accuracy of 0.9878, precision of 0.9788, recall of 0.9896, and F1_Score of 0.9840, respectively.The Ada classifier is the worst among all the classifiers, with an accuracy of 0.5311, precision of 0.0664, recall of 0.1250, and F1_Score of 0.0867 metrics, respectively.Summarily, it is observed from the outcomes presented in Sections 4.2 and 4.3 that the XGBoost classifier gave a superior performance score for all assessment indicators across all the datasets.In contrast, the Adaboost ensemble classifier performed worst based on the assessment indicators across all datasets.Based on the recall rate of all models across all datasets, the XGBoost classifier performed the least on the 'Motion_Light" dataset, with the least value of 0.8957.
Based on the baseline study, the proposed study also evaluates IoT/IIoT-based dataset by combining all of the individual datasets for each device into the collective IoT dataset.Since most real-time apps save their data in a single location, this may represent some real situations.The combined IoT dataset assessed the proposed ensemble classifiers for binary and multi-class classification issues.It is worth mentioning that most studies that use the datasets combine the whole datasets to become one before applying classifiers to them.Table 11 shows the results of the binary classification of the ensemble classifiers enabled with feature selection mothed.Both the overall IoT dataset and the per-device IoT dataset were used to test the ensemble classifiers.The proposed ensemble classifiers, enabled with the feature selection model, were also evaluated using the performance measures.The findings are summarized in Table 11: XGBoost receives the maximum accuracy rating of 1.0 and scores about 98% across the other performance metrics.From the results obtained, it was noted that all the classifiers scored about 97% across all the performance metrics except Adaboost, which scored less than 70% across all the performance metrics.The performance can be attributed to the application of feature selection to remove the irrelevant features before the classification of the dataset using the ensemble classifiers.

Results Based on Confusion Matrix
The confusion matrix of the XGBoost algorithm for all datasets was considered in this study.This is the only model considered because of its superior performance.This further analysis is desired to show the commonly misclassified attacks.Only 20% were used as test sets for each of the datasets.It could be observed from Figure 2a-g that the 'normal' class is often misclassified from all classes, and major misclassification occurs between the 'normal' class and the 'backdoor' attack class for all the datasets.For Figure 2a, the misclassification of the 'normal' class also occurs with 'DDoS', 'ransomware', 'password', and 'injection' attack classes, respectively.Incorrect classification also occurs between 'password' and 'ransom' attacks.This pattern could be observed through all the confusion matrices for all datasets.The confusion matrix of the XGBoost algorithm for all datasets was considered in this study.This is the only model considered because of its superior performance.This further analysis is desired to show the commonly misclassified attacks.Only 20% were used as test sets for each of the datasets.It could be observed from Figure 2a-g that the 'normal' class is often misclassified from all classes, and major misclassification occurs between the 'normal' class and the 'backdoor' attack class for all the datasets.For Figure 2a, the misclassification of the 'normal' class also occurs with 'DDoS', 'ransomware', 'password', and 'injection' attack classes, respectively.Incorrect classification also occurs between 'password' and 'ransom' attacks.This pattern could be observed through all the confusion matrices for all datasets.

Results Based on ROC Curve
Figure 3a-g show the ROC curve for the XGBoost ensemble ML on the ToN-IoT datasets, respectively.This displays a model ensemble with a ROC curve near the upper left corner and strong separability.The probability value of this assessor ranges between 0 and 1.A good value for this assessor will be closer to 1. Hence, the ROC curve of the XGBoost algorithm for all datasets considered in this study is displayed in Figure 3, respectively.This is the only model considered because of its superior performance.This further analysis is desired to show the commonly misclassified attacks.For each of the datasets, only 20% were used as a test set.

Results Based on ROC Curve
Figure 3a-g show the ROC curve for the XGBoost ensemble ML on the ToN-IoT datasets, respectively.This displays a model ensemble with a ROC curve near the upper left corner and strong separability.The probability value of this assessor ranges between 0 and 1.A good value for this assessor will be closer to 1. Hence, the ROC curve of the XGBoost algorithm for all datasets considered in this study is displayed in Figure 3, respectively.This is the only model considered because of its superior performance.This further analysis is desired to show the commonly misclassified attacks.For each of the datasets, only 20% were used as a test set.

Comparative Study with the Baseline Model
To assess how the proposed model performs with the baseline models, the proposed model and baseline models are placed side by side.Table 8 shows the comparison of the proposed model with the baseline model.In [3], CART performs better when compared with other ML models used for the classification of the dataset with 88.0%, and LR and SVM models have the least accuracy, with 61.0% each.CART has the overall best performance across all the performance metrics used to evaluate the datasets.Therefore, it can be said that the proposed model, using feature selection with ensemble classifiers, performs better than the baseline models.In addition, the computational time of the proposed models is very fast, since the number of parameters used is reasonably reduced compared with the baseline model.Table 12 shows the comparison of the proposed model with the baseline model.

Comparative Study with the Baseline Model
To assess how the proposed model performs with the baseline models, the proposed model and baseline models are placed side by side.Table 8 shows the comparison of the proposed model with the baseline model.In [3], CART performs better when compared with other ML models used for the classification of the dataset with 88.0%, and LR and SVM models have the least accuracy, with 61.0% each.CART has the overall best performance across all the performance metrics used to evaluate the datasets.Therefore, it can be said that the proposed model, using feature selection with ensemble classifiers, performs better than the baseline models.In addition, the computational time of the proposed models is very fast, since the number of parameters used is reasonably reduced compared with the baseline model.Table 12 shows the comparison of the proposed model with the baseline model.

Comparison with other Existing Models Using the Same Dataset
To emphasize how crucial, it is to use feature selection on the dataset before applying classification models, the baseline and other existing techniques were utilized to compare the proposed model to them.Table 13 compares the outcomes obtained from IDS models proposed with other existing state-of-the-art models based on ToN-IoT datasets.Each row of Table 13 shows a group of various ML-based models from some other notable studies.The use of the ToN-IoT dataset is quite recent in the study of IDSs.Hence, the number of studies associated with network security is very few in this dataset.Worthy of being mentioned is the work of [3], which is the baseline study, where eight (8) ML models were considered.Still, LSTM performed better on four (4) of the datasets.CART performed best on two (2) datasets, and k-NN performed best on one (1) dataset based on accuracy metrics.In [36], the authors used six different ML-based models to classify the dataset, and the study results revealed that the RF classifier performed better on five (5) datasets.In contrast, DT performed best on two (2) datasets.The authors in [37] make use of two ML-based models for the classification of the datasets, and the results show that the VC classifier performed better in four (4) datasets, while RF did very well in two (2) datasets.In a similar work by authors in [38], they used six (6) ensemble classifiers for IDS detection.The CB classifier has the best accuracy across all the types of datasets in the TON_IoT Telemetry Dataset used to test the performance of the classifiers used.
Therefore, the comparison in Table 13 revealed that the proposed model performs reasonably better in terms of accuracy when compared with other existing classifiers and the baseline model.As a result, the model performs best when applied to a real-world IIoT ecosystem that contains vast amounts of unstructured and unlabeled datasets.Furthermore, feature selection considerably reduces the computational time needed to process the dataset compared to the baseline.Consequently, data dimensionality is automatically reduced, and high-level functioning is examined efficiently and precisely.Although Table 13 shows that our results appear to be comparable to other research in the field, our suggested approach has been examined on a more pertinent dataset using feature selection to see how the dataset will respond to the model.So, compared to other relevant research, our results would be more trustworthy.

Conclusions
In order to protect the IIoT environment from outside attackers and intruders, several IDSs techniques, linked with IIoT-based network traffic, have been proposed and have emerged as essential parts of the technology for proper protection from outsiders.When used, in conjunction with ML-based classifiers, big data as a potent tool for studying massive amounts of data to safeguard IIoT equipment.The technologies have shown to be beneficial for IioT-based systems security measures.Industrial Automation and Control Systems and conventional IT systems are fundamentally different in how they counter cyberattacks, yet these differences are distinct.Security for the IioT must, therefore, be given specific consideration.Therefore, this study attempts to build an efficient multiclass IDS system based on ML-based ensemble models: XGBoost, Bagging, Random Forest, Extra Trees, and AdaBoost, based on seven (7) paremeters, were used for the Telemetry dataset of ToN_IoT datasets.An empirical experiment is performed on the dataset.The outcome, based on a comparative study, indicates that the proposed model performs excellently, and XGBoost performed superior to other models.The outcomes from the analysis showed that the proposed system could effectively and accurately classify different attacks.One of the major limitations of the proposed model is the inability to deal with the class imbalance that arises from the datasets used to test the performance of the proposed model.Therefore, future work will make use of imbalanced algorithms to balance the dataset.This will enable us to know if the imbalance will affect the performance of the proposed model.Future work will further focus on applying deep learning models to optimize their hyper-parameters to improve the dataset classification performance for the IDS.The proposed model will be applied to other IIoT-based datasets.

24 Figure 1 .
Figure 1.The proposed workflow for the classification of cyberattacks in an IoT network.

Figure 1 .
Figure 1.The proposed workflow for the classification of cyberattacks in an IoT network.

Figure 2 .
Figure 2. (a-g) XGBoost confusion matrix for all datasets considered in this study.

Figure 2 .
Figure 2. (a-g) XGBoost confusion matrix for all datasets considered in this study.

Figure 3 .
Figure 3. (a-g) XGBoost ROC curve for all datasets considered for this study.

Figure 3 .
Figure 3. (a-g) XGBoost ROC curve for all datasets considered for this study.

Table 1 .
The IoT Telemetry dataset feature descriptions.

Table 2 .
Record of the multi-class label of various actual attacks and normal.

Table 3 .
The Confusion Matrix.

Table 4 .
The classification report for the IIoT Fridge sensor device dataset.

Table 5 .
The classification report for the IIoT Thermostat dataset.

Table 6 .
The classification report for the IIoT GPS_Tracker dataset.

Table 7 .
The classification report for the Modbus dataset.

Table 8 .
The classification report for the Motion_Light dataset.

Table 9 .
The classification report for the IIoT Garage_Door dataset.

Table 10 .
The classification report for the Weather dataset.

Table 11 .
Combined_IoT_Dataset Evaluation of Binary Classification Models.

Table 12 .
The proposed model is compared with the baseline model.

Table 12 .
The proposed model is compared with the baseline model.

Table 13 .
Accuracy comparison of existing results based on ToN-IoT datasets.