SDN-Enabled FiWi-IoT Smart Environment Network Trafﬁc Classiﬁcation Using Supervised ML Models

: Due to the rapid growth of the Internet of Things (IoT), applications such as the Augmented Reality (AR)/Virtual Reality (VR), higher resolution media stream, automatic vehicle driving, the smart environment and intelligent e-health applications, increasing demands for high data rates, high bandwidth, low latency, and the quality of services are increasing every day (QoS). The management of network resources for IoT service provisioning is a major issue in modern communication. A possible solution to this issue is the use of the integrated ﬁber-wireless (FiWi) access network. In addition, dynamic and efﬁcient network conﬁgurations can be achieved through software-deﬁned networking (SDN), an innovative and programmable networking architecture enabling machine learning (ML) to automate networks. This paper, we propose a machine learning supervised network trafﬁc classiﬁcation scheduling model in SDN enhanced-FiWi-IoT that can intelligently learn and guarantee trafﬁc based on its QoS requirements (QoS-Mapping). We capture the different IoT and non-IoT device network trafﬁc trace ﬁles based on the trafﬁc ﬂow and analyze the trafﬁc traces to extract statistical attributes (port source and destination, IP address, etc.). We develop a robust IoT device classiﬁcation process module framework, using these network-level attributes to classify IoT and non-IoT devices. We tested the proposed classiﬁcation process module in 21 IoT/Non-IoT devices with different ML algorithms and the results showed that classiﬁcation can achieve a Random Forest classiﬁer with 99% accuracy as compared to other techniques.


Introduction
The world has seen the incredible growth in the Internet as a global communication infrastructure in recent decades. The wired and wireless Internet revolutionized the telecommunications paradigm to enable communication with anyone, "anytime." The emerging Internet of Things (IoT) is creating another paradigm, in which "anything" can be accessed and/or controlled remotely, allowing for a more direct coordination between the physical world and machines-based systems [1]. IoT refers to billions of Internet-connected physical devices worldwide, collecting and sharing data. There will be a rapid increase in the number of different pieces of IoT equipment, as sensors and actuators are widely used in many applications, such as cyber security, automation, metering, health care, utilities and consumer electronics. Gartner predicts that, by 2022, the typical household could contain more than 500 smart devices [2]. Further, the world's IoT devices are expected to reach 18 billion by 2022 [3]. The rapid development of IoT devices presents a new problem of network allocation on the current Internet, especially the "last mile" access network, which has long been recognized as a major bottleneck in delivering high-speed internet service. application data (e.g., QoS requirements); (2) programmability without the need to manage individual infrastructure elements, i.e., it is possible to proactively program OF switches on the data plane; (3) openness, in which data plane components (i.e., OpenFlow switch-es) communicate with the controller via a single interface for data plane programming and network information gathering, regardless of vendor; (4) various flow table pipelines in OF switches will increase the versatility and effectiveness of flow management. Furthermore, network traffic classification can help improve the performance and QoS of IoT devices. The major progress in SDN and ML methods has created a new network management era. By utilizing a global view, SDN's centralized feature is used to manage the network. Thereby, the performance of the FiWi-IoT network is improved by optimizing bandwidth usage, load balancing, and minimizing latency. SDN's abilities make the application of machine learning techniques easier [10]. Recent advances in computing technologies (such as Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU)) provide us with a good opportunity to contribute machine learning techniques. The SDN Controller has quite a global view of the network and is capable of collecting numerous network traffic data, enabling machine learning algorithms to be applied [11]. This new concept combines network intelligence and network programming to create an autonomous high-performance network that will expand 5G capabilities. In recent decades, artificial intelligence (AI) and ML concepts have been developed for a variety of applications using a variety of approaches [12,13]. Further, the AI/ML approaches are based on recent statistics. Integrating such tools into the networking industry could enable service providers to set up self-configuring, self-healing, and self-enhancing networking. This type of network can be named a knowledge-defined network. Therefore, multichannel IoT devices must be automatically classified to provide reliability, security, and improved QoS for upstream applications. However, IoT devices traffic classification plays an important role in ensuring network security and protecting against traffic attacks as an important part of massive data analysis. Furthermore, ensuring network security and protecting against traffic attacks is the main part of massive data interpretation [14].
In this paper, we aim to propose a new, SDN-enhanced Fi-Wi-IoT access network framework with the coexistence of regular service and IoT service. We aim to observe IoT device actions on the network by combining supervised SD-FiWi telemetry and machine learning methods on the network, using a combination of SD-FiWi-IoT telemetry and machine learning methods. We also address the device identification issue by building a robust framework which classifies every IoT device separately with good accuracy, utilizing statistical attributes obtained from network data characteristics. Our focus is on accurately detecting devices and tracking their dynamic behaviors based on traffic flow patterns. This paper's major contributions are as follows: (1) The novelty of this paper focuses on the proposed SDN-Based FiWi architecture, IoT traffic QoS management and IoT traffic classification mechanism using ML supervised models. This article addresses the possibilities and possible challenges of developing and implementing IoT traffic classification mechanisms in a fiber wireless smart environment to support internet service providers' (ISP) network performance. The intelligent SDN controller, in conjunction with the fiber-wireless network and machine learning, enables the combination of the benefits of programmable flow-based telemetry and modular data-driven models for the management of IoT devices based on their network operation and defence against cyberattacks; (2) An integrated, SDN-based, Fiber wireless network access scheme is proposed and the primary operational components are described. Further, the EPON and WLAN QoS mapping is proposed; (3) Using the global view of SDN and the need for traffic flows, an optimized scheme is built for the multipath transmission of IoT applications; (4) We implemented the proposed systems and demonstrated the performance of our classifier using 21 IoT and non-IoT devices, representing different types of device. (5) We propose an enhanced framework to identify IoT device specifications, in which we devise a method for extracting invariant dependencies along with all devices and deriving features from them; (6) Finally, we evaluate our methods on the real-time IoT dataset. Our proposed model might achieve satisfactory accuracy with a small training set in classifying new IoT and Non-IoT devices. Finally, we discuss the achieved results and compares the performance with other classifiers.
The paper is organized as follows: Section 2 outlines relevant prior work. Section 3 discusses our proposed system and its operation. Section 4 outlines network traffic classification techniques. Section 5 outlines the proposed machine learning process module. Section 6 describes performance evaluation. Finally, Section 7 presents the conclusion and future work.

Related Work
In this section, we give a brief overview of related work that is end-to-end FiWi access network communication for resource provision of IoT and tactile internet (TI), intelligently manage network traffic flow in SDN, device classification in IoT and network traffic analysis. To maximize resource utilization and minimize energy consumption in IoT devices, numerous resource allocation schemes such as joint bandwidth allocation have been proposed in [5]. In 2019, Y. Liu et al. proposed the bandwidth provision problem in the FiWi network, supporting the IoT service based on the machine learning prediction method [4]. An integrated heterogeneous networking system for cloud computing and the virtualization of FiWi access networks is proposed in [15]. In addition, machine learning techniques are applied to the fiber wireless network to improve system performance [16]. The SDN controller provides a centralized view of all network traffic. SDN architecture deployed in an enterprise network that collects traffic data via the Open-Flow protocol and classifies it using machine learning techniques [17,18]. The approach proposed in [19] uses fingerprinting for authentication and identification purposes by training an ML method based on network traffic features to identify similar device types. An automated classification scheme was also created for system Identifier (SysID) characteristics [20]. This work focuses on examining this IoT device identification scenario using statistical attributes including activity cycles, port numbers, signaling patterns, and cipher suites to classify IoT traffic and devices in smart cities. The final test results showed that a 95% classification accuracy rate was reached [21]. A major challenge is proposed for the robustness of the unknown method of traffic classification [22]. The author [23] uses CNN and RNN models in a profound learning combination to identify flow forms, such as HTTP, SMTP, Telnet, QUIC, Office365, and YouTube, with six features, specifically source/destination port number, payload length, TCP window size, interarrival time and directions from the first 20 flow packets. Another method employs autonomous IoT system classification using a combination of textual and flow features used for classification [24,25]. From the perspective of network security, ML methods are used to classify IoT devices with the aim of determining if a system is on a whitelist of devices that are authorized to link to the network. In [26], a deep learning approach of LSTM-CNN cascade-based time series classification method is proposed to identify the seen and unseen devices.

Proposed Software-Defined-FiWi-IoT System Architecture and Operation
This section explains the proposed SD-FiWi-IoT architecture, such as enhanced optical network unit access point (ONU-AP) and optical line termination (OLT) architectures, and the system operation based on the SDN controller. Our proposed SD-FiWi-IoT system architecture is shown in Figure 2 and consists of an SD-OLT, in which the standard OLT is improved by adding an integrated OpenFlow-based SDN controller to centralize the FiWi system. As illustrated in the figure, an SDN controller can communicate with clients (service applications) via a northbound API (NB-API). The SDN management framework will aggregate all input from client applications and forward it to the SDN controller via the NB-API. A southbound API allows the controller to communicate with the OLT (SB-API). The underlying physical network is the back-end PON and the front-end ONU integrated Wi-Fi (wireless fidelity). Routers serve as connection points for end users of both traditional and IoT services such as M2M and H2M in wireless mesh networks (WMN).  Figure 2 shows the IoT over a converged FiWi network communication architecture. As explained earlier, the bilateral transmission of H2M/M2M packets crosses internet segments, including optical access networks and embedded software-defined optical access points. (SD-ONU-APs). This architecture consists of three respective layers, such as a service layer (application), transport (control plane) layer and infrastructure layer (data plane); it also consists of two interfaces, i.e., SBAPI and NBAPI. The service layer is responsible for providing different services for the clients. The transport layer is responsible for transmitting and receiving the data packet information from the applications orchestrating the client to the SDN controller. The infrastructure layer (data plane) is where the actual physical transfer of control and data packets takes place. The operation administration and management (OAM) system consist of an optical back-end with a single or multiple optical network in its (SD-OLT) located in the central office, which serves single or multiple optical network units (SD-ONU/SD-ONU-AP) on the customer premises. An MUX/DEMUX is used to combine/split the signal in order to drive one or more wavelengths. The feeder fiber line is divided by the power splitter in order to meet the distributed ONU-APs [27]. An advanced SD-ONU-AP that maintains the generic SD-ONU-AP ability and includes an integrated OpenFlow agent and tunable transmitter. An Ethernet passive optical network (EPON)-based back-end is employed, with its typical tree and branch topology. A branch of ONUs may also be located at residential and business subscribers' premises, delivering FTTx services (e.g., fiber to home/buildings) to one or more wired subscribers. Another segment of the SD-ONU-AP is equipped with Wi-Fi mesh networks that consist of mesh and access points to cover the user within the coverage area by radio and fiber technologies (R&F). The end-user IoT devices exchange their sensitivity and surveillance data in the same way as regular front-end user devices with FiWi infrastructures. Typically, front-end IoT device communication is directly connected to wireless SD-ONU-AP.

System Architecture
However, depending on the abilities of a device and its compatibility with radio access technology (RATs), it can connect to the chosen appropriate access mode. Here, central office (CO) resource allocation, to provide end-user (IoT devices) traffic via access networks is critical to improving delay efficiency. To collect the data of IoT/non-IoT device (e.g., cameras, Amazon echo, smoke sensors, mobile, and laptop) traffic, trace files are captured using the Wireshark protocol analyzer tool. Future FiWi-IoT access networks are expected to meet the ever-increasing dynamic bandwidth allocation (DBA) requirements of the next generation of PON and WLAN technologies; this would ensure low latency, a high data rate and broad network coverage for the next generation of wireless networks [28]. Finally, the DBA plays a major role in successfully governing the performance of these integrated networks and mapping the QoS optical wireless networks, as shown in Table 1 and Figure 3. The proposed quality of service class identifier (QoS-CI) mapping is based on traffic types, such as voice, video, IoT and data. QoS parameter mapping changes QoS parameters among access networks, defined for each service requirement [29,30]. However, EPON and 5G/4G WiFi/WLAN offers various types of services (ToS) with different latency and bandwidth requirements, but they have some common properties and, thus, mapping is possible. In addition, traffic classification and class service mapping are performed at the SD-ONU based on their traffic type, characteristics, and QoS specifications in order to support these services via the SD-FiWi-IoT architecture. Due to their specifications, IoT services are assigned the highest priority in this CoS queue, as constant bit rate (CBR) is a type of service that is mainly used for voice traffic and Table 1 shows the CoS priority. The distributed-coordination function (DCF) model and enhanced distributed-channel access (EDCA) mode of front-end WLAN are depicted in Figure 4a. The basic IEEE 802.11 Medium Access Control (MAC) protocol is DCF, based on the Collision Avoidance (CSMA/CA) and Binary Exponential Backoff (BEB) mechanisms. The pre-specified QoS mapping plan is performed where all data frames (DFs) are represented with a single priority queue (all-in-one mapping) using STA's DCF model, and QoS is provided in DCF mode. The WiFi-AP has the same QoS configuration and STAs, compete with an equal DCF bandwidth. Moreover, through mapping a CoS for each frame in this scheme, no packet classification is needed for all STA frames, avoiding the associated overhead as in SD-FiWi-IoT. The EDCA is the basic and obligatory 802.11e mechanism; the EDCA classifies traffic flows between four access categories (ACs), each associated with a different transmission queue and acting independently. Therefore, in the EDCA mode, each packet is marked with a user-priority (UP) value that has eight distinct values, i.e., 0 to 7. After marking the packet, it will then be sent to the SD-ONU via the AP. The access category, packets are classified with a right EPON queue (i.e., AC_EF, AC_AF) form in every SD-ONU-AP, as shown in Figure 4b. Packet Classifier classifies the uplink traffic in both the AP and SD-ONU according to their QoS specifications and buffers it into the appropriate priority queues. To the OLT and ONUs, assign DiffServ Code Point (DSCP) values. Additionally, the packet schedulers at the SD-ONU-AP and SD-OLT must use the same packet forwarding approach for EPON and 4G, 5G WLAN/WiFi upstream/downstream traffic with each configured QCI/DSCP value associated with an IP flow. We refer these traffic flows to four sources of traffic types (VOICE, VIDEO, IoT (Alarm), HTTP). The H2H, MTD (IoT) services and applications are listed in Table 2.   Furthermore, according to [31], there are three key DWBA framework building blocks for optical wireless networks: QoS mapping block (QoSMB), QoS provi-sioning block (QoSPB), and (QoSB) scheduling block. First, QoSMB is responsible for solving the QoS diversity issue of allowing various hybrid-network technologies. Second, QoSPB is prompted to decide if the data packet (connection request) on single or multiple criteria is accepted or dropped (rejected). Third, the SB controls how data packets are sent or how data flow from optical to wireless, and vice versa.

Operation of Software Defined Network
SDN controller is a program that manages one or more SDN-OLTs and SDN-ONU-APs to perform the complex, non-real-time operations that make up FiWi's control plane. Thus, the control plane's SDN controller will track and evaluate the traffic conditions in the access network and reconfigure the forward-plan devices to modify the operating mode accordingly [15]. The controller keeps a log of all SD-OLTs and registered SD-ONU-APs. This database provides statistical details on the OLT wavelength and all SD-ONU-APs average buffer status. It also eliminates the mechanism of re-registration during wavelength switching. The wavelength/link-rate configuration changes are performed by the SDN controller by sending OFPT_MOD_PROP_OPTICAL to SD devices. The Media Access Control (MAC) client on the L-OLT sends GATE messages to assign the SD-ONU-APs transmission time slots. Additionally, the OpenFlow agent activates SD-OLT and SD-ONU control mechanisms, which link the SD-OLT/ONU-APs to the controller and communicate through OpenFlow signaling messages. The protocol communication channels to the OpenFlow switches provide one or more flow tables containing flow entries. Each flow entry contains matching fields and behaviors, and the controller populates the tables. Additionally, the match fields include data packet headers, which include the Internet protocol (IP) source and destination addresses, the port number, and other relevant details. Each behavior determines how the packet's instructions are implemented in accordance with the entry law. The overall operation of network traffic classification in a FiWi environment is focused on actively managing all facets of the network, including SD-OLT initialization, new wavelength activation, wavelength shutdown, SD-ONU-AP wavelength tuning, link-rate tuning, and transmission timeslot monitoring [32].

Network Traffic Characteristics of IoT Smart Environment
Machine-to-machine (M2M) communication is a fundamental part of the IoT paradigm. IoT-smart environments have unique properties which make them distinctive. The number of devices connecting to the Internet, as well as network traffic, is increasing at an exponential rate. The characteristics of IoT traffic behavior comprise a combination of machine-type communication (MTC) and human-type communication (HTC). MTC was the primary communication when IoT was introduced. At the time, devices and application characteristics were mostly restricted to limited sessions with a few data bytes [18]. However, owing to the new, revolutionary applications, the characteristics of traffic generated by the various MTCs as a result of IoT devices, such as surveillance videos and automobiles, have been completely updated. Furthermore, IoT devices now have the following traffic characteristics based on new applications: short bursts of data sent periodically, short active time, long sleep time, low data rates, and small packet size. The device battery power, network tolerable delay, large network size and network type are also important traffic characteristics [33]. Hence, meeting the requirements for network traffic characteristics of smart environment applications can be challenging. Another critical feature of traffic is that the network is usually medium to large in scale, linking hundreds and thousands of devices over a wide area. Traffic rates are also irregular and relatively less; however, many applications are based on the detection of rare events, although there is high demand for QoS. Furthermore, most applications require only medium to high security.

Overview of Network Traffic Classification Techniques
Network traffic classification is a critical issue in network resource management that emerges from network pattern analysis, as well as network planning and design. Numerous methods have been proposed and implemented over the last two decades. This section discusses some methods for classifying network traffic.

Port-Based Classification
Port numbers may identify network traffic. These port numbers are assigned by the Internet Assigned Numbers Authority (IANA). The numerous applications of these techniques make use of the port number allocated by IANA to a local host on the network. The World Wide Web and email also use regular port numbers. As a result, it is simple to classify the traffic associated with these applications. However, some applications, such as B2B, gaming, and multi-media, do not use set port numbers. They make use of the port numbers associated with other commonly used applications (e.g., HTTP/FTP connections), which sometimes results in a suboptimal performance. When applications make use of dynamically assigned port numbers, these strategies are ineffective [34].

Payload-Based Classification
This classification technique is also called packet-based classification or deep packet inspection (DPI). Packet content can be calculated by defining the characteristic signatures of traffic network applications. The majority of payload-based classification algorithms evaluate the packet's contents and compare them to the signatures contained in the database. These approaches are alternated with port-based approaches and have reliable results compared with port-based techniques. They are particularly well-suited for peer-to-peer (P2P) traffic. However, they have some disadvantages and weaknesses [35,36]. They require very expensive hardware for the payload method search. Additionally, they do not work in encrypted network application traffic. Finally, payload-based approaches require a continuous updating of the signature format of new applications.

Statistical Classification
Statistical classification is a logical technique using statistical properties of network traffic flow to classify the application. Packet duration, packet inter-arrival time, packet length, and traffic flow idle time are some examples of traffic characteristics. As the measured characteristics are unique to each type of application, different implementations may be distinguished from one another [37,38]. Therefore, classifiers must use data processing techniques to perform real classification based on statistical properties, particularly ML methods, because they must handle different traffic patterns from large datasets. Due to their independence from the packet-based technique, ML models are considered lightweight and low-cost. These techniques outperform payload-based techniques; however, as they do not handle packet content. Hence, encrypted traffic can easily be analyzed.

Proposed Machine Learning Methodology
This section describes the implementation of the ML classification module. The architecture is depicted in Figure 5, which contains the following functional blocks: packet capture and collected raw data, pre-processing, transformed data, ML-training, ML-testing, ML-classification model, and classification results.

Packet Capture and Collected Block
This module utilizes an SDN-FiWi-IoT interface to catch IoT network packets. The gateway is connected to the Internet through the smart environment, whereas the IoT devices (i.e., smart devices) are connected through the SD-FiWi. Our smart environment has a total of 21 unique IoT and non-IoT devices representing different categories of devices. The function block collects data from the network interface pertaining to IoT network traffic and saves it to a file for further processing. The tcpdump tool of the Wireshark protocol analyzer performs this task [39]. The Wireshark packet analyzer collects information about incoming and outgoing traffic flows and generates associated records. The record contains the entirety of the data contained in that package, from the MAC layer to the application layer. Wireshark provides a graphical user interface for monitoring network traffic, selecting the desired network interface, and capturing packets in real time. The software presents raw data in the form of a hexadecimal dump, as well as distorted information about various protocols used in communications, including source and target IP addresses and ports. The tcpdump tool collects data from a network interface and saves it to an external hard drive as a PCAP file; it also provides several features. All packets were captured from the SD-ONU-AP LAN side.

Pre-Processing and Transformed Data Block
After the IoT network traffic data are captured, the data are then subjected to preprocessing. The pre-processing block is in charge of receiving the captured file in packet capture (PCAP) format and collecting the necessary information. The block is composed of two functions: identification of traffic and variable extraction. Each packet is labeled by the traffic identification feature according to the system from which it originated: IoT or non-IoT device traffic. This is important because the classifier is supervised by machine learning. The data extraction task generates a collection of statistical variables from information contained in packet headers and payloads. Subsequently, the extraction of features is performed by determining strategies to handle missing fields and altering data as required. Then, useful features that can be used to represent the data are extracted, depending on the goal or task (i.e., the data are transformed using the labeling of IoT/Non-IoT Traffic). The effective number of variables under consideration can be reduced or invariant representations for the data can be found using dimensional reduction or transformation methods. The transformed features include port source and destination numbers, IP source and destinations, domain name services (DNS) and NTP, packet size, etc. Further, the extracted features are used to train and test the ML classifier [40].

Training Block
The aim of this module, based on all features extracted by the pre-processing module, is to enforce feature selection. The selection of features allows for the creation of a scalable model, depending on the features, and offers a more accurate dataset classification. The selected train module method for this research is based on the white-listed, where a binary classifier is generated for each device type. The predict module can be used to directly classify models from the train to predict device type and feature name: packet size, packet id, port number, and DNS. The classifier can be trained using the training dataset. Then, the effectiveness of the proposed supervised classifier was evaluated using the classifier to classify an independent test dataset. During the training phase, the information is used to identify new examples, which are not present in the experimental phase (classification process); 70% of the observations in the original dataset were placed in the training set.

Testing Block
The algorithm was then used to classify test data. The test block uses the model in the training block to identify new instances. Datasets used in the training and testing blocks must be independent and labeled in advance. When training the dataset, "flow statistics processing" is implemented by calculating the statistical properties of these flows (packet id, time, size, ethernet source and destination, IP source and destination, port numbers, etc.) as a prelude to the generation of features.

Implementation of ML Classification Model (Pattern) Block
This block contains the implementation step; this process involves determining which models and parameters could be suitable and matching a specific data-mining method to apply ML classifiers to different instances. We used six well-known data classification machine learning classifiers. ML algorithms are used: Random Forest (RF), (SVM), K-Nearest Neighbor (KNN), Neural Network-Multi-Layer Perception (MLP), Naive Bayes, logistic regression and Support Vector Machine (SVM). These algorithms and technical details are described as follows. First, a Random Forest is a meta-estimator that fit a set of decision tree classifiers on different dataset sub-samples and uses an average to boost predictive accuracy and control over-fitting issues. Further, RF is an ensemble ML algorithm. Second, a supervised, non-parametric classification method is k-Nearest-Neighbors (KNN). In the technique, k training samples are found with relatively similar (closest) attributes to test samples. These samples are called Nearest-Neighbors. Third, in a neural network, MLP uses a supervised learning technique called backpropagation. Its multiple layers and non-linear activation distinguish MLP from a linear perception. Fourth, the NB is a method used to classify data, based on the Bayes theorem. This paper is used on the gaussian naive Bayes. Fifth, LR is a statistical analysis technique that employs regression analysis to ascertain a quantitative relationship between two or more variables in mathematical statistics. Finally, the SVM is an ML technique that separates the attribute space with a hyperplane, maximizing the margin between different class or class value instances. The technical details of the studied classification techniques are tuned and the parameters are given in Table 3. Moreover, a comparative analysis of the algorithms was performed [41]. Various models were employed for comparison.

Classification Result Block
After the implementation of machine learning classifiers, the simulation tool presents the results in terms of classification accuracy.

Performance Evaluation
This section presents the effects of applying multiple machine learning models to classify the device. We utilized publicly available datasets to evaluate our proposed ML classification module. First, we provide a dataset overview. Next, we present performance indicators, system classification, experimental configuration, and results. This section ends by discussing future work.

Dataset
We used the dataset collected by Sivanathan et al. [21]. The dataset was collected in the IoT smart environment using different types of IoT/non-IoT device traffic traces, captured over 20 days and released online. We used a total of 21 devices from the UNSW dataset containing different types of IoT/non-IoT devices. There are a total of 28 IoT/non-IoT genre devices or different group devices in UNSW [21]: non-IoT devices, such as MacBook, laptop, Samsung Galaxy and TB-link, and IoT devices such as Amazon echo, Triby speaker, HP printer, Smart Things, Netatmo Weather Station, Netatmo Welcome, and Withings. The IoT/non-IoT devices belong to various categories, such as smart health, smart homes and cities. Although the categories differed from the classification, in this study, in total, we used 21 UNSW IoT and non-IoT devices. The MAC address and connectivity information of all these devices is listed in Table 4.

Performance Metrics
In this study, we used a confusion matrix to evaluate the classification performance. Our goal is to classify IoT and non-IoT device network traffic to identify specific devices and improve classification accuracy. To measure the accuracy of the classifier, several performance metrics are introduced. These metrics are calculated using the classification results of the ML classifier [42]. The following performance metrics are frequently used for binary classification: TP: True positive; FP: False positive; FN: False negative; TN: True negative. We used the aforementioned four metrics, along with the confusion matrix, to evaluate the performance of each model.
Thus, the calculation is as follows: Current accuracy = TP + TN TP + TN + FP + FN (1) The F1-score is the harmonic mean of precision and recall and indicates the classification accuracy of a machine-learning-supervised classification model. We calculated all device labels' accuracy, recall, accuracy, and F1-score. Then, we averaged them to obtain overall accuracy and F1-score measurements for the performance comparison of the six ML models. Note that F1 represents the balance between accuracy and recall values, and is calculated by the harmonic mean of these two values in Equation (4). For each model class, all measurements take a value of 0-1 to obtain the FP and FN of all labels. In our multi-class models, we derive FP and FN by summation across all "incorrect" labels.

Experimental Setup
The proposed method was implemented using the orange-ML simulation tool [43] with the system configuration shown in Table 5. In the simulation environment, we used 21 IoT/non-IoT devices and 41911 instances from the traffic traces, and the total instance counts of each class are shown in Table 6. The 21 devices were chosen based on the low traffic generated by the devices (e.g., Smoke Alarm, Netamo Weather Station, Netamo Welcome). The following features are used to simulate the environment: packet id, size, ethernet source and destination, port source, destination, IP source and destination, and DNS. The instances were randomly divided into two groups: 70% of training and 30% for testing.

Device Classification and Analysis of Receiver Operating Characteristics (ROC) Curve
The types of IoT device that can appear on the network are classified into specific IoT and non-IoT device types. An initial activity includes the creation of classification models to differentiate traffic between devices. Our method is structured as follows: each IoT/non-IoT system traffic capture is expressed as x = (x 1 , x 2 , . . . ., x n ). A function vector represents these traffic flows. Hence, each function vector x i should be assigned a label. As the number of devices increases over time, we construct a single classifier for each device class, and n different classifiers for each device. As a result, each classifier is a binary classifier that decides whether the unknown device's input feature vector fits the device class or not. This method is known as one-vs.-all. For each classification model, the class fits all other classes. This is the most-used multi-class classification technique. Additionally, we used the receiver operating characteristic (ROC) curve and area under the curve (AUC) as statistical measures. The receiver operating characteristic (ROC) curve is a commonly used machine learning and data mining technique. This graph illustrates the relationship between the TP rate (sensitivity) and the FP rate (1-specificity). The classification of correct and incorrect results at various thresholds shows the model's overall efficiency, and ROC provides visual and numerical descriptions of a classifier's behavior. Additionally, the region under the curve (AUC) has gained significantly more attention in the ML community and is a widely used performance metric in supervised machine learning. The AUC score is used as a criterion because, under the binary classification problem, the data are generally balanced. This means that most feature vectors usually do not represent a device; some feature vectors represent devices. Therefore, the accuracy indicator may not be sufficient to reflect the distribution of the base classes. The receiver operating characteristics (ROC) represents the recall and precision based on the true false positive rate (TPR) and the false positive rate (FPR) [44]. The closer the area under the curve is to 1, the better the classification. Here, we use the average AUC score to measure the model performance. A classifier efficiency measurement with a high AUC score is considered favorable. The AUC and ROC curves in Figure 6a,b illustrate the predictive efficiency of the Amazon Echo and Belkin motion sensors, respectively. Figure 6, shows the Random Forest and KNN classifier and received a higher AUC score of 1.00 and 0.995 compared to other ML algorithms. For all attributes, the Random Forest Classifier yields a greater region under ROC curves.

Overall Performance Result
In this section, we describe our experimental process and data-based results. After implementation of the machine learning algorithm, the simulation tool provides detailed results about the applied machine learning algorithms, such as (1) area under the curve (AUC), (2) current accuracy (CA), (3) precision, (4) recall, and (5) F1 score. Table 7 shows the predictive accuracy of the proposed model classification and is shown in Figure 7.  It is clear that the feature set selected in this work is robust, effective and achieves excellent performance results for the IoT datasets. The experimental study used six wellknown ML supervised classifiers. All the applied ML classifiers were found to have good classification performance. The random forest classifier showed the best overall performance in our experiments. Figure 7 shows the comparison of the classification current accuracy, F1 score, precision and recall of the six ML-algorithms. The Random Forest algorithm outperforms all other algorithms due to its high tolerance of overfitting in comparison to other decision tree classifiers. However, with an accuracy of 0.48 and 0.54, respectively, the SVM and logistic regression do the worst of all. The accuracy of the other algorithms ranges between 0.996 and 0.87, and the Random Forest is the optimal algorithm. Figure 8 presents the confusion matrix of the IoT and non-IoT classifier and devices based on the above performance metrices (True/False, positive/negative) occurrences, where the rows represent the actual device and the columns represent the expected classification correctly. We note that the proposed model works well at predicting most classes. The diagonal entries are close to 100 percent classified, with just four exceptions: the

Discussion on Our Work with Related Work
In this subsection, in this section, we present our insights into the realistic implementation on model analysis and IoT data acquisition compared to previous studies. Our proposed methodology is a generic IoT system classification tool. Our experimental findings in Section 6 show that our method could be able to automatically classify new IoT devices by analyzing their network traffic sources, which are generally easy to acquire. We use a multi-class classification using supervised ML models to identify the individual IoT and non-IoT devices. We classify the test data from the proposed several ML models. The experimental results show that the proposed RF model achieves higher predictive accuracy than existing reference models. After that, we analyze the AUC and ROC curve to find the best performance of the device identification. The confusion matrix of a 21 device classification test (shown in Figure 8) shows that our approach fails to accurately distinguish Netatmo welcome, MacBook and Amazon Echo. This can be attributed to the very limited information available in our small-scale dataset. Moreover, this experiment's devices and smart environment are one of the use cases, and more development is needed to extend the proposed user identifier to different devices in actual smart environments. Finally, we compare our methods to similar studies in terms of the method, features, number of devices, identification speed and accuracy. The Table 8 represents a summary of the state-of-the-art from various perspectives. Our method is subject to the following requirements and limitations: 1.
The tested IoT and Non-IoT devices are various enough with 21 devices; 2.
The coverage is complete and 99% accuracy is good enough; 3.
The study only examined devices that communicated through TCP/IP; 4.
We collect harmless IoT and non-IoT traffic flow, i.e., we do not abuse or unusually use the IoT system. As a result, our assumptions only apply to the capture of the usual activity patterns of a variety of IoT system types.

Conclusions
This paper implements an end-to-end network traffic classification system based on a fiber wireless access network by mapping an Ethernet passive optical network (EPON) and wireless local area network (WLAN) traffic, based on the quality of service class identification (QoS-CI) for traffic types such as voice, video, IoT and data. The identification of the devices that comprise a network, referred to as network mapping, serves as the foundation for a variety of network management applications, ranging from resource allocation and network slicing to security management. The proposed ML process modules were tested with the UNSW dataset. We collected a smart environment dataset with 21 unique IoT devices, analyzed the trace file and extracted the traffic behavior features. Then, we used multiclass classification techniques that were uniquely identified with the individual devices. We employed a multi-class, machine-learning-based classification system to ensure that IoT devices are uniquely identified. Six different supervised machinelearning models were used to automatically classify specific IoT/non-IoT devices. We found that the proposed random forest classifier achieved 99% accuracy compared to other classifiers (KNN, logistic regression, SVM, neural network, Naive Bayes), and the identified speed is also quick at classifying specific device types using behavior features on the UNSW dataset. Despite their being room for progress, our work successfully demonstrates an ability to automatically identify IoT/non-IoT devices based on their network traffic flows. In future work, we intend to study the classification of anomaly detection datasets using different machine learning approaches. The goal of anomaly detection is important for the extraction of essential business insights and maintenance of key functions. Anomaly detection is a critical tool for detecting fraud, network interference, and other unusual but significant incidents. Acknowledgments: The authors would like to acknowledge the anonymous referees who gave precious suggestions to improve this work.

Conflicts of Interest:
The authors declare no conflict of interest.