IoT Device Identification Using Directional Packet Length Sequences and 1D-CNN

With the large-scale application of the Internet of Things (IoT), security issues have become increasingly prominent. Device identification is an effective way to secure IoT environment by quickly identifying the category or model of devices in the network. Currently, the passive fingerprinting method used for IoT device identification based on network traffic flow mostly focuses on protocol features in packet headers but does not consider the direction and length of packet sequences. This paper proposes a device identification method for the IoT based on directional packet length sequences in network flows and a deep convolutional neural network. Each value in a packet length sequence represents the size and transmission direction of the corresponding packet. This method constructs device fingerprints from packet length sequences and uses convolutional layers to extract deep features from the device fingerprints. Experimental results show that this method can effectively recognize device identity with accuracy, recall, precision, and f1-score over 99%. Compared with methods using traditional machine learning and feature extraction techniques, our feature representation is more intuitive, and the classification model is effective.


Introduction
In recent years, the number of Internet of Things (IoT) devices in use has continued to proliferate. It is estimated that the number of IoT devices will reach 75 billion by 2025 [1]. For both the traditional Internet and IoT, security remains an important issue. The challenge of IoT security comes from the heterogeneity of IoT devices [2,3], and the limited nature of their resources, such as processing ability, battery and bandwidth [4], for implementing traditional security solutions. Many security issues can be mitigated by identifying unknown devices in the local network, which enables appropriate security enforcement on a particular device. Besides cybersecurity application, IoT device identification is an important area of research that many other applications can further benefit from, especially the smart building domain. Researchers have developed many applications in this field recently, such as plug load automation and control [5], wireless communication [6], energy management [7], occupant-appliance interaction patterns, and abnormal traffic detection [8].
As IoT devices are constantly being installed in and removed from a network, it is essential to identify the device type or model for security concerns. IoT device identification (fingerprinting) is sometimes difficult due to the large variety of protocols used in devices. Commonly an IoT device should respond to queries about its identity, however, an unknown or compromised device might disguise itself as another device by sending false identity information. This behavior can be detected by fingerprinting techniques through passive network traffic analysis. Therefore, passive fingerprinting of IoT devices is of vital importance for securing IoT networks. In general, IoT device identification or fingerprinting is a multiclass classification problem. By training a classifier based on feature 1.
We propose a new feature extraction technique based on the directions and lengths of the packets in network traces which is fundamentally different from other IoT device identification methods. This may provide different insights compared to other feature extraction techniques used in this field. 2.
Based on an evaluation of experimental results, our method performs better than previous work in terms of classification precision, recall, and F-1 score.
The rest of this paper is organized as follows: Section 2 reviews related works, and Section 3 defines our proposed method of data representation and classification. Section 4 reports the experimental results. These results are discussed in Section 5. Section 6 discusses limitations and concludes the paper.

Background and Related Work
This section reviews the previous work on IoT device identification and then presents some background on deep learning to provide a better understanding of our method.

IoT Device Identification
Research on IoT device identification, or fingerprinting, is still in the early stage due to the rapid development of the IoT industry. Many previous works have focused on device identification using fingerprinting methods to classify IoT devices. Device identification can be characterized as a classification problem in machine learning. The objective of the classification task may be either device type or model classification. First, features are extracted from raw network traffic data. Then the data in feature vector form are divided into training and testing sets, and different algorithms are used for training and testing. Many previous works differ in the feature extraction methods and machine learning algorithms used. Most of them have used features such as packet header and payload statistics, flow statistics, or timestamp features. Algorithms such as naïve Bayes, decision trees, random forest, k-nearest neighbors, and support vector machines are commonly used in these works. Table 1 summarizes various recent works on IoT device identification.  In [9], Miettinen et al. described a framework called IoT Sentinel for securing IoT devices using the fingerprinting approach. They extracted a set of 23 features from protocols in different layers of the network, IP options, packet content, and port class from 12 consecutive packets and finally obtained a feature set with a size of 23 × 12 = 276. Then, binary classifiers were trained for one device type versus the rest using the random forest algorithm. An experiment was performed over a set of 27 devices, and accuracies of 95% for 17 devices and 50% for 10 devices were achieved, resulting in an average of 85%.
In IoTSense [10], Bezawada et al. used part of the feature set from IoT Sentinel and another feature set from the payload. Specifically, IoTSense considers 17 protocol-based features and an additional three payload-related features. The fingerprints are produced by extracting 20 features from five packets, for a total of 100 features. This work achieved 99% average accuracy and 93-100% per-device recall using the k-NN, decision trees, and gradient boosting algorithms. This work on IoTSense considered a smaller number of devices than the work on IoT Sentinel (i.e., 10 vs. 31).
Sivanathan et al. [16] used a large dataset collected from 28 IoT devices over six months for IoT device classification. This work used eight features, namely flow volume, flow duration, average flow rate, device sleep time, server port numbers, DNS queries, NTP queries and cipher suites. Naïve Bayes and random forest were used together to construct the classifier. As a result, an accuracy of 99.88% was obtained. However, some of the features were too device specific, which could influence the classification results.
Meidan et al. [23] proposed a method to identify unauthorized IoT devices. Their dataset contained 17 devices. In total 334 features were extracted from each network traffic flow. A random forest was used for classification and as a result, unauthorized devices could be identified with an accuracy of 96%. The drawback of this study was that the features included application layer information, which is often encrypted in reality.
Shahid et al. [12] used bidirectional flow characteristics to identify different types of IoT devices. They used four different types of equipment: sensors, cameras, bulbs and sockets. From the algorithmic perspective, t-SNE used for dimension reduction, and a random forest was used for classification. This research finally achieved an accuracy of 99.9%.
Yin et al. [24] proposed IoT ETEI, an automated end-to-end IoT device identification method based on a CNN + BiLSTM deep learning model. Their method outperforms traditional methods with higher identification accuracy and less overhead. Even for IoT network traffic using encrypted protocols, the method can reach an identification accuracy of over 99%.
Similarly, to address the problem of device type and model identification, the author of this paper [15] used the ping operation to generate fingerprints of different IoT devices to distinguish real embedded devices from virtual or simulated embedded systems. For each ping, ping requests with an interval of 0.2 s were used to calculate the statistical characteristics based on time. Detection rates of 99.5% (using 25 pings) and 99.9% (using 200 pings) were obtained.
Oser et al. [14] used TCP timestamps to measure the clock offsets of different IoT device models for model identification. They used a total of 51 different models of 562 devices. When only the clock offset was used, the system could not identify most models. Therefore, the author used 12 other features obtained from timestamps in addition to the original clock offset, and finally obtained 97.03% accuracy, 94.64% recall rate and 99.76% accuracy rate.
Thangavelu et al. [13] proposed DEFT, a distributed device fingerprint identification system. In their method, an SDN network gateway is used to monitor and classify equipment locally. Statistical features based on packet headers and application layer protocols are extracted, and then a 15-min session is formed. To identify new equipment types, a clustering algorithm (k-means) and random forest are used. Perdisci et al. [25] analyzed the DNS application layer protocol to build fingerprints of IoT devices and used a method based on file retrieval to classify them.
Another related work in the context of IoT device model classification comes from Marchal et al. [19]. They presented Audi (Autonomous IoT Device-Type Identification), a system for identifying the types of IoT devices by passively analyzing periodic network communication. To identify periodic flows, a discrete Fourier transform was used to transform the time domain into the frequency domain. Then, 33 different features were calculated for each cycle. Their results were over 90% in F1-score and 98.2% in comprehensive accuracy. Msadek et al. [17] focused on encrypted traffic analysis, and they used a head sliding window to obtain statistical features. AdaBoost was used for classification, and an accuracy rate and F1-score of 95.5 were obtained.
Pinheiro et al. [20] proposed an IoT device and event identification technique based on packet length from encrypted traffic. Their solution utilized packet length statistics to identify IoT devices and events, including the mean packet length, the standard deviation, and the number of bytes transmitted over a one-second window. The method used only three statistics reducing the computational complexity for IoT traffic classification. The traffic classification algorithms used included k-NN, decision tree, random forest, support vector machine and majority voting. The results showed that the random forest algorithm could achieve up to 96% accuracy in the identification of devices on the UNSW dataset [18].
Duan et al. [22] proposed a practical IoT device identification system called ByteIoT based on the frequency distribution of bidirectional packet lengths. For ByteIoT, a k-NN classifier was applied. The authors evaluated ByteIoT on several datasets and the results showed that it achieved over 99% accuracy on the UNSW IEEE TMC 2018 dataset [16].
Following the same technical route, OConnor et al. [26] presented HomeSnitch, a communication classification framework designed for home IoT devices based on semantic behaviors (such as firmware upgrades, audio and video recording, and data uploading). HomeSnitch uses the adudump [27] traffic analysis tool to build an application layer model of the packet headers. Finally, 13 different features are extracted to describe the application layer data exchange. The authors used the YourThings [28] dataset for testing, and obtained 99.69% accuracy, 93.93% F1-score and 96.82% TPR. Trimananda et al. [29] performed similar research, and they proposed Ping-Pong, a tool for extracting signatures of packet events. This work addressed research on encrypted traffic and unknown protocols, and used a method based on clustering and statistical packet analysis. Hafeez et al. [30] proposed IoT-KEEPER, a system that uses an unsupervised learning method to identify device types and detect malicious behaviors. The methods used were fuzzy c-means and interpolation.
Using a more complex method based on deep learning, Ortiz et al. [31] proposed DeviceMien, a probability-based device identification framework that uses a stacked LSTMautoencoder structure to automatically learn the characteristics of the original TCP packets, and then uses DBSCAN to cluster them. To test their model, the authors used two datasets, one public [16] and one private.
In commercial buildings, plug loads often account for up to one-third of the energy use. Some researchers have developed automatic smart plug load identification systems for enhancing the capabilities of existing load monitoring systems. Tekler et al. [21] proposed a near-real-time plug load identification approach that used low-frequency power data in office spaces. They applied a novel dynamic time window strategy during feature extraction. Then the proposed method was evaluated in online and offline settings for device identification using k-NN, random forest (RF), gradient boosting (GB), and bagging algorithms. As a result, the best online model achieved accuracies up to 93% using the bagging algorithm with a minimum dynamic time window of 5 min.
The above previous works have made important contributions to the IoT device identification, however most of the methods depend on certain packet header field values or related statistical values in the network traffic generated by IoT devices, which require manual feature engineering or predefined domain knowledge. Some related work also needs feature selection algorithms to select useful features from the feature pool. In this work, our method for IoT device identification uses the length and direction of continuous packets generated by a specific unknown device and then transforms it into sequence of packet length and direction, as the input of the deep learning model. The learning ability of the deep learning model is leveraged to extract features automatically. Extracting packet length and direction is simple and no other feature engineering techniques are needed. This method dramatically reduces the difficulty in manually extracting features, and has better generalization ability. In addition, the accuracy of our method is as good asor better than the best previous work.

Deep Learning
Deep learning is a branch of machine learning methods; it refers to algorithms based on artificial neural network architectures that are used to perform representation learning on data [32,33]. The advantage of deep learning is the use of automated feature learning and hierarchical feature extraction algorithms in place of manual feature engineering. Convolutional neural networks (CNNs) consist of several convolutional layers and fully connected layers and include shared weights and pooling layers. This structure allows a CNN to use input data with a high-dimensional structure. CNNs produce better results in image and speech recognition compared with other neural network models. A CNN can also be trained using the backpropagation algorithm and with a smaller number of parameters than other deep feedforward neural networks. Figure 1 shows a block diagram of a traditional CNN. The diagram shows that there are two steps in a CNN: feature extraction and classification. Feature extraction is performed by convolutional layers and pooling layers and batch normalization and dropout are used to prevent overfitting. In classification, fully connected layers are used to classify the input as in a traditional multilayer perceptron (MLP).

System Overview
In our study, we propose a method for IoT device identification. Figure 2 presents our proposed system model. The first stage of the model shows when an unknown device joins the local IoT network. The model passively captures a sequence of network packets for the device. Then, a feature vector (fingerprint) is extracted from the network traces, and the unknown device can be identified using a classifier trained on a training set of

System Overview
In our study, we propose a method for IoT device identification. Figure 2 presents our proposed system model. The first stage of the model shows when an unknown device joins the local IoT network. The model passively captures a sequence of network packets for the device. Then, a feature vector (fingerprint) is extracted from the network traces, and the unknown device can be identified using a classifier trained on a training set of known devices. Figure 2 shows the complete system workflow. We first describe the method used for preprocessing and representing the data of network traffic traces; the deep learning classifier used for final identification is introduced.

System Overview
In our study, we propose a method for IoT device identification. Figure 2 presents our proposed system model. The first stage of the model shows when an unknown device joins the local IoT network. The model passively captures a sequence of network packets for the device. Then, a feature vector (fingerprint) is extracted from the network traces, and the unknown device can be identified using a classifier trained on a training set of known devices. Figure 2 shows the complete system workflow. We first describe the method used for preprocessing and representing the data of network traffic traces; the deep learning classifier used for final identification is introduced.

Data Preprocessing
IoT device identification can be seen as a supervised machine learning (or classification) problem. For a classification problem, first we should have feature vectors that can model the data of interest (network packet data); then, these feature vectors are fed into a classification model to obtain their predicted classes. In general, four categories

Data Preprocessing
IoT device identification can be seen as a supervised machine learning (or classification) problem. For a classification problem, first we should have feature vectors that can model the data of interest (network packet data); then, these feature vectors are fed into a classification model to obtain their predicted classes. In general, four categories of input features are used in network traffic classifiers: time-series, header, payload, and statistical features [34]. Each network packet includes a header and payload. For a layer-2 packet, if the payload is encrypted, the only information available to us is the metadata stored in the Ethernet header. Recent IoT device identification research has focused on extracting feature sets from packet headers and then using feature vectors obtained from individual packets for training and testing. Therefore, in the previous work, each packet was used as one data sample. In contrast, the proposed method utilizes several continuous packets and one such packet sequence is used as one data sample for device identification. This feature extraction and representation method is introduced below.
In this work, we use two time-series features, namely, packet length and direction, of N continuous packets. The feature vector is of length N with 1 channel in which packet length and direction are combined. As in [35], we define an outgoing packet from a device as having a positive value, whereas the incoming packet has a negative value. The original dataset is in PCAP format, and the network traffic is captured by software located in the gateway of a local network. The first step is aggregating packets generated by each device in chronological order according to the MAC address of each device. As a result, the PCAP file is split into different small files, and each small file contains all the packets generated by each device in a specific time period. For a small PCAP file, we extract the packet length and then combine the packet length and direction to obtain a numeric value for each packet. For example, 1400 denotes that this is an incoming packet of length 1400. Then a long sequence consisting of these numeric values is constructed in time order. To make a feature vector representing the device, the long sequence is sliced for every N packets to obtain multiple feature vectors of length N. The trailing packets are dropped for convenience. If the length of the feature vector is too long, this leads to more parameters in the classification model, and more computational resources are needed. In addition, it is also not good if the length is too short, as a lack of sufficient information will cause the method to fail. The optimal length N must be determined through experiments.

Convolutional Neural Networks
Convolutional neural networks have been widely used in image recognition. Data in image recognition task is high-dimensional tensor. Since the network traffic data are represented by one-dimensional sequence, inspired from sequence classification tasks such as DNA sequence classification [36] and heart sound signal processing [37], we adopt 1-D convolutional layers in the network design, in contrast to traditional image recognition applications. Another difference is that in traditional image classification, activation functions such as the sigmoid and rectified linear unit (ReLU) are widely used, but they do not work on negative values. Thus, the packet direction information would be lost if we were to use these functions; instead, activation functions such as hyperbolic tangent (tanh), leaky ReLU, and exponential linear unit (ELU) can deal with negative values. Among these three activation functions, we performed comparisons during hyperparameter tuning and found that ELU performed best.
Our CNN model includes three convolutional blocks and two fully connected blocks. The three convolutional blocks look similar except for the number of filters and the kernel size. Each convolutional block comprises one convolutional layer, followed by batch normalization, then an activation function (ReLU or ELU), followed by another convolutional layer, batch normalization, and an activation function Finally, max pooling and dropout are used. This block is repeated three times with a different kernel size each time. In each fully connected (FC) block, a fully connected layer is followed by batch normalization, ReLU activation, and dropout. The FC block structure is repeated twice with different numbers of neurons in the fully connected layers. The block diagram of our CNN model is shown in Figure 3.

CNN Hyperparameter Tuning
In a supervised classification task, many hyperparameters need to be tuned. such as the value of k in k-NN, or the number of hidden layers in MLP. Properly tuning the hyperparameters allows model not only to fit the training data but also to generalize on the test data that it has not been trained on. We performed an extensive search in the hyperparameter space to find the better hyperparameters for our model. The model was built block by block in each layer. For each layer of the CNN model, we performed an experiment by varying the hyperparameters-and then chose the hyperparameters that gave the best performance. The search spaces and final values after hyperparameter tuning are shown in Table 2. Table 3 lists the input/output shape, kernel shape, and number of parameters for each layer of this CNN model.

CNN Hyperparameter Tuning
In a supervised classification task, many hyperparameters need to be tuned. such as the value of k in k-NN, or the number of hidden layers in MLP. Properly tuning the hyperparameters allows model not only to fit the training data but also to generalize on the test data that it has not been trained on. We performed an extensive search in the

Experimental Evaluation
In this section, we describe the experimental setup and the dataset on which the identification tests were carried out. The results were obtained in two different scenarios: (1) classifying IoT devices into 7 categories and (2) classifying 18 IoT device models. The results for these two scenarios are shown below.

Experimental Setup and Datasets
Three different real device datasets available for public use that can be used for IoT device identification. Their names, creation year, and number of devices are as follows: Aalto University [9], 2016, 27 devices; UNSW dataset [18], 2016, 28 devices; IoTFinder [25], 2019, 51 devices. During the selection of the dataset to be used in our experiments, we found that the Aalto University dataset contains only network traffic from the device installation process. Although this installation process was repeated 20 times to increase the quantity of data, this dataset is still too small compared with the others. The UNSW dataset was built by collecting network traffic data of IoT devices in normal working environments rather than during the device installation process. The raw PCAP file size is 11.3 GB, which is large enough for deep learning evaluation. The IoTFinder dataset relies only on DNS traffic to identify IoT devices, which is not suitable for our objective. Thus, considering the advantages of the UNSW dataset, we chose to use this dataset for all the evaluations and analyses presented below. This dataset contains various types of devices, including lights, cameras, hubs, and healthcare devices. Table 4 provides detailed information about the devices. The TP-Link router is a gateway to the Internet. The WAN interface of the router was connected to the Internet, and the IoT devices were connected to the LAN or WLAN interfaces. Some software was installed on the gateway such as the tcpdump tool for collecting the network traffic of all devices. Then the collected network traffic was stored in PCAP files. We parsed the PCAP files and extracted informative features in accordance with the MAC address of each device. The dataset is organized by date, with one PCAP file for one day. In total, we downloaded 20 PCAP files corresponding to 31 device types. However, the original dataset contains several devices such as iPhones, laptops, and routers which cannot be categorized as IoT devices. Because the objective of this research is to study the relationship between network traffic behavior and IoT device type or model, we ignored these non-IoT devices for the purity of the dataset. We used the MAC addresses to aggregate the data from the raw PCAP files. In addition, the MAC addresses of several devices could not be found in the PCAP files and thus were also ignored. After cleaning the data, we finally obtained 18 devices. The PCAP file for each day was parsed using the MAC addresses of the devices. Finally, a long sequence of packet direction and size was generated for each device. We cut each long sequence into slices every 500 entries to obtain our feature vectors, as described in Section 3.1. The statistics on the distribution of the data samples for each device after the completion of the above data preprocessing steps are shown in Table 5. Table 5. Statistics about the number of data instances for each category.

Device Data Instances
Dropcam 8266

TP-Link Smart plug 45
Three metrics were used for the performance evaluation of our model. The definitions of metrics are listed below; they are precision, recall, and F1-score. TP is true positive, FP is false negative, FN is false negative.

Device Category Identification
In this experiment, we explored the capability of our model to classify devices into different categories. For this experiment, we generated a new dataset from the original dataset by grouping the devices into different device categories, e.g., hubs or cameras. The statistics of this dataset are shown in Table 6. This dataset consists of the categories from Table 4. Table 6. Statistics on the number of data instances for each category.

Device Data Instances
Hubs 2940

Light bulbs 251
Electronics 341 We used the CNN model described in Table 1. The generated dataset was split at a ratio of 80%:10%:10% for training, validation, and testing. The experimental results including precision, recall, and F1-score are listed in Table 7, and Figure 4 illustrates the confusion matrix of the classification results.

Device Model Identification
In this experiment, we evaluated the classification accuracy of the proposed model for device model fingerprinting. The goal of the classifier was to distinguish distinct devices. The experiment was conducted with two models for comparative study. First, we used a traditional neural network model, an MLP, for classification. The architecture of the MLP used is described in Table 8. The dataset was split at a ratio of 80%:10%:10% for the training, validation, and testing. Figures 5 and 6 show the training progress of the MLP

Device Model Identification
In this experiment, we evaluated the classification accuracy of the proposed model for device model fingerprinting. The goal of the classifier was to distinguish distinct devices. The experiment was conducted with two models for comparative study. First, we used a traditional neural network model, an MLP, for classification. The architecture of the MLP used is described in Table 8. The dataset was split at a ratio of 80%:10%:10% for the training, validation, and testing. Figures 5 and 6 show the training progress of the MLP from the perspectives of accuracy and loss, and it can be seen that the accuracy curve was not stable in the last several epochs of training.     Second, the proposed CNN model was tested on the same data. The dataset was again split at a ratio of 80%:10%:10% for the training, validation, and testing. Figure 7 shows the progress of training in terms of accuracy as the number of epochs increased. We selected the maximum number of epochs to be 30 to achieve the desired accuracy. Figure 8 illustrates how the loss was is reduced as the number of epochs increased. The loss we used during training was the cross-entropy loss. As the value of the loss decreased, the predictions improved. Second, the proposed CNN model was tested on the same data. The dataset was again split at a ratio of 80%:10%:10% for the training, validation, and testing. Figure 7 shows the progress of training in terms of accuracy as the number of epochs increased. We selected the maximum number of epochs to be 30 to achieve the desired accuracy. Figure 8 illustrates how the loss was is reduced as the number of epochs increased. The loss we used during training was the cross-entropy loss. As the value of the loss decreased, the predictions improved.     Tables 9 and 10 show the experimental results in terms of the precision, recall, and F1-score of each class of MLP and CNN, respectively. On average, the precision achieved is 99%, and the recall achieved is 99%. The F1-score is a measure that combines precision and recall; on average, the F1-score achieved was 99%. Figures 9 and 10 illustrate the confusion matrices of the classification results. Table 9. The precision, recall, and F1-score results and the numbers of test data for the MLP.  Tables 9 and 10 show the experimental results in terms of the precision, recall, and F1-score of each class of MLP and CNN, respectively. On average, the precision achieved is 99%, and the recall achieved is 99%. The F1-score is a measure that combines precision and recall; on average, the F1-score achieved was 99%. Figures 9 and 10 illustrate the confusion matrices of the classification results.      Figure 10. The confusion matrix of the classification results for the CNN.

Discussion and Limitation
In our study, we performed device identification by fingerprinting the packet length of network traffic flows via a deep learning algorithm. A certain number of successive packets from a specific device were used to construct a sequence that we took as a fingerprint. The experimental results show that this method is effective and efficient. For comparison, IoT Sentinel uses the first 12 packets during the device installation process to extract the feature vector, which is an approach that cannot be directly applied to the UNSW dataset. Thus, we cannot present a comparison with the results of IoT Sentinel. The authors of IoTSense did not provide their private dataset, so we cannot reproduce their methods with the UNSW dataset. IoTDevID leverages feature extraction techniques similar to those of IoT Sentinel and IoTSense, with some modifications and performance enhancement. Therefore, the results of IoTDevID on the UNSW dataset are a good benchmark for comparison. Compared with previous work on the same dataset, namely, UNSW [18], the work of Msadek et al. [17], and IoTDevID [11], the proposed CNN model achieves superior performance in classification accuracy, meaning that this algorithm can identify IoT devices with very high accuracy. Compared with a shallow neural network (MLP), the accuracy of classification is also boosted significantly by leveraging deep learning (CNN). A performance comparison of those different methods is shown in Table 11.
The dataset we used in this study is unbalanced in terms of the number of data instances for each device. As a result, the classification precision for devices such as the iHome and TP-Link Smart plug is poor because of insufficient training data. One solution may be to use data augmentation techniques, but this is beyond the scope of this paper. Another limitation of deep learning is that the quantity of data needed to train a network increase as the network becomes deeper. Simply, the more complicated a model is, the more training data are needed. Therefore, deep learning models do not perform as well on small datasets. Traditional machine learning algorithms can usually perform well with fewer data.
Another limitation of this work is that to construct the feature vector, it is necessary to use 500 consecutive packets in a flow, making the latency of the identification system larger than that in previous works, such as IoT Sentinel and IoTSense which use only tens of packets. In the future, we will explore how to use a smaller feature vector size while maintaining high identification accuracy.

Conclusions
This work proposes an IoT device fingerprinting method that uses only the directions and lengths of packets in a sequence as input features. This method reduces the effort for manual feature engineering from packet metadata compared to many previous works. Moreover, it leverages deep learning techniques (specifically, a CNN) for more accurate IoT device identification. The proposed method can effectively recognize device identity with an accuracy of over 99%. The use of the CNN demands many more computational resources than previous works, but this issue can be solved by deploying the proposed fingerprinting system on a local network server or gateway rather than on IoT devices. In conclusion, we developed a fingerprinting method using only the directions and lengths of packets to summarize the network traffic of IoT devices. Our study shows that packet length and direction are important features of the network traffic generated by IoT devices and that IoT device identification tasks can be successfully performed with only these features. In addition, a CNN with one-dimensional convolutional layers is a powerful tool for processing sequence data of this kind. Our study proposes a new direction for fingerprinting IoT devices based on automatic feature extraction from raw data using deep learning rather than manual feature engineering.
There are possibly several future research directions for device identification. First, different kinds of representations for network traffic can be explored. For example, network traffic is represented by sequence in this work, it can also be transformed to image which contains more information. Second, beyond the device type and model identification, device behavior identification needs to be studied in fine granularity. Last but not least, fast online device identification systems need more exploit in the future.  Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://iotanalytics.unsw.edu.au/iottraces (accessed on 10 October 2022).