Network Anomaly Detection inside Consumer Networks—A Hybrid Approach

: With an increasing number of Internet of Things (IoT) devices in the digital world, the attack surface for consumer networks has been increasing exponentially. Most of the compromised devices are used as zombies for attacks such as Distributed Denial of Services (DDoS). Consumer networks, unlike most commercial networks, lack the infrastructure such as managed switches and ﬁrewalls to easily monitor and block undesired network trafﬁc. To counter such a problem with limited resources, this article proposes a hybrid anomaly detection approach that detects irregularities in the network trafﬁc implicating compromised devices by using only elementary network information like Packet Size, Source, and Destination Ports, Time between subsequent packets, Transmission Control Protocol (TCP) Flags, etc. Essential features can be extracted from the available data, which can further be used to detect zero-day attacks. The paper also provides the taxonomy of various approaches to classify anomalies and description on capturing network packets inside consumer networks.


Introduction
Multiple sweeping attacks on key Internet services around the world have been launched with botnets powered by Zombie Internet of Things (IoT) devices such as security cameras and wireless routers, with attack bandwidth topping 1.1 Terabits per second [1]. This shows that compromised IoT devices can pose a huge threat if hacked successfully. Not only that, but they also act as a pivot point inside a network from where perpetrators can exploit other devices, sniff sensitive information from the local network, and use device resources to mine cryptocurrencies, etc.
Today's commercial intrusion detection systems are primarily signature-based, which means they depend on predefined signatures of known attacks or carefully set up rules to filter out any possibility of attacks [2]. These require frequent signature updates and operators proactively update rules for these systems to work effectively. Thus, without frequent updates, these devices fail to detect the latest threats, not to mention they cannot protect against Zero-Day attacks due to their inherent nature of having no previous instances of them being used and hence no signatures to compare [3].
Network Anomaly detection is a wide topic which has been studied in numerous research, articles, surveys as well as books [4][5][6]. A major part of existing works aims at thwarting attacks on commercial networks [7]. One of the most common approaches to detecting anomalies is Outlier detection which has been studied by the statistical community [4,[8][9][10][11][12][13][14][15][16], but with recent advancements in machine learning, it has since been playing a notable role in anomaly detection. While it seems attractive conceptually, this approach has its own set of drawbacks, such as high false-positive rates to the intrinsic complexity of the system, determining which exact event triggered the alarm, etc. These problems need to be addressed before a wide adaptation of anomaly-based detection systems.
To help understand the advantages of an intelligent system in anomaly detection, this paper discusses a hybrid approach based on the One-Class Support Vector machine (OCSVM) using the Entropy of various network features to classify anomalies. It also provides a definite approach for collecting and analyzing network traffic, also portraying essential features that can be extracted from this data; and specifically used for marking zero-day vulnerabilities. The performance and accuracy of the models, One-Class Support Vector Machine, and Isolation Forest have been compared in this paper. The models were trained on the features derived from Table 1 and have been listed in Table 2. The results have been mentioned in Tables 3 and 4 and have been discussed in Section 4.

Motivation and Problem Statement
The number of IoT devices connected to the Internet is projected to grow to 28 billion by 2021 [17]. Most of these IoT devices are inherently insecure due to common default passwords, open and unauthenticated telnet ports, outdated and unpatched Linux firmware, and unencrypted transmission of sensitive data, etc [18]. With the increase in the number of Internet-connected devices, the number of attacks targeting such devices has increased at the same time.
Identification of common threats because of intelligent embedded codes is tedious, and it is required to taken certain steps to increase the identification of threats by leveraging the properties of machine learning and intelligent solutions [19,20]. These attacks become more dominant when targeted inside consumer households. These households do not consist of dedicated firewalls or managed switches with monitoring capabilities; thus, novel solutions that can run on low powered devices need to be developed which allow detection and prevention of compromised devices or attacks.

Distinction from Existing Research
Network anomaly detection research has been targeted towards commercial scenarios. Commercial networks are a target for a multitude of attacks that are not a concern for a consumer network.
Distributed Denial of Service (DDoS) attacks are generally aimed at essential internet infrastructure and websites, and mitigating such attacks has been a challenge taken up by many researchers. The key reason being the fact that a large portion of the attack bandwidth comes from "Zombie" devices found in consumer networks. There is research to detect DoS attacks on the LAN inside consumer networks [21], but very little light has been shed on detecting if a device on the network is acting as a part of a botnet to execute large scale DDoS attacks.
Generally, the comparison is made using the DARPA's KDD'99 Dataset [11,14,22]; however, this dataset is not only outdated but cannot be used to evaluate machine learning algorithms for network anomaly detection inside consumer networks. The dataset containing traffic from eight households (Section 3.4.3) and also synthetic attacks were utilized trying to simulate the real-world scenario as close as possible. Details about capturing and processing data are discussed in Section 3.4.2.
Detecting anomalies on a consumer network has its own set of challenges: • Lack of data points: Consumer networks do not have subnets containing hundreds of devices generating a multitude of traffic over various protocols which can be analyzed for patterns. For example: In a typical work environment, one can monitor traffic on the DC analyzing user login data or the average overall bandwidth use whose patterns correlate work hours; which cannot be done on a typical consumer network. • Unavailability of public datasets: For commercial networks, publicly available labeled datasets with millions of records and attack types have been made available for research purposes [23]. There has also been work done on the quality of such datasets. However, consumer network data have not been collected or published for privacy reasons. • Lack of infrastructure: There aren't managed switches or routers with monitoring capabilities in a consumer network. In such networks, packet capturing tools need to be installed on users' computers or a low power device like a Raspberry Pi has to be installed on the network to monitor the network.

Anomaly Detection Methods
Details of different types of anomaly detection methods are provided below:

Supervised Classification Based Anomaly Detection
This approach is a supervised learning approach where the model is trained using a labeled dataset, which then tries to classify new data based on the training data. Linear classification tries to find a line between the classes [24], but the classification boundary may be nonlinear too [24], as seen in Figure 1. These techniques have a low false-positive ratio subject to suitable thresholds [24]. The prime drawback of these is that they are highly dependent and biased on the training data and thus generally cannot identify anomalies it has not observed before, which defeats the purpose of this approach.

Statistical Anomaly Detection
Any irrelevant entity which is not generated by the stochastic model is considered as an anomaly in this model as stated in [24]. Thus, the occurrences with a low probability of being generated are treated as anomalies. The prime advantage of this technique over the others is that it does not require "Prior Knowledge" of the network's normal activity [6]; thus, it can provide accurate results about anomalous activity [24]. One promising approach to detect network anomalies is an Entropy-based approach. Entropy is a measure of the uncertainty or randomness associated with a random variable. If it was more random, it contains more entropy [25]. The primary drawback of this approach is that attackers can "train" the detection model until the traffic is considered as normal according to the statistical model. This approach also requires a lot of training data to create baseline entropy for each feature being targeted, which is difficult to obtain in a consumer setting. Since the margins for triggering alerts based on entropy values have to be manually set, there is room for error, and optimization of these values is difficult.

Clustering and Outlier Based Novelty Detection
Clustering and outlier approaches can be studied as unsupervised and supervised approaches as explained below:

Unsupervised Approach
Fundamentally, grouping data into various sets of similar objects is called clustering, as represented in Figure 2a. In anomaly detection, the primary assumption made is that the larger clusters are normal and the rest of the clusters can be considered as anomalies [24]. In Figure 2b, the points which do not fit into any of the clusters are considered outliers (anomalous data-points). It is also worth noting that the unsupervised learning approach works with unlabeled data. Due to most unsupervised learning models using both outlier detection and clustering, the computation complexity can be quite high [24]. The prime advantage over other techniques is that the outliers can be detected with small datasets. In addition, due to its nature, small isolated bursts can be detected.
Being closely related to statistical models, this approach also suffers from attackers being able to "train" the detection model until the traffic is classified as normal.

Semi Supervised Approach
Semi-supervised learning is prevalent in scenarios where there is very little anomalous data available to train the model, but non-anomalous data are readily available, thus it is trained on a single class of data and then detects novelties. Thus, this is also commonly referred to as novelty detection [26]. One-Class Support Vector Machine classifiers (OCSVM) are favorable in case of anomaly detection as they do not require pre-labeled data sets which are expensive or difficult to obtain [27].

Proposed Hybrid Approach for Anomaly Detection
Since network usage patterns might keep changing inside a household, e.g., kids browsing a lot of different sites on holidays or while doing homework, relying only on the entropy of network traffic features can be unreliable. This paper proposes to use the normalized entropy of features mentioned in Table 2 to be processed by One-Class SVM for anomaly detection inside consumer networks.

Application Model
A modern household can be considered as an application scenario equipped with smart televisions, smart refrigerators, home automation systems, and other connected embedded devices. These devices, when compromised, can be utilized as zombies for DDoS Attacks [28]. In the setup used in this paper, the network data were captured from eight households with varying numbers and types of IoT devices. Details about test setup are mentioned in Section 3.4.2.

Entropy
Entropy is the measurement of uncertainty. One can say that the network entropy tends towards 1 when there is a lot of randomness in the traffic. For example, in a Distributed Denial of Service (DDoS) attack period, the entropy of source Internet Protocol (IP) will increase due to the number of source IPs of attackers being uncertain [25]. Similarly, destination IP entropy decreases as most packets that will have the destination IP will be those of the victim. Thus, the entropy values would deviate from its normal baseline values, and that can be classified using the OCSVM model for classification Using entropy of features for classification with OCSVM can help eliminate most of the demerits of entropy-based anomaly detection as discussed in the explanation of anomaly detection approaches while also utilizing the merits of Support Vector Machine (SVM) since entropy can represent network traffic changes better and One-Class SVM can classify values better.

Normalized Entropy
By the definition of the entropy [25], it is expressed using the probable values for a given variable (feature in the proposed approach) and its probability distribution. Considering that for a random variable, X, its values are observed for a fixed time window, its probability distribution can be expressed as n i /n, where n i is the number of times the value is observed for the variable (feature) and n is the total number of values for the variable (feature). The Normalized Entropy is calculated as H/ log n 0 , where n 0 is the number of distinct values in time interval. The steps for entropy calculation can be followed in Algorithm 1 [25].

Algorithm 1 Normalized entropy
Input: Network traffic for a feature from Table 1 Output: Entropy value for network feature

Methodology
The complete process from packet being captured to classification has been represented in Figure 3; below are the steps followed:   Number of Domain Name System (DNS) queries in T Seconds 5 Number of ACK packets in T seconds 6 Number of RST packets in T seconds 7 Number of FIN packets in T seconds 8 Number of SYN packets in T seconds 9 Number of IPv4 Frames in T Seconds 10 Time between two frames over T seconds

Tools Used for Evaluations
Network sniffer wireshark [29] was used to capture network data which was then stored in a PCAP file. The PCAP file which consists a lot of disposable data was then processed to remove unnecessary data, compressed, and converted to CSV files using netcap [30]. Python library matplotlib [31] was then utilized to plot the necessary graphs to identify useful features. Python Library pandas [32] was then utilized to extract these features from the data. Machine Learning library scikit learn [33] was utilized for training ML Models.

Capturing and Processing Data
As mentioned in Section 1.2, capturing and processing data are one of the biggest challenges in this field of research-more so, for networks without the necessary hardware to do so. Consumer routers do not have a dedicated monitor port on them as most commercial switches do. Thus, a low power dedicated device such as a Raspberry Pi can be used for capturing and storing network traffic. A similar network monitor device has been mentioned in [34]. A capture device can be set up in two ways inside a local network • As a server which listens to packet capture data sent via dedicated programs installed on individual devices in the network. This configuration is not suitable as the approach in this paper focuses on monitoring traffic from IoT devices and programs which capture network traffic and send that to a server that cannot be installed on these devices, as shown in Figure 4. • As a network bridge that sits between the router and devices. This allows the capture device to trace all the traffic across the network, as shown in Figure 5.

Network Dataset
The network data was captured from eight different household networks each containing a multitude of IoT devices. The captured data were then manually cleaned for training the models. Synthetic network traffic for creating anomalies such as internet sweeps and performing DDoS attacks using devices on the network was also generated. Network traffic from well-known malware such as Mirai whose source code is readily available was also simulated. The features mentioned in the Table 1 were extracted from the data to be used in the machine learning models.

Results and Discussion
All of the models were optimized to use the best possible hyperparameters. The dataset is randomly split into training and test sets with an 80:20 ratio. Anomalous data-points are tested directly on the trained model. The results can be visualized in Figure 6. In Tables 3 and 4 containing the results, the Feature column corresponds to the features mentioned in Table 2 and in Table 5 the features column corresponds to the features in Table 1. Results of other techniques anomaly detection techniques on which research has been published haven't been compared since a major part of it relies on the KDD99 Dataset, which is outdated (discussed in [23]); moreover, this paper aims at consumer network traffic instead of commercial network traffic.   Anomalous detection rate is defined as the ratio of anomalous data-points classified as anomalies to the total number of anomalous data-points the classifier was tested upon, denoted by DR Non Anomalous detection rate is defined as the ratio of non-anomalous data-points not classified as anomalies to the test dataset split denoted by DR(na)

DR(na) =
No. of non-anomalous data-points marked negatives Total No. of non-anomalous data-points (2) Anomalous False positive rate is defined as the ratio of anomalous data-points marked as negatives to the total number of anomalous data-points the classifier was tested upon FPR(a)

FPR(a) =
No. of anomalous datapoints marked negatives Total No. of anomalous data-points Non Anomalous detection rate is defined as the ratio of non-anomalous data-points classified as anomalies to the test dataset split FPR(na)

FPR(na) =
No. of non-anomalous data-points marked as positives Total No. of non-anomalous data-points (4)

Isolation Forest
Even with high dimensional data such as the network traffic used in this experiment, isolation forest proves to have sub-par performance amongst the three algorithms compared as seen in Table 3. Its average false positive rate is the highest (16.618%) amongst the three algorithms used. The average detection rate is also the lowest (83.757%). It is worth noting that this approach identifies normal traffic more accurately than OCSVM, but the other inaccuracies outweigh this algorithm's usefulness. The subpar performance might be due to imbalanced classes or being an unsupervised model which was not provided labeled data for training. Table 3 clearly show that OCSVM outperforms an unsupervised model such as Isolation Forest, by having a better detection rate 87.909% as compared to 83.757% by isolation forest, not only detection rates but also a false positive rate for OCSVM is 11.333% as compared to 16.618%, thus showing that it certainly is superior to Isolation Forest in terms of anomaly detection inside consumer network datasets. Like Isolation forest, optimizing the hyperparameters did help overcome the problems due to imbalanced classes. Although OCSVM requires a small sample size to train the model and proves to be accurate in most cases. In [35], the only downside is the higher than an acceptable false-positive rate, which the prosed hybrid approach surpasses with a sub 10% false-positive rate of 8.442%. This is most likely because raw network traffic data cannot simply be classified as anomalous and normal due to a lot of factors being in play.

Proposed Hybrid Approach
The hybrid approach not only has the best average detection rate ( 91.55% as compared to 83.757% and 87.909% ) but also has a lower false-positive rate than One-Class Support Vector Machine and Isolation Forest (8.442% as compared to 11.333% and 16.618%), which shows that utilizing entropy values for classification using OCSVM provides better results for detecting network anomalies inside consumer networks than using common ML models.

Detecting Attacks
Consumer networks are not usually on the receiving side of DDoS Attacks since they are not lucrative targets for cybercriminals, but devices inside a consumer network acting as zombies for a DDoS attack can be identified since they would be sending a lot of packets to the same IP or Port compared to usual. Alongside this, the deviation from usual traffic by being controlled by a botnet (device communicates with the Command and Control Server periodically), and more than the usual amount of traffic of specific kind on a certain port can be used to detect a zero-day attack exploiting a vulnerable service running on a certain port. This way, the proposed method helps detect attacks inside consumer networks.

Conclusions
As this research demonstrates, network anomalies can be detected with high accuracy without requiring large labeled datasets. The models proved to accurately detect common network anomalies such as DDoS attacks and port scans being performed by network devices. Other "Novelties" in network traffic were also flagged, which demonstrates the ability to detect Zero-Day attacks on or using consumer devices. The proposed hybrid approach seems to be the forerunner for practical purposes due to the high detection rate, low false-positive rates, and the fact that there is no need for extensive labeled datasets that are unavailable for consumer networks.
As future works, a deep learning approach using Deep Auto Encoder Networks might also prove effective since the semi-supervised ML model OCSVM has also shown promising results in this research. Combining other models with different trade-offs offers a promising direction for future research for securing the next generation of IoT devices.