# An Entropy-Based Network Anomaly Detection Method

## Abstract

## 1. Introduction

- Section 2 reviews related work in the area of network anomaly detection.
- Section 3 introduces the definition of Shannon entropy and describes Renyi and Tsallis generalizations. Brief overview as well as comparison of entropy measures are provided.
- Section 4 presents the architecture of the proposed method. Detailed specification as well as results of implementation are given.
- Section 5 refers to the dataset developed in order to evaluate a performance of the proposed method.
- Section 6 presents results of verification of the method.
- Section 7 finishes this article providing conclusions and short summary. It also outlines further work.

## 2. Related Work

#### 2.1. General Overview of Network Anomaly Techniques

#### 2.2. Closely Related Work

#### 2.2.1. Detection via Counters

#### 2.2.2. Detection via Feature Distributions

#### Shannon Entropy

#### Generalized entropy

#### Others Techniques

## 3. Entropy

#### 3.1. Shannon Entropy

- Nonnegativity ${\forall}_{p({x}_{i})\in [0,1]}{H}_{s}(X)\ge 0$;
_{s}(p(x_{1}), p(x_{2}), …) = H_{s}(p(x_{2}), p(x_{1}), …); - Maximality ${H}_{s}(p({x}_{1}),\dots ,p({x}_{n}))\le {H}_{s}\left(\frac{1}{n},\dots ,\frac{1}{n}\right)={\mathrm{log}}_{a}(n)$;
_{s}(X, Y) = H_{s}(X) + H_{s}(Y) if X and Y are independent variables.

#### 3.2. Parameterized Entropy

- expose concentration for α > 1 and dispersion for α < 1,
- converge to Shannon entropy for α → 1,

#### 3.3. Comparison

#### 3.3.1. Binominal Distribution

#### 3.3.2. Uniform Distribution

#### 3.3.3. Impact of Frequent and Rare Events

**Example 1.**Let’s assume a discrete random variable X = ip addresses observed in network within last 1 min. X = {“10.1.0.1”, “10.1.0.2”, “10.1.0.3”, “10.1.0.4”, “10.1.0.5”}. Suppose the following number of occurrences for the subsequent ip addresses Freq = {96, 1, 1, 1, 1}. Based on frequencies let’s estimate the following probability distribution of X (see Table 1).

^{α}existing in both Renyi and Tsallis formulas in Equation (8), Equation (12). The results are presented in Table 2.

_{i}) = 0.96) on the entropy is greater than impact of rare events (expressed by p(x

_{i}) = 0.01) when positive α-values are used and in opposite the impact of rare events is greater than frequent events when negative α-values are used.

## 4. Anode—Entropy-Based Network Anomaly Detector

#### 4.1. Architecture

_{i}) > 1 indicate abnormal concentration or dispersion. These abnormal dispersion or concentration for different feature distributions are characteristic for anomalies. For example, during a port scan, a high dispersion in port numbers and high concentration in addresses should be observed. Detection is based on the relative value of entropy with respect to the distance between min and max. Coefficient k in the formula determines a margin for min and max boundaries and may be used for tuning purposes. A high value of k, e.g., k = 2, limits the number of false alarms (alarms where no anomaly has taken placed) while a low value (k = 1) increases the detection rate (the percentage of anomalies correctly detected). Some other approaches to thresholding based on standard deviation – mean ± 2sdev, median absolute deviation – median ± 2mad [97] has been also taken into consideration but empirical results proved that proposed rule is the best choice. The detection is based on the results from all feature distributions presented in Table 3. Classification is based on popular classifiers (decision trees, Bayes nets [98], rules and functions) employed in Weka [99]. Extraction of anomaly details is also assumed – related ports and addresses are obtained by looking into the top contributors to the entropy value.

#### 4.2. Implementation

## 5. Dataset

#### 5.1. Origin of the Idea

- limited availability of such datasets;
- the lack of proper labeling in shared datasets;
- the fact that most of available datasets are obsolete in terms of legitimate traffic and anomalies;
- the absence of realistic data in synthetic datasets;
- small number of dataset with flows (conversion from packets is needed, labels are lost);
- incompleteness of data (narrow range of anomalies, lack of anomalies related to botnet-like malware);

#### 5.2. Legitimate Traffic

#### 5.3. Scenario 1

#### 5.4. Scenario 2

- One of the host in local network gets infected with a botnet-like malware. In order to propagate via network it starts scanning his neighbors. Malware is looking for hosts running Remote Desktop Protocol (RDP) services. RDP is a proprietary protocol developed by Microsoft, which provides a user with a graphical interface to connect to another computer over a network. RDP servers are built into Windows operating systems. By default, the server listens on TCP/UDP port 3389.
- Hosts serving Remote Desktop services are attacked with a dictionary attack (similarly to the technique found in MORTO worm [127]).
- After successful dictionary attack vulnerable machines are infected and become a member of botnet.
- A peer-to-peer communication based on UDP transport protocol is established among infected hosts.
- On C&C server command botnet members start a low rate Distributed Denial of Service attack called Slowrolis [128] on an external HTTP server. After a few min the server is blocked.

#### 5.5. Scenario 3

- One of the local host which is infected with a modern botnet malware starts scanning his neighbors in order to propagate via network. It uses similar network propagation mechanism as it is employed in Stuxnet worm [129,130]. Malware is looking for hosts with open TCP and UDP ports reserved for Microsoft Remote Procedure Call (RPC). In Windows RPC is an interprocess communication mechanism that enables data exchange and invocation of functionality residing in a different process localy or via network. The list of ports used to initiate a connection with RPC is as follows: UDP – 135, 137, 138, 445, TCP – 135, 139, 445, 593.
- Hosts with an open RPC ports are attacked with a specially crafted RPC requests.
- After successful attack, vulnerable machines are infected and become a member of botnet.
- A direct communication to a single C&C server is established on each infected host.
- On C&C server command botnet members start a DDoS amplification attack based on Network Time Protocol (NTP). This attack is targeted to an external server. Botnet members send packets with a forged source IP address (set to this used by the victim). Because the source IP address is forged the remote server replies and sends data to the victim. Moreover attack is amplified via NTP. Thus attackers send a small (234 bytes) packet “from” a forged source IP address with a command to get a list of interacting machines and NTP server sends a large (up to 200 times bigger) reply to the victim. As a result attackers turn small amount of bandwidth coming from a few machines into a significant traffic load hitting the victim. More details regarding NTP amplification in DDoS attacks can be found in [131].

## 6. Verification of the Approach

#### 6.1. Correlation

#### 6.2. Performance Evaluation

- Take n binary classifiers (one for each class);
- For the ith classifier, let the positive examples be all the points in class i, and let the negative examples be all the points not in class i;
## 7. Summary

#### 7.1. Conclusions

- Tsallis and Renyi entropy performed best;
- Shannon entropy turned out to be worse both in Accuracy and False Positive Rate as well as weighted ROC curves;
- the volume-based approach performed poorly;
- using a broad spectrum of network traffic feature is essential to successfully detect and classify different types of anomalies; this was proved both by results of features correlation and good results of classification of different anomalies in tested scenarios;
- using α-values from a set {−2, −1, 0, 1, 2} is a proper choice; it was proved by results of α-values correlation and good results of classification of different anomalies in tested scenarios; using a bigger set of α values is redundant; using one α-value is not enough to recognize different types of anomalies;
- the most suitable classifier (among popular classifiers employed in Weka) to our approach is the SimpleLogistic which relay on linear regression.

#### 7.2. Further Work

**Figure 14.**Abnormally high dispersion in destination addresses for network scan anomalies (Renyi/Shannon).

**Figure 15.**Abnormally high concentration in flows duration for network scan anomalies (Tsallis/Shannon).

X | “10.1.0.1” | “10.1.0.2” | “10.1.0.3” | “10.1.0.4” | “10.1.0.5” |

p(X = x) | 0.96 | 0.01 | 0.01 | 0.01 | 0.01 |

α | −2 | 2 |
---|---|---|

p(x_{i}) | ||

0.96 | 1.08 | 0.92 |

0.01 | 10000 | 0.0001 |

Feature | Probability mass function |
---|---|

src(dst)address(port) | $\frac{number\phantom{\rule{0.2em}{0ex}}of\phantom{\rule{0.2em}{0ex}}{x}_{i}\phantom{\rule{0.2em}{0ex}}as\phantom{\rule{0.2em}{0ex}}src(dst)address(port)}{total\phantom{\rule{0.2em}{0ex}}number\phantom{\rule{0.2em}{0ex}}of\phantom{\rule{0.2em}{0ex}}src(dst)\phantom{\rule{0.2em}{0ex}}addresses(ports)}$ |

flows duration | $\frac{number\phantom{\rule{0.2em}{0ex}}of\phantom{\rule{0.2em}{0ex}}flows\phantom{\rule{0.2em}{0ex}}with\phantom{\rule{0.2em}{0ex}}{x}_{i}\phantom{\rule{0.2em}{0ex}}as\phantom{\rule{0.2em}{0ex}}duaration}{total\phantom{\rule{0.2em}{0ex}}number\phantom{\rule{0.2em}{0ex}}of\phantom{\rule{0.2em}{0ex}}flows}$ |

packets, bytes | $\frac{number\phantom{\rule{0.2em}{0ex}}of\phantom{\rule{0.2em}{0ex}}pkts\phantom{\rule{0.2em}{0ex}}(bytes)\phantom{\rule{0.2em}{0ex}}with\phantom{\rule{0.2em}{0ex}}{x}_{i}\phantom{\rule{0.2em}{0ex}}as\phantom{\rule{0.2em}{0ex}}src\phantom{\rule{0.2em}{0ex}}(dst)\phantom{\rule{0.2em}{0ex}}addr\phantom{\rule{0.2em}{0ex}}(port)}{total\phantom{\rule{0.2em}{0ex}}number\phantom{\rule{0.2em}{0ex}}of\phantom{\rule{0.2em}{0ex}}pkts\phantom{\rule{0.2em}{0ex}}(bytes)}$ |

in(out)-degree | $\frac{number\phantom{\rule{0.2em}{0ex}}of\phantom{\rule{0.2em}{0ex}}hosts\phantom{\rule{0.2em}{0ex}}with\phantom{\rule{0.2em}{0ex}}{x}_{i}\phantom{\rule{0.2em}{0ex}}as\phantom{\rule{0.2em}{0ex}}in\phantom{\rule{0.2em}{0ex}}(out)-degree}{total\phantom{\rule{0.2em}{0ex}}number\phantom{\rule{0.2em}{0ex}}of\phantom{\rule{0.2em}{0ex}}hosts}$ |

Type/kind | No. of flows | Duration [s] | No. of victims | No. of attackers |
---|---|---|---|---|

SSH brute force (bf) | ||||

1 | 1K | 300 | 1 | 1 |

2 | 1K | 100 | 1 | 1 |

TCP SYN flood DDoS (dd) | ||||

1 | 2K | 200 | 1 | 50 |

2 | 2K | 200 | 1 | 250 |

3 | 3K | 300 | 1 | 50 |

4 | 3K | 300 | 1 | 250 |

5 | 4K | 400 | 1 | 50 |

6 | 4K | 400 | 1 | 250 |

SSH network scan (ns) | ||||

1 | 6K | 60 | 6K | 1 |

2 | 6K | 300 | 6K | 1 |

3 | 8K | 80 | 8K | 1 |

4 | 8K | 400 | 8K | 1 |

Port scan (ps) | ||||

1 | 1K | 50 | 1 | 1 |

2 | 1K | 100 | 1 | 1 |

3 | 2K | 100 | 1 | 1 |

4 | 2K | 200 | 1 | 1 |

Type | No. of flows | Duration [s] | No. of victims | No. of attackers |
---|---|---|---|---|

Network scan (ns) | 252 | 200 | 252 | 1 |

RDP brute force (bf) | 720 | 550 | 53 | 1 |

Botnet p2p (p2p) | 150 | 185 | 15 | 15 |

Slowrolis DDoS (dd) | 1124 | 117 | 15 | 1 |

Type | No. of flows | Duration [s] | No. of victims | No. of attackers |
---|---|---|---|---|

Block scan (bs) | 1.5K | 80 | 168 | 1 |

RPC attack (rpc) | 650 | 200 | 90 | 1 |

Botnet C&C communication (c&c) | 125 | 190 | 63 | 1 |

NTP DDoS (dd) | 2.9K | 580 | 1 | 63 (spoofed to 1) |

α = −3 | α = −2 | α = −1 | α = 0 | α = 1 | α = 2 | α = 3 | ||
---|---|---|---|---|---|---|---|---|

Pearson | α = −3 | 1 | 0.99 | 0.96 | 0.66 | 0.12 | −0.06 | −0.09 |

α = −2 | – | 1 | 0.98 | 0.69 | 0.13 | −0.06 | −0.09 | |

α = −1 | – | – | 1 | 0.75 | 0.16 | −0.05 | −0.08 | |

α = 0 | – | – | – | 1 | 0.44 | 0.18 | 0.12 | |

α = 2 | – | – | – | – | – | 1 | 0.97 | |

α = 3 | – | – | – | – | – | – | 1 | |

Spearman | α = −3 | 1 | 0.97 | 0.837 | 0.46 | 0.06 | −0.09 | −0.11 |

α = −2 | – | 1 | 0.94 | 0.57 | 0.1 | −0.07 | −0.1 | |

α = −1 | – | – | 1 | 0.72 | 0.15 | −0.06 | −0.09 | |

α = 0 | – | – | – | 1 | 0.49 | 0.2 | 0.15 | |

α = 2 | – | – | – | – | – | 1 | 0.9 | |

α = 3 | – | – | – | – | – | – | 1 |

Pearson | src ip | dst ip | src port | dst port | in-degree | out-degree |
---|---|---|---|---|---|---|

src ip | 1 | 0.89 | 0.89 | 0.91 | 0.37 | 0.35 |

dst ip | – | 1 | 0.98 | 0.89 | 0.27 | 0.55 |

src port | – | – | 1 | 0.86 | 0.15 | 0.5 |

dst port | – | – | – | 1 | 0.41 | 0.53 |

ind-egree | – | – | – | – | 1 | 0.27 |

out-degree | – | – | – | – | – | 1 |

Spearman | src ip | dst ip | src port | dst port | in-degree | out-degree |

src ip | 1 | 0.9 | 0.85 | 0.87 | 0.47 | 0.69 |

dst ip | – | 1 | 0.96 | 0.89 | 0.43 | 0.83 |

src port | – | – | 1 | 0.83 | 0.3 | 0.69 |

dst port | – | – | – | 1 | 0.53 | 0.12 |

in-degree | – | – | – | – | 1 | 0.48 |

out-degree | – | – | – | – | – | 1 |

Pearson | src ip | dst ip | src port | dst port | in-degree | out-degree |
---|---|---|---|---|---|---|

src ip | 1 | −0.07 | −0.34 | −0.02 | −0.07 | 0.44 |

dst ip | – | 1 | −0.29 | 0.05 | 0.08 | −0.28 |

src port | – | – | 1 | −0.42 | 0.59 | −0.04 |

dst port | – | – | – | 1 | −0.39 | 0.01 |

in-degree | – | – | – | – | 1 | 0.03 |

out-degree | – | – | – | – | – | 1 |

Spearman | src ip | dst ip | src port | dst port | in-degree | out-degree |

src ip | 1 | 0.03 | −0.21 | 0.07 | 0.21 | 0.37 |

dst ip | – | 1 | −0.31 | 0.07 | 0.08 | −0.35 |

src port | – | – | 1 | −0.55 | 0.64 | 0.23 |

dst port | – | – | – | 1 | 0.52 | 0.76 |

in-degree | – | – | – | – | 1 | 0.18 |

out-degree | – | – | – | – | – | 1 |

Name | Formula |
---|---|

True Positive Rate (TPR) eqv. with Recall, Sensitivity | $TPR=\frac{TP}{TP+FN}$ |

True Negative Rate (TNR) eqv. with Specificity | $TNR=\frac{TN}{FP+TN}$ |

Positive Predictive Value (PPV) eqv. with Precision | $PPV=\frac{TP}{TP+FP}$ |

Negative Predictive Value (NPV) | $NPV=\frac{TN}{TN+FN}$ |

False Positive Rate (FPR) eqv. with Fall-out | $FPR=\frac{FP}{FP+TN}=1-TNR$ |

False Discovery Rate (FDR) | $FDR=\frac{FP}{FP+TP}=1-PPV$ |

False Negative Rate (FNR) | $FNR=\frac{FN}{FN+TP}$ |

Accuracy (ACC) | $ACC=\frac{TP+TN}{TP+FN+FP+TN}$ |

F1 score – harmonic mean of Precision and Recall | $F1=\frac{2TP}{2TP+FP+FN}$ |

ZeroR | Bayes Network | Decision Tree J48 | Random Forest | Simple Logistic | ||
---|---|---|---|---|---|---|

Accuracy | Tsallis | 0.66 | 0.89 | 0.90 | 0.93 | 0.93 |

Renyi | 0.66 | 0.88 | 0.89 | 0.90 | 0.93 | |

Shannon | 0.66 | 0.84 | 0.86 | 0.90 | 0.92 | |

volume-based | 0.66 | 0.72 | 0.77 | 0.76 | 0.80 | |

FPR | Tsallis | 0.66 | 0.07 | 0.08 | 0.07 | 0.06 |

Renyi | 0.66 | 0.08 | 0.09 | 0.11 | 0.09 | |

Shannon | 0.66 | 0.08 | 0.11 | 0.12 | 0.08 | |

volume-based | 0.66 | 0.21 | 0.15 | 0.22 | 0.20 |

ZeroR | Bayes Network | Decision Tree J48 | Random Forest | Simple Logistic | ||
---|---|---|---|---|---|---|

Accuracy | Tsallis | 0.68 | 0.82 | 0.84 | 0.85 | 0.91 |

Renyi | 0.68 | 0.83 | 0.88 | 0.89 | 0.92 | |

Shannon | 0.68 | 0.77 | 0.8 | 0.84 | 0.89 | |

volume-based | 0.68 | 0.68 | 0.73 | 0.78 | 0.80 | |

FPR | Tsallis | 0.68 | 0.22 | 0.14 | 0.27 | 0.11 |

Renyi | 0.68 | 0.15 | 0.12 | 0.2 | 0.11 | |

Shannon | 0.68 | 0.29 | 0.21 | 0.28 | 0.15 | |

volume-based | 0.68 | 0.68 | 0.2 | 0.15 | 0.28 |

ZeroR | Bayes Network | Decision Tree J48 | Random Forest | Simple Logistic | ||
---|---|---|---|---|---|---|

Accuracy | Tsallis | 0.68 | 0.83 | 0.83 | 0.87 | 0.93 |

Renyi | 0.68 | 0.83 | 0.83 | 0.85 | 0.94 | |

Shannon | 0.68 | 0.76 | 0.8 | 0.85 | 0.90 | |

volume-based | 0.68 | 0.68 | 0.62 | 0.65 | 0.66 | |

FPR | Tsallis | 0.68 | 0.13 | 0.17 | 0.22 | 0.1 |

Renyi | 0.68 | 0.13 | 0.16 | 0.22 | 0.06 | |

Shannon | 0.68 | 0.23 | 0.16 | 0.22 | 0.13 | |

volume-based | 0.68 | 0.68 | 0.57 | 0.45 | 0.67 |

TPR/FPR | |||
---|---|---|---|

Scenario1 | Scenario2 | Scenario3 | |

brute force | 0.78/0 | 1/0.01 | – |

network scan | 0.92/0.02 | 0.9/0 | – |

port scan | 0.92/0.01 | – | – |

block scan | – | – | 0.9/0.01 |

DDoS | 0.67/0.01 | 0.9/0 | 0.9/0.01 |

p2p | – | 0.3/0.02 | – |

c&c | – | – | 0.9/0.01 |

RPC exploitation | – | – | 0.7/0.01 |

not anomalous | 0.98/0.13 | 0.97/0.16 | 0.97/0.08 |

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

