Using Generalized Entropies and OC-SVM with Mahalanobis Kernel for Detection and Classification of Anomalies in Network Traffic †

Network anomaly detection and classification is an important open issue in network security. Several approaches and systems based on different mathematical tools have been studied and developed, among them, the Anomaly-Network Intrusion Detection System (A-NIDS), which monitors network traffic and compares it against an established baseline of a “normal” traffic profile. Then, it is necessary to characterize the “normal” Internet traffic. This paper presents an approach for anomaly detection and classification based on Shannon, Rényi and Tsallis entropies of selected features, and the construction of regions from entropy data employing the Mahalanobis distance (MD), and One Class Support Vector Machine (OC-SVM) with different kernels (Radial Basis Function (RBF) and Mahalanobis Kernel (MK)) for “normal” and abnormal traffic. Regular and non-regular regions built from “normal” traffic profiles allow anomaly detection, while the classification is performed under the assumption that regions corresponding to the attack classes have been previously characterized. Although this approach allows the use of as many features as required, only four well-known significant features were selected in our case. In order to evaluate our approach, two different data sets were used: one set of real traffic obtained from an Academic Local Area Network (LAN), and the other a subset of the 1998 MIT-DARPA Entropy 2015, 17 6240 set. For these data sets, a True positive rate up to 99.35%, a True negative rate up to 99.83% and a False negative rate at about 0.16% were yielded. Experimental results show that certain q-values of the generalized entropies and the use of OC-SVM with RBF kernel improve the detection rate in the detection stage, while the novel inclusion of MK kernel in OC-SVM and k-temporal nearest neighbors improve accuracy in classification. In addition, the results show that using the Box-Cox transformation, the Mahalanobis distance yielded high detection rates with an efficient computation time, while OC-SVM achieved detection rates slightly higher, but is more computationally expensive.


Introduction
The detection and prevention of attacks and malicious activities have led to the development of technologies and devices designed to provide a certain degree of security.One of the first technologies for countering attacks launched against computer networks were the Network Intrusion Detection Systems (NIDS).NIDS are classified into two groups: Signature-NIDS, which use a database with attack signatures, and Anomaly-NIDS, which use the principle of classifying the traffic into normal and abnormal in order to decide if an attack has occurred.
A-NIDS, also known in the literature as behavioral-based, make use of a model of normal inputs in order to detect security events.They try to establish what a "normal profile" or anomaly-free profile for system or network behavior is, using the network features or variables, e.g., destination and source IP Addresses and Port, packet size, number of flows, and amount of packets.
For anomaly detection [1], some traffic variables can be employed directly or functions of these variables, e.g., the entropy.Entropy-based approaches for anomaly detection are appealing, since they provide more information about the structure of anomalies than traditional traffic volume analysis [2].Entropy is used to capture the degree of dispersal or concentration of the distributions for different traffic features [3,4].The attractiveness of entropy metrics stems from their ability to condense an entire feature distribution into a single number while retaining important information about the overall state of the distribution.A sequence of packets from network traffic is captured, network features are selected, and the entropy of these network features are calculated.With the estimated values of entropy, the anomaly detection is performed.For this, a profile with "normal" traffic is generated, and the data that deviate from this profile will be considered anomalies.In work [5], starting from an H entropy matrix of normal traffic without outlier filtering, an ellipsoidal region based on the Mahalanobis distance was defined.
An improvement to [5] was proposed in [6] where the algorithm uses the Mahalanobis distance to the exclusion of outliers, and an ellipsoidal region was generated by calculating the parameters {x, γ, λ, LT }, where x is the mean vector of the H matrix, γ, λ are the eigenvectors and eigenvalues of the covariance matrix of H, and LT is the limit of the Mahalanobis distance for H [7].In both works, network traffic behavior was characterized by regular ellipsoidal regions.This paper proposes defining non-regular regions from training traces, i.e., "normal" traffic, through OC-SVM, which contains parameters that adjust the region to the training traces.Figure 1 shows different defined regions for the case of two variables.In other works (see [8,9]), the RBF kernel was used.However, this work proposes using the Mahalanobis kernel, which in general showed higher accuracy in classification than other methods.This paper is organized as follows: Section 2 gives an overview of related work in the area of network anomaly detection.Section 3 introduces the mathematical background, including different entropy estimators, distance metrics, and OC-SVM.Section 4 states the problem and the proposed methods associated with the definition of a region in the space R p that characterizes the entropy behavior of the p intrinsic variables associated with the traces.Section 5 presents the experiments carried out to define regions and to detect and classify anomalies employing two different types of data sets.Section 6 presents a discussion of the experimental results.Finally, Section 7 outlines the conclusions.

Related Work
Works dedicated to anomaly detection systems employ different features and entropy as a measure of dispersion, uncertainty, or randomness in order to detect changes in network traffic, which allows anomaly detection.Wagner et al. [3] justify the use of entropy, saying, "The connection between entropy and worm propagation is that worm traffic is more uniform or structured than normal traffic in some respects and more random in others."Xu et al. [4] propose a method based on the construction of a 3-dimensional feature space by reporting the contents of Shannon entropy of four intrinsic characteristics of the traffic (source and destination IP address, source and destination ports) as a mechanism for detecting intrusions.Nychis et al. [10] consider two types of distribution based on flow-header and behavioral features.They concluded that the port and address distributions are strongly correlated, both in their entropy time series and in their detection capabilities.Some authors ( [11][12][13][14]), have utilized generalized entropies (Tsallis and Rényi entropy), showing advantages over Shannon entropy, adapting the q parameter in order to improve the detection of anomalies.Ziviani et al. [11] investigated Tsallis entropy in the context of DoS attack detection and found empirically that a value of q around 0.9 provides high detection of this attack.On the other hand, Tellenbach et al. [12] utilized the set q ∈ {−3} ∪ {−2, −1.75, ..., 1.75, 2} in order to detect DDoS and scanning attacks.Ma et al. [13] used Tsallis entropy and Lyapunov exponent with chaotic analysis of the entropy of source and destination IPs to detect DDoS attacks, employing q = 1.1.Bhuyan et al. [14] used generalized entropy to describe characteristics of network traffic data and as an appropriate metric to facilitate building an effective model for detecting both low-rate and high-rate DDoS attacks, for q ∈ {1, 2, 3, ..., 15}.
At the classification stage, different techniques are used.In [5], the authors detected anomalies using regular regions obtained from "normal" network traffic through Mahalanobis distance (i.e., hyper-ellipsoids).In [8], Li et al. proposed the OC-SVM method for construction of non-regular regions, using the RBF kernel, and considering that "the normal data set is much larger than the abnormal."Zhang et al. [9] detected anomalies using the OC-SVM detector with RBF kernel.
Defining regular and non-regular regions in the feature space in order to detect and classify anomalies in network traffic using entropy, Mahalanobis distance, and OC-SVM, this paper proposes: • the use of Mahalanobis distance for construction of decision regions, • the novel inclusion of the MK in OC-SVM for classification improvement respect to RBF kernel, • the refinement of classification via the k-nn algorithm in the temporal sense.
In addition, the Box-Cox transformation was used to transform non-Gaussian distributed data to a set of data that has approximately Gaussian distribution, and fulfills the requirement of Gaussianity for the Mahalanobis distance.

Entropy Estimators
Let X be a random variable (r.v) which takes values of the set {x 1 , x 2 , ..., x M }, p i := P (X = x i ) the probability of occurrence of x i , and M the cardinality of the finite set; hence, the Shannon entropy is: Based on the Shannon entropy [15], Rényi [16] and Tsallis [17] defined generalized entropies, which are related to the q-deformed algebra and where P is a probability distribution.When q → 1 the generalized entropies are reduced to Shannon entropy.
In order to compare the changes of entropy at different times, the entropy is normalized, i.e., where the maximum value of Rényi entropy for the observation vector of size L is given by while the maximum of Tsallis entropy is given by The parameter q as shown in Equations ( 2) and ( 3) is used to make the entropy more or less sensitive to certain events within the distribution, thus modifying the entropy values, and consequently the entropy behavior.In addition, for a specific event with probability p selecting an appropriated value of q, the entropy value with respect to Shannon entropy can be increased (or decreased), see Figure 2. HS (P ) HR (P, q = 0.01) HR (P, q = 0.5) HR (P, q = 5) HT (P, q = 0.01) HT (P, q = 0.5) HT (P, q = 5) Figure 2. Entropy estimators (Shannon, Rényi, and Tsallis) for random variable (r.v) X with probabilities P x = {p, (1 − p)}.

Feature Space
Let X i t , i = 1, 2, ..., p be features or random variables of some phenomenon under study and R p a p-dimensional feature space or space where our variables live.When the phenomenon is observed during a time period T, N observations are collected.These observations can be studied one by one or by group.In our case, the N observations are partitioned into m sequences or windows of length L. For each sequence or time window, a functional f (•) is applied.As our purpose is the study of network traffic and the randomness of the features, we will employ the entropy as f (•), which maps a set of values of a sequence of R p into a point in R p .Let X j ∈ R p , j = 1, 2, ..., N be the vectors associated at p features, and H i , i = 1, .., m, the entropies associated at X j in each sequence.Therefore, we have X N ×p , a matrix representing the observations, and H m×p , the matrix of the entropy of the m sequences.
An H m×p matrix row represents a point in the p-dimensional feature space, and the m points generate a cloud, which characterizes the behavior of p variables of the phenomenon under study.The entropy values will be normalized, H(X p i ) ∈ [0, 1], in order to perform comparisons between the variables.

Mahalanobis Distance
The Mahalanobis distance is defined as [18]: where x ∈ R p is the sample vector, µ ∈ R p denotes the theoretical mean vector, and C ∈ R p×p denotes the theoretical covariance matrix.An unbiased sample covariance matrix is where the sample mean is Thus, Mahalanobis distance using Equations ( 8) and ( 9) is given by: One basic assumption preceding any discussion of the distribution properties of Mahalanobis distance is that the p-multivariate observations involved are the result of a random sampling of a p-variate Gaussian population having a mean vector µ and a covariance matrix C. As µ and C are theoretical values, for a data set containing x 1 , x 2 , .., x N samples, S and x are theirs estimated respectively, and the distribution of d 2 i (x i , S) is given by: where β [α,p/2,(N −p−1)/2] represents a beta distribution with a level of confidence α and parameters p/2 and (N − p − 1)/2, N is the number of samples and p the number of variables, see [7,19].
If the data sets does not follow a Gaussian distribution, a method to transform non-Gaussian distributed data to a data sets with an approximate Gaussian distribution should be employed.In this paper, the Box-Cox transformation [20] was used.This transformation is a family of power expressions y (z) = x z −1 z for z = 0 and y (z) = log(x) for z = 0, where z is the transformation parameter that maximizes the Log-likelihood function.

One Class Support Vector Machine and Mahalanobis Kernel
OC-SVM maps input data x 1 , ..., x N ∈ A (a certain set) into a high dimensional space F (via Kernel k(x, y)) and finds the maximal margin hyperplane which best separates the training data from the origin (see Figure 3).Theoretical fundamentals of SVM and OC-SVM were established in [21][22][23][24].In order to separate the data from the origin, the following quadratic program must be solved [21] min w∈F,b∈R,ξ∈R subject to (w where w is the normal vector, ϕ is a map function A → F , b is the bias, ξ i are nonzero slack variables, ν is the outlier parameter control, and k(x, y) = (ϕ(x), ϕ(y)).Moreover, the decision function is given by f By applying the kernel function and Lagrangian multiplier (α i ) to the original quadratic program, the solution of Equation ( 12) creates a decision function: where In this work, we used the Mahalanobis kernel (MK), which is defined as: , where C is a positive definite matrix.The Mahalanobis kernel is an extension of the Radial Basis Function kernel (RBF).Namely, by setting C = ηI, where η > 0 is a parameter for decision boundary control and I is the unit matrix, we obtain the RBF kernel: The Mahalanobis kernel approximation [25] used in this work is: where p is the number of variables, and S is defined by Equation (8).

Problem Statement
Let Ω be an Internet traffic data trace, called here Ω-trace, and p the number of random variables X i representing the traffic features.It is known that the temporal behavior of these variables in the case of "normal" traffic differs from that when there are attacks.On the other hand, in order to characterize these behaviors, entropy can be used, and then instead of studying the traffic features directly, their temporal entropy behaviors H i (t) will be studied.We have the following behaviors: • if Ω-trace was obtained during "normal" network traffic and the outlier exclusion was performed, it will be called β-trace, • if Ω-trace was obtained during a period containing "normal" traffic plus one or more attacks, it will be called ψ-trace.
The main problem is to find a region, R N or R A , in the feature space R p characterizing the temporal behavior of the entropy of the p intrinsic variables associated with a class determined by the traces, i.e., • if Ω-trace is β-trace, then a region R N ("normal" traffic) can be constructed and it will serve to detect the anomalies, • if Ω-trace is ψ-trace, then a region R A (abnormal traffic) can be constructed and will serve to classify the anomalies of this class.
Our approach to defining the "normal" R N or abnormal regions R A in the feature space uses Mahalanobis distance to construct regular regions (i.e., hyper-ellipsoids) and OC-SVM for non-regular regions.
Figure 4 shows the general architecture of the proposed method, which is composed of three parts: training, detection, and classification.Feature extraction, windowing, entropy calculation, and the Box-Cox transformation (for non-Gaussian data) are performed in the training and detection stages.In the training stage, the different regions in the feature space are defined and the decision functions are obtained.In the detection stage, the "normal" regions R N and the decision functions are used to detect anomalies in the current traffic.Finally, the anomaly is classified through defined regions R A of known classes.

Training Stage
An Ω-trace is divided into m non-overlapping slots of L packets each.Next, normalized entropy estimates by means of Equations ( 1)-(3) of each p variable for every j-slot of size L are obtained, using the relative frequencies pi = n i L , where n i is the number of times that the i-element appears in the j-slot.Then, the matrix H ∈ R m×p is built as follows: where H(X p j ) represents the normalized entropy estimation of the p variable of each j-slot obtained from Ω-trace.The H matrices are inputs of the algorithms for constructing the regions.
Algorithm for constructing regions based on the Mahalanobis distance (MD) method 1. Verify that the columns of the H matrix follow a Gaussian distribution.If the data are non-Gaussian, then a transformation is performed so that the new data approximately follow a distribution of this type.In this paper, the Box-Cox transformation was employed.
2. Perform the exclusion of outliers of the H matrix.The LT limit for Mahalanobis distance is calculated through Equation (11).
3. Calculate the mean vector x = {x 1 , x2 , ..., xp }, where the i-element is the mean of the i-column of the H matrix, see Equation (9).
4. Calculate the covariance matrix S of the H matrix.As the S matrix is positive definite and Hermitian, all its eigenvalues λ 1 ≥ λ 2 ≥ ... ≥ λ p are real and positive, and its eigenvectors γ 1 , γ 2 , ..., γ p form a set of orthogonal basis vectors that span the p-dimensional vector space.
5. Solve the matrix equation Sγ = λγ according to a specific algorithm in order to obtain the eigenvalues λ i and eigenvectors γ i of S.
6. Finally, define a hyper-ellipsoidal region obtained from the H matrix by means of {LT, x, γ, λ}.
Algorithm for constructing regions based on the OC-SVM method 1. Verify that the columns of the H matrix follow a Gaussian distribution.If the data are non-Gaussian, then a transformation is performed so that the new data approximately follow a distribution of this type.2. Perform the exclusion of outliers of the H matrix.The LT limit for Mahalanobis distance is calculated through Equation ( 11). 3. Solve Equation ( 12) via the Sequential Minimal Optimization Algorithm (SMO) [26], using two different kernel functions: RBF and MK.Considering the H matrix as input data, the entropy support vectors x i and the constants α i and b are obtained.
The algorithm for constructing regions based on the MD or OC-SVM allows regions R N to be defined if the trace contains "normal" traffic, or regions R A if the trace contains abnormal traffic.

Detection Stage
1.In the current traffic, a j-slot of size L packets is captured, the p features or variables associated to each packet are extracted, and their entropies estimated.With these values, the input vector h j is built as follows: 2. The decision function for the MD region is given by Equation (10).If d 2 j (h j ) ≤ LT, then the j-slot is considered "normal"; otherwise, it is an anomaly.

The decision function for OC-SVM is expressed by Equation (13). If the decision function maps
h j to +1, then h j is considered "normal"; otherwise, it is an anomaly.

Anomaly Classification Stage
If h j Equation ( 17) is outside the "normal" region, i.e., h j / ∈ R N , but h j ∈ R A , then the behavior is abnormal and the vector will be classified.Here, h j is evaluated with all decision functions defined in the training stage.
If h j is outside all the defined regions or h j is located in two or more regions, then classification is refined through a criterion based on the k-temporal nearest neighbors algorithm in order to ensure that a point does or does not belong to a specific class.
The principle of the k-temporal nearest neighbors algorithm is that given h j and its k temporal successors h r , r = j +1, j +2, .., j +k, h j using majority vote among these k-temporal nearest neighbors is classified.If h j is outside all the defined regions and its k temporal successors are as well, then a new attack class will be found or not. Figure 5 shows an example of the algorithm considering two regions and k = 2 temporal nearest neighbors.In Figure 5a, point h j is classified in the R A region, while in Figure 5b, point h j is classified in the R B region.

Our Data Sets
We evaluated our approach by analyzing its performance over two different experimental databases.The first is from an Academic LAN [27], and is composed of traffic data traces collected over seven days.A trace contains "normal" traffic (β 1 ) and four traces are formed with "normal" traffic plus traffic generated by four real attacks: port scan (ψ 1 ), and three worms: Blaster (ψ 2 ), Sasser (ψ 3 ), and Welchia (ψ 4 ).The second is a sub-set of the 1998 MIT-DARPA [28] (public set benchmark for testing NIDS), and is composed of one training trace (β 2 ) that was collected over five days of "normal" behavior of the network and four traces containing the traffic generated by Smurf (ψ 5 ), Neptune (ψ 6 ), Pod (ψ 7 ), and portsweep (ψ 8 ) attacks.
The β 1 -trace is composed of "normal" traffic captured over six days.In the training stage, only a day's traffic is used, and the rest is used for test.A similar procedure is employed for the MIT-DARPA β 2 -trace.In the case of anomalous traces, a portion of each ψ-traces were used for training, and the complete traces for the test were employed.

Traffic Features
According to Section 3.2, the selected features are extracted from the header of each traffic network packet and represented as random variables X r ; r = 1; ...; p.For our experiments, where attacks generate deviations from the typical behavior of IP and Port addresses, four random variables were selected: X 1 source IP addresses, X 2 destination IP addresses, X 3 source port addresses, and X 4 destination port addresses, and the temporal behavior of these features via their entropies h X p for normal and abnormal traffic were studied.
An Ω-trace is divided into m non-overlapping slots of L packets each.For each i-slot, the normalized entropy for each p variable H(X p i ) was obtained and the entropy vectors h X p = H(X p 1 ), H(X p 2 ), ..., H(X p m ) were constructed.Then, as inputs of the algorithms the following matrices ) were formed.For estimations of the generalized entropies the selected q-values are {0.01,0.5, 1.5, 2, 10}.

The Classifier Metrics
The classifier is a mapping from instances to predicted classes, e.g., in two-class classification problems, each instance (an entropy point in our case) is mapped to one element of the set {+1, −1} of positive and negative class labels [29].Given a classifier and an instance, there are four possible outcomes: T N is the number of correct predictions that an instance is negative, F P is the number of incorrect predictions that an instance is positive, F N is the number of incorrect predictions that an instance is negative, and T P is the number of correct predictions that an instance is positive.With these entries, the following statistics are computed [30]: • The accuracy (AC) is the proportion of the total number of predictions that were correct: AC = T N +T P T N +F P +F N +T P .
• The sensitivity, detection rate, or true positive rate (TPR) is the proportion of positive cases that were correctly identified: T P R = T P F N +T P .
• The specificity or true negative rate (TNR) is defined as the proportion of negative cases that were classified correctly: T NR = T N T N +F P .
• The false negative rate (FNR) is the proportion of positive cases that were incorrectly classified as negative: F NR = F N F N +T P .

Detection of Anomalies in Network Traffic
As noted above, anomaly-free traces were divided into m non-overlapping slots of size L (in our case L = 32) packets.This size was chosen according to the shortest attacks contained in the test traces-around 30 packets-and assuring at least one slot with malicious traffic.
For the input matrices H Ip , H P t , H IpSP t , H IpDP t , and H IpP t , ellipsoids were found through Mahalanobis distance, and non-regular regions were found through OC-SVM Radial Basis Function (RBF) and Mahalanobis kernel (MK).The performance of OC-SVM was evaluated for different combinations of parameters η and ν (see Equations ( 12), (14), and ( 15)) in the k-fold cross-validation process with k = 5.For implementation of OC-SVM, the LIBSVM library [31] was used.
Table 1.True positive and negative rates using Tsallis entropy with q = 0.01 for different input matrices.The regions found are used to detect anomalies in network traffic.Therefore, traces containing traffic generated by different anomalies were used.Each test trace was divided into slots of size L and the estimates of entropy for each selected variable were obtained.For each i-slot the Mahalanobis distance was computed by Equation (10).Likewise, each i-slot was analyzed with OC-SVM decision function Equation (13) and thus it was determined to belong to the non-regular region or not.
Results for anomaly detection of the LAN and MIT-DARPA traces using Tsallis entropy of the features with q = 0.01 by means of the ellipsoidal (MD) and non-regular (OC-SVM) regions are displayed in Table 1.Additionally, the values of α, η and ν (see Equations ( 11), ( 12), (14), and ( 15)) are shown.The true negative rate for the attack ψ 6 is 0 or 100, as it is contained in only one slot.

Classification of Worm Attacks
Each ψ-trace was divided into m non-overlapping slots of size L.For each i−slot, i = 1, ..., m, the estimation of entropy H(X r i ) of the four selected variables was obtained.Next, H Ip , H P t , H IpSP t , H IpDP t , and H IpP t matrices were formed.With these matrices, the regions using Mahalanobis distance and OC-SVM with RBF and MK kernel were defined.Figure 6 shows the ellipses and non-regular regions defined in the feature space of IP addresses R 2 for each anomalous trace from LAN and MIT-DARPA traces.In Table 2, the selected values of the OC-SVM parameters for the construction of non-regular regions are shown.
We assume that every entropy point outside the normal region is an anomaly; however, not every anomaly belongs to a specific attack class.If a point is an anomaly but the majority of its temporal neighbors are normal, then it is considered normal as well.If a point is an anomaly and the majority of its temporal neighbors belong to a specific anomaly class, then it belongs to this class.Therefore, results were obtained using the k-temporal nearest neighbors algorithm, as in [6].
Table 2. Parameters of OC-SVM for classification of LAN and MIT-DARPA traces with Tsallis entropy, q = 0.01.In Figure 7, the impact of the k-value of k-temporal nearest neighbors algorithm on the classification for LAN traces using Tsallis entropy of Ips and ports variables with q = 0.01 is shown.TPR values are results of the classifiers trained with β-traces and TNR values are results of the classifiers trained with ψ-traces.
Impact of the k-value of k-temporal nearest neighbors algorithm on the classification.

Discussion of the Experimental Results
Our approach, see Figure 4, based on mathematical tools such as Mahalanobis distance, covariance matrix, OC-SVM, and the k-temporal nearest neighbors algorithm allows the construction of different regions (regular and non-regular), which encompass the behaviors of the four selected features.These regions allow: • the classification of an entropy vector as normal or abnormal, and • the classification of an abnormal entropy vector based on known attacks.
The effects of the number of features-input matrices-on the true positive and negative rate is shown in Table 1.Although in general more variables mean better results, a particular case occurred in trace ψ 3 , where the use of three variables was better than four.
For anomalous ψ-traces, experimental results show that the true negative rate for q < 1 is higher than the results for q > 1. Figure 8 shows the behavior of the true negative rate using four variables for different q values of Tsallis entropy using OC-SVM with RBF kernel.
The runtime of the decision function of OC-SVM, see Equation ( 13), is determined by the number of support vectors (x i ).In this regard, the Mahalanobis kernel has a smaller number of support vectors than RBF kernel in the MIT-DARPA traces.For LAN traces, the kernel that uses fewer support vectors is RBF.
When a sequence of anomalies occur in network traffic, the entropy values begin to move away from the "normal" region to a new region.This transient state affects classification when few neighbors (k ≤ 2) of the k-temporal nearest neighbors algorithm are selected.Choosing a larger k-value mitigates the effect of this transient, and therefore, the classification rate will stabilize.Table 3 shows that when the number of neighbors is increased, the classification accuracy in the network LAN is increased as well.Using the k-temporal nearest neighbors method, classification is improved; however, classification is performed k−slots later.Experimental results showed that for values of k between 3 and 5, the accuracy classification reaches a steady state, and the delay time is not significant.Table 3. Accuracy of the classification of LAN and MIT-DARPA traces vs. different k-values of k-temporal nearest neighbors, using q = 0.01 in LAN traces, and q = 0.5 in MIT-DARPA traces for generalized entropies.Considering packet sizes of 60 bytes in a 100Mbs network to capture a slot of 32 packets, the time required is 32×60×8 100M bs = 153.6 µS.Using a PC with Intel Core i7 3.4 Ghz and 16 G of RAM, a C-implementation of the proposed method using MD and including the decision function required computation times of no more than 5 µs.Therefore, the proposed method can be implemented in real time.

Conclusions
In this paper, an approach was proposed for detecting and classifying Internet traffic anomalies using the entropy of selected features, Mahalanobis distance, and OC-SVM with two kernels: RBF and Mahalanobis kernel.Regular and non-regular regions were built with "normal" traffic from training data.For detection of an anomaly, computation times in order of few µs were obtained; consequently, these results are very significant for real time implementations.
In the detection stage, for all traces the highest true positive and negative rates (99.35% for "normal" traffic and up to 99.83% for anomalous traffic) were obtained, using the generalized entropies (particularly Tsallis entropy) with q = 0.01, and OC-SVM with RBF kernel.However, the optimal q is not addressed in this work.
In the classification stage: 1.For Academic LAN traces, using Tsallis entropy with q = 0.01, OC-SVM with Mahalanobis kernel, and considering k = 5 for the k-temporal nearest neighbor algorithm the highest results of accuracy (99.30%) were obtained.
2. For MIT-DARPA traces, using the MD method, Rényi entropy with q = 0.5, and k ≥ 1 for the k-temporal nearest neighbor algorithm the highest results of accuracy (99.99%) were obtained.

Open Issues
For different networks, the larger the slot size, the more different the entropy behaviors.In the near future, this behavior including more and recent traces in order to determine whether the learned model from a certain network can be used in a different network should be addressed.
In order to enhance our proposed approach other classification techniques such as multi-class SVM should be studied.

Figure 1 .
Figure 1.Different regions based on different methods and metrics.

Figure 3 .
Figure 3. Illustration of One Class Support Vector Machine idea.

Figure 4 .
Figure 4. General architecture of the proposed method.

Figure 5 .
Figure 5. Use of k-temporal nearest neighbors algorithm in classification stage.(a) h j is outside all the defined regions; (b) h j belongs to two or more regions.

Figure 8 .
Figure 8. True negative rate for different values of q parameter of Tsallis entropy using OC-SVM with RBF kernel.