Error-Robust Distributed Denial of Service Attack Detection Based on an Average Common Feature Extraction Technique

In recent years, advanced threats against Cyber–Physical Systems (CPSs), such as Distributed Denial of Service (DDoS) attacks, are increasing. Furthermore, traditional machine learning-based intrusion detection systems (IDSs) often fail to efficiently detect such attacks when corrupted datasets are used for IDS training. To face these challenges, this paper proposes a novel error-robust multidimensional technique for DDoS attack detection. By applying the well-known Higher Order Singular Value Decomposition (HOSVD), initially, the average value of the common features among instances is filtered out from the dataset. Next, the filtered data are forwarded to machine learning classification algorithms in which traffic information is classified as a legitimate or a DDoS attack. In terms of results, the proposed scheme outperforms traditional low-rank approximation techniques, presenting an accuracy of 98.94%, detection rate of 97.70% and false alarm rate of 4.35% for a dataset corruption level of 30% with a random forest algorithm applied for classification. In addition, for error-free conditions, it is found that the proposed approach outperforms other related works, showing accuracy, detection rate and false alarm rate of 99.87%, 99.86% and 0.16%, respectively, for the gradient boosting classifier.


Introduction
Cyber-Physical Systems (CPSs) consist of a set of networked components including sensors, control processing units and communication devices applied to the monitoring and management of physical infrastructures [1]. CPSs are typically used for safety-critical applications, such as in avionics, instrumentation, defense systems and critical infrastructure control, for instance, electric power, water resources and communications systems [2]. Consequently, potential cyber and physical attacks can lead to information leakage, extensive economic damage and critical infrastructure destruction [3].
A CPS architecture is typically composed of five layers, namely, physical layer, sensor/actuator layer, network layer, control layer, and information layer. The physical layer consists of the physical objects or processes monitored by CPSs. In addition, the sensor/actuator layer is composed of sensors, which measure data obtained from the physical layer, and by actuators, which execute specific

•
The proposal of a novel technique in which the average value of the common features among instances is filtered out from the dataset by applying the HOSVD low-rank approximation scheme, improving the performance of the intrusion detection system. • The comparison with different state-of-the-art low-rank approximation techniques in order to show the higher performance and error-robustness of the proposed approach.
The remainder of this paper is organized as follows. Section 2 presents the related works. Section 3 introduces the data model. In Section 4, the theoretical background is introduced. Section 5 shows the proposed tensor based scheme for DDoS attack detection in CPSs. In Section 6, simulation results are presented and discussed. Section 7 draws the conclusions.

Related Works
In this section, the related works are presented and discussed. Since the proposed scheme is based on multidimensional signal processing techniques applied on DDoS attack detection, we discuss papers related to multilinear algebra and distributed denial of service detection systems. In [10,11], the authors presented multidimensional solutions for image classification. However, whereas the former proposed common and individual feature extraction techniques based on LL1 decomposition, the latter applied the HOSVD algorithm for classifying corrupted images. In addition, Lathauwer et al. [12] proposed the classical HOOI low-rank approximation technique, widely applied for tensor denoising. In [5], the authors proposed a signal processing-based approach in which model order selection and eigen similarity analysis are applied for detecting and identifying the time instants and ports exploited by attackers. Finally, specifically regarding DDoS attack detection, three researches can be cited. Hosseini and Azizi [13] proposed a hybrid framework based on a data stream approach for DDoS attack detection where the computational load is divided between the client and proxy side. Next, Lima Filho et al. [14] proposed a random forest-based DDoS detection system in which several volumetric attacks, such as Transmission Control Protocol (TCP) flood, User Datagram Protocol (UDP) flood, and Hyper Text Transfer Protocol (HTTP) flood, are early identified. Finally, Wang et al. [6] proposed a method for detecting DDoS attacks in which the optimal features are obtained by combining feature selection and multilayer perceptron (MLP) classification algorithm. Further, when considerable detection errors are dynamically perceived, a feedback mechanism reconstructs the IDS.
In Table 1, we summarize the general aspects of the above mentioned related works, highlighting their aims, proposed solutions, pros and cons.

Works related to multilinear algebra Paper Aim Proposed Solution Pros Cons
Kisil et al. [10] -Image classification.
-Common and individual feature extraction technique based on LL1 tensor decomposition.
-Flexible -Not restricted to images of the same dimensions.
-Corrupted datasets are not considered.
-Patch-based ML technique for image denoising by applying HOSVD.
-Outstanding performance on grayscale and color images.
-Outperforms HOSVD in the estimation of singular matrices and core tensor.

Works related to DDoS attack detection Paper Aim Proposed Solution Pros Cons
Vieira et al. [5] -Detection and identification of network attacks, including DDoS.
-Framework for detecting and identifying network attacks using model order selection, eigenvalues and similarity analysis.
-Outstanding accuracy for timely detection and identification of TCP and UDP ports under attack.
-Corrupted datasets are not considered.
-Not based on ML techniques.
-Hybrid framework based on data stream approach for detecting DDoS attacks.
-Computational process divided between client and proxy.
-Corrupted datasets are not considered.
-RF based DDoS detection system for early identification of TCP flood, UDP flood and HTTP flood.
-Early identification of volumetric attacks.
-Packet inspection is not required.
-Corrupted datasets are not considered.
-Feature selection combined with MLP.
-Feedback mechanism to reconstruct the IDS according to detection errors.
-Feedback mechanism perceives errors based on recent detection results.
-Global optimal features are not necessarily found.
-Corrupted datasets are not considered.

Data Model
This section presents the data model adopted in this paper and is divided into two subsections. First, Section 3.1 shows the mathematical notation used throughout this paper. Next, a brief description of the data modeling is presented in Section 3.2.

Mathematical Notation
In this subsection, we present the mathematical notation used throughout this paper. Italic letters (a, b, c, A, B, C) represent scalars, lowercase bold letters represent column vectors (a, b, c) and uppercase bold letters represent matrices (A, B, C). Higher order tensors are denoted by uppercase bold calligraphic letters (A A A, B B B, C C C). The concatenation of the tensors A A A and B B B along the r-th dimension is defined as [A A A | B B B] r . Transposition and Hermiticity of a matrix are represented by the superscripts {·} T and {·} H , respectively. The operator diag(·) transforms its argument vector into the main diagonal of a diagonal matrix. The Hadamard product is represented by operator .
Furthermore, the r-th mode unfolding of the tensor X X X is denoted as [X X X] (r) , which is obtained by varying the r-th index along the rows and stacking all other indices along its columns. Additionally, Y Y Y = X X X × r B denotes the r-mode product between the tensor X X X and the matrix B. In a matricized fashion, such a product can be expressed as [Y Y Y] (r) = B[X X X] (r) .

Data Modeling
In this paper, the dataset matrix X ∈ R M×N is modeled in the following fashion: where X 0 ∈ R M×N is the error-free dataset matrix, N ∈ R M×N is the error matrix, M is the number of instances and N is the number of features. The matrix N represents generalized perturbations added to X 0 , for instance, false data injection attacks, which are commonly used to fool machine learning classifiers. The m-th instance and the n-th feature are, respectively, given by X m,: for m = 1, . . . , M and X :,n for n = 1, . . . , N. The class label vector is denoted by y = [y 1 , . . . , y M ] T ∈ R M , where y m indicates if the m-th instance X m,: for m = 1, . . . , M is legitimate traffic or DDoS attack. Furthermore, we can rewrite the dataset matrix X in (1) in a tensor form. Initially, each instance X m,: ∈ R N for m = 1, . . . , M is reshaped as a tensor with dimensions N 1 × · · · × N R , such that N = ∏ R r=1 N r . Then, the M tensors are stacked along the (R + 1)-th dimension, generating the dataset X X X ∈ R N 1 ×···×N R ×M denoted as: where X X X 0 ∈ R N 1 ×···×N R ×M is the error-free dataset tensor and N N N ∈ R N 1 ×···×N R ×M is the error tensor. The r-th mode unfolding matrix of X X X is given by [X X X] (r) ∈ R N r ×∏ j =r N j ×M . Note that the dataset matrix X ∈ R M×N in (1) corresponds to the (R + 1)-th unfolding matrix [X X X] (R+1) ∈ R M×∏ R r=1 N r .

Theoretical Background
This section presents the theoretical background and is divided into two subsections. First, Section 4.1 introduces the taxonomy of DDoS attacks. Next, Section 4.2 details the DDoS attack datasets adopted in this paper.

Taxonomy of DDoS Attacks
Distributed Denial of Service attacks are one of the most important security threats nowadays. In a DDoS attack, a large volume of traffic is sent through the network, exhausting the network resources, as well as the overall bandwidth and individual node resources [15]. Consequently, the victim is forced to slow down, crash or shut down due to multiple connection requests during a period of time [16].
Since networks and servers became more robust in identifying network layer DDoS attacks, hackers responded by moving up the OSI model stack to higher layers [17]. For instance, several DDoS attacks exploit vulnerabilities present in the application layer, reproducing the behavior of legitimate customers and, consequently, are not detected by most of the conventional IDSs [18]. In this sense, currently, several researches in the literature broadly classify DDoS attacks into three types: application-layer attacks, resource exhaustion attacks, and volumetric attacks [19], which are described as follows: • Application-Layer Attack: in this type of attack, vulnerabilities present in the application are used by an attacker, making it inaccessible by legitimate users [19]. Instead of depleting the network bandwidth, the server resources, such as CPU, database, socket connections or memory, are exhausted by application-layer DDoS attacks. In addition, such attacks present some subtleties which make them harder to detect and mitigate: they are performed through legitimate HTTP packets, with a low traffic volume, presenting high resemblance to flash crowds [17]. HTTP and Domain Name System (DNS)-based DDoS attacks are examples of application-layer attacks.

•
Resource Exhaustion Attack: In this category, hardware resources of servers, such as memory, CPU, and storage, are depleted. Consequently, they become unavailable for legitimate accesses. Resource exhaustion attacks are also known as protocol-based attacks, since vulnerabilities in protocols are exploited. For example, in an SYN flood attack, a hacker exploits the TCP three-way handshake process. After receiving a high volume of SYN packets, the targeted server responds with SYN/ACK packets and leaves open ports to receive the final ACK packets, which never arrive. This process continues until all ports of the server are unavailable.

•
Volumetric Attack: In this type of attack, the bandwidth of the target system is exhausted by a massive amount of traffic. Since such attacks are launched by using amplification and reflection techniques, they are considered as the simplest DDoS attacks to be employed [18]. UDP flood and Internet Control Message Protocol (ICMP) flood can be cited as volumetric attacks.

CICDDoS2019 and CICIDS2017 Datasets
In this paper, we consider two datasets provided by the Canadian Institute of Cybersecurity (CIC) for network intrusion detection models, namely, CICDDoS2019 [20] and CICDIS2017 [21]. CICDDoS2019 is a novel benchmark dataset composed by several network traffic features, with millions of labeled legitimate and DDoS attack instances [22]. The dataset was generated in two distinct days. In 12 January 2019, the training set was captured, containing 12 different types of DDoS attacks, namely, DNS, WebDDoS, LDAP, MSSQL, NetBIOS, NTP, SNMP, SSDP, UDP, SYN, TFTP and UDP-Lag based attacks. Next, in 11 March 2019, the testing set was generated, with seven DDoS attack types, including LDAP, MSSQL, NetBIOS, UDP, SYN and UDP-Lag based attacks, plus Port Scan. All DDoS attacks were separated in different PCAP files, according to their types.
Similarly to CICDDoS2019, CICIDS2017 is a completely labeled dataset that contains legitimate traffic and the most up-to-date common network attacks. The dataset was generated in five days, from Monday, 3 July 2017, to 7 July 2017, and is publicly available in PCAP and CSV files. On Monday, only legitimate traffic was captured, whereas different types of network attacks were captured in the following days. The malicious activities include common updated attacks, for example, DDoS, Denial of Service (DoS), Brute Force, Cross-Site Scripting (XSS), SQL Injection, Infiltration, Port Scan and Botnet [23]. Particularly, DDoS attacks were generated on 7 July 2017. Since we focus on DDoS attack detection, only legitimate and DDoS attack instances present in the traces of 3 July 2017, and 7 July 2017, respectively, are used in this research.

Proposed Average Common Feature Extraction Technique for DDoS Attack Detection in Cyber-Physical Systems
This section presents the proposed average common feature extraction scheme for DDoS attack detection in CPSs. First, we introduce the concept of common and individual features of a given dataset. Such concept is well-known in image classification problems, in which data share some common variables while exhibiting their own features simultaneously [24]. Let us assume a tensor Y Y Y ∈ R I 1 ×I 2 ×S composed of the slices Y Y Y :,:,s for s = 1, . . . , S. Each frontal slice Y Y Y :,:,s is equivalent to a combination of the three base colors, namely, green, red and blue, represented by the matrices B G ∈ R I 1 ×I 2 , B R ∈ R I 1 ×I 2 and B B ∈ R I 1 ×I 2 . Usually, the base colors are obtained through tensor decompositions, such as the LL1 where c G ∈ R S , c R ∈ R S and c B ∈ R S contain the intensity values of the red, green and blue colors, respectively [10]. Note that Y Y Y presents rank three, which corresponds to the number of base colors. Alternatively, the base colors can be stacked along the 3rd dimension, generating B B B ∈ R I 1 ×I 2 ×3 , whereas the vectors c G , c R and c B can be grouped into the matrix C ∈ R S×3 . The tensor B B B, known as the common feature tensor, can also be represented asỸ Ỹ Y, as a reference to the original dataset Y Y Y.
After extracting the common features, only the more discriminative individual information at each instance is used during the training phase, which improves the performance of the machine learning classifier [10]. In this sense, due to the considerable results for image classification, the concept of common and individual feature extraction shows an outstanding potential for detecting network intrusion by using large datasets in ML classifier training. Hence, a similar procedure is adopted in this paper, such that the average value of the common features among dataset instances is filtered out from the data. As a consequence, the ML classifier takes advantage of the benefits from the resulting filtered dataset. In order to improve the readability, the mathematical symbols used throughout this section are summarized in Table 2. Table 2. Mathematical symbols along this paper.

Symbol Definition
Symbol Definition Before applying the feature extraction technique on the dataset tensor, three steps are necessary, namely, dataset splitting, dataset pre-processing and multilinear rank estimation, which are described as follows.

•
Dataset Splitting: First, the DDoS attack dataset X X X ∈ R N 1 ×···×N R ×M is split into the training and testing tensors X X X tr ∈ R N 1 ×···×N R ×M tr and X X X te ∈ R N 1 ×···×N R ×M te , where M tr and M te are the number of training and testing instances, respectively, with M = M tr + M te .

•
Dataset Pre-Processing: The training and testing datasets, X X X tr and X X X te , are submitted to a preprocessing step, which includes data cleansing, feature scaling and label encoding. ) corresponding to the tensors X X X tr and X X X te , respectively. The parameters d tr r and d te r for r = 1, . . . , R + 1 are estimated by using multidimensional model order selection (MOS) schemes, such as the R-D Minimum Description Length [25].
After the above-mentioned steps, X X X tr is forwarded to the proposed average common feature extraction technique for DDoS attack detection, such that the training phase is initialized. Next, when the training process is finished, X X X te is sent to the trained IDS for classification. For simplicity, from this point on, X X X ∈ R N 1 ×···×N R ×M can refer to the training or testing dataset tensors. The steps of the proposed scheme, shown in Figure 1, are discussed as follows.

̅
Step 5 Step . For simplicity, we depict the filtering process of a three-dimensional dataset tensor X X X ∈ R N 1 ×N 2 ×M .

•
Step 1: Computing the HOSVD of X X X.
In Step 1 of Figure 1, we compute the Higher-Order Singular Value Decomposition (HOSVD) of the dataset tensor X X X ∈ R N 1 ×···×N R ×M . Here, we intend to obtain the core tensor, G G G ∈ R d 1 ×···×d R+1 , as well as the first R factor matrices, A r ∈ R N r ×d r for r = 1, . . . , R, where (d 1 , . . . , d R+1 ) is the multilinear rank of X X X. Such tensors are used in Step 2 to compute the common feature tensor, The HOSVD of X X X is given by: Usually, the number of common features among the dataset instances is obtained empirically. However, a considerable performance is achieved by considering d R+1 as an estimate of the number of common features, as shown in the simulations of Section 6. We refer here to [25] to estimate the number of common features.

•
Step 2: Computing the common feature tensor,X X X.
In Step 2 of Figure 1, we computeX X X ∈ R N 1 ×...N R ×d R+1 , which contains the common features among the dataset instances X X X :,...,m ∈ R N 1 ×···×N R for m = 1, . . . , M. The tensorX X X is defined as the r-mode product between the core tensor G G G and the first R factor matrices [26], Step 3: Computing the average common feature tensor,X X X.
Following, in Step 4 of Figure 1, we obtain the (R + 1)-th mode unfolding matrix of X X X, given by [X X X] (R+1) ∈ R M×N . In general, the r-th unfolding matrix [X X X] r is obtained after each element (x 1 , . . . , x R+1 ) in X X X is mapped to the element (x r , j) in [X X X] r as follows: Such a matrix is used in Step 5 in order to compute the weights to be applied onX X X for dataset filtering.

•
Step 5: Computing the weight tensor, C C C. In Step 5 of Figure 1, we compute the weight tensor C C C ∈ R N 1 ×···×N R ×M , which is used for dataset filtering in Step 7. First, the covariance matrix R xx ∈ R N×N of the (R + 1)-th mode unfolding matrix [X] (R+1) ∈ R M×N , as well as its eigenvalue decomposition, are obtained as follows: where E ∈ R N×N is the eigenvector matrix of R xx and Λ ∈ R N×N contains the eigenvalues λ 1 , . . . , λ N of R xx in its diagonal. Such eigenvalues are sorted in descending order so that λ 1 is the largest one.
Before subtracting the average common features from X X X, we have to multiply each one of the elements ofX X X by a positive number smaller than 1. This can be done by computing the Hadamard product betweenX X X and a weight tensor C C C ∈ R N 1 ×···×N R ×M . The tensor C C C can be obtained empirically or by some adaptive technique such that the errors between the expected and predicted classifications during the training phase of a ML classifier are minimized. In this paper, we adopt the following empirical approximation: all elements of C C C are equal to the average eigenvalueλ of R xx , i.e.,λ where λ n for n = 1, . . . , N are the eigenvalues of R xx . • Step 6: Obtain the concatenated tensors, C C C C andX X X C . In Step 6 of Figure 1, M copies of C C C are concatenated along the (R + 1)-th dimension, generating the tensor C C C C ∈ R N 1 ×···×N R ×M . The same procedure is adopted forX X X in order to obtainX X X C ∈ R N 1 ×···×N R ×M . Both computations can be expressed as L: By doing this, we can compute the Hadamard product between C C C C andX X X C in Step 7, and then subtract the result from X X X in Step 8, in a direct way.

•
Step 7: Applying the weights C C C C on the tensorX X X C .
Next, in Step 7 of Figure 1, we compute the Hadamard product between C C C C andX X X C such that the weights computed in Step 5 are applied to each element of the average common feature tensor, i.e., • Step 8: Computing the filtered dataset tensor, X X X [f] .
Then, in Step 8 of Figure 1, the filtered dataset tensor X X X [f] ∈ R N 1 ×···×N R ×M can be computed as follows: • Step 9: Obtaining the (R + 1)-th mode unfolding matrix, [X X X] [ f ] (R+1) . Finally, in Step 9 of Figure 1, we obtain the (R + 1)-th mode unfolding matrix of X X X [f] , given by r is computed as follows: Such a matrix is forwarded to the ML classification algorithm for classification tasks, where the predicted class label vectorŷ ∈ R M is computed. Since decision tree, random forest and gradient boosting algorithms present considerable results in network intrusion detection problems, they are adopted in this paper for classifying the network traffic data [14].
The proposed average common feature extraction technique for DDoS attack detection in CPSs is summarized in Algorithm 1.

Algorithm 1:
Proposed average common feature extraction technique for DDoS attack detection.

Simulation Results
This section presents the simulation results and is divided into four subsections. Sections 6.1 and 6.2 introduce and discuss the results obtained from numerical simulations, respectively. Next, the comparison between the proposed technique and related works is shown in Section 6.3. Finally, Section 6.4 presents the computational complexity of the compared schemes.

Results
In this paper, we adopt Accuracy, Detection Rate, False Alarm Rate, Area Under the Precision-Recall Curve and Matthews Correlation Coefficient as performance evaluation metrics. Furthermore, the Relative Loss of Accuracy is adopted as error-robustness evaluation metric. Such metrics are based on the values of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). TP and TN represent the correctly predicted values, whereas FP and FN correspond to the misclassified events. These metrics are defined as follows: • Accuracy (Acc): the ratio between the correctly predicted instances and the total number of instances, • Detection Rate (DR): the ratio between the correctly predicted positive instances and the total number of actual positive instances, • False Alarm Rate (FAR): the ratio between the number of negative instances wrongly classified as positives and the total number of actual negative instances, • Area Under the Precision-Recall Curve (AUPRC): reflects a trade-off between the precision and recall. Precision is the ability of a classifier not to label as positive a sample that is negative, defined as Prec = TP/(TP + FP). On the other hand, recall corresponds to the ability of a classifier to find all positive samples, given by Rec = TP/(TP + FN). The AUPRC corresponds to the area under the curve obtained by plotting the precision and recall on the y and x axes, respectively, for different probability thresholds. By applying the trapezoidal rule, the AUPRC can be defined as: where Prec k and Rec k are the precision and recall values for the k-th threshold, and K is the total number of probability thresholds.

Matthews Correlation Coefficient (MCC)
: measures the quality of binary classifications. It ranges from −1 to +1 such that higher values represent better performance. The MCC is defined as: • Relative Loss of Accuracy (RLA): measures the percentage of variation of the accuracy of the classifiers at the error level EL%, Acc EL% , with respect to the original case with no additional error, Acc 0% , RLA = Acc 0% − Acc EL% Acc 0% All experiments were executed on a desktop computer with processor Intel Core i7-2600 3.40 GHz and 16 GB of RAM. Data pre-processing and machine learning classifier algorithms were implemented in the Python library Scikit-Learn, whereas Python libraries Tensorly [27] and HOTTBOX [26] were used to implement tensor computations. Furthermore, the proposed approach is validated by considering subsets of the CICDDoS2019 and CICIDS2017 datasets, described in Section 4.2. A total of M = 40,000 instances were extracted from each dataset, of which 20% correspond to DDoS attacks, as detailed in Table 3. CICDDoS2019 is a novel dataset that contains an extensive variety of DDoS attacks and fills the gaps of the current datasets [28]. In this sense, it is used for performance evaluation throughout this subsection. The proposed scheme is compared with state-of-the-art low-rank approximation techniques, namely, the Higher-Order Orthogonal Iteration (HOOI) [12] and Higher-Order Singular Value Decomposition (HOSVD) [11]. Here, we intend to assess the performance of the proposed approach in the presence of corrupted datasets, as well as its error-robustness.
The dataset is folded as a three-dimensional tensor with size N 1 × N 2 × M, i.e., R + 1 = 3. For simplicity, we set N 1 = N 2 = 8 such that the number of features is given by N 1 · N 2 = N = 64. The dataset is split into training, validation and testing sets, with proportion 60:20:20. The validation set is used for hyperparameter tuning, whereas the testing set is used only once for performance evaluation. In addition, we also evaluate the proposed technique for different training dataset sizes.
In accordance with the literature about corrupted datasets [8], we adopt the following error generation process: for each feature X :,n for n = 1, . . . , N, EL % of the instances are corrupted with Gaussian noise with mean zero and standard deviation (max(X :,n ) − min(X :,n ))/5. We simulate a total of 100 different experiments, using the decision tree (DT), random forest (RF) and gradient boosting (GB) as machine learning classifiers. The R-D MDL scheme [25] is applied to estimate the multilinear rank of the training and testing datasets. Table 4 shows the accuracy, detection rate, false alarm rate, area under the precision-recall curve and Matthews correlation coefficient as a function of the error level (EL). The EL ranges from 10% to 30%. For each error level and ML classifier, the best metric values are highlighted in bold. From the results shown in Table 4, it is clear that the proposed scheme outperforms its competitor methods for all EL range. In addition, even in high error level conditions, e.g., EL = 30%, the proposed technique presents outstanding results, with Acc = 98.94%, DR = 97.70%, FAR = 4.35%, AUPRC = 0.9937 and MCC = 0.9663 when the random forest algorithm is applied for classification. Furthermore, we observe that the AUPRC is higher than 0.98 for all EL range when using RF and GB classifiers, which reflects a considerable trade-off between the true positive rate and positive predictive values. Therefore, from Table 4, we note that the proposed technique presents a considerable performance along the whole error level range.  0.9492 0.9958 0.9959 0.1308 0.0172 0.0147 0.8407 0.9866 0.9871 0.8855 0.9983 0.9975 0.9188 0.9909 8267 0.9890 0.9918 0.1938 0.0457 0.0171 0.5935 0.9654 0.9746 0.7804 0.9953 0.9976 0.8190 0.9760  Next, the proposed approach is compared with the HOOI and HOSVD schemes when the training size proportion (TSP) ranges from 20% to 70% of all available instances, with error-level fixed in 20%. Table 5 shows the Acc, DR, FAR, AUPRC and MCC for different values of TSP. For each training size proportion and ML classifier, we highlight in bold the best metric values that were obtained. It can be observed that the proposed scheme delivers significantly better results when compared to its competitor methods in all TSP range, showing outstanding metric values. Note that, even with small training datasets, e.g., TSP = 20%, the proposed approach presents Acc, DR, FAR, AUPRC and MCC equal to 99.18%, 98.85%, 1.71%, 0.9976 and 0.9746, respectively, when RF is applied for classification. Therefore, our proposed approach shows considerable performance, even when trained with small data.
Finally, the error-robustness evaluation results of the proposed scheme as well as the HOOI and HOSVD approaches are presented. The same simulation parameters adopted in the experiments of Table 4 are considered. Figure 2 illustrates the relative loss of accuracy, as a function of the error level, for each compared technique and different ML classifiers. As expected, all techniques presented an improved performance for lower error levels, in which the datasets present lower corruption. Furthermore, note that the proposed approach shows outstanding metric values for all EL range. As shown in Figure 2, the RLA is approximately zero when the error level is 10%, and is lower than 12% for EL = 30%, regardless of the classifier. In this sense, it can be seen that the proposed approach shows a considerable error-robustness when compared to HOSVD and HOOI low-rank approximation techniques. HOOI [12] (c) RLA vs. EL-random forest.

Discussion
In this paper, we compare the proposed technique with architectures in which the HOSVD and HOOI schemes are previously applied to the dataset tensor X X X ∈ R N 1 ×···×N R ×M for denoising. The HOSVD is a generalization of the matrix Singular Value Decomposition to higher-order tensors and is widely applied for noise reduction. In this case, an (R + 1)-th dimensional tensor X X X is decomposed into a core tensor and R + 1 factor matrices truncated to the signal subspace, which is determined by the multilinear rank (d 1 , ..., d R+1 ). On the other hand, HOOI is a low-rank approximation method in which more accurate truncated singular matrices and core tensor are computed through higher order orthogonal iterations.
Decision tree, random forest and gradient boosting are adopted as ML classifiers. Despite its low computational cost and ease of understanding and interpretation, decision tree presents high variance, i.e., completely different trees can be generated from tiny changes in the training dataset. When trained with corrupted data or small datasets, DTs can lead to overfitting. Such fact can be seen in Tables 4 and 5, in which DTs are outperformed by both GB and RF, especially for high error levels and small TSP. For instance, in Table 4, for EL = 25%, the proposed technique presents values of MCC for DT, GB and RF classifiers equal to 62.68%, 97.68% and 97.81%, respectively. As expected, all compared techniques deliver better performance when RF and GB are used for classification, since both algorithms reduce the variance existing in DTs and prevent overfitting. Random forest and gradient boosting combine multiple DTs, but with different tree-building processes: while the former builds each tree independently and combines results at the end, the latter builds one tree at a time, combining results during the process.
Furthermore, from Table 4, it can be observed that, for all compared schemes, RF outperforms GB for almost all EL range. Since gradient boosting combines the results along the process, it is more sensitive to data corruption, resulting in overfitting. For example, for EL = 30%, the values of DR for random forest and gradient boosting when considering our proposed approach are, respectively, 97.70% and 96.02%. Such a fact is more evident in HOSVD, which presents detection rates of 92.67% and 85.27% for RF and GB, respectively. In addition, from the results shown in Table 4, note that the proposed scheme outperforms both HOSVD and HOOI techniques for all EL range. Such results confirm that ML classifiers benefit from the more discriminative individual information resulting from the average common feature extraction technique applied on the training dataset. Note that HOSVD is also outperformed by HOOI for high error levels, confirming that the latter scheme leads to better results due to the more accurate core tensor and singular matrices generated through alternating least squares decomposition methods. In short, the compared techniques present better performance as the error level is lower. In this case, the machine learning classifiers deal with less corrupted data and, consequently, deliver more reliable and accurate results.
Additionally, from Table 5, we observe that, in general, the compared techniques present better performance as the training size proportion is higher. Small training datasets can lead to a lack of representative instances and, consequently, to overfitting. In this case, the ML algorithm is excessively adjusted to the training data, performing poorly in predicting new instances. As mentioned above, such a fact is more evident in decision trees, which are more prone to overfitting. For instance, when considering the smallest training dataset size, i.e., TSP = 20%, the values of AUPRC for the proposed scheme, HOSVD and HOOI when DT is applied for classification are, respectively, 0.7804, 0.7437 and 0.6099. On the other hand, the proposed approach is very robust against small training dataset sizes when gradient boosting and random forests are used for classification. For example, still considering the worst case of TSP = 20%, the AUPRC for the proposed scheme when GB and RF are applied are, respectively, 0.9953 and 0.9976. However, both HOSVD and HOOI present a performance reduction in this case, showing AUPRC of 0.8168 and 0.9336 for gradient boosting, and 0.9845 and 0.9869 for the random forest, respectively.
Finally, the error-robustness of all compared approaches is assessed in Figure 2, in which the relative loss of accuracy is illustrated. By observing Figure 2c, we observe that all schemes are more robust against errors for random forest classifier when compared to DT and GB algorithms. For instance, considering the worst case of EL = 30%, the proposed technique presents RLA of 0.98% when RF is applied for classification. On the other hand, for GB and DT, our approach shows relative loss of accuracy of 2.29% and 12.52%, respectively. Therefore, once again we observe that random forest outperforms the DT and GB algorithms in DDoS attack detection.

Performance Comparison with Related Works
This subsection presents the performance comparison between the proposed scheme and related works assuming error-free conditions. Furthermore, since CICIDS2017 has been extensively applied for IDS validation by several papers in the literature, we also include the performance evaluation on such dataset. Consequently, the comparison with related researches is enriched due to the higher number of competing schemes.
Since the related papers assume error-free datasets, the proposed approach is considered with error level 0% for comparison. Table 6 shows the adopted dataset, the ML classification algorithm and the values of accuracy, detection rate and false alarm rate obtained by the proposed approach and the related papers. The metrics represented as "Not Available" (N/A) were not informed by the corresponding paper. Furthermore, since CICDDoS2019 is a new released dataset, to the best of our knowledge only Elsayed et al. [28] applied such data for performance evaluation. The authors proposed a deep learning-based intrusion detection system in which a recurrent neural network is combined with an autoencoder. Note that, considering the CICDDoS2019 dataset, the proposed technique outperforms the competing scheme when GB and RF algorithms are applied for classification. Our approach presents Acc = 99.87% and DR = 99.86% for gradient boosting, whereas accuracy and detection rate of 99.55% and 98.96% were obtained when using random forest classifier.
On the other hand, as above mentioned, CICIDS2017 was applied by several authors for IDS performance evaluation, as it can be seen in Table 6. Although it is not the best IDS among the compared ones, the proposed scheme still presents a considerable performance, with Acc = 99.95%, DR = 99.95% and FAR = 0.05% for gradient boosting algorithm, outperforming almost all competitor schemes. It is worth to mention the performance shown by LUCID, proposed by Doriguzzi-Corin et al. in [29], with Acc, DR and FAR of 99.67%, 99.94% and 0.59%, respectively. The authors presented a practical, lightweight CNN-based DDoS detection architecture with low processing overhead and attack detection time. In addition, the 1D-CNN-LSTM model, proposed by Roopak et al. [30], showed a considerable detection rate of 99.10%. Note that both papers propose deep learning-based schemes, which usually deliver better performance when compared to traditional machine learning-based solutions, such as the DT, RF and GB algorithms.

Computational Complexity
This section discusses the computational complexity of the proposed approach, described in Algorithm 1. For simplicity, the complexity is analyzed for a three-dimensional dataset tensor X X X ∈ R N 1 ×N 2 ×M . We only consider the most costly calculations, represented by Steps 1 to 3 of Algorithm 1, as a function of the most important variables, namely, N 1 , N 2 , M and (d 1 , d 2 , d 3 ). Consequently, the computational cost related to folding and unfolding of matrices and tensors, performed in Steps 4 and 9 of Algorithm 1, are not considered since such functions are about data representations. Similarly, the time complexity of the Steps 5-8 of Algorithm 1 are not analyzed, since low-cost computations are performed in such steps.
Step 1 of Algorithm 1 corresponds to the HOSVD of the dataset tensor X X X and presents computational complexity given by [33]: where, for simplicity of notation, N 3 corresponds to the number of dataset instances M. Next, in Steps 2 and 3 of Algorithm 1, we compute the common feature tensor as well as its average along the 3-rd dimension. Such steps require two tensor times matrix products plus the average calculation, and present complexity given by: Finally, the overall computational complexity of Algorithm 1 corresponds to the sum of the above mentioned complexities, In Table 7, we summarize the computational complexities of the proposed approach as well as the HOOI and HOSVD techniques. For HOOI, I corresponds to the number of iterations and d = max(d 1 , d 2 , d 3 ). Note that the proposed scheme is accompanied by an increase of computational complexity, which reinforces the trade-off between the more accurate DDoS attack detection and the time cost.

Conclusions
In this paper, we propose a novel average common feature extraction technique applied on DDoS attack detection. Initially, the proposed scheme filter out, from the dataset, the average value of the common features among instances by applying the classic Higher-Order Singular Value Decomposition. Finally, the filtered dataset is sent to machine learning algorithms where data are classified as benign traffic or DDoS attack.
Extensive numerical simulations are performed on the CICDDoS2019 and CICIDS2017 benchmark datasets, whereas decision tree, random forest and gradient boosting are used as ML classifiers. Further, accuracy, detection rate, false alarm rate, area under the precision-recall curve, Matthews correlation coefficient and relative loss of accuracy are adopted as evaluation metrics. According to the obtained results, the proposed scheme outperforms the traditional HOSVD and HOOI techniques, presenting a higher error-robustness. For instance, considering a dataset corruption level of 30%, the proposed scheme shows values of Acc, DR, FAR, AUPRC and MCC of 98.94%, 97.70%, 4.35%, 0.9937 and 0.9663, respectively, when random forest algorithm is used for classification. In the same conditions, the traditional HOOI technique shows Acc, DR, FAR, AUPRC and MCC equal to 97.65%, 94.19%, 11.52%, 0.9906 and 0.9250, respectively. In addition, we observe that our proposed scheme presents high robustness against small training datasets, showing a slight loss of performance along the whole evaluated TSP. For example, when the training dataset size is only 20% of all available samples, the proposed approach shows Acc, DR, FAR, AUPRC and MCC equal to 99.18%, 98.85%, 1.71%, 0.9976 and 0.9746, respectively, for random forest classifier. On the other hand, considering the same TSP, the well-known HOSVD scheme presents values of Acc, DR, FAR, AUPRC and MCC of 97.55%, 95.20%, 8.71%, 0.9845 and 0.9227, respectively. However, an important drawback of our proposed scheme is its higher computational complexity, which reflects the trade-off between the more accurate DDoS attack detection and the time cost.
Another considerable finding corresponds to the performance of the evaluated ML classification algorithms for DDoS attack detection. According to simulations, decision trees are more prone to overfitting when data are highly corrupted or small datasets are used for training. For example, for a data corruption level of 25%, the proposed technique presents a detection rate of 80.23% when DTs are used for classification, whereas 98.57% and 98.66% are obtained with GB and RF, respectively. Similarly, for a training dataset size proportion of 30%, our approach obtained accuracies of 98.55% and 98.95% with GB and RF algorithms, while Acc = 85.95% when decision tree is applied. Additionally, it is observed that the random forest classifier presents higher error-robustness when compared to gradient boosting. For instance, considering a data corruption level of 20%, our proposed scheme shows a relative loss of accuracy of 0.61% when RF is applied for classification, whereas 2.09% is obtained for GB. Therefore, it is shown that gradient boosting is more sensitive to data corruption when compared to random forest, since the former scheme builds one tree at a time and combines results along the process, whereas the latter builds each tree independently, combining results at the end.
In the future, we intend to apply the proposed technique by using alternative machine learning algorithms, especially deep learning-based approaches, such as convolutional neural networks. Furthermore, we shall verify the performance of the proposed scheme for online DDoS attack detection.