Nowadays security software is a very important part of every organization, facing security incidents have never been harder than they are today, a lot of people now have some sort of technical knowledge and some of them can abuse that knowledge to cause serious damage. DDoS attacks are one of the biggest threats to IT security of any organization and many are trying to solve that problem efficiently.
3.2.1. Redundancy
The development of ICT has reached a level that allows their application and implementation in complex and sensitive systems and domains, where high reliability, availability, security, stability, manageability, and usability are required. Such systems are commonly referred to as high-reliability or fault-tolerant systems. Fault-tolerant systems are systems that continue to perform their functions even in very unfavorable and even extreme conditions, thanks primarily to the ability to tolerate individual failures [
11,
12,
39,
40].
Several factors have influenced the development and spread of the concept of failure-resistant systems. First of all, modern systems are becoming more and more complex, they consist of a large number of connected components, where the complexity of the system is increased, and thus the possibility of failures. Another important factor is the development of Very Large Scale Integration (VLSI) technology, which has enabled the practical application of many failure resistance techniques. Another factor is the development of electronic systems that are less reliable than mechanical systems and require additional measures to increase their reliability.
When it comes to fault-tolerant systems, one of the common methods to improve system reliability is to use redundancy. It represents the addition of resources, information, or time above the level required for the normal operation of the system. Redundancy can be hardware, software, information, and time.
This type of redundancy is the physical duplication of hardware, most commonly for fault detection or tolerance, in order to achieve fault tolerance. There are three basic forms of hardware redundancy: Passive, active, and hybrid.
- Passive Hardware Redundancy
In passive hardware redundancy, a voting mechanism is used to cover up errors, namely the principle of majority voting. Techniques used in implementing passive hardware redundancy provide fault tolerance without the need to detect and repair the fault. The most commonly used passive hardware redundancy is Triple Modular Redundancy (TMR). With this type of redundancy, the hardware is tripled and the majority voting principle is applied. If one of the modules stops working, the other two modules will cover up the error and alleviate the problem.
In TMR, the main problem is the voter. If it stops working, the whole system is called into question, i.e. the reliability level of the whole system is equal to the reliability level of TMR. Reliability in TMR can be increased by using the voter tripling technique.
- Active Hardware Redundancy
Active hardware redundancy tries to achieve fault-tolerance by fault detection, localization, and repair. Active hardware redundancy does not use a fault masking technique, and it is used in the systems that can tolerate temporary, incorrect results under the condition that the system is reconfigured and that its operation is stabilized in the satisfactory period.
- Hybrid Hardware Redundancy
As already mentioned, the hybrid redundancy represents the combination of active and passive redundancy. The fault masking is used to prevent errors, and detection, localization, and repair of a fault are used for system reconfiguration in the case of system failure.
In the computer-based application, many techniques for fault detection and toleration can be implemented in the software. Software redundancy represents software addition to detect and tolerate a fault. Software redundancy can be realized within different ways and there is no need to replicate all the programs. Software redundancy can be realized by adding a few additional commands for checking the system or as a small programming procedure that is used to test the memory periodically by writing and reading from a certain location in the memory.
Information redundancy represents the addition of redundant information to the data with the aim to enable fault detection, masking, and toleration. The example of information redundancy are the codes for error detection and codes for error correction which are realized by adding the redundant information by words or words conversion in some other form that contains redundant information.
A code represents the way of information or data presentation by using a set of strictly defined rules. The code word denotes a set of symbols used to present a specific data part based on the defined code. Binary code is the case whose words are made of only two symbols, 0 and 1. Coding is the process of determining a proper code word for a given task. In other words, coding represents the transformation of the original data in the codes by using the coding rules. Decoding is the opposite process, it returns the data to its original form by translating the codes to the raw data. The code for error detection is a particular type of code that enables error detection in the codes. The error-correction code is used for error correction. These codes are defined by the number of bit errors that they can correct.
The main parameter used in the characterization of codes for error detection and correction is the Hamming distance. The Hamming distance in notation between two binary words denotes the number of positions in which these two words differ. For each code, the code distance can be defined as a minimal Hamming distance between any two valid code words. Generally, the code can correct c bit errors and detect d bit errors if and only if the following condition is met:
A separate code is a code wherein the original information is associated to the new information to form the code word allowing the decoding process to be made of a simple removing of additional information and keeping the original data. In other words, the original data is obtained from a code word by removing the additional bits which are called the code or control bits and keeping only the bits that represent the original information. Non separate codes do not have an ability of separation thus, they demand more complex decoding procedures.
Time redundancy is used to provide additional time for the execution of system functions so that fault detection and correction can be achieved. Time redundancy methods tend to reduce the amount of hardware for the cost of adding additional time resources. Namely, in many applications, time is a less valuable resource than hardware because the hardware is a physical resource that influences on systems overall weight, volume, energy, and cost.
3.2.2. Anomaly Detection Algorithms
Anomaly detection could be done in numerous ways [
1].The primary goal is to detect attacks as soon as possible and to inform the user. The scondary goal is to decrease the number of false positive results to a minimum. The three selected algorithms [
1] that are best suited for these goals are presented, and they are the Cumulative Sum Algorithm, Exponentially Weighted Moving Average, and K-nearest neighbors algorithm as described below.
The is the algorithm that detects changes. It is updated in real-time and periodically, and that is convenient for this solution because network traffic usually has many packages constantly changing over time. CUSUM is an algorithm that is used for quality control, it is optimized for measuring any deviation from a specified value, and it is used for the detection of small mean changes. With CUSUM, the sum of differences between actual and expected values is being calculated, that is the CUSUM value. CUSUM can be easily adapted and there are already a number of variations of this algorithm, it even can be adapted to be self-learning so it can detect changes at different network usages.
CUSUM is actually the mean value of the cumulative sum of deviations from a reference
, that represents a periodically updated value calculated in real time [
33,
41]. In addition, if the
is the i-th cumulative sum,
is the n-th observation, and
is the mean of the process estimated in real time, then the cumulative sum
can be calculated as follows:
- -
—Cumulative Sum;
- -
—n-th observation;
- -
—mean of the process estimated in real time.
EWMA applies weighting factors which decrease exponentially. Older data are less important than new data, but still considered [
42]. This feature is very important in this solution, because an attack can occur and stop immediately so the new data is more important. The degree of weighing decrease is expressed as a constant smoothing factor
, a number between 0 and 1.
can be expressed as a percentage also [
6]. The equation of EWMA is calculated as follows:
- -
—mean of the process estimated in real time in n-th observation;
- -
—exponentially weighted moving average factor;
Factor at EWMA determines the amount of old data that enters into the EWMA formula. If = 1, that means only the most recent data are taken into consideration, opposite of that is when is closer to 0, then older data are more important. EWMA algorithm in its original state will have a high value of false positives, but with slight modifications and performance tuning this algorithm can achieve great results.
K-nearest neighbors algorithm is a simple algorithm that can be used for both regression and classification tasks [
43]. It is a non-parametric algorithm which is one of the main advantages, because in real world situations there are usually no rules when it comes to attack data. This algorithm classifies or predicts based on a K number of training instances. For example for a chosen value of K, any input instance will be classified or predicted to belong to the same class as the closest number of K instances nearest to it. Distance to the nearest instances is measured using several methods but the most popular is the Euclidean distance, which is calculated as follows:
- -
—i-th observation;
- -
—i-th observation.
For this algorithm, training data are needed and for that purpose a data set [
35] is being selected, and as we already said before for evaluation, the CIC-DDoS2019 dataset [
36] is going to be used because it is a well known dataset and more comparable with other methods.
3.2.3. Quality Parameters of Classification in Two Classes
We will consider a classification that classifies the results obtained using considered methods for IDS in this paper into two classes, positive and negative which means in our case, a correctly or incorrectly predicted attack. Then, the possible prediction results are as shown in
Table 2.
In
Table 1, TP + FN + FP + TN = N and N is the total number of members in the considered set to be classified. The matrix given in
Table 1 is called a 2 × 2 confusion matrix. As presented in
Table 1, there we can find four results, true positive (TP), false positive (FP), true negative (TN), and false negative (FN). We notice that these numbers are integers. Based on the possible results that are presented in
Table 1, for a two-class classifier, we will consider in this paper four quality parameters of classification and that are accuracy, precision, recall, and F1 measure which can be calculated as:
True negative rate is another one parameter significant for plotting the ROC (Receiver Operating Characteristics) curve:
The ROC curve is often used in analyzing the results of one classification process. In the case of binary classification, ROC represents at different classification thresholds in a two-dimensional coordinate system on the Ox axis the rate of false positive cases and on the Oy axis the rate of true positive cases, what is synonym for recall parameter. AUC (Area under the ROC Curve) provides an aggregate measure of performance across all possible classification thresholds as an efficient, sorting-based algorithm can provide information for us about quality of considered classification. AUC ranges in value from 0 to 1 and the model whose predictions are 100 percent wrong has an AUC of 0 and model that has 100 percent true predictions has an AUC of 1.