K-Nearest Neighbors with Third-Order Distance for Flooding Attack Classification in Optical Burst Switching Networks

Nuha, Hilal H.; Mugitama, Satria Akbar; Absa, Ahmed Abo; Sutiyo,

doi:10.3390/iot6010001

Open AccessArticle

K-Nearest Neighbors with Third-Order Distance for Flooding Attack Classification in Optical Burst Switching Networks

by

Hilal H. Nuha

^1,*,

Satria Akbar Mugitama

¹,

Ahmed Abo Absa

² and

Sutiyo

¹

CAATIS, Telkom University, Bandung 40257, Indonesia

²

Software Engineering Department, University of Palestine, Gaza Strip 890, Palestine

^*

Author to whom correspondence should be addressed.

IoT 2025, 6(1), 1; https://doi.org/10.3390/iot6010001

Submission received: 1 November 2024 / Revised: 15 December 2024 / Accepted: 23 December 2024 / Published: 25 December 2024

(This article belongs to the Special Issue 6G Optical Internet of Things (OIoT) for Sustainable Smart Cities)

Download

Browse Figures

Versions Notes

Abstract

:

Optical burst switching (OBS) is a network architecture that combines the advantages of packet and circuit switching techniques. However, OBS networks are susceptible to cyber-attacks, such as flooding attacks, which can degrade their performance and security. This paper introduces a novel machine learning method for flooding attack detection in OBS networks, based on a third-order distance function for k-nearest neighbors (KNN3O). The proposed distance is expected to improve detection accuracy due to higher sensitivity with respect to the distance difference between two points. The developed method is compared with seven other machine learning methods, namely standard KNN, KNN with cosine distance (KNNC), multi-layer perceptron (MLP), naive Bayes classifier (NBC), support vector machine (SVM), decision tree (DT), and discriminant analysis classifier (DAC). The methods are further assessed using five metrics: accuracy, precision, recall, F1-score, and specificity. The proposed method achieved an accuracy of 99.3%, outperforming the original KNN, MLP, and SVM, which achieved accuracies of 99%, 76.4%, and 94.7%, respectively. The results show that KNN3O is the best method for flooding attack detection in OBS networks, as it achieves the highest scores in all five metrics.

Keywords:

flooding attack; optical burst switching networks; k-nearest neighbors

1. Introduction

Optical burst switching (OBS) is an emerging network architecture that combines the advantages of circuit switching and packet switching. The OBS is preferable as it demonstrates lower latency relative to optical packet switched (OPS) networks [1]. Therefore, the OBS has become one of the fundamental technologies in optical IP networks [2]. It plays a crucial role in various environments, including the software-defined networks (SDN), wireless sensor networks (WSN), and Internet of Things (IoT) [3]. In IP networks, different types of cyber-attacks have emerged as significant threats, drawing researchers’ attention. These include cyber-attacks in long-range wide area networks (LoRaWAN) [4] and SDN [5]. Along with the growth of the optical networks industry by 15.2% from 2013 to 2018 [6], cyber-attacks on optical networks are a growing concern, as these networks are critical infrastructure that supports many aspects of modern society, including finance, healthcare, transportation, and communication. According to Thales Data Threat Report 2020, 26% of surveyed global organizations were breached in 2019 [6]. As such, it is important for organizations to implement robust cybersecurity measures to protect their optical networks from cyber threats.

Several approaches for cyber-attack countermeasures on optical networks have been proposed for OBS network intrusion detection systems (IDSs) as the network is prone to flooding attacks, which lead to low bandwidth utilization, degraded network performance, DOS, and high data loss rates. Due to many rules and exceptions, machine learning approaches are the best solution to detect these attacks. One effective method is using the deep convolution neural network (DCNN) model [7] to detect these attacks early on, with better performance compared to traditional models such as naive Bayes, SVM, and KNN due to the limited number of samples in the dataset. Chawathe [8] discusses how the OBS networks can be vulnerable to denial of service attacks due to a separation of control information from primary data, and proposes a solution using a monitoring method evaluated on a public dataset. Liu et al. [9] develop a combination of particle swarm optimization and support vector machine (PSO-SVM) to detect intrusion in OBS networks and use the UCI and NCENTs datasets to show that this model has better performance than traditional machine learning models. It is an effective and high-efficiency method for detecting burst header packet (BHP) flooding attacks in OBS networks. Furthermore, several methods have been developed for BHP flooding attacks in OBS networks, including a decision tree (DT) with selected features by Almaslukh [10], ant colony optimization (ACO), and a support vector machine (SVM) by Seddik et al. [11], and a decision forest classifier (DFC) with flower search optimization (FSO) by Panda et al. [12]. In addition to that, our research group has conducted multiple studies on machine learning methods for cyber-attacks in different types of networks, including wireless sensor networks [13], websites [14], and the IoT [15].

Motivated by the works of Rajab et al. [16,17] about flooding attack prevention on OBS, the main contribution of this paper is the development of a third-order distance for k-nearest neighbors (KNN3O) for flooding attack detection. In addition to that, this paper compares different types of machine learning methods for flooding attack detection. The remainder of this paper is organized as follows: Section 2 presents the proposed method and the comparative methods. Experimental results are discussed in Section 3. Lastly, Section 4 concludes this paper.

2. Methodologies

This section describes the methods used in this paper. First, the vulnerabilities of OBS networks, including node hijacking and flooding attacks, are presented. Several machine learning methods to detect intrusion are described. The last sub-section discusses the experimental setup.

2.1. OBS Networks Vulnerability and Intrusion Detection

The OBS transmission approach allows for the management of data in the optical domain while allowing for complex control header electronic processing in the counterpart domain. The procedure involves taking in incoming client data traffic at the OBS network ingress (edge) node and constructing a data burst (DB). A BHP, which contains the DB packet information, such as the offset time, arrival time, burst length, etc., is then set over a dedicated (out-of-band) wavelength division multiplexing (WDM) channel before the DB. The BHP is delivered before the DB with a particular time difference referred to as the offset time, which is used to configure the path for the core switches to process DBs and allocate the essential resources. At each intermediate node, the BHP must undergo an optical-electronic-optical (O-E-O) conversion and is electronically processed to reserve resources required by the incoming data burst in the optical domain. OBS data bursts come in various lengths and include different types of traffic, such as optical packets, IP packets, and ATM cells. The edge node sends the data as bursts, which are taken apart at the receiving edge router. This is illustrated in Figure 1.

This study is centered around the BHP flooding attack, which belongs to the denial of service (DoS) attack class. This attack is designed to hinder (legitimate) regular BHP allocation of crucial resources in the intermediate core switch. Similar to the traditional DoS attack aimed at the TCP protocol, such as SYN flooding, which inundates a victim host with a vast number of SYN requests without finalizing the connection setup and prevents it from accepting genuine connection requests, the BHP flooding attack also involves flooding the network with fake BHPs to seize necessary resources and prevent legitimate BHPs from reserving them. This article aims to thwart this type of attack through the application of machine learning techniques.

In a similar vein, a BHP flooding attack takes place when a hijacked or attacker node overwhelms the network by sending numerous BHPs without corresponding DBs. As soon as the WDM channels are allocated by a core switch for incoming BHPs, these channels’ states change from vacant to busy. The process of fake BHPs attacking a core switch is illustrated by Figure 2. It starts to provide each BHP with new WDM channels. This process results in legitimate BHPs being unable to allocate the required intermediate core switch resources. When a regular DB is received without any available vacant WDM channels, the core switch discards the DB, and the allocated channels stay busy, awaiting unidentifiable bursts that may never arrive.

This paper uses the security model developed by Rajab et al. [17], as shown in Figure 3, to defend the OBS network from BHP flooding attacks by analyzing the behavior of each node to counter harmful BHPs that exploit network resources. The developed model has multiple benefits, including easy implementation through software modification and integration into existing core switch infrastructure. It can also be deployed gradually to improve security. The model uses a sliding range window to classify all ingress nodes as Blocked, Trusted, or Suspicious based on their observed performance. The node category changes over time, with a node classified as Suspicious when a predetermined number of corresponding DBs are not sent on time and as Blocked when the packet dropping rate increases. If there is a BHP flooding attack, the classifier adds compromised nodes to the blocked list. However, nodes can improve their state by increasing their throughput and decreasing their packet dropping rate. The Trusted window, corresponding to one second, is divided into 10 slots, whereas the Blocked and Suspicious windows have 20 slots to scrutinize the node behavior in detail. The core switches that put a node in the Blocked category will not pass on its BHPs, but the node can be removed from the Blocked class if it stops transmitting fake BHPs and starts transmitting legitimate DBs.

2.2. K-Nearest Neighbors and Its Enhancement

K-nearest neighbors (KNN) is a machine learning algorithm used for classification tasks [18] by finding the k closest data points in the feature space to a given query point and then classifying the query point based on the labels or values of its

K

nearest neighbors.

Mathematically, the KNN algorithm can be described as follows: Let

X

be the feature space and

Y

be the label space. Given a query point

x (q) \in X

representing the input data for the model, this algorithm finds the

K

nearest data points

X_{K} = \{x (1) \dots x (k)\}

in

X

to

x (q)

based on the Euclidean distance [19] metric, which is given by:

d (x (i), x (j)) = | | x (i) - x (j) | |

(1)

where

d (x (i), x (j))

is the Euclidian distance metric between two points

x (i)

and

x (j)

. The Euclidean norm

| | \cdot | |

of

N

dimensional point

x

is given by:

| | x | | = \sqrt{x_{1}^{2} + x_{2}^{2} + \dots + x_{N}^{2}}

(2)

where

x_{n}

is the element of vector

x

at dimension

n

. Once the

K

nearest neighbors of

x (q)

are identified, the algorithm then determines the class xq based on the labels or values of its

K

nearest neighbors. For classification tasks, KNN selects the class label with the highest frequency among the

K

nearest neighbors.

Alternatively, a combination of KNN with cosine distance (KNNC) is commonly used. Cosine distance [20] is given by:

d_{C} (x (i), x (j)) = \frac{x (i) \cdot x (j)}{| | x (i) | | | | x (j) | |}

(3)

where the dot operator

(\cdot)

denotes the inner product.

This paper proposes a new class of distance using a third-order exponential as given below:

d_{(3)} (x (i), x (j)) = |\sum_{n = 1}^{N} {(x_{n} (i) - x_{n} (j))}^{3}|

(4)

The proposed combination of KNN with the third order of exponential distance (KNN3O) is expected to increase the classification model’s accuracy. Mathematically, the

d_{(3)}

function can be written as:

d_{(3)} (x, y) = | {(x_{1} - y_{1})}^{3} + {(x_{2} - y_{2})}^{3} + . . . + {(x_{n} - y_{n})}^{3} |

(5)

where

x

and

y

are two n-dimensional vectors representing two data points, and

|.|

denotes the absolute value.

Compared to the Euclidean distance, the

d_{(3)}

function is more sensitive to differences between values because the third power amplifies the difference between them. This means that the differences in a single dimension can have a large impact on the overall distance between the two points.

Theorem 1.

The

d_{(3)}

function is more sensitive to differences than the Euclidean distance.

Proof.

Let us consider the two-dimensional case, where x and y are two vectors with elements

x_{1}

,

x_{2}

and

y_{1}

,

y_{2}

, respectively. Then, the Euclidean distance between x and y is given by:

d (x, y) = \sqrt{{(x_{1} - y_{1})}^{2} + {(x_{2} - y_{2})}^{2}}

(6)

The

d_{(3)}

function between x and y is given by:

d_{(3)} (x, y) = | {(x_{1} - y_{1})}^{3} + {(x_{2} - y_{2})}^{3}) |

(7)

□

The partial derivative of

d (x, y)

with respect to the difference between the first coordinate is given by:

\frac{\partial d (x, y)}{\partial (x_{1} - y_{1})} = \frac{(x_{1} - y_{1})}{d (x, y)}

(8)

Whereas the counterpart of

d_{(3)}

is given by

\frac{\partial d_{(3)} (x, y)}{\partial (x_{1} - y_{1})} = 3 {(x_{1} - y_{1})}^{2} s i g n (x_{1} - y_{1})

(9)

where “sign” denotes the sign function.

It can be noticed that the partial derivative of

d_{(3)} (x, y)

with respect to

(x_{1} - y_{1})

is larger than the partial derivative of

d (x, y)

with respect to the same quantity, except at the point where

x_{1} \approx y_{1}

, where both derivatives are close to zero. This means that the

d_{(3)} (x, y)

function is more sensitive to differences between the first coordinates than the Euclidean distance. QED.

Figure 4 shows the comparison between two partial derivatives where the partial derivative of

d_{(3)} (x, y)

shows more significant values than that of the standard distance. This means that the

d_{(3)}

function can better distinguish between points that are close together in Euclidean space, which can be beneficial for KNN classification.

2.3. Multi-Layer Perceptron for Classification

Multi-layer perceptron (MLP) [21] is a feedforward neural network that consists of an input layer, single or multiple hidden layers, and an output layer. The input layer takes the input data, and each neuron in the input layer represents one feature of the input. The hidden layers perform non-linear transformations on the input, and the output layer produces the final output.

Activation functions introduce non-linearity in the neural network. In this study, the hidden layers used the rectified linear unit (RelU) activation function, which is defined as:

f (x) = \max (0, x)

(10)

The output layer of MLPs used for classification problems usually employs the softmax activation function. The softmax function converts the output of each neuron in the output layer into probabilities where the softmax function is defined as:

s o f t m a x (z_{j}) = \frac{e^{z_{j}}}{\sum_{k}^{K} e^{z_{k}}}

(11)

where

z_{j}

is the input to the jth neuron in the output layer.

The cost function measures the difference between the predicted output and the true value. In this study, cross-entropy is used as the cost function.

C = \frac{\sum_{n = 1}^{N} (\sum_{k}^{K} y_{(n, k)} \log {\hat{y}}_{(n, k)})}{N}

(12)

where N is the total number of samples,

y_{(n, k)}

is a one-hot encoding of the true label of the n-th sample and class k, and

{\hat{y}}_{(n, k)}

is the predicted probability distribution over the classes for the n-th sample and class k.

This paper uses the limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS)-based backpropagation algorithm to train the network. The error is propagated back through the network, and the weights are updated to minimize the cost function. The derivative of the cost function with respect to the output of the output layer is:

\frac{\partial C}{\partial z_{j}} = {\hat{y}}_{j} - y_{j}

(13)

where

y_{j}

is the true label for the jth neuron in the output layer, and

{\hat{y}}_{j}

is the predicted probability for the jth neuron in the output layer.

2.4. Support Vector Machine

This paper uses a support vector machine (SVM) [22] with error-correcting output codes (ECOC) for multiclass classification. Given a training set of

n

samples, each with

m

features and

k

possible classes, the goal of multiclass classification is to learn a function

f (x)

that maps each input sample

x

to one of the

k

possible classes.

To convert the

k

-class problem into a set of binary classification problems, a coding matrix

C

is first created, which is a

k \times t

binary matrix. Each column of

C

corresponds to a binary classifier and each row corresponds to a distinct binary problem. The entries in each column indicate which classes are included in the positive class for that binary classifier. The number of columns,

t

, is determined by the desired trade-off between the number of classifiers and accuracy.

The

t

binary classifiers are trained independently, one for each column of

C

. The binary classifier for column

i

is trained to distinguish between the positive class defined by column

i

of

C

and all the other classes. The training data for each binary classifier consist of the original data set, but with the labels modified according to column

i

of

C

.

To train each binary classifier, the following SVM optimization problem must be solved:

minimize \frac{1}{2} {| | w | |}^{2} + D \sum_{i}^{N} m a x (0, 1 - y_{i} (w^{T} x_{i} + b)) subject to y_{i} (w^{T} x_{i} + b) \geq 1 for all i = 1, 2, . . ., N, y_{i} \in {- 1, 1}

(14)

where

w

is the weight vector for the binary classifier,

b

is the bias term,

x_{i}

is the

i

th sample,

y_{i}

is the corresponding binary classifier label for the

i

th sample, and

D

is a regularization parameter. The objective is to obtain the optimal hyperplane that maximizes the separation between the two classes. The hinge loss function ensures that the classifier does not make errors by imposing a penalty when a sample is misclassified.

The interior-point method can be used to solve this optimization problem by first converting it into a form that can be solved using an unconstrained optimization algorithm. This is achieved by introducing a logarithmic barrier function that penalizes solutions that violate the constraints of the problem. The barrier function is defined as:

φ (x) = - \sum (\log (y_{i} (w^{T} x_{i} + b) - 1)) - \sum (\log (- x))

(15)

where the second term ensures that x is positive, to prevent the logarithm from being undefined.

The interior-point method then minimizes the objective function subject to the barrier function by solving the following problem:

minimize \frac{1}{2} {||w||}^{2} + C \sum_{i}^{N} m a x (0, 1 - y_{i} (w^{T} x_{i} + b)) + μ φ (x)

(16)

where μ is a positive parameter that controls the trade-off between the objective function and the barrier function.

The interior-point method solves this problem iteratively by first choosing an initial feasible point, and then solving a sequence of barrier problems with increasing values of μ. At each iteration, the interior-point method computes the gradient and Hessian of the objective function subject to the barrier function, and then updates the decision variables (w, b) using a Newton-like method. The interior-point method also uses a step-size parameter to control the rate at which the decision variables move towards the boundary of the feasible region.

The interior-point method continues iterating until the solution converges to a point that satisfies the constraints within a predefined tolerance. Once a solution is found, the decision variables (w, b) can be used to define the hyperplane that divides the data points into two classes with a maximum margin. The support vectors are the data points that lie on the margin or violate the margin constraint, and their corresponding Lagrange multipliers can be used to calculate their importance in defining the hyperplane.

To classify a new input sample

x

, each binary classifier produces a score indicating the confidence that

x

belongs to the positive class for that classifier. The scores for all

t

classifiers are concatenated to form a vector

s

of length t. A decoding matrix

Γ

is used to map the vector

s

back to a

k

-dimensional output vector

y

. Each row of

Γ

corresponds to a class label, and the entries in each row indicate which binary classifiers vote for that class. The output class for

x

is the class corresponding to the row of

D

with the highest score. By using binary classifiers with a coding matrix and decoding matrix, they can be extended to handle problems with multiple classes while still maintaining high accuracy. The error correcting technique improves the robustness of the classifier by correcting errors that may have occurred due to misclassification by the binary classifiers.

2.5. Naive Bayes Classifier

The naive Bayes classifier (NBC) [23] for multiclass classification is an extension of the binary classifier. Let X be a d-dimensional feature vector and y be a class label taking one of K possible values. The goal is to predict the class label y given the feature vector X. The NBC assumes that the features

X = {X_{1}, X_{2}, . . ., X_{d}}

are conditionally independent given the class label y.

Mathematically, the NBC computes the posterior probability of the class label y given the feature vector X as:

P (y | X) = P (X | y) \frac{P (y)}{P (X)}

(17)

where

P (X| y)

is the probability of observing the feature vector

X

given the class label

y

,

P (y)

denotes the prior likelihood of the class label y, and

P (X)

is the marginal likelihood of the feature vector

X

.

The NBC estimates the likelihood and prior probabilities from the training data. For a given class label y, the likelihood is modeled as a multivariate Gaussian distribution:

P (X| y) = {(2 π)}^{- \frac{d}{2}} d e t {(Σ_{y})}^{- \frac{1}{2}} e x p (- \frac{1}{2} {(X - μ_{y})}^{'} Σ_{y}^{- 1} (X - μ_{y}))

(18)

where

μ_{y}

is the mean vector and

Σ_{y}

is the covariance matrix of the training samples with class label

y

.

The prior likelihood of the class label y is estimated as the frequency of the class label in the training data:

P (y) = \frac{N_{y}}{N}

(19)

where

N_{y}

is the number of training samples with class label y and N is the total number of training samples.

To classify a new feature vector X, the NBC computes the posterior probability for each class label y and assigns the label with the highest probability:

\hat{y} = a r g m a x_{y} P (y| X) = a r g m a x_{y} P (X| y) P (y)

(20)

where

\hat{y}

is the predicted class label for X.

2.6. Decision Tree

This paper also uses a decision tree (DT) [24] model to perform the classification. The mathematical representation of DT consists of training data (X) and a class label (Y). The training data is a matrix of size N-by-P, where N, P, and Y denote the number of observations, the number of predictor variables, and a vector of true class labels of size N-by-1, respectively.

The tree is constructed by recursively partitioning the data into subsets based on the predictor variables, using a splitting criterion that maximizes the Gini index. The DT is grown until the stopping criteria, such as minimum number of observations per leaf or a maximum tree depth, are met.

The splitting criterion that maximizes the Gini index is a measure of impurity used in decision tree learning. It is based on the concept of Gini impurity, which is a measure of the probability of misclassification.

The Gini index for a binary split is defined as:

G = (p_{0}) (1 - p_{0}) + (p_{1}) (1 - p_{1})

(21)

where

p_{0}

is the proportion of samples in the left child node that belong to the first class, and

p_{1}

is the proportion of samples in the left child node that belong to the second class.

The Gini index for a multiclass split is defined as:

G = 1 - \sum_{i}^{K} p_{i}^{2}

(22)

where

p_{i}

is the proportion of samples in the ith class in the left child node.

The splitting criterion that maximizes the Gini index is the one that minimizes the impurity in the resulting child nodes. The decision tree algorithm evaluates all possible splits and selects the one that results in the greatest reduction in the Gini index. This process is repeated recursively until a stopping criterion is met, such as reaching a maximum tree depth or a minimum number of samples in a leaf node.

The output of DT is a trained model which is a binary tree structure consisting of decision nodes and leaf nodes. Each decision node specifies a test on one of the predictor variables, and each leaf node assigns a class label to the observations that reach it based on the majority class of the training samples in that node. Once the DT model is trained, it can be used to predict the class labels of new observations.

2.7. Discriminant Analysis Classifier

Discriminant analysis classifier (DAC) model [25] is a linear classification method that assumes that the predictors have a multivariate normal distribution and that the class covariances are equal. DAC is a similar method but does not assume equal class covariances.

Mathematically, given a set of input data X of size

N \times p

, where N is the number of observations and p is the number of predictors, and a corresponding response variable Y of size

N \times 1

, where Y contains the categorical labels for each observation, DAC finds the linear or quadratic discriminant function that best classifies the observations into K classes.

For DAC, the discriminant function for each class

k

is defined as:

g_{k} (x) = - \frac{1}{2} {(x - μ_{k})}^{'} Σ_{k}^{- 1} (x - μ_{k}) - \frac{1}{2} l o g (d e t (Σ_{k})) + l o g (P_{k})

(23)

where

μ_{k}

is the mean vector for class

k

,

Σ_{k}

is the covariance matrix for class

k

,

P_{k}

is the prior probability of class

k

, and

d e t (Σ_{k})

is the determinant of the covariance matrix for class k.

This paper uses pseudo quadratic, which estimates the coefficients of the quadratic discriminant function by maximizing the likelihood function of the data, given the parameters of the model. The likelihood function is a measure of how well the model fits the data, and the maximum likelihood estimates of the model parameters are those that maximize the likelihood function.

The use of pseudo quadratic can improve the performance of the discriminant analysis model when the assumptions of separate covariance matrices are violated. This method adjusts the covariance matrix of the predictor variables to have a more quadratic form, which can better capture the non-linear relationships among the predictor variables and improve the accuracy of the classification model.

2.8. Experiment Setup

This section provides details on the dataset and configuration, along with the corresponding experimental results of the proposed method. To evaluate the performance of the classifier, a dataset provided by [9], where a NCTUns network simulator modification was used, and the simulation topology in Figure 5 included single legitimate sender (1), a single receiver (14), a single attacker (13), eight core switches (3–10), two ingress edge routers (2 and 11), and a single egress edge router (12). The attacker node was placed near the receiver to highlight its impact and increase the probability of detection. Only one attacker and one legitimate ingress node were used in the experiments, as the focus was on testing the classifier against BHP flooding attacks. In the original simulation, the eight core switches serve to simulate the complexities of an OBS network, while the attacker node can be deployed to any core switches. However, the attacker node is placed near the receiver to indicate its flooding effect, instead of core network congestion effects. Ten trace files with increasing User Datagram Protocol (UDP) traffic load rates were created for the legitimate sender’s traffic, starting at 0.1 Gbps and increasing by 0.1 Gbps up to 1 Gbps. For each legitimate traffic load rate, the network is tested with three different attack schemes, namely lightweight, medium, and powerful, corresponding to attack traffic load rate 0.2 Gbps, 0.5 Gbps, and 1 Gbps, respectively. The simulation parameter is summarized in Table 1. All machine learning methods are implemented using Matlab 2023a.

The simulation produces 1075 samples, with each sample having the attributes given in Table 2:

The original dataset contains 21 input attributes. However, in our experiments, the 20th attribute, which contains nodes’ initial classification labels, is removed from the model inputs to allow the model to learn only from numerical input. The 21st attribute, representing the percentage of flood per node, is also removed to increase the detection difficulty. In addition to that, the target class label is one of four classes of nodes, namely No Block (NB), NB-Wait, NB-No Block, and Block.

All the methods in this study used a k = 5-fold validation approach, where 80% of the data was used for training and 20% was used for testing. This approach allowed for the evaluation of the performance of each method using multiple independent datasets, which helped to ensure the reliability of the results. By dividing the data into training and testing sets, the models were able to learn from the training data and generalize their performance to the testing data. This also allowed for the identification of any overfitting or underfitting issues that could affect the accuracy of the models.

This paper uses accuracy, precision, recall, F1, and specificity, which are commonly used metrics to evaluate the performance of classification models. Accuracy represents the ratio of correctly classified instances among all samples. Mathematically, accuracy is defined as:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(24)

where true positive (TP) and true negative (TN) represent the number of samples correctly classified as positive and negative, respectively. On the contrary, false positive (FP) and false negative (FN) are the number of instances incorrectly classified as positive and negative, respectively.

Precision is the ratio of correctly classified positive instances among all samples classified as positive. Mathematically, precision is defined as:

P r e c i s i o n = \frac{T P}{T P + F P}

(25)

Recall or TP rate is the proportion of correctly classified positive instances among all actual positive samples. Mathematically, recall is defined as:

R e c a l l = \frac{T P}{T P + F N}

(26)

F1 score is the harmonic mean of recall and precision, which provides a numerical value to balance between recall and precision. Mathematically, F1 score is defined as:

F 1 s c o r e = 2 \times \frac{R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(27)

Specificity is a statistical measure that describes how well a binary classifier can identify true negative cases, or the ratio of actual negatives that are correctly identified by the classifier. It is calculated as the proportion of TN predictions over the sum of TN and FP predictions, expressed as:

S p e c i f i c i t y = \frac{T N}{T N + F P}

(28)

In other words, specificity tells us how good the classifier is at avoiding false positives or how often it correctly identifies cases that are negative.

3. Results and Discussion

This section provides the results of the simulation of all methods. First, Table 3 contains a summary of all the experiments. A detailed sample confusion matrix of each method is provided to show the performance of the method. Table 2 summarizes the results, with all methods benchmarked in relation to accuracy, precision, recall, F1 score, and specificity for all trials (k = 1…5), along with the mean.

Table 3 presents an analysis of the performance of different methods for a classification problem. It evaluates the methods based on five metrics: accuracy, precision, recall, F-1 score, and specificity. The main focus is on comparing the highest and lowest scores for each metric, as well as the average performance and the variation among the methods.

The highest accuracy score achieved by KNN3O is impressive at 0.993, indicating that it correctly classified 99.3% of the test data. On the other hand, DAC’s accuracy score of 0.54024 is considerably lower, suggesting that it only correctly classified 54% of the test data. This reveals a significant disparity in performance between the two methods. The precision scores further highlight KNN3O’s superiority, with a precision score of 0.993, indicating high accuracy in classifying positive examples. In contrast, DAC, MLP, and NBC precision scores below 0.8 show that they are imprecise and prone to errors.

Similarly, KNN3O dominates the recall scores with a high score of 0.993, indicating its sensitivity and accuracy in classifying positive examples. DAC, on the other hand, has the lowest recall score of 0.66826. The F1-score, which balances precision and recall, shows KNN3O’s strength with a score of 0.993, indicating a good balance between the two. DAC’s F1-score of 0.588076 suggests a poor balance between precision and recall. Regarding specificity, KNN3O stands out with the highest score of 0.99666, indicating its reliability in correctly classifying negative examples. On the contrary, DAC’s low specificity score of 0.843448 indicates its proneness to false positives.

Based on the analysis of all the metrics, KNN3O emerges as the best method for this problem, exhibiting the highest scores in all five metrics. It is closely followed by KNN with the standard Euclidean distance and KNNC. The proposed KNN3O achieved 100% accuracy in four trials and 96.5% in a single trial, resulting in an average accuracy of 99.3%. In comparison, KNN and KNNC achieved 100% accuracy in only three trials, with average accuracies of 99% and 98.7%, respectively. This indicates that the proposed distance function effectively enhances the performance of the standard KNN. DAC is identified as the worst-performing method. Other methods like NBC, MLP, SVM, and DT fall somewhere in between.

Detailed confusion matrices for all methods at k = 1 are shown in Figure 6. It can be noticed that the proposed distance combined with the KNN (KNN3O) perfectly detects the attacks and classifies the other types of traffic.

4. Conclusions

In conclusion, this paper has presented a comparative study of different machine learning methods for flooding attack counter measure in OBS networks. The results show that KNN3O is the best method for this problem, as it outperforms the other methods in all five metrics. The proposed KNN3O method uses a third-order distance function to measure the similarity between bursts and to classify them as normal or abnormal. This method can detect flooding attacks with high accuracy, precision, recall, and specificity. The other methods have lower performance metrics than the proposed method.

Author Contributions

Conceptualization, H.H.N. and A.A.A.; methodology, H.H.N.; software, S.A.M.; validation, A.A.A.; formal analysis, S.; investigation, H.H.N.; resources, S.A.M.; data curation, S.A.M.; writing—original draft preparation, H.H.N.; writing—review and editing, H.H.N. and A.A.A.; visualization, H.H.N.; supervision, S.; project administration, S.; funding acquisition, S. All authors have read and agreed to the published version of the manuscript.

Funding

This research and the APC were funded by PPM of Telkom University grant number 199/LIT06/PPM-LIT/2024.

Data Availability Statement

This experiment was developed using Matlab and can be accessed here https://github.com/hilalnuha/KNN3O, accessed on 17 July 2023.

Acknowledgments

This paper was supported by PPM of Telkom University under the research grant entitled “Perlindungan Terhadap Virtual Jamming Menggunakan Machine Learning” No. 199/LIT06/PPM-LIT/2024. The authors would like to acknowledge the assistance of ChatGPT, an AI language model developed by OpenAI, for its support in paraphrasing and proofreading the content of this paper. The model helped enhance the quality and clarity of the writing.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rajaduray, R. Unbuffered and Limited-Buffer All-Optical Networks. Ph.D. Thesis, University of California, Santa Barbara, CA, USA, 2005. Available online: https://www.proquest.com/openview/5ae4994dfe72f2d3e57cb27dbb9d5866/1?pq-origsite=gscholar&cbl=18750&diss=y (accessed on 28 February 2023).
Qiao, C.; Yoo, M. Optical burst switching (OBS)—A new paradigm for an Optical Internet. J. High Speed Netw. 1999, 8, 69–84. [Google Scholar]
Ujalambkar, D.; Chowdhary, G. Allocation of channels over optical burst switching (OBS) networks in smart cities using integrated statistical techniques. Int. J. Syst. Assur. Eng. Manag. 2022, 13, 385–396. [Google Scholar] [CrossRef]
Esteves, G.; Fidalgo, F.; Cruz, N.; Simão, J. Long-Range Wide Area Network Intrusion Detection at the Edge. IoT 2024, 5, 871–900. [Google Scholar] [CrossRef]
AlSharman, S.A.; Al-Khaleel, O.; Al-Ayyoub, M. A Detailed Inspection of Machine Learning Based Intrusion Detection Systems for Software Defined Networks. IoT 2024, 5, 756–784. [Google Scholar] [CrossRef]
Bertaina, A. Ensuring Data Remains Cybersecure with Optical Fibers. Available online: https://www.cablinginstall.com/cable/article/14210318/ensuring-data-remains-cybersecure-with-optical-fibers (accessed on 31 October 2024).
Zahid Hasan, M.; Zubair Hasan, K.M.; Sattar, A. Burst header packet flood detection in optical burst switching network using deep learning model. Procedia Comput. Sci. 2018, 143, 970–977. [Google Scholar] [CrossRef]
Chawathe, S.S. Analysis of burst header packets in optical burst switching networks. In Proceedings of the 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA), Cambridge, MA, USA, 1–3 November 2018; IEEE: Piscataway, NJ, USA; pp. 1–5. [Google Scholar]
Liu, S.; Liao, X.; Shi, H. A pso-svm for burst header packet flooding attacks detection in optical burst switching networks. Photonics 2021, 8, 555. [Google Scholar] [CrossRef]
Almaslukh, B. An Efficient and Effective Approach for Flooding Attack Detection in Optical Burst Switching Networks. Secur. Commun. Netw. 2020, 2020, 8840058. [Google Scholar] [CrossRef]
Seddik, M.T.; Kadri, O.; Bouarouguene, C.; Brahimi, H. Detection of flooding attack on obs network using ant colony optimization and machine learning. Comput. Sist. 2021, 25, 423–433. [Google Scholar] [CrossRef]
Panda, M.; Gandhi, N.; Abraham, A. Decision forest classifier with flower search optimization algorithm for efficient detection of bhp flooding attacks in optical burst switching network. In Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
Sepvira, A.F.; Suryani, V.; Wardana, A.A. Benchmarking Machine Learning Algorithm for Routing Attack Detection in Wireless Sensor Network. In Proceedings of the 2023 International Conference on Data Science and Its Applications (ICoDSA 2023), Bandung, Indonesia, 9–10 August 2023. [Google Scholar] [CrossRef]
Ridho, M.R.; Nuha, H.H. Application of Extreme Learning Machine (ELM) Classification in Detecting Phishing Sites. In Proceedings of the 2022 5th International Conference of Computer and Informatics Engineering (IC2IE), Jakarta, Indonesia, 13–14 September 2022; IEEE: Piscataway, NJ, USA; pp. 60–64. [Google Scholar]
Putrada, A.G.; Alamsyah, N.; Pane, S.F.; Fauzan, M.N.; Perdana, D. AUC Maximization for Flood Attack Detection on MQTT with Imbalanced Dataset. In Proceedings of the 2023 International Conference on Information Technology Research and Innovation (ICITRI 2023), Jakarta, Indonesia, 16 August 2023. [Google Scholar] [CrossRef]
Rajab, A.; Huang, C.T.; Al-Shargabi, M.; Cobb, J. Countering burst header packet flooding attack in optical burst switching network. In Information Security Practice and Experience, Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Zhangjiajie, China, 16–18 November 2016; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
Rajab, A.; Huang, C.T.; Al-Shargabi, M. Decision tree rule learning approach to counter burst header packet flooding attack in Optical Burst Switching network. Opt. Switch. Netw. 2018, 29, 15–26. [Google Scholar] [CrossRef]
Zhang, S. Challenges in KNN Classification. IEEE Trans. Knowl. Data Eng. 2021, 34, 4663–4675. [Google Scholar] [CrossRef]
Setiawan, A.; Hadiyanto, H.; Widodo, C.E. Distance Estimation Between Camera and Shrimp Underwater Using Euclidian Distance and Triangles Similarity Algorithm. Ing. Syst. D’inf. 2022, 27, 717–724. [Google Scholar] [CrossRef]
Pan, C.; Huang, J.; Hao, J.; Gong, J. Towards zero-shot learning generalization via a cosine distance loss. Neurocomputing 2020, 381, 167–176. [Google Scholar] [CrossRef]
Al Bataineh, A.; Manacek, S. MLP-PSO Hybrid Algorithm for Heart Disease Prediction. J. Pers. Med. 2022, 12, 1208. [Google Scholar] [CrossRef] [PubMed]
Han, X.; Peng, J. Bird sound classification based on ECOC-SVM. Appl. Acoust. 2023, 204, 109245. [Google Scholar] [CrossRef]
Ressan, M.B.; Hassan, R.F. Naïve-Bayes family for sentiment analysis during COVID-19 pandemic and classification tweets. Indones. J. Electr. Eng. Comput. Sci. 2022, 28, 375–383. [Google Scholar] [CrossRef]
Lee, C.S.; Cheang, P.Y.S.; Moslehpour, M. Predictive Analytics in Business Analytics: Decision Tree. Adv. Decis. Sci. 2022, 26, 1–29. [Google Scholar] [CrossRef]
Li, X.; Wang, Q.; Nie, F.; Chen, M. Locality Adaptive Discriminant Analysis Framework. IEEE Trans. Cybern. 2022, 52, 7291–7302. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Typical OBS network architecture: (a) an ingress node assembles the packets; (b) a core switch BHP with a BHP O-E-O converter allocates resources forthe incoming OBS network data burst.

Figure 2. BHP flooding attack scenario on OBS networks core switches.

Figure 3. The model used for the classification process.

Figure 4. Comparison between the partial derivative of

d_{(3)} (x, y)

and the partial derivative of

d (x, y)

with respect to the same difference.

Figure 4. Comparison between the partial derivative of

d_{(3)} (x, y)

and the partial derivative of

d (x, y)

with respect to the same difference.

Figure 5. OBS network topology used in the simulation.

Figure 6. Confusion matrix of each method at k = 1.

Table 1. Parameters used in the simulation.

Parameter	Value
Link bandwidth	1 Mbps
Sender UDP traffic load rate	0.1 Gbps, 0.2 Gbps,…1 Gbps
Attacker traffic load rate	0.2 Gbps, 0.5 Gbps, and 1 Gbps
Propagation delay	$1 μ s$
Maximum burst length	16,000 bytes
MLP activation type	ReLU
MLP output layer type	SoftMax
MLP hidden layer size	10 units
Standardize	1

Table 2. Attributes of the data [9].

Attributes	Name	Description
A1	Node	The identifier of the node that sends data.
A2	Utilized Bandwidth Rate	The normalized rate of reserved usable bandwidth.
A3	Packet Drop Rate	The proportion of lost packets and the sent data at each node.
A4	Full Bandwidth	The initial amount of allocated bandwidth reserved by the user to each node.
A5	Average Delay Time Per Sec	The mean latency experienced by each node per second.
A6	Percentage Of Lost Packet Rate	The ratio of lost packets at each node in percentage.
A7	Percentage Of Lost Byte Rate	The ratio of lost bytes at each node.
A8	Packet Received Rate	The amount of data packets arriving at each node per second on the allocated bandwidth.
A9	Used Bandwidth	The bandwidth size that each node utilizes within the reserved bandwidth (A4).
A10	Lost Bandwidth	The bandwidth size that each node loses within the reserved bandwidth (A4).
A11	Packet Size Byte	The allocated size of each data packet for each node.
A12	Packet Transmitted	The amount of data packets sent by each node per second within the reserved bandwidth (A4).
A13	Packet Received	The quantity of packets arriving at each node per second within the reserved bandwidth (A4).
A14	Packet Lost	The quantity of packets lost per node per second within the lost bandwidth (A10).
A15	Transmitted Byte	The quantity of bytes transmitted per node per second.
A16	Received Byte	The quantity of bytes arriving per second at each node within the allocated bandwidth.
A17	10-Run-AVG-Drop-Rate	The mean packet loss rate (A3) measured from 10 simulation trials.
A18	10-Run-AVG-Bandwidth-Use	The mean of consumed bandwidth amount (A9) measured from 10 simulation trials.
A19	10-Run-Delay	The mean of latency measured from 10 simulation trials.

Table 3. Comparative results summary.

Methods	Accuracy		Precision		Recall		F1 Score		Specificity
Methods	K = 1…5	Mean	K = 1…5	Mean	K = 1…5	Mean	K = 1…5	Mean	K = 1…5	Mean
KNN3O	1	0.993	1	0.993	1	0.993	1	0.993	1	0.99666
	1		1		1		1		1
	0.965		0.965		0.965		0.965		0.98833
	1		1		1		1		1
	1		1		1		1		1
KNNC	0.98545	0.98709	0.98545	0.98709	0.98545	0.98709	0.98545	0.98709	0.99515	0.993756
	1		1		1		1		1
	0.95		0.95		0.95		0.95		0.98333
	1		1		1		1		1
	1		1		1		1		1
KNN	0.98545	0.99009	0.97959	0.990932	0.9916	0.988042	0.98509	0.98905	0.99565	0.996346
	1		1		1		1		1
	0.965		0.97507		0.94861		0.96016		0.98608
	1		1		1		1		1
	1		1		1		1		1
MLP	0.76727	0.764454	0.76431	0.770656	0.74488	0.758698	0.74835	0.761338	0.91101	0.90865
	0.775		0.74432		0.7214		0.72867		0.91139
	0.725		0.77605		0.75862		0.76657		0.8922
	0.74		0.728		0.7588		0.7406		0.90402
	0.815		0.84054		0.80939		0.8225		0.92463
SVM	0.94182	0.947364	0.95113	0.9661784	0.95786	0.943344	0.95371	0.956512	0.94182	0.965546
	0.94		0.95994		0.93021		0.97226		0.94386
	0.965		0.97448		0.96703		0.96989		0.98661
	0.92		0.94745		0.89805		0.91612		0.96759
	0.97		0.97892		0.96357		0.97058		0.98785
NBC	0.77455	0.73991	0.75171	0.728816	0.82922	0.7881308	0.77368	0.745676	0.92378	0.908984
	0.765		0.73738		0.79565		0.75441		0.91987
	0.7		0.71575		0.75332		0.73146		0.88985
	0.695		0.69216		0.75579		0.70892		0.89326
	0.765		0.74708		0.80674		0.75991		0.91816
DT	0.97091	0.952182	0.97272	0.951586	0.96731	0.941954	0.96993	0.944724	0.98899	0.981284
	0.965		0.96559		0.94414		0.95405		0.98442
	0.895		0.89844		0.85051		0.86659		0.95814
	0.95		0.93785		0.95868		0.94723		0.98181
	0.98		0.98333		0.98913		0.98582		0.99306
DAC	0.58182	0.54024	0.77066	0.720094	0.71182	0.66826	0.61736	0.588076	0.85555	0.843448
	0.515		0.7331		0.66629		0.582		0.84564
	0.565		0.75315		0.68075		0.61719		0.85012
	0.5		0.63353		0.6313		0.54		0.82391
	0.54		0.71003		0.65114		0.58383		0.84202

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nuha, H.H.; Mugitama, S.A.; Absa, A.A.; Sutiyo. K-Nearest Neighbors with Third-Order Distance for Flooding Attack Classification in Optical Burst Switching Networks. IoT 2025, 6, 1. https://doi.org/10.3390/iot6010001

AMA Style

Nuha HH, Mugitama SA, Absa AA, Sutiyo. K-Nearest Neighbors with Third-Order Distance for Flooding Attack Classification in Optical Burst Switching Networks. IoT. 2025; 6(1):1. https://doi.org/10.3390/iot6010001

Chicago/Turabian Style

Nuha, Hilal H., Satria Akbar Mugitama, Ahmed Abo Absa, and Sutiyo. 2025. "K-Nearest Neighbors with Third-Order Distance for Flooding Attack Classification in Optical Burst Switching Networks" IoT 6, no. 1: 1. https://doi.org/10.3390/iot6010001

APA Style

Nuha, H. H., Mugitama, S. A., Absa, A. A., & Sutiyo. (2025). K-Nearest Neighbors with Third-Order Distance for Flooding Attack Classification in Optical Burst Switching Networks. IoT, 6(1), 1. https://doi.org/10.3390/iot6010001

Article Menu

K-Nearest Neighbors with Third-Order Distance for Flooding Attack Classification in Optical Burst Switching Networks

Abstract

1. Introduction

2. Methodologies

2.1. OBS Networks Vulnerability and Intrusion Detection

2.2. K-Nearest Neighbors and Its Enhancement

2.3. Multi-Layer Perceptron for Classification

2.4. Support Vector Machine

2.5. Naive Bayes Classifier

2.6. Decision Tree

2.7. Discriminant Analysis Classifier

2.8. Experiment Setup

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI