A Novel Metaheuristic-Based Methodology for Attack Detection in Wireless Communication Networks

Ismail, Walaa N.

doi:10.3390/math13111736

Open AccessArticle

A Novel Metaheuristic-Based Methodology for Attack Detection in Wireless Communication Networks

by

Walaa N. Ismail

^1,2

¹

Department of Management Information Systems, College of Business Administration, Al Yamamah University, Riyadh 11512, Saudi Arabia

²

Faculty of Computers and Information, Minia University, Minia 61519, Egypt

Mathematics 2025, 13(11), 1736; https://doi.org/10.3390/math13111736

Submission received: 10 April 2025 / Revised: 15 May 2025 / Accepted: 20 May 2025 / Published: 24 May 2025

Download

Browse Figures

Versions Notes

Abstract

The landscape of 5G communication introduces heightened risks from malicious attacks, posing significant threats to network security and availability. The unique characteristics of 5G networks, while enabling advanced communication, present challenges in distinguishing between legitimate and malicious traffic, making it more difficult to detect anonymous traffic. Current methodologies for intrusion detection within 5G communication exhibit limitations in accuracy, efficiency, and adaptability to evolving network conditions. In this study, we explore the application of an adaptive optimized machine learning-based framework to improve intrusion detection system (IDS) performance in wireless network access scenarios. The framework used involves developing a lightweight model based on a convolutional neural network with 11 layers, referred to as CSO-2D-CNN, which demonstrates fast learning rates and excellent generalization capabilities. Additionally, an optimized attention-based XGBoost classifier is utilized to improve model performance by combining the benefits of parallel gradient boosting and attention mechanisms. By focusing on the most relevant features, this attention mechanism makes the model suitable for complex and high-dimensional traffic patterns typical of 5G communication. As in previous approaches, it eliminates the need to manually select features such as entropy, payload size, and opcode sequences. Furthermore, the metaheuristic Cat Swarm Optimization (CSO) algorithm is employed to fine-tune the hyperparameters of both the CSO-2D-CNN and the attention-based XGBoost classifier. Extensive experiments conducted on a recent dataset of network traffic demonstrate that the system can adapt to both binary and multiclass classification tasks for high-dimensional and imbalanced data. The results show a low false-positive rate and a high level of accuracy, with a maximum of 99.97% for multilabel attack detection and 99.99% for binary task classification, validating the effectiveness of the proposed framework in the 5G wireless context.

Keywords:

IDS; intrusion detection; cybersecurity; 5G/6G network; swarm optimization; neural network; ML; XGBoost

MSC:

92B20

1. Introduction

In the digital era, wireless communication has the potential to revolutionize our ability to connect, communicate, and engage by encouraging cooperation, creativity, and strategic thinking [1,2]. Wireless communication technology has advanced significantly from 5G to 6G. With the advancement of 6G research and 5G standardization efforts, stakeholders face a dynamic and rapidly changing world, and they must navigate the hurdles while taking advantage of the potential. With the advent of 5G and the industrial Internet of Things (IIoT), a large volume of data is generated by smart devices. Some of these data are encrypted hostile traffic, which makes it challenging to identify harmful traffic [2,3,4]. The classification of malicious traffic using intrusion detection systems (IDSs) is one of the most effective methods for the ever-evolving landscape of cybersecurity [5,6]. The intrusion detection system (IDS) is a security tool for monitoring network traffic and identifying threats or unusual activity [5]. Maintaining security involves specific methods for detecting vulnerabilities, preventing DoS attacks, and acquiring phishing credentials. The continuous growth of cyber-threats and the sophistication of malicious actors have necessitated the development of intelligent and adaptive defense mechanisms [3,6].

A machine learning-based prediction algorithm is used in the domain of intrusion detection. The emergence of ML has initiated a new era, enabling IDSs to learn from data patterns and adapt dynamically to evolving attack vectors. Several techniques have been developed for building machine learning-based intrusion detection systems. Healthcare systems, as critical infrastructures, demand a tailored approach to intrusion detection. Akshay Kumaar et al. [7] propose a hybrid Framework for Intrusion Detection in Healthcare Systems using Deep Learning. Recognizing the unique challenges and sensitivities of healthcare data, this framework employs deep learning techniques to enhance intrusion detection, ensuring the confidentiality and integrity of patient information. Integrating ML into healthcare-oriented IDSs exemplifies the adaptability and scalability needed to safeguard diverse sectors. The study by Vu et al. [8] presents a model of deep generative learning tailored explicitly for cloud-based IDSs. The research emphasizes the importance of accommodating the unique features of cloud environments where the nature of data and network behaviors is inherently unpredictable. Through the analysis of these studies, a foundation is established for understanding the paradigm shift in intrusion detection, moving from static rule-based approaches to dynamic, learning-based ones. WSNs present distinct challenges, including high-dimensional traffic data, various devices, energy limitations, and limited processing power. Despite the advantages of traditional static intrusion detection techniques, they are not suitable for handling these unique features, resulting in suboptimal performance and increased device vulnerability.

A large number of swarm intelligence-inspired (SI) optimization algorithms are used for advanced predictive monitoring security solution applications. Several SI metaheuristics can be used for either feature selection or automated model customization. Exploiting such algorithms makes it possible to enrich the data landscape for intrusion prevention and develop real-time monitoring strategies and proactive intervention methods, addressing potential threats before they become active. For feature selection in WSNs, Karthikeyan et al. [9] use the Firefly Algorithm to overcome high-dimensional data and resource constraints. In addition, the deployed methodology incorporates a Support Vector Machine (SVM) classifier, which leverages the machine learning model’s discriminative capabilities. Using the Grey Wolf Optimizer (GWO) for parameter adjustment, the SVM is efficiently optimized for precise intrusion detection. The Self-Improved Sea Lion Optimization (SLO) algorithm was presented in [10] to optimize cluster head selection in Internet of Things (IoT) systems based on Wireless Sensor Networks (WSNs). A dual-layer security framework was proposed in the study to improve the resilience of IoT-based WSNs. This framework includes a method for detecting anomalous intrusions and an algorithm to prevent them. The Cat Swarm Optimization (CSO) technique has proven highly effective in improving the detection accuracy and robustness of intrusion detection systems (IDSs). The CSO draws inspiration from the collaborative hunting behavior of cats. For example, Idris et al. [11] harnessed CSO to optimize Support Vector Machine (SVM) parameters in an intrusion detection system, demonstrating enhanced performance and showcasing CSO’s adaptability to classical machine learning techniques [12]. Jovanovic et al. employed an improved Sand Cat Swarm Optimizer for feature selection in intrusion detection, highlighting CSO’s role in optimizing feature selection and improving the overall efficiency of IDSs [12]. In the realm of healthcare IoT networks, Chandol and Rao introduced Border Collie Cat Optimization, utilizing CSO to optimize parameters of deep recurrent neural networks for robust intrusion detection [13]. Additionally, Khan et al. [14] applied CSO for hyperparameter optimization of classifiers, utilizing an Artificial Immune Network, which highlights CSO’s versatility in fine-tuning parameters for optimal intrusion detection model performance [14].

Several studies have demonstrated the effectiveness of the CSO algorithm in improving various aspects of intrusion detection, including feature selection and the performance of classical machine learning algorithms. In alignment with Wolpert’s No Free Lunch (NFL) theorem [15], which states that no single optimization strategy performs best across all problem domains, this study strategically employs the CSO algorithm for hyperparameter tuning in Wireless Sensor Network (WSN) intrusion detection models. Its adoption not only leverages CSO’s optimization potential but also addresses a notable gap in the literature given its limited prior application in the context of the dynamic and evolving nature of wireless networks within the WSN security landscape.

Modern 5G and developing 6G networks deliver unprecedented levels of performance and flexibility, enabled by technologies such as software-defined networking (SDN), network slicing, and multiaccess edge computing (MEC). However, these advancements also introduce new security challenges and significantly broaden the attack surface [16,17]. For instance, while network slicing supports multitenant virtualization, it increases the risk of cross-slice attacks and breaches in slice isolation. MEC shifts data processing closer to the network edge, making it more vulnerable to localized attacks and necessitating real-time, low-latency intrusion detection. Furthermore, the dynamic and distributed architecture of 5G networks complicates the monitoring of diverse traffic patterns, encrypted communications, and zero-day threats [17,18]. The security of such WSN systems must be ensured through rigid measures [5,7]. However, passive protection techniques alone are insufficient to guarantee the security of WSNs. Additionally, the significant computational overhead of cryptographic techniques can complicate real-time intrusion detection, particularly in WSNs with limited energy and processing power [19,20]. Data-driven intrusion detection systems (IDSs) offer proactive threat detection, even without conventional defenses [8,21]. As the volume of data transmitted across the network increases, performing real-time analysis for IDSs becomes increasingly challenging. Therefore, it is essential to consider how quickly and efficiently a WSN can process data. This work proposes an IDS system that addresses these challenges by employing a deep learning-based approach to monitor and track network traffic in 5G/6G communication with high accuracy. The model aims for simplicity without compromising robustness, anticipating future challenges related to generalization and bias. The distinctive contributions and novel aspects of this study are detailed below:

Adaptive machine learning-based IDS for 5G/6G communication: We develop and implement a hybrid framework integrating two optimized, practical, efficient, and lightweight machine learning-based models for detecting malicious traffic in the complex, high-dimensional, and dynamic landscape of 5G/6G communication networks. The first approach uses a novel, lightweight 2D CNN architecture with 11 layers specifically designed to handle high-dimensional input data efficiently. The architecture includes convolutional, pooling, and fully connected layers optimized for real-time traffic analysis. Additionally, an optimized attention-based XGBoost classifier is developed, utilizing regularized boosting and incremental training to handle evolving threat patterns. The XGBoost classifier’s parallel processing capability, ensembled with the attention process, significantly improves classification accuracy and efficiency by focusing on the most relevant features, making it suitable for high-dimensional and complex traffic patterns. To identify both single-label and multilabel malicious traffic patterns, these architectures were developed from scratch and trained on an extensive dataset of 5G network traffic.
Metaheuristic-based parameter tuning algorithm: We introduce and apply the Cat Swarm Optimization (CSO) Algorithm for the optimization of the baseline 2DCNN and XGBoost models. The strategy involves fine-tuning the convolutional neural network (CNN) and the XGBoost algorithm, which have been developed for binary and multiclass classifications, thereby improving the model’s ability to adapt to network conditions dynamically. The adaptive selection and fine-tuning of traffic patterns and hyperparameters enable the model to maintain high detection accuracy and adaptability in the dynamic 5G/6G environment. In addition to improving model performance, the CSO algorithm optimization process significantly improves model performance, as demonstrated in numerous tests.
Comprehensive development methodology: This encompasses data preprocessing, baseline model development, and adaptive optimization, ensuring a holistic approach to attack detection. Additionally, the inclusion of a diverse dataset covering nine types of attacks enhances the model’s applicability and real-world relevance.
Rigorous performance evaluation: The developed framework is evaluated using established performance metrics, including accuracy, precision, recall, Kappa statistics, and F1-score. Performance evaluations demonstrate high accuracy, low false-favorable rates, and robustness against various types of attacks. This ensures a thorough understanding of the model’s efficacy in detecting network traffic attacks.

2. Related Work

Wireless sensor networks and the larger Internet environment, which are key elements of the Internet of Things, are vulnerable to security flaws. In the dynamic landscape of cybersecurity, the role of intrusion detection systems (IDSs) has undergone a profound transformation to accommodate the sophisticated strategies employed by cyber-threats. Traditional rule-based IDSs, with inherent limitations in adapting to contemporary threats, have given way to innovative machine learning (ML) methodologies, which have resulted in a new era of dynamic defense mechanisms. In the context of 5G technology development, the complex structure and dynamic nature of network data pose significant obstacles to effective intrusion detection [3,4]. Traditional security measures such as firewalls and signature-based IDSs face challenges due to the increased volume and velocity of data, as well as the wide variety of devices and communication patterns [5,6]. These systems often rely on reestablished signatures or anomaly detection methods, which may not be sufficient to address the constantly evolving threats associated with 5G, such as sophisticated cyberattacks, zero-day, and advanced persistent threats (APTs). Additionally, high-dimensional and imbalanced data can further challenge standard IDSs, resulting in reduced accuracy and an increase in false-positive threats [4,6].

Several recent studies on intrusion detection in 5G networks are summarized in the following subsections.

2.1. Machine Learning-Based IDSs

Traditional signature-based systems face limitations in adapting to dynamic attack vectors, which requires integrating advanced machine learning (ML) techniques to strengthen the resilience of intrusion detection mechanisms. The study by Kavitha et al. [22] presents an intelligent intrusion detection system that utilizes an enhanced arithmetic-based algorithm coupled with a deep learning (DL) model. This combination illustrates the interdisciplinary nature of contemporary intrusion detection approaches, as it improves the efficiency of feature selection through arithmetic optimization and leverages the expressive power of deep learning models. To enhance intrusion detection systems (IDSs), Altamimi et al. [23] investigated the use of an Extreme Learning Machine (ELM) in conjunction with multiple machine learning algorithms, including K-Nearest Neighbors (KNN), decision trees, and Random Forests. The accuracy of the NSL-KDD dataset reported by this model is 94%, and that of the Distilled-Kitsune dataset is 99.5%.

An innovative architecture was proposed for the real-time detection of anomalies or unauthorized access in the context of the Internet of Things in [24]. Based on anomalies or strange patterns, this architecture uses machine learning techniques to detect intrusions. As part of the article, it is also possible to identify two categories of routing assaults: collectors and selective. The proposed combined real-time technique yielded a false-positive rate of 5.92% and an actual-positive rate of 76.19%. A model presented in [25] based on convolutional neural networks (CNNs) and Extreme Gradient Boosting (XGBoost) for feature extraction was proposed, which was then combined with long short-term memory networks (LSTM) for classification with a maximum accuracy of 94.8%.

Nazir et al. [26] introduced a novel hybrid feature fusion-based method for network intrusion detection. By refining the feature set, the research contributes to the efficiency of ML models in identifying relevant patterns indicative of cyber-threats. The emphasis on feature selection reflects a nuanced understanding of the importance of streamlined and meaningful data for enhancing intrusion detection capabilities. Azam et al. [21] contributed to this narrative through a comparative analysis of IDSs using decision tree frameworks for an in-depth examination of ML-based models. The study not only sheds light on the efficacy of these systems but also provides valuable insights into their decision-making processes. Awajan [27] introduced a novel deep learning-based IDS specifically designed for Internet of Things (IoT) networks. The complex nature of IoT environments requires specialized intrusion detection mechanisms. This work showcases the adaptability of ML in addressing these unique challenges. Azar et al. [28] extended this exploration to satellite networks, demonstrating the versatility of deep learning techniques in safeguarding diverse network architectures. An energy-efficient clustering strategy based on node characteristics was developed in [29] to optimize network performance. Furthermore, an intrusion detection system that utilizes deep learning was designed to reduce the risks associated with the Internet of Things. This innovative approach significantly improved detection accuracy and response effectiveness in IoT security scenarios by using sophisticated neural networks to identify and block fraudulent traffic targeting IoT devices.

2.2. Optimization-Based IDSs

The significance of optimization algorithms extends beyond traditional machine learning (ML) approaches. Benmessahel et al. [30] introduced a new evolutionary neural network based on intrusion detection systems using Locust Swarm Optimization. This novel approach illustrates the adaptability of bioinspired algorithms in enhancing the learning capabilities of neural networks for intrusion detection. Integrating metaheuristic algorithms enhances the adaptability of the intrusion detection system, allowing it to identify complex patterns indicative of cyber-threats. The advancements in intrusion detection extend beyond algorithmic hybridization to the optimization of ensemble systems.

An optimal CNN-LSTM hybrid model for big data intrusion detection was presented by Madhuridev et al. [31]. The optimization of CNNs can effectively extract spatial features, such as patterns and structures, from data. In contrast, LSTM networks specialize in modeling temporal dependency between sequential patterns. Additionally, an optimization-assisted training method was proposed to maximize the ideal weights of CNNs, thereby enhancing their detection capabilities. The proposed model achieved an accuracy of 95.05%.

Gupta et al. [32] presented a hybrid optimization and DL-based intrusion detection system (IDS). The research emphasizes the significance of a unified approach, where optimization algorithms refine the parameters of deep learning models, enhancing the overall efficacy of intrusion detection. Alzubi et al. [33] contributed to this domain by proposing an intrusion detection system based on hybridizing a modified binary Grey Wolf Optimization and Particle Swarm Optimization. This innovative approach leverages the strengths of both algorithms, enhancing the overall detection accuracy by intelligently combining their complementary features. Optimization algorithms highlight their crucial role in refining intrusion detection models to address specific challenges in cyber-threat landscapes. The study by Stiawan et al. [34] proposed an approach for optimizing ensemble intrusion detection systems (IDSs), highlighting the collective strength gained by combining multiple machine learning (ML) models. The ensemble approach enhances the system’s resilience against diverse attack strategies, demonstrating the importance of collaborative and complementary intrusion detection mechanisms. In the realm of network intrusion detection, Injadat et al. [35] proposed a multistage optimized machine learning framework. This framework leverages optimization techniques at multiple stages, emphasizing a systematic and comprehensive approach to intrusion detection. Multistage optimization contributes to adaptability and accuracy, which are crucial aspects of addressing the dynamic nature of cyber threats. Furthermore, Kasongo [36] introduced an advanced IDS for the Industrial Internet of Things (IIoT) based on genetic and tree-based algorithms. Tailored for IIoT environments, the research highlights the importance of context-specific intrusion detection mechanisms. By incorporating advanced optimization algorithms, the proposed system addresses the unique challenges IIoT architectures pose. In [32], a hybrid optimization and deep learning-based intrusion detection system was introduced. Their work combines optimization algorithms with deep learning models, presenting a holistic approach to enhancing the capabilities of IDSs. By combining the merits of optimization techniques and deep learning, the research outlines a framework that adapts to evolving cyber-threats with heightened precision. In the context of Grey Wolf Optimization, Alzaqebah et al. [37] proposed a modified algorithm to enhance the efficiency of intrusion detection systems. Their novel approach addresses the need for adaptive and intelligent solutions, demonstrating the potential of advanced algorithms in improving network security. The modified Grey Wolf Optimization algorithm contributes to the growing collection of intelligent intrusion detection methodologies. Nilesh Kunhare et al. [38] explored the interaction between hybrid classifiers and metaheuristic algorithms for intrusion detection. Their work focuses on feature selection using genetic algorithms, showcasing the importance of tailoring ML-based intrusion detection to achieve optimal performance. Moreover, Kunhare et al. [38] provided insight into the sophisticated domain of hybrid classifiers and metaheuristic algorithms for optimizing feature selection. Their research emphasizes the significant role of genetic algorithms in enhancing the efficiency of intrusion detection systems. This approach, based on the effective integration of hybrid classifiers and metaheuristic algorithms, demonstrates a comprehensive strategy for optimizing feature selection processes. Furthermore, Eesa et al. [39] supported a feature selection approach based on the jellyfish optimization algorithm. Their research showcases the importance of feature selection in optimizing the performance of intrusion detection systems. Inspired by natural selection processes, the jellyfish optimization algorithm is harnessed to refine the set of features, contributing to a more efficient and accurate IDS. Wang [40] investigated the landscape of adversarial scenarios in intrusion detection. The study focuses on deep learning-based approaches that are resilient against adversarial attacks, demonstrating the importance of robust systems in real-world cyber-environments. This research examines the dynamic interaction between intrusion detection and adversarial machine learning, enhancing our understanding of system vulnerabilities. Expanding on these innovative approaches, Gupta et al. [32] presented a comprehensive study on utilizing hybrid optimization and deep learning in intrusion detection systems. Their work delves into the in-depth details of how combining optimization algorithms with deep learning models significantly enhances the accuracy and efficiency of IDSs. Optimization techniques and deep learning improve the adaptability and responsiveness needed to mitigate evolving cyber-threats. Additionally, Afzaliseresht et al. [41] brought a human-centric perspective to intrusion detection by focusing on the transition from logs to stories. Their approach employs data mining techniques for cyber-threat intelligence, emphasizing the importance of understanding and contextualizing intrusion events. This human-centered approach adds a layer of interpretability to IDSs, enabling a more nuanced and effective response to potential threats.

3. Background Challenges and Problem Definition

Table 1 and Table 2 present a review of IDS research, covering both optimized and non-optimized techniques. These tables summarize the details, including the recommended feature selection and model training algorithms and their limitations. The proposed model also aims to address the deficiencies outlined in Table 3, making this study unique compared to previous research on IDS for WSN. New architectural elements introduced by modern 5G and developing 6G networks, such as network slicing, multiaccess edge computing (MEC), and ultra-reliable low-latency communication (URLLC), dramatically alter security issues [16,17,18]. The wide range of attack surfaces that these technologies produce necessitates flexible and resource-efficient IDS frameworks. This study presents a novel hybrid pipeline for IDSs, combining machine learning models, attention mechanisms, and optimization algorithms to enhance performance, scalability, and interpretability. A key innovation of this work is incorporating attention mechanisms into the XGBoost model. We hypothesize that integrating attention into machine learning models can significantly improve IDS performance by capturing temporal dependencies in attack behaviors, enhancing generalization across diverse attack types, and dynamically prioritizing critical features within high-dimensional network traffic. This approach is efficient when dealing with large, imbalanced datasets, which often pose challenges for traditional IDS solutions. XGBoost’s parallel and incremental learning capabilities enable it to handle streaming data efficiently and scale to real-time wireless sensor network (WSN) environments. As a baseline, we employ a non-attention-based 2D-CNN architecture, which retains the CNN’s strength in extracting general features and spatial patterns from traffic data while offering a reliable and computationally efficient solution. Additionally, the optimization algorithm enhances the adaptability, accuracy, and resilience of intrusion detection systems. This study systematically evaluates the impact of attention mechanisms on key metrics such as accuracy, reliability, and explainability by comparing bioinspired-enhanced models—such as the optimized 2D-CNN and attention-integrated XGBoost. By leveraging the strengths of both model types, this dual-model approach delivers a scalable and effective IDS solution tailored for dynamic and resource-constrained WSN scenarios.

Consider a 5G wireless network comprising various nodes G, where each node maintains dynamic and evolving time series data for network traffic flow sequences, denoted by f, ranging from 1 to F. The value of the c-th feature of node i at time T, represented as

x_{c, i, t}

in

R

, and

x_{i, t}

in

R^{F}

signifies all features of node i at time T.

X_{t} = {(x_{1, t}, x_{2, t}, \dots, x_{N, t})}^{T}) i n (R^{N \times F}

(1)

represents the values of all features for all nodes at time T.

X = {(X_{1}, X_{2}, \dots, X_{τ})}^{T}

in

R^{N \times F \times τ}

denotes all feature values for all nodes over

τ

time slices. Additionally,

y_{i, T} = x_{f, i, T}

in

R

represents the future traffic flow of node i at time T.

Given X, a collection of historical measurements across all nodes in the 5G wireless network over

τ

time slices, the objective is to detect flow attacks for the IDS system

Y = {(y_{1}, y_{2}, \dots, y_{N})}^{T}

in

R^{N \times T_{p}}

for all nodes across the network over the next

T_{p}

time slices. Here,

y_{i} = {(y_{i, τ + 1}, y_{i, τ + 2}, \dots, y_{i, τ + T_{p}})}^{T}

in

R^{T_{p}}

denotes the future traffic flow of node i from

τ + 1

to

τ + T_{p}

.

The devised models rely upon attention mechanisms and optimized architectures to focus on the most informative aspects of the information. As a result, fewer annotated examples are required. This suggested system achieves success by combining customized convolutional neural networks (CNNs) that can learn discriminatory features across multiple categories with Extreme Gradient Boosting (XGBoost), an efficient model for managing numerous classes.

The model aims to classify network traffic data into one of C classes, where C represents the number of potential types of network intrusions. The model is trained to minimize the categorical cross-entropy loss function, defined as

L = - \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i, c} log ({\hat{y}}_{i, c})

(2)

where N is the number of samples,

y_{i, c}

is a binary indicator (0 or 1) if class label c is the correct classification for sample i, and

{\hat{y}}_{i, c}

is the predicted probability that sample i belongs to class c.

To enhance the detection accuracy and optimize the hyperparameters of the IDS, we employ the Cat Swarm Optimization (CSO) algorithm. CSO is a powerful metaheuristic optimization technique inspired by the natural behavior of cats [51]. It effectively explores the hyperparameter space, leading to improved model performance. The CSO algorithm tunes hyperparameters, such as the learning rate, batch size, number of layers, and filter sizes, to find the optimal configuration for model training and results analysis. In the evolving 5G learning task, the parameter space

Θ

is defined as

Θ = {Θ_{1}, Θ_{2}, \dots, Θ_{E p}}

(3)

within

R^{| Θ_{1} | \times | Θ_{2} | \times \dots \times | Θ_{E p} |},

(4)

where

E p

denotes the number of model parameters in the CNN parameter space. Each

Θ_{i}

(for

i = 1, 2, \dots, E p

) represents a potential solution within the parameter set. Various offspring with different traits are produced as a result of parental mutations, and the optimal individual is assessed by testing its adaptability to the surrounding environment, denoted as

Θ_{best}

.

The following is the definition of the fitness function used by the CSO algorithm to tune its hyperparameters:

F (Θ) = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i, c} log ({\hat{y}}_{i, c}) + λ {∥ Θ ∥}_{2}

(5)

where

Θ

represents the set of hyperparameters,

{∥ Θ ∥}_{2}

is the regularization term of the L2 norm, and

λ

is the regularization parameter to prevent overfitting.

Using this optimization-based multiclass classification approach, our goal is to improve detection accuracy and robustness against various types of network intrusions in evolving 5G and 6G wireless networks.

4. Proposed Intrusion Detection System Architecture

In this section, we outline the steps we took in our research. In this paper, our objective is to develop and optimize an IDS framework for 5G/6G communication networks in an accurate and timely manner. Algorithm 1 represents the steps to accomplish this; we gather data, prepare them for analysis, build models, and carefully adjust the model settings using advanced optimization techniques as follows:

Algorithm 1: IoT intrusion detection algorithm.

Data preprocessing: The dataset, describing nine types of attacks, undergoes thorough preprocessing. This involves handling missing and duplicate values, scaling the data, and encoding categorical variables.
Model training: Two baseline machine learning-based approaches are devised to extract spatial and temporal features indicative of attacks from traffic data. These baseline models serve as a foundation for subsequent optimization.
Optimization with the CSO algorithm: The baseline models are further enhanced through the Cat Swarm Optimization (CSO) algorithm. This approach facilitates adaptability to changing network conditions and ensures improved performance.
Traffic analysis and performance evaluation: The optimized models then analyze traffic for either binary or multiclass labels. Additionally, a rigorous evaluation using metrics such as accuracy, precision, recall, and F1 Score is applied to the obtained results. This step ensures the model’s effectiveness in detecting traffic attacks while maintaining simplicity and robustness.

The comprehensive methodology illustrated in Figure 1 ensures the model’s adaptability, scalability, and accuracy across diverse scenarios, establishing a robust foundation for addressing the evolving landscape of cyber-threats in evolving networks.

4.1. Data Collection and Preprocessing

Various new connections, features, and services have been introduced in mobile communication networks with the advent of 5G technology. This study draws upon the valuable “5G-NIDD”, a comprehensive network intrusion detection dataset generated for the “5G wireless network”. This dataset, accessible through IEEE (access: https://doi.org/10.21227/xtep-hv36), has more than 1,215,890 samples, as given in Table 4. To our knowledge, this is the first publicly available 5G IDS data collection [52]. “5G-NIDD” emerges as a fully labeled dataset, a product of a functional 5G test network at the University of Oulu, Finland. Two base stations, each housing an attacker node and benign 5G users, orchestrate scenarios mimicking real-world threats. The attacker nodes initiate DoS attacks and port scans on a server deployed in the 5GTN MEC (accessed 6 August 2014, https://5gtnf.fi/). DoS attacks are shown in Table 5. Figure 2 provides an overview of recorded events, showing that the majority, with a count of 477,737, refers to “Benign Traffic”, which is the regular, expected, and harmless traffic within a network. Subsequently, the UDPFlood attack type emerges with a count of 457,340, followed by HTTPFlood with 140,812 occurrences.

Moving on to identify the main tools used in the recorded activities, Figure 3 highlights the high proportion of typical cases, which account for 477,737 instances. However, the “Hping3” tool stands out as the most frequently used in attacks, registering 468,216 occurrences, followed by “Goldeneye” with 140,812 instances. An examination of missing values, depicted in Figure 4, highlights specific columns, such as “dTos”, “dTtl”, and “dHops”, with percentages amounting to only 77% of the total column values. To address this, we decided to discard these columns, ensuring a cleaner dataset.

Eliminating outliers was a crucial step in refining the dataset. Leveraging the interquartile range (IQR) method, we computed statistical quartiles to identify potential outliers using Algorithm 2. By calculating the First Quartile (Q1) and Third Quartile (Q3), we determine the interquartile range (IQR) as the range between Q3 and Q1. Defining lower and upper bounds as 1.5 times the interquartile range (IQR), we identified and filtered out potential outliers, ensuring a more robust dataset.

Algorithm 2: Interquartile range (IQR) procedure:

# Define a function for IQR method

def iqr_method(5G-NIDD dataset):

begin

# Compute the first quartile (Q1)

Q1 = np.percentile(data, 25)

# Compute the third quartile (Q3)

Q3 = np.percentile(data, 75)

# Compute the Interquartile Range (IQR)

IQR = Q3 −~Q1

# Determine the outlier boundaries

lower_bound = Q1 − 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR

# Identify outliers

outliers = [× for × in data if × < lower_bound or × > upper_bound]

return outliers, lower_bound, upper_bound

outliers, lower_bound, upper_bound = iqr_method(data)

print(“Outliers:”, outliers)

print(“Lower_Bound:”, lower_bound)

print(“Upper_Bound:”, upper_bound)

end

To normalize the data and enhance their uniformity, we employed the MinMaxScaler, adjusting values based on the following equation:

[X_{scaled} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}]

(6)

Furthermore, we categorized the dataset into three distinct sections: training, validation, and testing, with proportions of 70%, 10%, and 20%, respectively. This strategic partitioning facilitates the practical training, validation, and evaluation of models across diverse subsets of the dataset.

4.2. Model #1: Convolutional-Based Neural Network for IDS

Detecting network attacks in wireless communication scenarios poses a complex challenge. We introduce a foundational deep learning model named “2D-CNN”, which adopts a 2D convolutional neural network architecture to establish initial performance metrics. The model architecture (Shown in Figure 5) consists of the following layers:

Input layer: The model is initiated with an “inputs_cnn” layer designed to handle data with an input shape of (30, 1, 1), which matches the dimensions of the input data.

$X_{i n p u t} \in R^{30 \times 1 \times 1}$

(7)
Convolutional layers: CNN layers compute output feature maps by convolution where the kernel K is moved over the input $X_{i n p u t}$ . The output feature map $S (i, j)$ is computed as follows:

$S (i, j) = \sum_{m = 0}^{h - 1} \sum_{n = 0}^{w - 1} I (i + m, j + n) \cdot K (m, n) .$

(8)

Three well-designed convolutional layers (Conv2D) were used, enhanced by batch normalization for stable training. The first layer applies 64 filters with a (6, 1) kernel, while the subsequent layers use 64 filters, each with a (3, 1) kernel. Max-pooling layers are thoughtfully inserted after each convolutional layer to downsample feature maps effectively.

$X_{c o n v} = ReLU (X_{i n p u t} * W_{c o n v} + b_{c o n v})$

(9)

The region of the input that influences neurons in deeper layers is referred to as the receptive field, denoted by its size $r_{l}$ at layer l. The size is determined recursively by

$r_{l} = r_{l - 1} + (k_{l} - 1) \cdot \prod_{i = 1}^{l - 1} s_{i},$

where $k_{l}$ is the kernel size at layer l and $s_{i}$ is the stride at layer i. The 3D tensor is produced when a convolutional layer employs many kernels to identify various features, resulting in a stack of feature maps. Each filter $K^{(k)}$ produces a corresponding feature map $S^{(k)}$ , resulting in a 3D output tensor of shape $H^{'} \times W^{'} \times N$ , where N is the number of filters. The output spatial dimensions are computed as

$H^{'} = ⌊\frac{H - h + 2 p}{s}⌋ + 1, W^{'} = ⌊\frac{W - w + 2 p}{s}⌋ + 1$

The input dimensions are H and W, the filter dimensions are h and w, the padding size is p, and the stride width is s. Convolutional layers are equivariant to translation, meaning that if the input is shifted, the output will move accordingly (i.e., $f (T (x)) = T (f (x))$ ). To achieve translation invariance, where the output remains relatively unchanged under small shifts in the input (i.e., $f (T (x)) \approx f (x)$ ), pooling layers such as max-pooling are often employed.
Flattening and fully connected layers:
The ReLU activation function is applied in two fully connected layers, with 64 and 32 units, respectively.

$X_{f c} = ReLU (W_{f c} \cdot X_{f l a t t e n} + b_{f c})$

(10)

After the convolutional and pooling operations, the output tensor has the following dimensions:

$X_{c o n v} \in R^{D \times H_{c o n v} \times W_{c o n v}}$

(11)

where:
-
D is the number of feature maps.
-
$H_{c o n v}$ is the height of each feature map.
-
$W_{c o n v}$ is the width of each feature map.
The 2D feature maps are then converted into 1D vectors by a flattening operation following the convolutional layers.

$X_{f l a t t e n} \in R^{D \times H_{c o n v} \times W_{c o n v}}$

(12)

The purpose of this transformation is to convert all the elements in the feature map into a single vector using a fully connected layer:

$X_{f c} = ReLU (W_{f c} \cdot X_{f l a t t e n} + b_{f c})$

(13)

The IDS’s performance is significantly influenced by the features chosen and designed. A thoughtful approach to feature development and feature selection is crucial to achieving the best results. To obtain high-quality information from X-rays and CT scans, a pretrained CNN model was used with a proposed 16-layer CNN architecture. Consequently, the eight convolution layers that comprise the basic CNN model are utilized to extract features. Following data preprocessing, the next crucial step is to train the model using state-of-the-art techniques. The objective of this study was to present a “single yet simple” model for multilabel attack types. As shown in Figure 5, the proposed CNN model is broken down into layers. The model was trained using just 16 batches of training and 100 epochs in total. The current technique was tested with several inputs, yielding encouraging results. The 2D-CNN architecture was trained on a total of 36,553 params (142.79 KB), 36,169 trainable parameters (141.29 KB), and 384 non-trainable parameters (1.50 KB).
The expected architecture produced the expected outcomes for the provided collection of input features.

$C = \frac{\sum (v_{i} - \bar{v}) (τ_{i} - \bar{τ})}{\sqrt{\sum {(v_{i} - \bar{v})}^{2} \sum {(τ_{i} - \bar{τ})}^{2}}}$

(14)

A correlation analysis, denoted by C, is used to measure the linear relationship between two variables, as shown in Equation (14). By applying an identity matrix-based calculation, Equation (15) improves the evaluation of correlations and extracts their characteristics. The importance-weighted correlation coefficient $C_{I m p}$ is defined as

$C_{I m p} = \frac{\sum (v_{i} - \bar{v}) (τ_{i} - \bar{τ})}{\sqrt{\sum {(v_{i} - \bar{v})}^{2} \sum {(τ_{i} - \bar{τ})}^{2}}} \cdot \frac{I_{d m} \cdot d_{m a x} - W C D}{d_{m a x}}$

(15)

$W C D = \frac{\sum w_{i}^{2} \cdot v_{i} \cdot τ_{i}}{\sqrt{\sum {(w_{i} \cdot v_{i})}^{2} \cdot \sqrt{\sum {(w_{i} \cdot τ_{i})}^{2}}}}$

(16)

A random integer between 0 and 1 is designated by $w_{i}$ in this case.
Output layer: To facilitate the multiclass classification task, a softmax activation function is used for the “main_output” layer as follows:

$X_{m a i n_o u t p u t} = Softmax (W_{o u t} \cdot X_{f c} + b_{o u t})$

(17)

$Softmax (z_{i}) = \frac{e^{z_{i}}}{\sum_{j = 1}^{C_{c l a s s e s}} e^{z_{j}}}$

(18)

This layer completes the model architecture by aligning nine units to the dataset’s classes.

The “2D-CNN” model is compiled using the Adam optimizer, a widely used optimization algorithm in deep learning. The loss function for multiclass classification tasks is categorical cross-entropy as follows:

Loss = - \sum_{i = 1}^{C_{c l a s s e s}} y_{i} log (p_{i})

(19)

where:

-: $y_{i}$ is the ground truth label for class i.
-: $p_{i}$ is the predicted probability for class i.

For binary class classification:

Loss = - [y log (p) + (1 - y) log (1 - p)]

(20)

where y is the binary ground-truth label (0 or 1) and p is the predicted probability of the positive class with an activation function defined as

σ (z) = \frac{1}{1 + e^{- z}}

(21)

4.3. Model #2: Attention-Based Neural Network for IDS

The second model utilized in this study involves building and training a machine learning system to detect various types of attacks using an ensemble approach that combines XGBoost with an attention mechanism integrated into a neural network. This workflow combines the strengths of XGBoost and attention mechanisms in neural networks to create a robust model for detecting network attacks. By using incremental training and feature transformation, the approach efficiently handles large datasets and leverages advanced techniques to improve classification accuracy.

In the context of intrusion detection systems (IDSs) for wireless networks, an ensemble learning approach can be employed by sequentially generating detection models. Initially, all features related to network traffic are assigned weights and passed through the first detection model, which attempts to identify potential threats (as shown in Figure 6). The features that this model misclassifies are given higher weights, allowing the subsequent model to focus on these complex cases. Combining the results of multiple detection models enables the IDS to achieve more accurate and reliable threat detection.

Let

F = {f_{1}, f_{2}, f_{3}, \dots, f_{k}}

represent a set of base detection models [53]:

The predicted outcome

{\hat{a}}_{i}

for an attack scenario is

{\hat{a}}_{i} = \sum_{k = 1}^{n} f_{j} (t_{i})

Here, each

f_{j}

is a detection model applied to the network traffic feature

t_{i}

. In an IDS, an incremental learning-based method like XGBoost could be employed, which utilizes a sequence of weaker detection models. Training involves a series of Q models.

{\hat{a}}_{i} = \sum_{q = 1}^{Q} f_{q} (t_{i})

where

f_{q}

is the q-th detection model in the ensemble, and

t_{i}

represents the i-th network traffic sample. The detection at the s-th step is:

{\hat{a}}_{i}^{(s)} = \sum_{q = 1}^{s} f_{q} (t_{i})

In XGBoost, the objective function takes the following general form:

{\tilde{L}}^{(t)} = \sum_{i = 1}^{f} [g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})] + Ω (f_{t})

The loss function, which is denoted by

([g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})])

, including logistic loss and mean square error, represents the difference between the actual and predicted values. For the k-th tree,

Ω (f_{k})

is the regularization term that penalizes the tree’s complexity. The regularization term is defined as

Ω (f_{k}) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}

T indicates how many leaves are in the tree, w indicates how much weight is in the j-th leaf, and

γ

and

λ

indicate the strength of regularization, preventing overfitting.

This loss function

{\tilde{L}}^{(t)}

is optimized by calculating the optimal weights associated with the network traffic features

x_{i}

.

In the context of IDSs, determining the optimal weights for leaf nodes within a given decision tree structure is crucial for accurate threat detection. XGBoost introduces the concept of a similarity score to assist in node selection and splitting. The steps for constructing a decision tree using similarity scores in a 5G IDS environment are outlined as follows [53,54]:

Initialize the IDS model with a single decision tree.
Calculate the regularization parameters and residuals using an appropriate loss function tailored for the first tree.
Subsequent trees use the predictions from the previous tree as residual inputs.
The similarity score is computed based on the Hessian (which corresponds to the number of residuals m), the squared gradient (which represents the squared sum of residuals), and the regularization hyperparameter $λ$ .
Nodes with higher similarity scores are identified as having greater homogeneity in their classification outcomes.
Calculate the information gain by subtracting the combined similarities of the left and right child nodes from the similarity score of the root node.
Estimate the residual values for the nodes in the decision tree.
Sum the predicted values, adjust them using the learning rate, and incorporate these adjustments into the residuals to obtain new residuals.
Repeat the process starting from step 3 for each tree in the ensemble.

The attention neural network (shown in Figure 7) consists of an input layer compatible with reshaped features. Secondly, the attention layer focuses on the essential parts of the input. A third flatten layer converts 2D attention output to 1D. A final dense output layer with softmax activation for nine-class classification follows. By using the attention mechanism, models can focus on the most relevant characteristics of input data by calculating dynamic weights. The process begins by using learned weight matrices to derive the query Q, key K, and value V matrices from the input X:

Q = X W^{Q}, K = X W^{K}, V = X W^{V}

The attention scores are calculated by scaling the square root of the key dimension d k by the dot product, which measures the similarity between queries and keys:

score (Q_{i}, K_{j}) = \frac{Q_{i} \cdot K_{j}^{T}}{\sqrt{d_{k}}}

After applying the softmax function to the scaled scores, attention weights

α_{i j}

are calculated.

α_{i j} = \frac{exp (\frac{Q_{i} \cdot K_{j}^{T}}{\sqrt{d_{k}}})}{\sum_{j = 1}^{n} exp (\frac{Q_{i} \cdot K_{j}^{T}}{\sqrt{d_{k}}})}

The weighted sum of the values represents the output of the attention mechanism:

Attention (Q, K, V) = \sum_{j = 1}^{n} α_{i j} V_{j}

Through this procedure, the model manages complex, multifaceted data by selectively focusing on the most pertinent elements.

The proposed intrusion detection framework, as given in Algorithm 3, employs a hybrid approach that combines XGBoost and an attention-based neural network to analyze malicious traffic. The process begins by loading and preprocessing the dataset (Algorithm 3, lines 2–7), where the feature sets

(X_{(train, valid)}), X_{test}

and corresponding labels

(Y_{train}, Y_{valid}), Y_{test}

are split into training, validation, and test sets. Training data

X_{train}, Y_{train}

are broken down into smaller subsets for incremental learning (Algorithm 3, lines 8–10).

An XGBoost model (Algorithm 3, lines 11–14), initialized with a multiclass objective function, is trained on these chunks, with the model’s performance validated against the validation set

X_{valid}, Y_{valid}

. The leaf indices from the trained XGBoost model, which represent the decision paths taken (Algorithm 3, lines 15–17), are extracted as new features

X_{leaves}

.

These features are then reshaped into a three-dimensional format suitable for neural network input (Algorithm 3, lines 18–23). A neural network incorporating an attention mechanism is trained incrementally on these reshaped features, using a dense output layer with softmax activation to classify the samples into nine classes (Algorithm 3, lines 24–26). The network is trained for multiple epochs with a batch size of 32. The final model is optimized using the Adam optimizer and evaluated using the categorical cross-entropy loss function. The overall process continues incrementally by training the combined tree-based and deep learning techniques for robust and accurate DDoS attack detection (Algorithm 3, Line 27–30).

The combined model leverages the strengths of the XGBoost and attention-based neural network models to enhance the detection of traffic attacks. By integrating the predictions from these two models, the final pipeline achieves a more robust and accurate classification.

Algorithm 3: IDS with XGBoost and attention-based neural network.

Algorithm 4: Cat Swarm Optimization-based model for IDS.

4.4. Hybridization with Cat Swarm Optimization (CSO)

Hyperparameter tuning is pivotal in optimizing machine learning models, significantly influencing their performance and generalization capabilities. Hyperparameters such as learning rates, batch sizes, and the depth of the architecture directly impact the neural network’s learning dynamics and, consequently, its overall efficacy [55,56]. Cat Swarm Optimization (CSO) is a nature-inspired optimization algorithm that simulates the collective intelligence observed in animals that form groups. CSO has found widespread applications across various domains due to its ability to address complex optimization problems efficiently [51,55]. This study uses the Cat Swarm Optimization (CSO) algorithm to fine-tune the hyperparameters of 2D-CNN and attention-based XGBoost models. These models are susceptible to parameters such as learning rate, number of attention heads, and dropout rate. CSO offers a robust and adaptive approach for exploring the hyperparameter space, enabling the selection of configurations that are well suited to the challenges posed by diverse cyberattacks in wireless network environments (as illustrated in Table 6) [51,57]. Firstly, flexibility and dynamic exploitation are central to CSO’s effectiveness in IDSs. The algorithm operates in two modes (

{Mode}_{i}^{(t)}

), tracing (exploitation) and seeking (exploration), allowing it to dynamically switch between intensifying the search around promising solutions and exploring new areas of the solution space.

{Mode}_{i}^{(t)} = \{\begin{matrix} Seeking & if r \leq 0.5 \\ Tracing & if r > 0.5 \end{matrix}

This dual-mode design allows the CSO to explore emerging attack patterns and focus on refining promising solutions. Secondly, the CSO incorporates a key feature that enables it to adapt to new network activity and data dissemination changes in real time. Through mechanisms like mixture ratio (MR) and Learning Decay (

α_{t}

), CSO maintains high detection accuracy while reducing false positives.

-: MR evaluates the system’s past prediction accuracy through joining the seeking mode together with the tracing mode as follows:

$M R = \frac{1}{T} \sum_{t = 1}^{T} (\frac{| y_{t} - {\hat{y}}_{t} |}{max (| y_{t} |, ϵ)})$

where $y_{t}$ represents the ground-truth (actual) value at time step t and ${\hat{y}}_{t}$ is the predicted value at time step t.
-: Learning Decay ( $α_{t}$ ) is defined as the average normalized prediction error which gradually reduces the learning rate (r) at time (t) by a $γ$ decay factor.

$α_{t} = α_{0} \cdot γ^{⌊\frac{t}{s}⌋}$

In this way, CSO refines the search space around promising cat locations, thereby promoting convergence and preventing irregular updates or instability.
Lastly, CSO and Particle Swarm Optimization (PSO) have similar computational complexity, as both algorithms update positions and evaluate fitness for all agents in the population [57,58]. Consequently, the computational complexity of CSO per iteration is generally considered to be $O (N \cdot d)$ . Due to its linear scalability, CSO is well suited to respond effectively to imbalanced and evolving data in real-time applications with resource-constrained environments, such as intrusion detection systems in wireless sensor networks.

The details of CSO given in Figure 8 are as follows: Initially, for a number of cats (hyperparameters), an

S R

is arranged based on their position and velocity at the beginning of the process (xth cat, x = 1, 2…SR) as defined in Equation (22).

Z_{x, y} = [(1 \pm SR \times Ψ)] \times Z_{c a}

(22)

In this case,

Ψ

corresponds to a value between [0, 1], while

Z_{c a}

indicates the location.

The value of

C a

can be updated adaptively as indicated by Equation (23). This equation represents the maximum and current iterations, where

w s = 0.6

,

c s = 2.05

,

i t_{m a x}

refers to the maximum number of iterations, and

i t

refers to the current number of iterations.

c a = c s (\frac{(i t_{m a x} - i t)}{2 \times i t_{m a x}}) - w s

(23)

A maximum velocity is assigned to a cat whose velocity is the maximum. According to the CSO, the new position

(Z_{n e w})

is represented as shown in Equation (24). Accordingly, the current location of the cat is indicated by

Z_{n e w} = Z_{old} + V_{j, y}

(24)

For each

C a_x

cat in the search space

ϵ

,

Z_{new}

is updated as follows:

Z_{new} = Z_{i} + α \times (Z_{best} - Z_{i}) \times r a

(25)

Therefore,

Z_{i + 1} = Z_{i} + V_{i, f_{c}}

(26)

Z_{i + 1} = Z_{i} + α \times (Z_{best} - Z_{i}) \times r a

(27)

The next best solution is then defined as

Z_{i + 1} = \frac{1}{2} [2 Z_{i} + V_{i, f_{c}} + r a_{1} c a_{1} \times (Z_{best} - Z_{i})] + α \times r a \times (Z_{best} - Z_{i})

(28)

where

(r a)

is a scale factor that can be updated by a circular map as follows:

r a_{1} = r a_{1} + 0.5 - \frac{1.1}{π} sin (2 π r a_{1}) % (r a_{1})

(29)

The cat will start to search for its prey’s position as defined in Equation (30).

V_{j, y}^{*} = V_{j, y} + r a_{1} c a_{1} \times (Z_{best, y} - Z_{j, y})

(30)

r a_{1}

implies a random integer between 0 and 1,

Z_{b e s t}

indicates the global best,

Z_{j}

indicates the location of the cat,

V_{j}

indicates its velocity, and

c a_{1}

indicates the acceleration coefficient. A group interaction mechanism is incorporated into the CSO algorithm’s tracing mode to enhance its exploration and exploitation abilities. This improved model incorporates cooperative learning by allowing cats to change their velocity in accordance with both the position of an adjacent cat and the global optimal position. The dynamic scaling factor

r (t)

allows the algorithm to adaptively exploit promising regions (

x_{best}

). During the dynamic exploration and exploitation phase, CSO updates the velocity equation to reflect these adaptive changes:

v_{i}^{(t + 1)} = v_{i}^{(t)} + r (t) \cdot (x_{best} - x_{i}^{(t)})

During the dynamic exploration and exploitation phase, the CSO updates the velocity equation to reflect these adaptive changes.

v_{i} (t + 1) = w \cdot v_{i} (t) + c_{1} \cdot r_{1} \cdot (Z_{best} - Z_{i} (t)) + c_{2} \cdot r_{2} \cdot (Z_{j} (t) - Z_{i} (t))

(31)

w indicates inertia weight,

c_{1}

,

c_{2}

indicate acceleration coefficients, and

r_{1}

and

r_{2}

indicate randomly distributed values evenly distributed within the interval [0, 1].

Z_{best}

determines a swarm’s best global position. The group interaction term lowers the likelihood of early convergence by encouraging cats to cooperate in exploring the search space together. Next, each cat’s seeking mode explores the neighborhood of the current solution by modifying the dimensions of

c a_{1}

as follows:

X (i) = X_{j}^{current} + δ \cdot randn

(32)

where

δ

is a misshaping coefficient and

randn ()

is a normally distributed random number that dynamically adapts to both individual and collective experiences. A quantitative measure, defined as a fitness function

F i t (Z_{best})

(i.e., accuracy or error rate), will serve as a guiding principle for assessing a candidate’s quality and performance and will be used to evaluate potential solutions. Let the hyperparameter vector be denoted as

θ = (θ_{1}, θ_{2}, \dots, θ_{d})

Finding the optimal configuration

F i t (Z_{best})

that minimizes the validation loss is the optimization goal:

F i t (Z_{best}) = arg min_{θ} L_{val} (θ)

(33)

Based on the techniques described in Algorithm 4, the optimization algorithm initially takes the lead, generating a set of hyperparameters for evaluation by the fitness function (Algorithm 4, lines 1–4). These hyperparameters encapsulate crucial configurations for the 2D-CNN model, such as learning rates, batch sizes, and other architectural aspects that influence the model’s behavior. The key hyperparameters used in the XGBoost-attention-based model are as follows:

learning_rate controls the rate at which the model updates its weights to minimize detection errors.
max_depth sets the maximum depth of the decision trees, helping to capture complex attack patterns.
Gamma determines the minimum gain needed to perform additional splitting of nodes in the tree, helping to avoid overfitting to noisy attack data.
min_child_weight specifies the minimum number of data instances required in a node before further splitting, ensuring robust detection of low-frequency attacks.
colsample_by_tree indicates the fraction of features to be used when constructing each tree, promoting diversity in the model’s detection strategies.
subsample defines the proportion of the dataset used for training each tree, aiding in generalizing the model to unseen attacks.
alpha represents the L1 regularization term to reduce model complexity and prevent overfitting to specific attack scenarios.
eval_metric specifies the metric used to evaluate the model’s performance on validation data, ensuring accurate detection of attacks during training.

The fitness function initiates the training of each model based on the hyperparameters generated (Algorithm 4, lines 1–2). This training unfolds on the designated training dataset

(X_{t r a i n}, Y_{t r a i n})

. Note: the

M o d e l C h e c k p o i n t C a l l b a c k

determines the CNN model’s best performance during training.

A series of iterative and continuous steps are implemented, guided by the optimization algorithm (Algorithm 4, lines 3–7). The algorithm leverages the feedback from the fitness function to adapt and refine its search for hyperparameters. This iterative loop is maintained until convergence is achieved, signifying the discovery of a set of hyperparameters that maximize the model validation accuracy.

The fitness function systematically communicates the evaluation outcomes back to the optimization algorithm. This communication is encapsulated in a tuple comprising the model’s accuracy and loss. As a result of this information, the optimization algorithm gains insights into the performance of the model under the given set of hyperparameters.

A critical phase ensues as the fitness function evaluates the trained model on the validation dataset

(X_{valid}, Y_{valid})

. The model’s performance is assessed in terms of validation accuracy and loss, with a focus on identifying the optimal configuration that yields the highest accuracy (Algorithm 4, lines 8–10).

4.5. Performance Metrics

To evaluate the performance of an intrusion detection system (IDS) tailored for a 5G wireless network, several key metrics must be considered. IDS detection performance can be significantly improved by selecting features that are connected considerably with intrusion patterns. Therefore, the metrics in Table 7 are used to assess and improve the system.

In the above table, the following are represented:

$T P$ = true positives (correctly identified attacks).
$T N$ = true negatives (correctly identified normal traffic).
$F P$ = false positives (normal traffic incorrectly identified as attacks).
$F N$ = false negatives (attacks incorrectly identified as normal traffic).
Observed agreement ( $P_{o}$ ): Proportion of times the IDS agrees with the actual classifications.
Expected agreement ( $P_{e}$ ): Proportion of times the IDS would be expected to agree with the actual classifications by chance.

5. Results Presentation and Analysis

In this section, we provide a detailed evaluation of the IDS’s effectiveness in identifying and categorizing different types of attacks. Two separate experiments evaluated the model’s versatility and performance by performing various classification tasks. The primary objective was to assess the model’s ability to classify data into nine distinct categories. After that, we conducted a second experiment to evaluate the model’s discriminative capabilities in a binary classification scenario, distinguishing between classes labeled as “benign” and “malicious”.

5.1. Multilabel Classification Performance

To classify diverse attacks in 5G/6G communication networks, we divided the dataset into three sections for convolution-based models, with 70% allocated to training, 10% to validation, and 20% to testing. In total, there are 1,215,890 samples across nine classes. The training, validation, and testing datasets contain 875,276, 97,253, and 243,361 samples.

The attention-based model’s dataset was divided into three sets, each with 30 features: a training set with 875,276 samples, a validation set with 97,253 samples, and a test set with 243,133 samples. The neural network model was trained incrementally on five chunks of the training data. For each chunk, the model was trained for five epochs with a batch size of 32, and then it was validated against the validation set. The training history for each chunk was recorded to analyze the model’s performance over time.

After the training process was completed, we obtained the optimal values achieved by the baseline models under the guidance of the optimization algorithm. Table 8 presents a comprehensive overview of the fundamental results obtained for multilabel attack detection. For attempts to address class distribution imbalances within a dataset.

Table 9 presents a comprehensive evaluation report. This report goes beyond conventional accuracy measures and incorporates both “macro accuracy” and “weighted accuracy”, offering a more thorough understanding of the model’s effectiveness across diverse class distributions.

Visual illustrations of the proposed model accuracy are provided in Figure 9, Figure 10 and Figure 11. Despite certain classes in the dataset having fewer samples than others, such as ICMPFlood with only 216 samples, the model accurately identified elements within these classes.

Figure 12 illustrates the Receiver Operating Characteristic (ROC) curves for the proposed multiclass classification model across nine distinct classes. Each curve represents the model’s ability to distinguish one class from the others using the One-vs-Rest approach. The ROC curves for the nine classes show varying degrees of class-specific discrimination. The Area Under the Curve (AUC) was calculated for each class, with values close to 1.0 indicating strong classification performance.

The low overlap between ROC curves and the consistent performance across all classes suggest that the model has clearly defined and effective decision boundaries.

The convergence of an optimization algorithm is commonly evaluated by examining the variation between successive iterations of the objective function. Convergence is established when this difference drops below a predetermined threshold. The optimization algorithm has successfully converged if subsequent iterations fail to improve the objective function significantly. Optimal validation accuracy and minimum validation loss are considered when formulating the fitness solution. Consequently, convergence criteria are established, which specify that the algorithm is terminated when the gradient norm remains below 1.0. Figure 13a,b illustrate the dynamic evolution of the training process for the CS-2D-CNN model. The Cat Swarm Optimization (CSO) was executed over two epochs, as depicted in Figure 14a,b within the context of the devised framework. Following multiple iterations, a solution of x = 99.90 was attained, indicating that the algorithm effectively converged to a solution extremely proximate to its optimal value. The total training, testing, and validation accuracy values for each model are given in Figure 15a,b.

Figure 16a,b illustrate the testing progress. Both figures collectively underscore the noteworthy efficacy of the proposed model. Notably, the model achieves the highest accuracy while minimizing the loss function value. Specifically, the model achieves a remarkable accuracy level of 99.97%, concurrently minimizing the loss function value to 0.00075 by the end of the training process.

5.2. Binary Attacks Classification Performance

In the second experiment, the proposed approach is validated in the context of binary data classification. In the output layer, we made specific adjustments to the baseline model structure, tailored for binary classification following the methodology used in the initial experiment.

The classification report in Table 10 demonstrates high performance with perfect precision, recall, and F1-score values for both benign and malicious traffic. Based on the data in Table 11, the model achieved an overall accuracy of 100% on an imbalanced dataset of 243,133 samples, demonstrating its robustness and reliability. The consistently high metrics illustrate the model’s exceptional capability in binary class prediction. Figure 17 and Figure 18 visually illustrates the effectiveness of the proposed model for binary classification.

Establishing robust termination criteria is imperative for efficient optimization algorithms. The CSO algorithm exhibited effective convergence characteristics with a termination threshold set at a gradient norm not exceeding 1.0. This ensures that the algorithm converges to a solution while avoiding unnecessary iterations. The CSO is executed over two epochs with 40 subiterations for the binary class classification task. Following multiple iterations, a solution of x = 99.99% is achieved, indicating that the algorithm has effectively converged to a solution close to its optimal value.

5.3. The Effect of Optimized Models on Trainable Parameters

Given the list of model parameters provided in Table 12, the proposed CSO XGBoost-Attention model for binary attack detection reduces the number of parameters required for traffic flow analysis. This reduction is illustrated in Figure 19. Compared to the CSO 2D-CNN model, the multiattack CSO XGBoost attention-based detection method uses fewer learnable parameters.

The training time (the length of time needed to train each model) and inference time (the time required to forecast each sample) for each of our models are presented in Table 13 and Table 14. Table 13 illustrates the training times for each model in binary and multilabel classification scenarios. XGBoost-Attention is the most efficient model in terms of training, requiring only 60 s for binary training and 90 s for multilabel training, whereas CSO-2D-CNN requires 180 s and 300 s, respectively. The hybrid CSO-XGBoost-Attention model takes 120 s and 240 s to train. As shown in Table 14, the CSO-XGBoost-Attention model exhibits remarkable inference efficiency for binary classification, with an inference time of 0.001 s per sample. For multilabel classification, the inference time is 0.008 s per sample. CSO-2D-CNN takes 0.002 and 0.06 s, while XGBoost-Attention takes 0.01 and 0.222 s.

5.4. Ablation Study

An ablation study was conducted to assess the impact of key components on the framework’s performance. The following model variations were considered for multilabel 5 g/6 g flow analysis:

W/o CSO: The XGBoost-Attention model is trained to incorporate CSO optimization.
W/o CSE: Optimization is excluded from the XGBoost-Attention model.
W/o WXG: The XGBoost model is trained without incorporating CSO optimization or attention applied.
W/o CSNN: The 2D-CNN model is trained using CSO optimization.
W/o CSNNE: The 2D-CNN model is trained without using CSO optimization.
W/o Hybrid: A dynamic weight adjustment mechanism is used for combining the predictions of the CSO-2D-CNN and CSO-XGBoost-Attention models.

To test the model’s ability to adapt to various network situations and attack types, the w/o Hybrid methodology calculates a dynamic score for each classifier’s output based on the input feature distribution instead of assigning fixed weights to CNN and XGBoost forecasts. The final prediction is calculated as a weighted sum of F1-scores for each model on the validation set; the weights are determined by a soft attention mechanism trained in conjunction with the model. For each test sample, the CSO-2D-CNN’s initial prediction was used to determine the predicted class, and the corresponding dynamic weights were applied to combine the probability outputs of both models. This ensures that the model with better performance for a specific attack type contributes more to the final prediction. The ablation results are presented in Table 15. XGBoost produced 97% accuracy, 97% precision, and 96.2% Cohen’s Kappa on its own, whereas the suggested CSO-XGBoost-Attention model achieved 99.97% accuracy. According to this comparison, the attention mechanism significantly improves the system’s performance. Compared to XGBoost alone, the attention mechanism improves accuracy, precision, recall, F1-score, and Cohen’s Kappa by 3.07%, 3.09%, 3.08%, and 3.89%, respectively. These improvements demonstrate that attention is essential for reducing classification errors (such as false positives and false negatives) and improving the overall resilience of our intrusion detection system.

5.5. Comparison

Compared to previous work, our study presents a deep learning model specifically designed to address the specific challenges and requirements of intrusion detection in 5G/6G wireless scenarios (Table 16). A significant difference between our study and previous publications was that our study addressed the often overlooked problem of unbalanced multilabel data classification [59]. Additionally, the ability to generalize a model may be diminished by overfitting models that are too complex or have extraneous features. Using lightweight, fine-tuned models, we ensured the dependability of our model, which had not been thoroughly explored in previous studies. This method improves the reliability of our findings, and the model is more viable in real-life multithread cybersecurity scenarios over wireless networks.

6. Discussion

Modern networks, such as 6G, are increasingly complex, making intrusion detection systems (IDSs) more critical than ever. IDSs are crucial cybersecurity tools that protect networks from malicious attacks and threats. Deep learning models have proven successful in other domains, so the research community is increasingly adopting them for intrusion detection applications. However, training these models effectively requires a significant amount of labeled data and manual configuration, which can be time-consuming and costly.

Due to the variety and rapid development of modern attack methods targeting contemporary networks, conventional intrusion detection systems (IDSs) may struggle to keep up with the latest threats. Developing precise detection models is particularly challenging due to the lack of labeled data specific to wireless environments. Selecting an appropriate model requires careful consideration of several factors, including the dataset’s characteristics, the application’s particular needs, and the available computational resources. Modern optimization techniques, notably the CSO algorithm, enhance the model’s adaptability and efficiency. The algorithm’s versatility is further demonstrated by its successful application in discrete multilabel optimization problems, making it well suited for scenarios involving diverse decision-making.

This study explores novel optimization strategies for detecting multilabel attacks in 5G and 6G communication, which differ from traditional approaches. The emphasis on simplicity, robustness, and integrating adaptive optimization techniques forms a foundation for further innovations. The first developed model uses a lightweight 2D-CNN architecture, leveraging the convolutional nature of CNNs to learn IDS features from the input dataset. An attention-based XGBoost ensemble model was also developed for binary and multilabeled IDSs. After being trained on the extracted features, each model generates probability predictions for each class. Known for its efficiency and high performance in handling structured data, XGBoost excels in capturing complex patterns through its ensemble of decision trees. Moreover, the attention mechanism within the neural network focuses on the most relevant parts of the input features, enhancing the model’s ability to capture intricate relationships and dependencies within the data.

A deep learning model’s efficacy depends on accurately classifying data. Class-wise predictions provide insights into potential challenges and successful outcomes in classification. The CSO-2D-CNN model demonstrated exceptional accuracy, achieving 99.97% in the first experiment for multiclass classification (as summarized in Table 8), which highlights its ability to categorize instances across diverse classes within the dataset. Similarly, the CSO-XGBoost-Attention model exhibited a high accuracy of 99.95%. The accuracy of both models, with a maximum of 99.95%, demonstrates their reliability and effectiveness in accurately predicting cyberattacks. The end-to-end training of the 2D-CNN model allows it to map high-dimensional inputs directly to outputs, eliminating the need for additional feature extraction or selection. In contrast, traditional machine learning (ML) models, such as XGBoost, may struggle to represent complex patterns and structures.

According to Table 9, the devised models performed well across various data distributions, achieving 100% accuracy in most cases. A detailed examination of confusion matrices for each class, shown in Figure 9, Figure 10 and Figure 11, enhances our understanding of the model’s performance. The model demonstrated resilience even with classes characterized by a limited number of samples, such as ICMPFlood, providing evidence of its adaptability to varying data distributions. Using the Cat Swarm Optimization (CSO) algorithm, the optimization process applied to the attention model significantly enhanced its predictive performance. As shown in Figure 12, the non-optimized ensemble of traditional XGBoost with the attention neural network has a marginally lower AUC of 89% for HTTPFlood compared to the other scenarios. HTTPFlood DoS attacks differ from others regarding packet volume and speed, making detection more challenging due to the slow transmission rate.

The convergence plot, depicted in Figure 13 and Figure 14, visually encapsulates the progression of the optimization process. The process illustrates the algorithm’s iterative refinement, resulting in a solution that aligns closely with the optimal value. This convergence indicates that the algorithm has successfully navigated the search space to find an optimal solution. The ensemble model, combining XGBoost and attention models, initially demonstrated a commendable accuracy of 96.47%, showing its effectiveness in classifying the data. The performance of the attention-based XGBoost model was significantly enhanced by using Cat Swarm Optimization (CSO) to adjust its hyperparameters, resulting in an accuracy of 99.96% (Figure 15). CSO’s ability to efficiently explore the hyperparameter space and identify optimal configurations contributed to this improvement. The optimal fitness score of the multilabel attention-based XGBoost model is 0.389, where a lower fitness score typically indicates a better configuration. As a result, the improved Attention-XGBoost model can generate precise predictions across a wide range of labels while achieving a high fitness level in hyperparameters. This increase in accuracy illustrates how the CSO approach has enhanced the model’s generalization ability and performance on challenging tasks. Figure 16 analyzes the model’s loss. The model’s ability to achieve high accuracy and minimal loss values during the training and testing phases substantiates its robust generalization capabilities. An accuracy level of 99.97% and a minimal loss function value of 0.00075 demonstrate the importance of hyperparameter optimization in optimizing machine learning models.

An intrusion detection system may be unable to distinguish between different types of attack, each of which may exhibit unique patterns, behaviors, and effects on the network if it simply classifies traffic as malicious or benign. Efficiency metrics, as described in Table 10 and Table 11, provide insight into the model’s robustness in the binary classes within the dataset. This emphasizes its adaptability to varying data distributions, a critical attribute for real-world applicability. Additionally, the optimization improved other performance indicators, such as precision, recall, F1-score, and Kappa score, indicating a more robust and effective predictive maintenance system. Figure 17 and Figure 18 offer a visual representation of the accuracy values obtained from the CS-2D-CNN and ensemble models. Non-optimized XGBoost-attention models achieve a maximum accuracy 80% in distinguishing between positive and negative cases in an imbalanced dataset. In contrast, the optimized models perfectly differentiate between positive and negative cases, demonstrating their ability to learn intricate patterns and generalize to previously unseen data. This indicates that the CSO-2D-CNN and CSO XGBoost-attention-based models memorize the training data and apply their learning to new, unknown instances, which is crucial for real-world applications.

The number of parameters that can be trained is a critical concern when using a lightweight model. Training parameters refer to variables that can be adjusted during training. The parameters in a neural network consist of biases and weights for each layer. Using CSO-based selection procedures, the number of trainable parameters for each model is lower than when using random methods (Figure 19). This substantial improvement highlights the impact of optimization on the model’s performance, as the refined attention model contributed to more accurate and reliable predictions. The 2D-CNN model, after hybridization with CSO, required significant memory allocation to train it on benign vs. malicious traffic or multilabel attacks over two epochs, consisting of two main iterations and 40 subiterations. In contrast, with the incremental training methodology and data chunking, the ensemble model, which used decision trees, completed the process using approximately 31 KB of memory for multiclass classification across five epochs. Decision trees require fewer trainable parameters than CNNs, as they are less computationally complex. A decision tree-based model offers a straightforward and effective training method, particularly for handling unbalanced data, which can be challenging for CNNs. In contrast, the forward propagation of CNNs generates multiple feature maps, also known as activations, at each convolutional layer. These feature maps, which are stored in memory for backpropagation, represent the results of every filter applied to the input data. This process may lead to higher memory consumption during training. An essential technique for training CNNs is mini-batch gradient descent, which involves processing multiple samples concurrently in batches. For future work, we plan to apply the mini-batch procedure to test the effectiveness of the developed CSO 2D-CNN approach over different batch sizes, evaluating how it stores data, gradients, and intermediate activations for each sample in a batch. Additionally, while the proposed framework was assessed using one of the most enormous single-center datasets available, a significant limitation is its lack of validation across multiple datasets, which is crucial for generalizing the findings. To achieve better generalization, incorporating a multicenter dataset is essential.

Table 16 summarizes recent studies in various domains compared to our findings. XGBoost, Light-GBM, and CatBoost techniques were used by Idris et al. [60] to develop an IDS for IoMT devices. Our model outperformed theirs in terms of accuracy, precision, and recall. Although their study reported a detection rate of more than 99% for many evaluation criteria, it requires extensive training when applied to large datasets to adjust the best model parameters and analyze complex traffic flow patterns. In the study by Jadav et al. [61], a long short-term memory (LSTM) model was used to categorize wearable devices as either harmful or legitimate. Despite achieving an accuracy of 93.8% and 92.92% in the training and testing sets, the model performed significantly worse in precision, recall, and F1-score. Bouke1 et al. [59] presented a lightweight model with 25 out of 45 feature selection processes. The authors aimed to balance improving the model’s predictive power and minimizing its complexity utilizing convolutional M Land DNN-based algorithms. Despite achieving a maximum accuracy of 99.6%, the model only considered binary cases and did not account for the evolving nature of cybersecurity attacks in real-world applications. Our study focused on developing lightweight models for multilabel attack detection that can achieve high accuracy while accounting for the complex nature of wireless environments. The current proposed work addresses the security needs of application-enabled 6G networks, vulnerable to severe cyberattacks with significant consequences. Although it employs binary and multilabel analysis techniques to improve detection accuracy, it has not been tested across different network layers. This limitation affects the models’ validation in varied network contexts. Future research should focus on validating the model with multiple datasets and across different network layers, ensuring that each layer targets specific detection areas such as network abnormalities, behavioral patterns, and signature-based identification.

The inference and training times for the developed models are presented in Table 13 and Table 14. We used the XGBoost-Attention model as the baseline to measure the improvement in inference time because it has a slower inference time. The improved ratio was calculated as a percentage to reflect the improvement in inference time. CSO-2D-CNN outperforms XGBoost-Attention in binary classification, with an enhancement ratio of 80%, and in multilabel classification, with a ratio of 72.97%. In binary classification, the CSO-XGBoost-Attention model has a better ratio of 99%. In multilabel classification, the ratio is 96.40%, significantly higher than the baseline. The results of this study demonstrate the incredible efficiency with which the CSO-XGBoost-Attention model makes inferences, making it an ideal candidate for real-time intrusion detection. From Table 15, the CSO-2D-CNN model achieved a relatively high accuracy of 99.97% for multilabel attack detection, with a substantial F1-score (99.9%), meaning that most of the predicted positives were correct. The recall (100%) indicates that it identified 100% of the actual positives. Additionally, the CSO-XGBoost-Attention model had a slightly lower performance with an accuracy of 99.95, sensitivity of 100%, and specificity of 100%. The F1-score (99.96) and Cohen’s Kappa (99.9) reflect its ability to identify negatives correctly. Compared to the static weight fusion, the results show a slight performance decrease with the dynamic weight adjustment (fusion of both models with a maximum of 98.5% accuracy). We hypothesize that the original static weights were highly optimized for the dataset, making the dynamic approach less impactful in this context. Additionally, the dynamic weights rely on the validation set’s F1-scores, which may not perfectly generalize to the test set, potentially introducing minor inconsistencies. The reliance on the CNN’s initial prediction to assign weights could also propagate errors in cases of incorrect initial predictions. Therefore, while the optimized models were sufficient to achieve near-perfect performance, the dynamic weight adjustment mechanism to fuse both models’ results is a promising approach that could be more effective in more complex or varied network scenarios. In future work, we plan to enhance the dynamic fusion mechanism by incorporating additional performance metrics (e.g., precision or recall) or contextual factors (e.g., network conditions) to improve its adaptability further.

7. Conclusions

5G and 6G wireless technology are rapidly expanding in various industries worldwide. Several government agencies and industry organizations have recognized 5G’s unprecedented speed and connectivity. The current study emphasizes maintaining model simplicity while still ensuring robustness, addressing concerns related to generalization and potential bias toward specific data. The first model architecture consists of three key components: an input layer configured to match the dataset’s dimensions, convolutional layers for feature extraction, and fully connected layers for classification. The second model is an ensemble-based attention model, which plays a crucial role in the workflow by enhancing feature extraction and improving data interpretability. After training the XGBoost model and using it to transform the dataset into new feature representations, the attention mechanism within a neural network becomes central to the process. A Cat Swarm Optimization (CSO) approach was used to optimize the hyperparameters of each model, thereby increasing detection accuracy. The CSO-XGBoost-Attention model and CSO-2D-CNN have improved accuracy with 99% for binary and multilabel network attack classifications. These results highlight the superior inference efficiency of the enhanced optimized models, making them highly suitable for real-time intrusion detection applications in the wireless network landscape.The deployed dataset, which includes features such as flow duration (“Dur”), packet sequence (“Seq”), protocol (“Proto”), and attack type (“Attack Type”), is designed to capture instantaneous network traffic characteristics rather than temporal sequences. For example, the “Dur” feature represents the duration of individual flows but lacks context for modeling attack sequences across time. Similarly, the “Seq” feature denotes packet order within a flow, not a global temporal sequence. Therefore, the data exhibited no significant recurring attack sequences, likely because the model focuses on isolated attack instances rather than longitudinal attack campaigns. In future work, we aim to address this limitation by exploring datasets with temporal attributes or collecting longitudinal network traffic data to enable attack forecasting. Such data would allow the integration of time series models or pattern-based prediction to anticipate future attacks. Additionally, we plan to investigate hybrid approaches that combine real-time detection with probabilistic forecasting based on contextual network conditions. These efforts will build on the current system’s strengths while addressing the predictive requirements.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in https://doi.org/10.21227/xtep-hv36.

Conflicts of Interest

The author declares no conflicts of interest.

References

Chavhan, S. Shift to 6G: Exploration on trends, vision, requirements, technologies, research, and standardization efforts. Sustain. Energy Technol. Assess. 2022, 54, 102666. [Google Scholar]
Salahdine, F.; Han, T.; Zhang, N. 5G, 6G, and Beyond: Recent advances and future challenges. Ann. Telecommun. 2023, 78, 525–549. [Google Scholar] [CrossRef]
Zawish, M.; Dharejo, F.A.; Khowaja, S.A.; Raza, S.; Davy, S.; Dev, K.; Bellavista, P. AI and 6G into the metaverse: Fundamentals, challenges and future research trends. IEEE Open J. Commun. Soc. 2024, 5, 730–778. [Google Scholar] [CrossRef]
Reshmi, T.; Abhishek, K. 5G and 6G Security Issues and Countermeasures. In Secure Communication in Internet of Things; CRC Press: Boca Raton, FL, USA, 2024; pp. 300–310. [Google Scholar]
Siriwardhana, Y.; Porambage, P.; Liyanage, M.; Ylianttila, M. AI and 6G security: Opportunities and challenges. In Proceedings of the IEEE 2021 Joint European Conference on Networks and Communications & 6G Summit (EuCNC/6G Summit), Porto, Portugal, 8–11 June 2021; pp. 616–621. [Google Scholar]
Jahankhani, H.; Kendzierskyj, S.; Hussien, O. Approaches and Methods for Regulation of Security Risks in 5G and 6G. In Wireless Networks: Cyber Security Threats and Countermeasures; Springer: Berlin/Heidelberg, Germany, 2023; pp. 43–70. [Google Scholar]
Akshay Kumaar, M.; Samiayya, D.; Vincent, P.D.R.; Srinivasan, K.; Chang, C.Y.; Ganesh, H. A hybrid framework for intrusion detection in healthcare systems using deep learning. Front. Public Health 2022, 9, 824898. [Google Scholar] [CrossRef]
Vu, L.; Nguyen, Q.U.; Nguyen, D.N.; Hoang, D.T.; Dutkiewicz, E. Deep generative learning models for cloud intrusion detection systems. IEEE Trans. Cybern. 2022, 53, 565–577. [Google Scholar] [CrossRef]
Karthikeyan, M.; Manimegalai, D.; RajaGopal, K. Firefly algorithm based WSN-IoT security enhancement with machine learning for intrusion detection. Sci. Rep. 2024, 14, 231. [Google Scholar] [CrossRef]
Krishnan, R.; Krishnan, R.S.; Robinson, Y.H.; Julie, E.G.; Long, H.V.; Sangeetha, A.; Subramanian, M.; Kumar, R. An intrusion detection and prevention protocol for internet of things based wireless sensor networks. Wirel. Pers. Commun. 2022, 124, 3461–3483. [Google Scholar] [CrossRef]
Idris, S.; Ishaq, O.O.; Juliana, N.N. Intrusion Detection System Based on Support Vector Machine Optimised with Cat Swarm Optimization Algorithm. In Proceedings of the IEEE 2019 2nd International Conference of the IEEE Nigeria Computer Chapter (NigeriaComputConf), Zaria, Nigeria, 14–17 October 2019; pp. 1–8. [Google Scholar]
Jovanovic, D.; Marjanovic, M.; Antonijevic, M.; Zivkovic, M.; Budimirovic, N.; Bacanin, N. Feature selection by improved sand cat swarm optimizer for intrusion detection. In Proceedings of the IEEE 2022 International Conference on Artificial Intelligence in Everything (AIE), Lefkosa, Cyprus, 2–4 August 2022; pp. 685–690. [Google Scholar]
Chandol, M.K.; Rao, M.K. Border collie cat optimization for intrusion detection system in healthcare IoT network using deep recurrent neural network. Comput. J. 2022, 65, 3181–3198. [Google Scholar] [CrossRef]
Khan, F.; Kanwal, S.; Alamri, S.; Mumtaz, B. Hyper-parameter optimization of classifiers, using an artificial immune network and its application to software bug prediction. IEEE Access 2020, 8, 20954–20964. [Google Scholar] [CrossRef]
Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef]
Mustafa, Z.; Amin, R.; Aldabbas, H.; Ahmed, N. Intrusion detection systems for software-defined networks: A comprehensive study on machine learning-based techniques. Clust. Comput. 2024, 27, 9635–9661. [Google Scholar] [CrossRef]
Rajesh, K.; Vetrivelan, P. Comprehensive analysis on 5G and 6G wireless network security and privacy. Telecommun. Syst. 2025, 88, 52. [Google Scholar] [CrossRef]
Pradhan, A.; Singh, N.; Kumar, N.; Agarwal, T.; Rampal, S. Machine Learning Techniques for Intrusion Detection in Software-Defined Networks. In Proceedings of the IEEE 2025 International Conference on Automation and Computation (AUTOCOM), Dehradun, India, 4–6 March 2025; pp. 613–618. [Google Scholar]
Negi, L.; Kumar, D. ECC based certificateless aggregate signature scheme for healthcare wireless sensor networks. J. Reliab. Intell. Environ. 2024, 10, 489–500. [Google Scholar] [CrossRef]
Gurusamy, V.; Praveenkumar; Jebaraj, J.R.; Ranjithi, M.; Raphael, B.L. A lightweight multi-layer authentication protocol for wireless sensor networks in IoT applications. In AIP Conference Proceedings; AIP Publishing LLC: Melville, NY, USA, 2024; Volume 2966, p. 020003. [Google Scholar]
Azam, Z.; Islam, M.M.; Huda, M.N. Comparative analysis of intrusion detection systems and machine learning based model analysis through decision tree. IEEE Access 2023, 11, 80348–80391. [Google Scholar] [CrossRef]
Kavitha, S.; Uma Maheswari, N.; Venkatesh, R. Intelligent intrusion detection system using enhanced arithmetic optimization algorithm with deep learning model. Teh. Vjesn. 2023, 30, 1217–1224. [Google Scholar]
Altamimi, S.; Abu Al-Haija, Q. Maximizing intrusion detection efficiency for IoT networks using extreme learning machine. Discov. Internet Things 2024, 4, 1–37. [Google Scholar] [CrossRef]
Bostani, H.; Sheikhan, M. Hybrid of anomaly-based and specification-based IDS for Internet of Things using unsupervised OPF based on MapReduce approach. Comput. Commun. 2017, 98, 52–71. [Google Scholar] [CrossRef]
Sajid, M.; Malik, K.R.; Almogren, A.; Malik, T.S.; Khan, A.H.; Tanveer, J.; Rehman, A.U. Enhancing intrusion detection: A hybrid machine and deep learning approach. J. Cloud Comput. 2024, 13, 123. [Google Scholar] [CrossRef]
Nazir, A.; Khan, R.A. A novel combinatorial optimization based feature selection method for network intrusion detection. Comput. Secur. 2021, 102, 102164. [Google Scholar] [CrossRef]
Awajan, A. A novel deep learning-based intrusion detection system for IOT networks. Computers 2023, 12, 34. [Google Scholar] [CrossRef]
Azar, A.T.; Shehab, E.; Mattar, A.M.; Hameed, I.A.; Elsaid, S.A. Deep learning based hybrid intrusion detection systems to protect satellite networks. J. Netw. Syst. Manag. 2023, 31, 82. [Google Scholar] [CrossRef]
Yadav, N.; Pande, S.; Khamparia, A.; Gupta, D. Intrusion detection system on IoT with 5G network using deep learning. Wirel. Commun. Mob. Comput. 2022, 2022, 9304689. [Google Scholar] [CrossRef]
Benmessahel, I.; Xie, K.; Chellal, M. A new evolutionary neural networks based on intrusion detection systems using multiverse optimization. Appl. Intell. 2018, 48, 2315–2327. [Google Scholar] [CrossRef]
Madhuridevi, L.; Sree Rathna Lakshmi, N. Metaheuristic assisted hybrid deep classifiers for intrusion detection: A bigdata perspective. Wirel. Netw. 2024, 31, 1205–1225. [Google Scholar] [CrossRef]
Gupta, S.K.; Tripathi, M.; Grover, J. Hybrid optimization and deep learning based intrusion detection system. Comput. Electr. Eng. 2022, 100, 107876. [Google Scholar] [CrossRef]
Alzubi, Q.M.; Anbar, M.; Sanjalawe, Y.; Al-Betar, M.A.; Abdullah, R. Intrusion detection system based on hybridizing a modified binary grey wolf optimization and particle swarm optimization. Expert Syst. Appl. 2022, 204, 117597. [Google Scholar] [CrossRef]
Stiawan, D.; Heryanto, A.; Bardadi, A.; Rini, D.P.; Subroto, I.M.I.; Idris, M.Y.B.; Abdullah, A.H.; Kerim, B.; Budiarto, R. An approach for optimizing ensemble intrusion detection systems. IEEE Access 2020, 9, 6930–6947. [Google Scholar] [CrossRef]
Injadat, M.; Moubayed, A.; Nassif, A.B.; Shami, A. Multi-stage optimized machine learning framework for network intrusion detection. IEEE Trans. Netw. Serv. Manag. 2020, 18, 1803–1816. [Google Scholar] [CrossRef]
Kasongo, S.M. An advanced intrusion detection system for IIoT based on GA and tree based algorithms. IEEE Access 2021, 9, 113199–113212. [Google Scholar] [CrossRef]
Alzaqebah, A.; Aljarah, I.; Al-Kadi, O.; Damaševičius, R. A modified grey wolf optimization algorithm for an intrusion detection system. Mathematics 2022, 10, 999. [Google Scholar] [CrossRef]
Kunhare, N.; Tiwari, R.; Dhar, J. Intrusion detection system using hybrid classifiers with meta-heuristic algorithms for the optimization and feature selection by genetic algorithm. Comput. Electr. Eng. 2022, 103, 108383. [Google Scholar] [CrossRef]
Eesa, A.S.; Orman, Z.; Brifcani, A.M.A. A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems. Expert Syst. Appl. 2015, 42, 2670–2679. [Google Scholar] [CrossRef]
Wang, Z. Deep learning-based intrusion detection with adversaries. IEEE Access 2018, 6, 38367–38384. [Google Scholar] [CrossRef]
Afzaliseresht, N.; Miao, Y.; Michalska, S.; Liu, Q.; Wang, H. From logs to stories: Human-centred data mining for cyber threat intelligence. IEEE Access 2020, 8, 19089–19099. [Google Scholar] [CrossRef]
Jayalaxmi, P.; Saha, R.; Kumar, G.; Alazab, M.; Conti, M.; Cheng, X. Pignus: A deep learning model for ids in industrial Internet-of-things. Comput. Secur. 2023, 132, 103315. [Google Scholar] [CrossRef]
Mezina, A.; Burget, R.; Travieso-González, C.M. Network anomaly detection with temporal convolutional network and U-Net model. IEEE Access 2021, 9, 143608–143622. [Google Scholar] [CrossRef]
Wang, Z.; Chen, H.; Yang, S.; Luo, X.; Li, D.; Wang, J. A lightweight intrusion detection method for IoT based on deep learning and dynamic quantization. PeerJ Comput. Sci. 2023, 9, e1569. [Google Scholar] [CrossRef]
Yang, K.; Wang, J.; Zhao, G.; Wang, X.; Cong, W.; Yuan, M.; Luo, J.; Dong, X.; Wang, J.; Tao, J. NIDS-CNNRF Integrating CNN and random forest for efficient network intrusion detection model. Internet Things 2025, 32, 101607. [Google Scholar] [CrossRef]
Alqahtany, S.S.; Shaikh, A.; Alqazzaz, A. Enhanced Grey Wolf Optimization (EGWO) and random forest based mechanism for intrusion detection in IoT networks. Sci. Rep. 2025, 15, 1916. [Google Scholar] [CrossRef]
Punitha, A.; Ramani, P.; Ezhilarasi, P.; Sridhar, S. Dynamically stabilized recurrent neural network optimized with intensified sand cat swarm optimization for intrusion detection in wireless sensor network. Comput. Secur. 2025, 148, 104094. [Google Scholar] [CrossRef]
Aljabri, J. Attack resilient IoT security framework using multi head attention based representation learning with improved white shark optimization algorithm. Sci. Rep. 2025, 15, 14255. [Google Scholar] [CrossRef] [PubMed]
Zivkovic, M.; Bacanin, N.; Arandjelovic, J.; Rakic, A.; Strumberger, I.; Venkatachalam, K.; Joseph, P.M. Novel harris hawks optimization and deep neural network approach for intrusion detection. In Proceedings of the International Joint Conference on Advances in Computational Intelligence: IJCACI 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 239–250. [Google Scholar]
Dakic, P.; Zivkovic, M.; Jovanovic, L.; Bacanin, N.; Antonijevic, M.; Kaljevic, J.; Simic, V. Intrusion detection using metaheuristic optimization within IoT/IIoT systems and software of autonomous vehicles. Sci. Rep. 2024, 14, 22884. [Google Scholar] [CrossRef] [PubMed]
Chu, S.C.; Tsai, P.W.; Pan, J.S. Cat swarm optimization. In Proceedings of the PRICAI 2006: Trends in Artificial Intelligence: 9th Pacific Rim International Conference on Artificial Intelligence, Guilin, China, 7–11 August 2006; Proceedings 9. Springer: Berlin/Heidelberg, Germany, 2006; pp. 854–858. [Google Scholar]
Samarakoon, S.; Siriwardhana, Y.; Porambage, P.; Liyanage, M.; Chang, S.Y.; Kim, J.; Kim, J.; Ylianttila, M. 5g-nidd: A comprehensive network intrusion detection dataset generated over 5g wireless network. arXiv 2022, arXiv:2212.01298. [Google Scholar]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Zhang, P.; Jia, Y.; Shang, Y. Research and application of XGBoost in imbalanced data. Int. J. Distrib. Sens. Netw. 2022, 18, 15501329221106935. [Google Scholar] [CrossRef]
Samidi, F.S.; Mohamed Radzi, N.A.; Mohd Azmi, K.H.; Mohd Aripin, N.; Azhar, N.A. 5G technology: ML hyperparameter tuning analysis for subcarrier spacing prediction model. Appl. Sci. 2022, 12, 8271. [Google Scholar] [CrossRef]
Aljebreen, M.; Alrayes, F.S.; Maray, M.; Aljameel, S.S.; Salama, A.S.; Motwakel, A. Modified Equilibrium Optimization Algorithm with Deep Learning-Based DDoS Attack Classification in 5G Networks. IEEE Access 2023, 11, 108561–108570. [Google Scholar] [CrossRef]
Chu, S.C.; Tsai, P.W. Computational intelligence based on the behavior of cats. Int. J. Innov. Comput. Inf. Control 2007, 3, 163–173. [Google Scholar]
Yang, X.S. Engineering Optimization: An Introduction with Metaheuristic Applications; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
Bouke, M.A.; El Atigh, H.; Abdullah, A. Towards robust and efficient intrusion detection in IoMT: A deep learning approach addressing data leakage and enhancing model generalizability. Multimed. Tools Appl. 2024, 1–20. [Google Scholar] [CrossRef]
Idrissi, I.; Boukabous, M.; Grari, M.; Azizi, M.; Moussaoui, O. An intrusion detection system using machine learning for internet of medical things. In Proceedings of the International Conference on Electronic Engineering and Renewable Energy Systems; Springer: Berlin/Heidelberg, Germany, 2022; pp. 641–649. [Google Scholar]
Jadav, D.; Jadav, N.K.; Gupta, R.; Tanwar, S.; Alfarraj, O.; Tolba, A.; Raboaca, M.S.; Marina, V. A trustworthy healthcare management framework using amalgamation of AI and blockchain network. Mathematics 2023, 11, 637. [Google Scholar] [CrossRef]
Hadi, H.J.; Cao, Y.; Li, S.; Xu, L.; Hu, Y.; Li, M. Real-time fusion multi-tier DNN-based collaborative IDPS with complementary features for secure UAV-enabled 6G networks. Expert Syst. Appl. 2024, 252, 124215. [Google Scholar] [CrossRef]
Bouke, M.A.; Abdullah, A. An empirical assessment of ML models for 5G network intrusion detection: A data leakage-free approach. E-Prime Electr. Eng. Electron. Energy 2024, 8, 100590. [Google Scholar] [CrossRef]
Dhanya, L.; Chitra, R. A novel autoencoder based feature independent GA optimised XGBoost classifier for IoMT malware detection. Expert Syst. Appl. 2024, 237, 121618. [Google Scholar] [CrossRef]
Wang, Z.; Fok, K.W.; Thing, V.L. Exploring Emerging Trends in 5G Malicious Traffic Analysis and Incremental Learning Intrusion Detection Strategies. arXiv 2024, arXiv:2402.14353. [Google Scholar]
Korba, A.A.; Boualouache, A.; Ghamri-Doudane, Y. Zero-X: A Blockchain-Enabled Open-Set Federated Learning Framework for Zero-Day Attack Detection in IoV. IEEE Trans. Veh. Technol. 2024, 73, 12399–12414. [Google Scholar]
Farzaneh, B.; Shahriar, N.; Al Muktadir, A.H.; Towhid, M.S. DTL-IDS: Deep transfer learning-based intrusion detection system in 5G networks. In Proceedings of the IEEE 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 30 October–2 November 2023; pp. 1–5. [Google Scholar]

Figure 1. The proposed IDS workflow.

Figure 2. The frequency attack type in 5G-NIDD dataset.

Figure 3. The frequency attack tool used on 5G-NIDD dataset.

Figure 4. The missing values in each column in the dataset.

Figure 5. Proposed CNN architecture for multilabel attack detection.

Figure 6. Feature extraction using XGBoost model—Input: All learned trees

f_{1} (x), f_{2} (x), \dots, f_{K} (x)

Output: Final prediction

\hat{y} = \sum_{k = 1}^{K} f_{k} (x)

.

Figure 6. Feature extraction using XGBoost model—Input: All learned trees

f_{1} (x), f_{2} (x), \dots, f_{K} (x)

Output: Final prediction

\hat{y} = \sum_{k = 1}^{K} f_{k} (x)

.

Figure 7. Multilabel attack classification using attention mechanism.

Figure 8. Workflow of Cat Swarm Optimization algorithm for IDS.

Figure 9. Confusion matrix of the XGBoost attention-based model for multilabel attack detection.

Figure 10. Confusion matrix of the optimized XGBoost attention-based model for multilabel attack detection.

Figure 11. Confusion matrix of the optimized 2D-CNN model for multilabel attack detection.

Figure 12. Receiver Operating Characteristic (ROC) for multilabel attack classification. (a) XGBoost attention-based model, (b) optimized XGBoost attention-based model, (c) optimized 2D-CNN model.

Figure 13. The accuracy curve for the devised models. (a) Optimized XGBoost attention-based model, (b) optimized 2D-CNN model.

Figure 14. The convergence curve for the CS-2D-CNN model. (a) Optimized XGBoost attention-based model, (b) optimized 2D-CNN model.

Figure 15. Accuracy for multilabel attack detection framework. (a) Optimized XGBoost-Attention, (b) optimized 2D-CNN.

Figure 16. The loss curve for the devised models. (a) Optimized XGBoost attention-based model, (b) optimized 2D-CNN model.

Figure 17. Confusion matrices for binary-label attack detection framework.

Figure 18. Receiver Operating Characteristic (ROC) for binary-label attack classification. (a) XGBoost attention-based model, (b) optimized XGBoost attention-based model, (c) optimized 2D-CNN model.

Figure 19. Performance metrics for optimized models.

Table 1. Summary of traditional and non-swarm-based intrusion detection approaches.

Ref.	Methodology	Advantages	Limitations
[42]	Cascade Forward Backpropagation NNs with Autoencoders for feature selection.	Provides an ideal feature set and uses predefined signatures to trace the attack pattern. Has 100% detection accuracy for multiclass attacks.	The model may become complex and less interpretable due to backpropagation and sparsity-based training. Strong fitting to training data may hinder generalization to unseen data.
[19]	An aggregate signature scheme for healthcare wireless sensor networks.	A formal security study based upon a random Oracle model demonstrates the scheme’s resistance to forgery attacks.	These cryptography-based solutions are computationally intensive and challenging to implement on resource-constrained devices.
[20]	Lightweight multilayer token-based authentication for WBAN sensing.	The devised protocol ensures the security, confidentiality, and integrity of the WBAN data by utilizing secure authentication and creating group keys.
[43]	A method based on the U-Net model and Temporal Convolution Networks.	In real-time detection applications, the approach maximizes model training efficiency while lowering computation overhead.	The convolutional layers and the rigid design of U-Net might not be as flexible in a dynamic environment, where new attack methods could emerge over time.
[44]	A hybrid approach combining PCA and DNNs with bidirectional LSTM networks.	An incremental PCA considerably enhances detection performance by reducing data dimensionality.	The network deep architecture may require substantial computing power when operating on devices that have limited capabilities.
[45]	A novel IDS that integrates a CNN for feature extraction, PCA to address feature redundancies, and RF for classifying traffic attacks.	Precise identification of various attacks.	PCA is based on linear subspaces, and it is difficult to represent the non-linear attack patterns that often occur in WSNs within a lower-dimensional linear subspace.

Table 2. Overview of swarm-inspired optimization methods applied to IDSs. GWO: Enhanced Grey Wolf Optimization, ELM: Extreme Learning Machine. RNN: recurrent neural network. WBOA: Wolf–Bird Optimization Algorithm. SCSO: Sand Cat Swarm Optimization. PSO: Particle Swarm Optimization. HHO: Harris Hawks Optimization algorithm.

Ref.	Methodology	Advantages	Limitations
[46]	A hybrid approach that uses enhanced GWO for feature selection to eliminate redundant qualities. RF is used to assess the importance of features for IoT intrusion detection.	Combining multiple decision trees enables the model to achieve an accuracy of 98.95%.	In highly scalable environments, this method may lead to additional location reevaluations within the search space, increasing computational complexity.
[37]	an IDS using a modified GWO and ELM classification.	ELMs are known for their extremely fast training time, as they do not require iterative weight updates.	It cannot handle multiclass attacks with detection accuracy of 81%.
[47]	A dynamically stabilized RNN with enhanced SCSO and the WBOA to detect intrusions in wireless sensor networks.	A multiscale improved differential filter is used in the preprocessing step to eliminate biased and redundant records from incoming data.	Overhead problems and processing complexity are associated with deploying hybrid optimization for feature selection and hyperparameter optimization.
[48]	Bidirectional gated recurrent unit with multihead attention intrusion detection mechanism in IoT networks. The SCSO model is used for the feature selection.	The approach effectively ensures robust performance with an average accuracy of 98.28%.	The model was evaluated on a small dataset of only 30,000 records. A small sample size may not fully reflect the variability in more extensive, real-world situations, which could affect the model’s general performance.
[49]	An advanced HHO is combined with a DNN.	The approach reduces dimensionality and improves feature selection, yielding an accuracy rate of 99.5%.	Overhead problems and processing complexity are associated with deploying the HHO algorithm for DNN hyperparameter optimization.
[50]	A hybrid approach integrating XGBoost and KNN classifiers and a modified version of the PSO algorithm.	Improves the precision and effectiveness of attack detection systems and enhancement of the possibility of effective threat avoidance.	The top models achieved over 89% accuracy, demonstrating encouraging performance; however, the presence of misclassifications suggests that further refinement and validation are needed to enhance dependability in practical applications.

Table 3. Gaps in IDS for WSNs, identified limitations, and contributions of lightweight models for real-time deployment.

Gaps Found	Proposed Method Contributions
Data with high dimensionality Struggles with feature selection. Low accuracy. High computation times.	The proposed hybrid framework consists of two separate lightweight end-to-end models: convolutional neural networks (CNNs) and attention-based XGBoosts. The two models operating independently to learn and classify features without requiring any special feature extraction procedures. A fine-grained classification of all attack categories is enabled by the application of this architecture, which improves detection accuracy.
Exploration and exploitation balance Inefficiency. Limited model performance. Premature convergence.	The advantages of lightweight versions enable effective decision-making for real-time IDS systems by maximizing the balance between exploration and exploitation without relying on excessive computational power. Multihead attention enables the IDS to simultaneously monitor multiple key features within network traffic sequences and capture the relationships between various attack types. By recognizing patterns such as DoS attacks occurring before Sinkhole attacks, along with intricate temporal correlations, the model is able to identify complex attack patterns. By modeling these interactions, the method enhances the accuracy of detecting binary and multilabel threats in WSNs. The CSO effectively tunes architecture parameters such as filter size, learning rate, and dropout rate.
Hyperparameter optimization Manual tuning is inefficient. Prone to suboptimal settings. High computation cost.	CSO improves overall optimization efficacy and convergence speed in a computationally efficient manner by reducing the number of search parameters. The Dual-Mode Search (Seeking and Tracing) of CSO simulates the behavior of cats and creates a dynamic balance that prevents local optima and premature convergence. The CSO is capable of adapting to the high-dimensional, noisy, and changing data patterns available in network traffic, as it dynamically modifies its search behavior based on the feedback received from the population.
Deployment scalability challenges Processing delays. Resource-intensive. Large-scale data processing delays.	Optimized Lightweight 2D-CNN is designed for low-latency processing and reduced computational overhead with minimal delay. An optimized attention-based XGBoost classifier is developed, utilizing regularized boosting and incremental training to handle evolving threat patterns.

Table 4. Summary of dataset characteristics.

Measurement	Value
Dataset size	14.4 MB
Number of normal samples	477,737 (39.29%)
Number of attack samples	738,153 (60.7%)
Total number of samples	1,215,890

Table 5. Definitions of various network attacks and scans in the 5G-NIDD dataset.

Attack Name	DDoS Attack Definition
ICMP Flood	An incredible number of ICMP packets are sent to overwhelm the target’s network resources.
UDP Flood	A large number of UDP packets are sent to random ports on a victim’s network.
SYN Flood	Numerous SYN requests are sent to a target’s server without actually completing the handshake.
HTTP Flood	An excessive number of HTTP requests are sent to a web server, attempting to exhaust the server’s resources to prevent legitimate users from accessing it.
Slowrate DoS/DoS	An attack that sends incomplete HTTP requests slowly to a server, gradually draining server resources and resulting in a denial of service.
SYN Scan	A process of sending SYN packets to identify open ports on a target machine without completing the TCP handshake.
TCP Connect Scan	A Full Port Scan utilizes TCP handshakes to determine which ports are open, providing detailed information about the target’s available services.
UDP Scan	A scan for identifying open UDP services on a target machine that sends UDP packets to a variety of ports.

Table 6. Key CSO features in IDS context.

Feature	Mathematical Component	Role in IDS Context
Dual-Mode Search	Seeking: softmax over sampled positions.Tracing: directional vector to global best.	Strives to balance developing new attack patterns and refining existing ones.
Adaptability	Mixture ratio (MR), Learning Decay $λ$ .	A flexible approach to adapting to changing cyber-threats and traffic patterns.
Efficiency	$O (N \cdot d)$ complexity for position updates and fitness evaluation.	It enables lightweight processing, making it suitable for real-time intrusion detection.

Table 7. Performance metrics for optimized 5G IDS.

Metric	Definition	Formula
Accuracy	Proportion of correctly identified instances (attacks and normal traffic) among all evaluated cases.	$\frac{T P + T N}{T P + T N + F P + F N}$
F1-score	Averages mean of precision and recall.	$2 \times \frac{Precision \times Recall}{Precision + Recall}$
Kappa (Cohen’s Kappa)	Measures agreement between predicted and actual classifications.	$κ = \frac{P_{o} - P_{e}}{1 - P_{e}}$
Specificity	Proportion of normal traffic correctly identified by the IDS.	$\frac{T N}{T N + F P}$
Sensitivity	Proportion of actual attacks correctly identified by the IDS.	$\frac{T P}{T P + F N}$
Precision	Proportion of correctly identified attacks among all instances identified as attacks by the IDS.	$\frac{T P}{T P + F P}$

Table 8. Performance of devised models for multilabel attack detection.

Model	Accuracy	Kappa	Loss	F1-Score	Sensitivity	Specificity
CSO-2D-CNN	99.97	99.9	0.00075	99.9	100	100
XGBoost-Attention	96.47	94.79	0.003	96.35	94	99
CSO-XGBoost-Attention	99.95	99.94	0.0008	99.96	100	100

Table 9. Classification report comparison for imbalanced multilabel attacks.

	CSO-XGBoost-Attention Model				CSO-2D-CNN Model
Class	Precision	Recall	F1-Score	Support	Precision	Recall	F1-Score	Support
Benign	1.00	1.00	1.00	95,406	1.00	1.00	1.00	95,406
HTTPFlood	1.00	1.00	1.00	28,108	1.00	1.00	1.00	28,108
ICMPFlood	0.99	1.00	1.00	216	1.00	1.00	1.00	216
SYNFlood	1.00	1.00	1.00	2032	1.00	1.00	1.00	2032
SYNSCan	1.00	1.00	1.00	4067	1.00	1.00	1.00	4067
SlowrateDoS	1.00	1.00	1.00	14,735	1.00	1.00	1.00	14,735
TCPConnectScan	1.00	1.00	1.00	4034	1.00	1.00	1.00	4034
UDPFlood	1.00	1.00	1.00	91,324	1.00	1.00	1.00	91,324
UDPScan	1.00	1.00	1.00	3211	1.00	1.00	1.00	3211
Accuracy			1.00	243,133			1.00	243,133
Macro Avg	1.00	1.00	1.00	243,133	1.00	1.00	1.00	243,133
Weighted Avg	1.00	1.00	1.00	243,133	1.00	1.00	1.00	243,133

Table 10. Performance of devised models for detecting malicious attacks versus benign attacks.

Model	Accuracy	Kappa	Loss	F1-Score	Sensitivity	Specificity
CSO-2D-CNN	99.99	99.94	0.000633	99.97	99.97	99.96
XGBoost-Attention	84.09	64.11	0.475	83.02	80	80
CSO-XGBoost-Attention	99.97	99.94	0.00013	99.97	99.97	99.97

Table 11. Combined classification report for binary attack classification.

	CSO-XGBoost-Attention Model				CSO-2D-CNN Model
	Precision	Recall	F1-Score	Support	Precision	Recall	F1-Score	Support
Benign	1.00	1.00	1.00	95,406	1.00	1.00	1.00	95,406
Malicious	1.00	1.00	1.00	147,727	1.00	1.00	1.00	147,727
Accuracy			1.00	243,133			1.00	243,133
Macro avg	1.00	1.00	1.00	243,133	1.00	1.00	1.00	243,133
Weighted avg	1.00	1.00	1.00	243,133	1.00	1.00	1.00	243,133

Table 12. The architecture training hyperparameters.

Parameters	Name	Value	Parameters	Name	Range	Value
CSO-2D-CNN Architecture			XGBoost-Attention Architecture
h1	Epoch	10	h12	colsample_bytree	[0.5, 1]	0.5
h2	Activation Fun	Relu	h13	Min weight_Child	[1, 30]	1
h3	Batch Size	16	h14	No. Of estimators	[1, 1000]	100
h4	Test Batch Size	16	h15	# of Classes	[2, 9]	[2, 9]
h5	Channels	3	h16	Max_tree	[1, 15]	6
h6	# of Classes	[2, 9]	h17	Learning Rate	[0.001, 1]	0.1
h7	Dropout Rate	0.5	h18	Optimization	CSO
h8	Learning Rate	0.01	h19	subsample	[0.5, 1]	0.976
h9	Optimization	CSO	h20	gamma	[0, 1]	0.0623
h10	Weight Decay	0.0001	h21	Max number_of_Leaf nodes	40
h11	Layers	11	h22	eval_metric	“logloss”

Table 13. Training time comparison (in seconds) for binary and multiclass/multilabel classification.

Model	Binary Classification (s)	Multiclass Classification (s)
CSO-2D-CNN	180	300
XGBoost-Attention	60	90
CSO-XGBoost-Attention	120	240

Table 14. Inference time per sample (in seconds) for binary and multiclass classification.

Model	Binary Classification (s)	Multiclass Classification (s)
CSO-2D-CNN	0.002	0.06
XGBoost-Attention	0.01	0.222
CSO-XGBoost-Attention	0.0001	0.008

Table 15. Performance metrics for models (%).

Model	Accuracy	Sensitivity	Specificity	F1-Score	Kappa
W/o CSO	99.95	100.0	100.0	99.96	99.9
W/o CSE	96.47	94.0	99.0	96.35	94.79
W/o WXG	97.0	97.0	97.0	97.0	96.2
W/o CSNN	99.97	100	100	99.9	99.9
W/o CSNNE	97.36	97.62	98.57	97.50	97.71
W/o Fusion	98.5	98.4	98.35	98.38	98.3

Table 16. Comparison of performances across different studies for binary and multilabel traffic analysis.

Study	Methodology	Target Traffic	Dataset	Acc	P	S	F1	K
Bouke [59]	Sequential-based DL	Legitimate vs. malicious	WUSTL EHMS2020	99%	99%	99%	99%	-
Idris [60]	GBM (XGBoost, LightGBM, CatBoost)	Legitimate vs. malicious	WUSTL EHMS2020	99.28%	99.46%	94.79%	97.07%	-
Jadav [61]	LSTM	Legitimate vs. malicious	WUSTL EHMS2020	93%	88%	100%	93%	-
Hadi [62]	Fusion of multitier (CNN, GAN, MLP)	Legitimate vs. malicious	5D-NIDD	99.15%	-	-	-	-
Bouke [63]	K-Nearest Neighbors	Legitimate vs. malicious	5G-NIDD	99.6%	99.7%	99.3%	99.5%	-
Hadi [62]	CNN	Legitimate vs. malicious	5D-NIDD	87.71%	-	-	-	-
Bouke [63]	Neural networks	Legitimate vs. malicious	5G-NIDD	98%	99.8%	95%	97.4	-
Dhanya [64]	Stacked autoencoder with GA-optimized XGBoost classifier	Legitimate vs. malicious	WUSTL EHMS2020	98.98%	93.02%	98.97%	95.91%	95.33%
Wang [65]	Incremental learning algorithms using SVM, logistic regression, and perceptron	Legitimate vs. malicious	5D-NIDD	99.10%	99.54%	99.42%	99.48%	-
Dhanya [64]	Stacked autoencoder with GA-optimized XGBoost classifier	Legitimate vs. malicious	Malware PE headers-Clamp	98.69%	99.79%	98.48%	99.13%	96.37%
Korba [66]	DNN with Open-Set Recognition	Multilabel	5D-NIDD	92.27	95.51	-	94.26	-
Farzaneh [67]	BiLSTM and CNN	Multilabel	5G-NIDD	93.36%	94.56%	92.18%	93.34%	-
	Optimized 2D-CNN	Legitimate vs. malicious		99.99%	99.97%	99.97%	99.97%	99.9%
Our study	Optimized XGBoost	Legitimate vs. malicious	5D-NIDD	99.97%	99.97%	99.97%	99.97%	99.94%
	Optimized 2D-CNN	Multilabel		99.97%	100%	100%	99.9%	99.9%
	Optimized XGBoost	Multilabel		99.95%	100%	100%	99.96%	99.94%

Acc: accuracy, P: precision, S: sensitivity, F1: F1-score, K: Cohen’s Kappa.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ismail, W.N. A Novel Metaheuristic-Based Methodology for Attack Detection in Wireless Communication Networks. Mathematics 2025, 13, 1736. https://doi.org/10.3390/math13111736

AMA Style

Ismail WN. A Novel Metaheuristic-Based Methodology for Attack Detection in Wireless Communication Networks. Mathematics. 2025; 13(11):1736. https://doi.org/10.3390/math13111736

Chicago/Turabian Style

Ismail, Walaa N. 2025. "A Novel Metaheuristic-Based Methodology for Attack Detection in Wireless Communication Networks" Mathematics 13, no. 11: 1736. https://doi.org/10.3390/math13111736

APA Style

Ismail, W. N. (2025). A Novel Metaheuristic-Based Methodology for Attack Detection in Wireless Communication Networks. Mathematics, 13(11), 1736. https://doi.org/10.3390/math13111736

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Metaheuristic-Based Methodology for Attack Detection in Wireless Communication Networks

Abstract

1. Introduction

2. Related Work

2.1. Machine Learning-Based IDSs

2.2. Optimization-Based IDSs

3. Background Challenges and Problem Definition

4. Proposed Intrusion Detection System Architecture

4.1. Data Collection and Preprocessing

4.2. Model #1: Convolutional-Based Neural Network for IDS

4.3. Model #2: Attention-Based Neural Network for IDS

4.4. Hybridization with Cat Swarm Optimization (CSO)

4.5. Performance Metrics

5. Results Presentation and Analysis

5.1. Multilabel Classification Performance

5.2. Binary Attacks Classification Performance

5.3. The Effect of Optimized Models on Trainable Parameters

5.4. Ablation Study

5.5. Comparison

6. Discussion

7. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI