1. Introduction
Wireless network transition from 4G to 5G, laying the groundwork for subsequent generations of wireless systems in the future, brings great enhancements in speed, lowering latency, and increasing potential to connect with billions of devices. These capabilities serve as the cornerstone for cutting-edge applications spanning the Internet of Things (IoT), smart cities, autonomous systems, and remote healthcare. However, the defining characteristics of next-generation networks, such as extensive connectivity, high heterogeneity of devices, and flexible architectures, also heighten their vulnerability. The broadened attack surface and constantly shifting threat landscape mean that traditional security approaches often struggle to counteract sophisticated cyberattacks meticulously tailored for 5G and beyond [
1,
2]. From critical healthcare networks to vehicular communications, the repercussions of a network breach can be grave, underscoring the necessity for more dynamic and proactive security mechanisms.
Machine learning (ML) has rapidly emerged as a compelling solution to enhance the detection, prevention, and mitigation of security threats in current and next-generation wireless networks. By scrutinizing voluminous and continuous streams of network data, ML-based systems can rapidly unearth anomalies and evolve to counter newly emerging threats [
3,
4]. Reinforcement learning models have demonstrated the ability to refine defense strategies on the fly, particularly in environments characterized by heterogeneous IoT devices and ever-shifting traffic patterns [
5]. Through adaptive learning, these systems can effectively allocate network resources to protect mission-critical areas, identify zero-day vulnerabilities, and respond quickly to malicious activity. Accompanying these technological advances, the regulatory environment has become a vital part in determining the deployment of ML-based security solutions.
The European Union Artificial Intelligence Act provides a harmonized regulatory framework for artificial intelligence across the EU, specifically in the classification of AI technologies according to the risk level associated with them. Given that the context of telecommunications is classified as a critical infrastructure, this fact could impose stringent demands on testing, explainability, and lifecycle management for ML-based security tools [
6]. Such obligations underscore the importance of transparency, accountability, and resilience in AI-driven solutions, particularly when they are responsible for safeguarding pivotal infrastructures.
In response to all these challenges, we introduce WIDA—a conceptual, ML-driven algorithm designed to provide adaptive anomaly detection—tailored to the unique demands of 5G and future 6G networks. Our approach aims to automate key stages of threat detection while ensuring compliance with regulatory standards such as the EU AI Act. Although currently presented as a prototype rather than a fully deployed system, WIDA demonstrates how surveyed ML techniques can be effectively integrated to bolster wireless network security.
The main novelty of this work is threefold: first, we propose an adaptive ML-driven security framework that aligns key 6G security drivers with a layered architecture, leveraging advanced ML techniques to address the unique challenges of current and next-generation wireless networks; second, we propose an optimized Wireless Intrusion Detection algorithm seamlessly integrated within this framework aimed at enhancing security. The third fold focuses on bridging the gap between cutting-edge technology and evolving legal frameworks, demonstrating how regulatory scrutiny influences the design and lifecycle management of ML-based security systems.
The structure of this paper is organized as follows:
Section 2 lays out the Architectures of 5G and 6G, providing a structural basis for subsequent discussions on intrusion detection. Also, it introduces a security framework developed for software-based infrastructures.
Section 3 delves into the machine learning techniques for intrusion detection, illustrating how models can be tailored to meet the high-speed, low-latency requirements of 5G and future 6G networks. Additionally, it proposes an optimized Wireless Intrusion Detection Algorithm specifically designed to meet the above stringent demands and evaluates its performance against existing ML-based approaches.
Section 4 explores real-time threat detection and response, emphasizing how ML can facilitate continuous monitoring and agile mitigation strategies.
Section 5 focuses on handling adversarial attacks, examining both the susceptibility of ML models and techniques to fortify them against manipulation.
Section 6 addresses privacy considerations, discussing how security solutions must align with data protection regulations and ethical concerns, particularly in light of the EU AI Act. Lastly,
Section 7 provides our conclusions, summarizing key findings, outlining limitations, and suggesting avenues for future research.
2. Architectures of 5G and 6G Networks
The evolution from fourth-generation (4G) Long-Term Evolution (LTE) systems to fifth-generation (5G) and eventually to the forthcoming 6G represents a paradigm shift in network design, operational flexibility, and service delivery. Whereas 4G focused primarily on increasing data rates and improving spectral efficiency, 5G introduces a broader transformation that leverages softwarization, virtualization, and distributed computing capabilities to support diverse use cases. Finally, 6G builds upon the advanced capabilities of 5G by further enhancing them, while also integrating emerging technologies such as AI-driven communication, advanced integrated sensing, and ubiquitous connectivity.
The use cases driving the evolution from 5G to 6G [
7], along with newly imposed characteristics, are depicted in
Figure 1.
Three use cases of 5G include enhanced mobile broadband (eMBB), ultra-reliable and low-latency communication (uRLLC), and massive machine-type communication (mMTC) [
8]. Looking ahead, the envisioned 6G is expected to build upon and extend the previous three categories—now referred to as Immersive Communication, Massive Communication, and Hyper-Reliable Low-Latency Communication (indicated by * in
Figure 1)—while integrating new technologies such as AI-driven communication, integrated sensing and communication, and ubiquitous connectivity. Notably, 5G and future 6G systems integrate three key principles—Software-Defined Networking (SDN), Network Function Virtualization (NFV), and edge computing—to deliver high-bandwidth, low-latency, and flexible network slicing for various vertical industries (e.g., Industry 4.0, autonomous vehicles, and telemedicine). While these technologies enable unprecedented levels of performance and tailoring of services, they also bring new security issues that need strong and flexible defense mechanisms. The subsections below detail each component of the architecture and give real case studies, leading to a discussion on securing future wireless networks.
▪ Software-Defined Networking (SDN): Software-Defined Networking decouples the control plane from the data plane, allowing centralized management of network elements and policies. This contrasts with conventional, tightly coupled hardware-based solutions in which configuration and operation reside within proprietary devices. By abstracting network intelligence into a logically centralized controller, SDN increases agility; therefore, network operators can dynamically configure traffic flow, enforce Quality of Service (QoS) policies, and deploy security rules in real time via programmable interfaces [
9].
However, SDN also introduces potential single points of failure; if an attacker compromises the SDN controller, they can gain sweeping control over the network, redirect traffic, or inject malicious flow rules. Consequently, rigorous security-by-design principles—such as controller redundancy, secure communication channels between the controller and switches, and continuous monitoring of controller activities—become essential [
10]. A 2020 proof-of-concept study demonstrated how a compromised SDN controller in a simulated 5G testbed could manipulate routing tables, leading to partial or complete Denial of Service (DoS) [
11].
The researchers implemented multi-controller architectures and controller-level intrusion detection modules to detect anomalous requests in real time, mitigating the issue. The findings highlight the need to integrate security features within the SDN orchestration layers more so as the 5G networks diverge and converge with the IoT devices that require near-zero latency. The authors in [
12] highlight key security challenges and examine how integrating SDN and NFV (discussed further in the next paragraph) can strengthen the security and adaptability of 6G networks. These findings collectively highlight that, while SDN enables centralized policy enforcement and rapid reconfiguration, its inherent vulnerabilities—such as susceptibility to controller attacks, scalability constraints, and reliance on secure communication channels—must be systematically addressed to ensure resilient and secure network deployments.
▪ Network Function Virtualization (NFV): Network Function Virtualization departs from the legacy model of deploying network services on vendor-specific, specialized hardware (e.g., physical firewalls, load balancers). Instead, NFV allows service providers to instantiate Virtual Network Functions (VNFs) on commodity hardware, thereby lowering capital and operational expenses while boosting service innovation [
13]. Common VNFs can include virtual firewalls (vFWs), virtual intrusion detection systems (vIDSs), virtual evolved packet cores (vEPCs), and other middleboxes that are crucial to mobile network operations.
Despite these advantages, NFV presents new security challenges. Malicious actors can exploit hypervisor-level vulnerabilities to escape virtual machine boundaries— commonly referred to as “VM escape” attacks—thus compromising co-resident VNFs on the same physical host. Additionally, misconfigurations in network slicing and resource allocation can cause inter-tenant interference, enabling attackers to eavesdrop on or disrupt neighboring slices [
14]. These challenges necessitate rigorous isolation mechanisms, secure VNF lifecycle management, and automated detection of anomalous VNF behaviors using ML techniques. A recent large-scale NFV deployment in a European telecom operator’s 5G infrastructure showed that flexible resource allocation helped meet diverse Quality of Service demands for enterprise and consumer applications [
15]. Nevertheless, continuous scanning detected hypervisor-level vulnerabilities that could have been exploited to intercept VNF traffic. As a response, the operator integrated a dynamic VNF placement algorithm combined with real-time anomaly detection, effectively segregating critical virtual functions from non-critical ones and reducing the potential attack surface. Therefore, NFV reduces hardware dependencies, simplifies network updates, and enables rapid deployment of virtual security tools. However, it introduces security risks such as hypervisor vulnerabilities, VM escape attacks, and threats from shared infrastructure. Its complexity demands strong isolation and continuous monitoring to prevent misconfigurations and inter-tenant breaches.
▪ Edge Computing: Edge computing (often referred to as multi-access edge computing, MEC) pushes computational and storage resources toward the network edge to minimize latency and optimize bandwidth usage [
16]. Rather than routing all data to centralized data centers, edge nodes—installed in base stations, local data centers, or on-premise facilities—process large volumes of data closer to end users. This architectural shift is particularly beneficial for latency-sensitive use cases such as virtual/augmented reality, autonomous driving, and mission-critical IoT operations. However, with resources distributed across numerous, geographically dispersed nodes, attackers gain multiple entry points to the broader network. These edge sites often have fewer on-site security measures compared to large central data centers [
17]. A compromised edge node could, in the worst case, enable lateral movement into the core network or facilitate the injection of false data into aggregated analytics streams. Securing edge infrastructures thus requires advanced intrusion detection and prevention strategies tailored especially for resource-constrained environments, along with strong encryption, robust access controls, and trusted hardware attestation to verify the integrity of edge devices.
A 2021 pilot project in a smart city environment illustrating the advantages and vulnerabilities of edge computing in 5G was presented in [
18]. The city deployed a network of edge servers to manage traffic lights and pedestrian safety systems in real time. Although the architecture delivered near-instantaneous responses, a security assessment revealed potential breaches through unpatched edge servers. In response to such threats, the city’s Information Technology department deployed an orchestration platform that ensures consistent monitoring of server software versions and machine learning-driven anomaly detection systems at each edge node. This case just goes to show the critical need for constant patch management, device attestation, and thorough threat detection across all edge locations. Therefore, despite the benefits of reduced latency and improved localized threat response, edge computing introduces vulnerabilities due to its distributed, resource-constrained nature. Challenges include inconsistent security enforcement, patching difficulties, and managing security across dispersed nodes.
A Comprehensive Security Framework for 6G Networks and Softwarized Infrastructures
Looking ahead to 6G and more advanced network paradigms, the trend toward highly distributed, dynamic, and softwarized infrastructures will intensify [
19]. Consequently, novel security frameworks are essential to address emerging threats in these evolving environments.
Figure 2 illustrates our proposed security framework drivers for 6G networks encompassing key characteristics for enhanced security.
More specifically, our framework includes the above key drivers:
▪ Network Slicing and Micro-Slicing: Building upon foundational 5G concepts, 6G introduces more granular network slicing—dynamically allocate network resources in smaller, context-specific segments—enabling the creation of highly specialized virtual networks tailored for specific applications and services, each with unique performance. This includes micro-slicing, which allows for dynamic and fine-grained resource allocation to meet diverse performance, security, and isolation requirements.
▪ AI/ML Integration for Self-Organizing Networks (SONs): Artificial intelligence (AI) and machine learning (ML) are central to 6G’s SONs, facilitating real-time analysis of network conditions. These technologies enable proactive fault management, dynamic resource orchestration, and rapid threat detection, ensuring optimal network performance and resilience. More specifically, machine learning algorithms greatly enhance network security through adaptive, real-time anomaly detection, capable of identifying previously unseen threats. However, these techniques are not without limitations, notably requiring extensive computational resources, potential susceptibility to adversarial attacks, and issues around model transparency and explainability, particularly relevant under the EU AI Act. Reinforcement learning enables dynamic threat mitigation strategies, adapting to evolving security landscapes. However, it faces challenges such as slow convergence times and the inherent exploration–exploitation trade-off that could delay effective threat response.
▪ Quantum-Safe Protocols and Encryption: As computing power grows, securing data against emerging quantum attacks becomes a priority, driving research into post-quantum cryptography [
20]. To counter this, future 6G networks have to adopt post-quantum cryptographic algorithms, such as those standardized by NIST in 2024, including CRYSTALS-Kyber and CRYSTALS-Dilithium [
21] to ensure long-term data security. Despite offering strong protection against quantum attacks, these protocols face practical challenges such as computational overhead, larger key sizes, and implementation complexity, especially in resource-constrained environments like IoT and edge computing.
▪ Ubiquitous Sensors and Massive IoT: The proliferation of IoT devices both in current 5G and emerging 6G networks [
22] amplifies data traffic and widens the attack surface. To mitigate associated risks, advanced security measures are implemented at both the edge and core network layers, including anomaly detection systems and robust authentication protocols.
These four pillars are explicitly mapped to a layered defense strategy, adapted from the security model and comparable characteristics proposed in [
22,
23].
Table 1 presents a comparative analysis of this work —structured around the aforementioned pillars—against the existing literature on ML-based security in 5G/6G networks. The comparison emphasizes scope, techniques, focus, regulatory insights, and unique contributions.
Afterwards, to operationalize these concepts, we propose a three-layered security architecture depicted in
Figure 3, incorporating state-of-the-art techniques such as intelligent reflecting surfaces, blockchain-anchored policy enforcement, quantum-safe APIs, and runtime AI-driven attestation.
Our proposed three-layered architecture reinforces channel-level confidentiality at the physical layer through beam-forming, intelligent reflecting surfaces, and physical key generation to neutralize eavesdropping and jamming. At the connection layer, we integrate SDN/NFV telemetry with blockchain-anchored policy enforcement to contain slice-to-slice spill-over and thwart signalling-plane DoS. Finally, the service layer is protected by quantum-safe, zero-trust APIs—interfaces that enforce verification of every interaction regardless of origin, based on strict identity and policy authentication—and AI-driven runtime attestation that mitigate deepfake, malware, and insider threats across XR, autonomous driving, and tactile internet workloads.
The key innovations of our approach compared to conventional machine learning intrusion detection and prevention systems (ML-IDPS) are as follows:
- ▪
Layer-Oriented Telemetry Mapping:
Directly binds physical-layer metrics (e.g., Received Signal Strength Indicator (RSSI), Channel Quality Indicator (CQI)) with SDN flow records to enhance context-aware anomaly detection.
- ▪
Shapley Additive Explanation (SHAP)-Driven Drift Monitoring at the Edge:
Incorporates SHAP to monitor model drift, allowing proactive alerts before performance degrades.
- ▪
Quantum-Safe APIs and Zero-Trust Enforcement:
Implements CRYSTALS-Kyber for key encapsulation in zero-trust API interactions, ensuring confidentiality and authentication by default.
- ▪
Blockchain-Anchored Model Provenance:
Leverages blockchain to secure ML model checkpoints, providing tamper-evident audit trails for every intrusion detection model update.
- ▪
Self-Adaptive Model Fine-Tuning:
Enables on-device model retraining triggered by drift thresholds, ensuring resilience against evolving attack vectors.
Therefore, safeguarding these dynamic environments requires adaptive security frameworks—such as the proposed one—leveraging ML-based intrusion detection and prevention systems. Such systems can learn from traffic patterns, identify anomalies, and autonomously reconfigure network defenses to counter evolving threats. Rigorous security testing, certification of SDN controllers and hypervisors, robust edge security protocols, and continuous monitoring of virtualized network slices will be critical [
22]. Therefore, the transformation from 4G to 5G—and eventually to 6G—is more than just a technological upgrade; it constitutes a reimagining of the fundamental networking paradigm. While softwarization, virtualization, and edge-centric computing have greatly improved performance and flexibility, they also expose new security vulnerabilities that lend more importance to real-time monitoring, automated threat response, and strong governance frameworks.
3. Machine Learning Techniques and Comparative Overview Against Established Intrusion Detection Methods
ML provides a feasible approach to the detection of complex intrusions and anomalies in wireless networks. ML, with its high-tech algorithms and feature engineering techniques, can improve the capability of IDS to detect and handle complex security problems effectively.
3.1. ML Algorithms for Detection
The effectiveness of ML in intrusion detection is largely attributed to its diverse algorithms, each suited to specific detection scenarios:
▪ Supervised Learning: Algorithms such as Random Forest and Gradient Boosting are employed to classify network traffic as benign or malicious based on labeled datasets. These models are trained on historical data to recognize patterns indicative of known attacks. For instance, a study on wireless sensor networks demonstrated the application of Logistic Regression, Decision Trees, and Gradient Boosting for detecting Denial of Service (DoS) attacks, achieving notable accuracy rates [
24].
▪ Unsupervised Learning: Techniques such as Isolation Forest and Autoencoders are in use for the detection of zero-day threats, as the lack of labeled data requires anomaly detection without established attack patterns. These methods are very good at detecting new attacks by learning what the normal behavior of the network is and alerting on anything that deviates from that norm.
▪ Deep Learning Architectures: Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs) are used to find complex temporal and spatial patterns in network traffic. These architectures prove to be remarkably effective in analyzing sequences and spatial hierarchies in data, which enables the detection of sophisticated attack strategies. A recent study proposed a Bagging-Deep Reinforcement Learning-based intrusion detection model that combines Multi-Layer Perceptron (MLP), CNN, and Optimized Recurrent Neural Networks (O-RNN) to enhance detection accuracy in Internet of Things (IoT) networks [
25].
Layer-oriented threat telemetry. Taking the layer perspective further, we map ML sensors to concrete data sources: (i) CSI snapshots and spectrum waterfalls feed physical-layer anomaly detectors; (ii) SDN-controller flow logs and slice-orchestration messages populate connection-layer models; (iii) API traces, HTTP/3-QUIC headers, and container events drive service-layer detectors. This mapping ensures that supervised or unsupervised models learn features that are both attack-relevant and layer-specific, an approach highlighted as a research gap in [
22].
Table 2 depicts a comparative overview of machine learning techniques used for intrusion detection and prevention in 5G/6G networks, outlining their purposes, advantages, and associated challenges.
3.1.1. Proposed Methodology and Corresponding Tool
ML provides a potent way of detecting both known and new security threats in wireless networks [
26,
27]. The initial step in any robust methodology is the collection of relevant data from wireless network traffic, log files, and management systems. Examples of such data include signal strength indicators (RSSIs), metadata related to packets, usage of protocols, and time intervals that reflect traffic patterns [
28]. Ensuring its quality and relevance is crucial, so any incomplete, duplicated, or corrupted records should be removed at this stage [
29]. Preprocessing often involves normalizing numeric features (e.g., with standard scaling) and, if necessary, using dimensionality reduction techniques like PCA or t-SNE to simplify high-dimensional inputs [
30,
31].
Feature engineering can either be expert-driven, focusing on domain-specific indicators such as packet types, signal strength fluctuations, and connection attempt frequencies, or automated through deep learning models that learn feature representations directly from the data [
32]. Once the features are defined, choosing the right ML algorithm depends on the data scenario and detection goals. Supervised methods like Random Forest or Gradient Boosting excel when representative labeled examples of normal and malicious behavior are available [
33]. Unsupervised and anomaly-detection approaches such as Isolation Forest or Autoencoders are valuable when labeled data is scarce, particularly for detecting zero-day or novel threats that deviate from expected behavior [
34,
35]. Deep learning architectures, including LSTMs for time-series data and CNNs for spatial or grid-like data, can be especially effective for complex patterns.
After selecting an algorithm, the ML model is trained and validated, often splitting the dataset into training, validation, and test sets to confirm effectiveness and avoid overfitting [
26,
27]. Hyperparameter tuning via methods like grid search or Bayesian optimization can further refine performance [
36]. Metrics such as precision, recall, the F1-score, or AUC–ROC curves are typically used to assess how well the model distinguishes between benign and malicious events [
37].
Once the performance is satisfactory, the model is ready for deployment in a real-time environment where it merges with network monitoring frameworks to classify or assess incoming traffic. Specific thresholds or anomaly scores may trigger automated notifications or actions, such as the isolation of potentially malicious endpoints [
38]. A continuous feedback loop, leveraging new data from detected anomalies or false alarms, helps refine the model and maintain its accuracy over time [
18,
27].
Building on this methodology, a conceptual open-source tool called WIDAT (Wireless Intrusion Detection & Analysis Toolkit) could automate many of these steps. WIDAT would ingest and preprocess raw wireless data, applying cleaning and normalization procedures before extracting or learning relevant features [
19]. It would support both conventional ML (with scikit-learn) and deep learning (with TensorFlow 2.19.0 or PyTorch 2.7.1) for training and validating various models [
39].
Once tuned, these models would be integrated into a real-time detection and alerting system that can interface with Security Information and Event Management (SIEM) platforms [
17]. A user-friendly dashboard would provide an overview of alerts, performance metrics, and historical patterns, enabling security teams to quickly respond to emerging threats while maintaining a historical record for analytical or forensic purposes [
28].
3.1.2. Optimized WIDAT-Based Algorithm
Below is a simplified proposed algorithm (WIDA) that demonstrates a workflow aligning with the WIDAT concept. This example uses dummy data for illustration. In a real scenario, we would replace the data generation step with actual wireless traffic input (for instance, reading from PCAP files or a real-time capture interface).
The synthetic dataset underpinning Algorithm 1 mimics the class imbalance (≈ 9% malicious flows) and feature dispersion of CICIDS2023 [
23] and 5G-CIC-IDS2024. Side-by-side benchmarking of WIDA with the original CICIDS2023 and 5G-CIC-IDS2024 traces forms part of our near-term work plan.
Algorithm 1: Wireless Intrusion Detection Algorithm |
Wireless Intrusion Detection Algorithm |
1: Setup Environment |
2: Import libraries: |
Numpy for numerical computations |
Pandas for data handling |
Standard Scaler for feature normalization |
Isolation Forest for anomaly detection |
Classification report for evaluating model |
3: Data Ingestion & Generation |
4: Set random seed for reproducibility |
5: Generate normal data with mean = 0, std = 1 |
6: Generate anomalous data with mean = 3, std = 1 |
7: Create labels: |
1 for normal samples |
−1 for anomalous samples |
8: Combine normal and anomalous data into a single dataset |
9: Shuffle combined dataset for randomness |
10: Create a Data Frame to store features and labels |
11: Data Preprocessing |
12: Remove missing or corrupted rows from dataset |
13: Split data into: |
X: Features |
Y: Labels |
14: Normalize features using StandardScaler |
15: Train-Test Split |
16: Split data into training (80%) and test (20%) sets |
17: Model Training |
18: Initialize IsolationForest with: |
n_estimators = 100 |
contamination = 0.1 |
19: Train model using training data |
20: Model Evaluation |
21: Predict labels for the test dataset |
22: Convert predictions to match label format (1 for normal, −1 for anomalies) |
23: Display classification report to evaluate: |
Precision |
Recall |
F1-score |
24: Real-Time Detection Simulation (Optional) |
25: Generate new incoming data to simulate real-time detection |
26: Normalize new data using fitted Standard Scaler |
27: Predict anomaly scores and labels for new data |
28: for each new sample |
29: if prediction = 1 then |
30: Print “Normal” with anomaly score |
31: else if prediction = −1 then |
32: Print “Anomalous” with anomaly score |
33: end |
34: end for |
Below, we present an example of the kind of performance statistics you might see after running the demo Isolation Forest script (with synthetic data). The script uses 500 standard and 50 anomalous samples, then splits them into 80% training and 20% testing data. Because the dataset is randomly generated at runtime, results can vary slightly each time you run it.
In
Table 3 below, we can see the sample results of a typical outcome regarding detection accuracy and classification performance.
Precision for anomalies is 57%, meaning that a little over half of the samples flagged as anomalous were truly anomalous. Recall for anomalies is 80%, indicating that the model detected 8 of the 10 anomalous samples in the test set. With an overall accuracy of roughly 93%, the Isolation-Forest still performs respectably, yet the balance between false positives and false negatives is less than ideal, highlighting scope for further tuning of the contamination factor, feature set, or ensemble size. The key observations are as follows:
▪ Moderate Accuracy (~93%):
Performance drops compared with earlier runs, underlining how sensitive results are to data splits and parameter choices.
▪ Contamination Mismatch:
Setting contamination = 0.15 (while the true anomaly rate is ≈ 9%) improves recall but lowers precision, illustrating the trade-off.
▪ Precision vs. Recall:
The current setting favours catching more anomalies (80% recall) at the expense of extra false alarms (57% precision).
▪ Result Variability:
Larger swings (±5–10%) are observed between runs because anomalies are now closer to the normal cluster.
▪ Scalability and Real-World Data:
Real wireless data—containing noise, diverse protocols, and varying signal strengths—needs more sophisticated preprocessing, richer features, or deep learning methods for automated feature extraction.
To further validate WIDA’s detection capabilities, we conducted synthetic scenario tests with multiple contamination levels and model configurations.
Table 4 below illustrates accuracy comparisons across three configurations:
Table 4.
Performance comparison of ML models in simulated WIDA scenarios.
Table 4.
Performance comparison of ML models in simulated WIDA scenarios.
Scenario | Model | Contamination | Precision (Anomalies) | Recall (Anomalies) | Overall Accuracy |
---|
A | Isolation Forest | 0.10 | 0.57 | 0.80 | 0.93 |
B | Autoencoder | N/A | 0.62 | 0.77 | 0.91 |
C | Random Forest | 0.05 | 0.85 | 0.72 | 0.94 |
From the above results, we observe the following:
▪ Isolation Forest (Scenario A): Achieves a balanced recall of 0.80, but with a moderate precision of 0.57, indicating good detection but with many false positives. This suits situations where missing anomalies are costlier than false alarms.
▪ Autoencoder (Scenario B): Delivers slightly precision (0.62) and good recall (0.77), though the overall accuracy is lower. This approach is appropriate for detecting unknown threats with minimal training data.
▪ Random Forest (Scenario C): Performs the highest precision (0.85) and accuracy (0.94), though with slightly lower recall (0.72). It is ideal for precision-critical environments but may miss some attacks.
While the data is synthetic, these findings underscore WIDA’s adaptability in diverse threat landscapes, with model selection and tuning influencing trade-offs between false positives and detection rates.
3.2. Feature Engineering and Preprocessing
The performance of ML models in intrusion detection is heavily influenced by the quality of input features. Effective feature engineering and preprocessing are crucial for reducing noise and improving model accuracy:
▪ Feature Selection: Identifying and selecting relevant features is essential to enhance model performance. Techniques such as Mutual Information and Recursive Feature Elimination are employed to select optimal features that influence classification results. A study on network intrusion detection utilized feature selection methods like SpiderMonkey, Principal Component Analysis, Information Gain, and Correlation Attribute Evaluation to select optimal features, resulting in improved accuracy.
▪ Data Normalization: Standardizing data ensures that features contribute equally to the model, preventing bias toward variables with larger scales. Normalization techniques, such as Min–Max scaling, are commonly applied to prepare data for ML models.
▪ Dimensionality Reduction: Reducing the number of features through methods like Principal Component Analysis (PCA) helps in mitigating the curse of dimensionality, thereby enhancing model efficiency and reducing overfitting. A study on network intrusion detection demonstrated the use of PCA for dimension reduction, which improved detection accuracy [
40].
Emerging research is focusing on automating feature extraction using deep learning techniques, thereby reducing the reliance on expert-driven feature engineering. The deep learning models, including but not limited to Autoencoders, have this extraordinary capability of learning hierarchical feature representations directly from the raw input data. This ability significantly helps in detecting the complicated and sophisticated patterns of attacks without any manual feature selection process, which is normally quite laborious and time-consuming. Moreover, this new approach has been experimented with in various studies and research works, and that too with great success, to show its huge potential in improving and enhancing the intrusion detection systems, which are very important for cybersecurity.
6. Privacy Considerations and Regulatory Alignment
Balancing effective security monitoring with user privacy has emerged as a central challenge for machine learning (ML)-based intrusion detection systems, especially in an era where regulatory frameworks and ethical considerations are evolving rapidly. As security solutions increasingly rely on large datasets to detect anomalies or malicious behaviors, they must do so in full compliance with data protection regulations, including the EU General Data Protection Regulation (GDPR) and the EU AI Act. These regulations place stringent requirements on data confidentiality and user consent, mandating that personal information be minimized, de-identified, or otherwise protected before any algorithmic processing occurs (Regulation (EU) 2016/679 and Regulation (EU) 2024/1689) [
53,
54]. Privacy-preserving methodologies, such as differential privacy and secure multiparty computation, have gained prominence for their ability to ensure that individual data points remain confidential while still allowing for robust pattern analysis and threat detection [
55]. In differential privacy, statistical noise is injected into the data analysis process to obscure individual contributions, thereby reducing the risk of re-identification or unauthorized disclosures [
56]. Meanwhile, secure multiparty computation enables multiple parties to jointly compute a function over their inputs without revealing those inputs to each other, preserving confidentiality even in collaborative threat intelligence settings [
57]. In addition to these privacy-enhancing techniques, federated learning has emerged as a promising paradigm for addressing ethical concerns around data centralization. Rather than collecting raw data in a single repository, federated learning trains models across distributed edge devices, allowing each participant to retain ownership of their data while contributing to a global model [
58]. This decentralized approach mitigates risks associated with large-scale data breaches and fosters greater trust among stakeholders who might otherwise be reluctant to share sensitive information. With the EU AI Act moving toward responsible AI development, such privacy considerations are going to be more imperative in aligning technical innovations with legal and societal expectations. Such commitment to transparent data governance, proactive privacy-risk assessments, and ethical impact reviews not only protects from potential reputational harm but also ensures effectiveness and compliance of machine-learning-based security solutions in increasingly dynamic regulatory environments [
59]. Privacy-preserving techniques like federated learning and differential privacy enhance compliance with the EU AI Act and GDPR, significantly reducing the risk of personal data exposure. However, implementing these techniques adds complexity, computational overhead, and can reduce model performance due to the introduction of statistical noise or decentralized training procedures.
In anticipation of the EU AI Act, WIDA has been designed to satisfy the five core obligations listed in Article 5 for high-risk AI systems. First, a continuous risk-management plan is realised through the edge-resident SHAP drift monitor, which triggers retraining whenever concept shift exceeds a preset threshold. Second, robust data-governance measures ensure that only anonymised traffic statistics leave tenant domains, with synthetic-to-real transfer protocols preventing leakage of personally identifiable information. Third, all releases are accompanied by full technical documentation—model cards, version history, and evaluation reports—hosted in a public repository. Fourth, the platform enforces human oversight via a “four-eyes” policy: every automated mitigation proposed by the IDS must be confirmed by a Security Operations Centre analyst. Finally, WIDA strengthens robustness and accuracy through blockchain-anchored model checkpoints and optional adversarial training on synthetically generated attack samples. Together, these design choices align the framework with both the letter and the spirit of emerging European AI regulation while preserving its operational effectiveness in 5G/6 G security scenarios.
7. Conclusions and Future Work
This paper has demonstrated that next-generation wireless networks, particularly 5G and the anticipated 6G, require proactive, adaptive, and intelligent security solutions. By proposing a comprehensive security framework integrating machine learning (ML) techniques—from anomaly detection and feature engineering to real-time threat response at the network edge—we have shown how advanced architectures—such as the proposed one—can significantly enhance wireless security. Core to our approach is thus the optimized Wireless Intrusion Detection Algorithm, best represented in the workflow of WIDAT, which reached an overall detection accuracy of approximately 93% against synthetic anomalies. This clearly shows the capability of ML in discovering subtle intrusion patterns in large heterogeneous streams of data; however, WIDA has not yet been benchmarked on live 5G base-station traces, which remains a key objective for future research. Also, we underlined how regulatory recommendations, among which the EU AI Act perhaps is a paradigm, are changing the way such ML-driven solutions are designed and deployed, imposing new constraints and possibilities due to transparency, explainability, and lifecycle management requirements. As wireless networks evolve into critical infrastructures, compliance becomes as important as technical performance, ensuring responsible AI adoption and public trust. In summary, while integrating technologies such as SDN, NFV, and edge computing alongside ML techniques vastly enhances the adaptive security capabilities of next-generation networks, each technology also introduces unique vulnerabilities and complexities. Effective security frameworks must balance these strengths and limitations, leveraging the agility and proactive capabilities of ML and virtualization, while rigorously mitigating potential single points of failure, hypervisor vulnerabilities, resource constraints, and complexity introduced by decentralized architectures. Future research will not only harden ML models against adversarial perturbations but also operationalise the roadmap set out in the latest 6G security survey: deploying quantum-safe cryptography in the physical and connection layers, enforcing zero-trust architectures in the service layer, and embedding policy-driven AI governance throughout the lifecycle. By following this layered, standard-aligned trajectory, we anticipate a measurable reduction in the mean-time-to-detect (MTTD) and mean-time-to-respond (MTTR) across heterogeneous 6G infrastructures.