Realistic Performance Assessment of Machine Learning Algorithms for 6G Network Slicing: A Dual-Methodology Approach with Explainable AI Integration

Sümeye Nur Karahan; Merve Güllü; Deniz Karhan; Sedat Çimen; Mustafa Serdar Osmanca; Necaattin Barışçı

doi:10.3390/electronics14193841

,

and

¹

R&D Department, Türk Telekom, 06103 Ankara, Türkiye

²

Department of Computer Engineering, Gazi University, 06560 Ankara, Türkiye

^*

Author to whom correspondence should be addressed.

Electronics2025, 14(19), 3841;https://doi.org/10.3390/electronics14193841

This article belongs to the Topic Advanced Array Signal Processing for B5G/6G: Models, Algorithms, and Applications

Version Notes

Order Reprints

Abstract

As 6G networks become increasingly complex and heterogeneous, effective classification of network slicing is essential for optimizing resources and managing quality of service. While recent advances demonstrate high accuracy under controlled laboratory conditions, a critical gap exists between algorithm performance evaluation under idealized conditions and their actual effectiveness in realistic deployment scenarios. This study presents a comprehensive comparative analysis of two distinct preprocessing methodologies for 6G network slicing classification: Pure Raw Data Analysis (PRDA) and Literature-Validated Realistic Transformations (LVRTs). We evaluate the impact of these strategies on algorithm performance, resilience characteristics, and practical deployment feasibility to bridge the laboratory–reality gap in 6G network optimization. Our experimental methodology involved testing eleven machine learning algorithms—including traditional ML, ensemble methods, and deep learning approaches—on a dataset comprising 10,000 network slicing samples (expanded to 21,033 through realistic transformations) across five network slice types. The LVRT methodology incorporates realistic operational impairments including market-driven class imbalance (9:1 ratio), multi-layer interference patterns, and systematic missing data reflecting authentic 6G deployment challenges. The experimental results revealed significant differences in algorithm behavior between the two preprocessing approaches. Under PRDA conditions, deep learning models achieved perfect accuracy (100% for CNN and FNN), while traditional algorithms ranged from 60.9% to 89.0%. However, LVRT results exposed dramatic performance variations, with accuracies spanning from 58.0% to 81.2%. Most significantly, we discovered that algorithms achieving excellent laboratory performance experience substantial degradation under realistic conditions, with CNNs showing an 18.8% accuracy loss (dropping from 100% to 81.2%), FNNs experiencing an 18.9% loss (declining from 100% to 81.1%), and Naive Bayes models suffering a 34.8% loss (falling from 89% to 58%). Conversely, SVM (RBF) and Logistic Regression demonstrated counter-intuitive resilience, improving by 14.1 and 10.3 percentage points, respectively, under operational stress, demonstrating superior adaptability to realistic network conditions. This study establishes a resilience-based classification framework enabling informed algorithm selection for diverse 6G deployment scenarios. Additionally, we introduce a comprehensive explainable artificial intelligence (XAI) framework using SHAP analysis to provide interpretable insights into algorithm decision-making processes. The XAI analysis reveals that Packet Loss Budget emerges as the dominant feature across all algorithms, while Slice Jitter and Slice Latency constitute secondary importance features. Cross-scenario interpretability consistency analysis demonstrates that CNN, LSTM, and Naive Bayes achieve perfect or near-perfect consistency scores (0.998–1.000), while SVM and Logistic Regression maintain high consistency (0.988–0.997), making them suitable for regulatory compliance scenarios. In contrast, XGBoost shows low consistency (0.106) despite high accuracy, requiring intensive monitoring for deployment. This research contributes essential insights for bridging the critical gap between algorithm development and deployment success in next-generation wireless networks, providing evidence-based guidelines for algorithm selection based on accuracy, resilience, and interpretability requirements. Our findings establish quantitative resilience boundaries: algorithms achieving >99% laboratory accuracy exhibit 58–81% performance under realistic conditions, with CNN and FNN maintaining the highest absolute accuracy (81.2% and 81.1%, respectively) despite experiencing significant degradation from laboratory conditions.

Keywords:

6G networks; network slicing; machine learning; computational complexity; data preprocessing; comparative analysis; algorithm resilience; realistic transformations; deployment optimization; performance evaluation

1. Introduction

With the rapid evolution of wireless communication technologies, the expectations surrounding sixth-generation (6G) networks are unprecedented. Sixth-generation networks differ fundamentally from fifth-generation (5G) systems. While 5G primarily focuses on enhanced mobile broadband and low-latency communications, 6G represents a paradigm shift. It is envisioned as a comprehensive ecosystem that integrates multiple infrastructure types: terrestrial-, aerial-, maritime-, and space-based networks [,,]. The driving force behind this evolution comes from emerging applications with unprecedented demands. Applications including extended reality (XR), brain–computer interfaces, holographic communication, autonomous vehicles, and the Internet of Robotic Things (IoRT) require capabilities that exceed 5G’s limitations. These applications demand multiple simultaneous requirements: extreme reliability, ultra-low latency, terabit-level data rates, artificial intelligence (AI)-driven orchestration, sustainable energy efficiency, and quantum-secure communication [,]. As a result, the advanced mobile broadband (eMBB), ultra-reliable low-latency communication (URLLC), and massive machine-type communication (mMTC) services offered by 5G may not fully address future requirements []. Therefore, 6G aims to meet these demands through a unified architecture. This architecture integrates three core technologies: communication, computing, and sensing. The entire system operates on a data-driven foundation supported by AI.

A critical technology enabling this transformation is network slicing. Network slicing was established during the 5G era but will play an even more critical role in 6G [,]. This technology works by creating virtualized end-to-end logical network slices on shared physical infrastructure. Each slice is specifically customized to meet the unique requirements of different application types. The key advantage is that multiple virtual networks can operate simultaneously on the same shared infrastructure, enabling efficient and dynamic resource allocation []. The result is maximized resource efficiency through isolation of critical data flows.

Sixth-generation network slicing encompasses five distinct service categories, each with unique characteristics and resource requirements. First, further enhanced mobile broadband (feMBB) delivers data speeds exceeding 1 Tbps for applications like holographic communication and 16K video streaming. Second, ultra-massive machine type communications (umMTC) supports hyper-dense IoT deployments with up to 10 million devices per square kilometer. Third, Mobile URLLC (mURLLC) enables mobility scenarios such as autonomous vehicle coordination and remote health monitoring. Fourth, extremely reliable low-latency communications (ERLLC) provides sub-microsecond latency and deterministic reliability for industrial automation. Finally, mobile broadband reliable low-latency communications (MBRLLC) serves applications requiring balanced multi-dimensional performance, including delivery drones and vehicle-to-everything (V2X) communication.

The methodological workflow of this study is illustrated in Figure 1. The framework begins with a standardized preprocessing phase on the initial laboratory dataset, involving feature selection and label encoding. Subsequently, the methodology bifurcates into two distinct analytical pathways designed to systematically evaluate algorithmic performance under contrasting conditions.

Figure 1. Overview of the dual-path methodological framework. The workflow contrasts the idealized PRDA pathway against the LVRT pathway, which simulates real-world operational stress, to enable a systematic analysis of model performance and interpretability.

The complexity of 6G networks makes traditional static approaches insufficient. The 6G network incorporates revolutionary components including communication–computing–sensing convergence, AI-based automation, and integration with satellite and airborne networks. Consequently, static slice definitions become inadequate. Network slicing must now address real-time challenges: classification, optimization, and prediction problems that evolve dynamically.

This complexity drives the need for intelligent solutions. ML-based approaches are emerging as essential tools for effectively managing this complex service ecosystem. AI-supported classifiers are recognized as a critical strategy for optimizing performance, managing heterogeneity, and overcoming dynamic network conditions in intelligent network slicing management and traffic routing to the appropriate slice [].

Current research demonstrates promising results in controlled environments. Studies in the literature report that models created using classical ML algorithms on synthetic datasets achieve accuracy rates of over 99%. Similarly, deep learning (DL)-based hybrid models are reported to achieve success rates of over 97% []. Furthermore, experiments conducted on 5G test networks have shown that systems automatically classify real traffic flows and assign them to appropriate slots, reducing packet loss and jitter, providing higher reliability, especially under heterogeneous traffic conditions []. However, a significant gap exists between the high accuracy achieved in laboratory conditions and the performance achieved in real-world scenarios. Most studies are limited to controlled simulation environments, where the same level of success cannot be guaranteed in real networks. Performance degradation is frequently observed under real traffic patterns, interference, and variable load conditions, posing a significant challenge for field applications. Consequently, developing models supported by transfer learning, domain adaptation, and realistic datasets emerges as a critical research direction for bridging the gap between laboratory and field conditions [].

This challenge leads to a fundamental research question. Given the increasing complexity of 6G networks and heterogeneous service requirements, network slicing and its specific subproblem slice classification become indispensable. While AI-based solutions show promising results in controlled environments, the critical question remains: how can these solutions work reliably and scalably in real-world deployments?

One of the fundamental issues is the methodological gap in data preprocessing, which significantly affects the robustness and generalizability of ML models for 6G network slicing. To address this challenge, we identify two distinct methodological approaches. The first approach, PRDA, maintains network data in its original laboratory-controlled form. It applies only basic normalization and feature selection. This approach preserves ideal statistical properties typically observed in controlled experimental environments, such as balanced class distributions, minimal measurement noise, and absence of missing values. The second approach, LVRT, applies comprehensive transformations to laboratory data. These transformations simulate authentic real-world network conditions based on empirical evidence. The approach incorporates empirically validated measurement uncertainty models, network congestion scenarios, equipment failures, and service adoption patterns. All transformations are grounded in extensive literature review and industry reports. While LVRT introduces complexity and potential performance degradation, it provides a more realistic evaluation of algorithm behavior under operational conditions.

The implications of this methodological choice are profound. This decision directly affects algorithm selection, deployment strategy, and performance expectations in practical 6G network implementations. PRDA may lead to overoptimistic performance predictions and inappropriate algorithm selection. In contrast, LVRT provides more conservative but realistic performance bounds that better reflect operational constraints and challenges.

This work presents the first systematic dual-methodology evaluation framework that bridges the critical gap between laboratory algorithm assessment and realistic 6G deployment performance. Unlike existing studies that rely solely on controlled synthetic datasets, our approach introduces LVRT incorporating authentic operational impairments including market-driven class imbalance, multi-layer interference patterns, and systematic missing data. This methodology reveals previously unknown algorithmic behavior patterns, demonstrating that certain algorithms exhibit substantial performance improvements under realistic conditions—a phenomenon invisible under traditional evaluation approaches.

Our research addresses this critical gap through concrete contributions. This paper makes the following key contributions:

Dual-Methodology Evaluation Framework: Introduction of a novel comparative framework combining PRDA and LVRT methodologies to systematically evaluate algorithm performance across laboratory and realistic deployment conditions;
Laboratory–Reality Performance Gap Analysis: First comprehensive demonstration that algorithms achieving >99% accuracy under controlled conditions exhibit 58–72% performance under realistic 6G deployment scenarios, revealing critical limitations in traditional evaluation approaches;
Counter-Intuitive Algorithm Behavior Discovery: Identification of algorithms (SVM RBF; Logistic Regression) that demonstrate performance improvements (14.0% and 10.1%, respectively) under realistic conditions, challenging fundamental assumptions about data quality relationships;
Algorithm Resilience Classification System: Development of a resilience-based categorization framework (Excellent, Good, Moderate, and Poor) based on performance degradation analysis, enabling informed algorithm selection for diverse 6G infrastructure scenarios;
Realistic 6G Network Simulation: Implementation of comprehensive operational impairments including market-driven class imbalance (9:1 ratio), multi-layer interference modeling, and systematic missing data patterns reflecting authentic 6G deployment challenges;
Practical Deployment Guidance: Evidence-based recommendations demonstrating that simple, stable algorithms often outperform sophisticated methods in operational environments, fundamentally altering 6G deployment strategy considerations;
Cross-Paradigm Comparative Analysis: Comprehensive evaluation of classical ML methods (e.g., SVM, Logistic Regression, and Random Forest) and DL models (e.g., CNNs and LSTMs) for 6G network slicing classification, highlighting their respective strengths, limitations, and deployment trade-offs under realistic conditions;
Explainable AI Integration: Application of XAI techniques, including SHAP, to interpret model decisions and feature importance, thereby enhancing transparency, enabling trust in algorithmic outputs, and providing actionable insights for network operators in critical 6G deployment scenarios.

The remainder of this paper is organized as follows. Section 2 provides the background and related work, including a detailed overview of the evolution of 6G network slicing and the role of XAI in telecommunications systems. Section 3 describes the methodology and dataset characteristics, elaborating on the research motivation, methodological philosophy, feature association analysis, dimensionality reduction, and the proposed dual-methodology evaluation framework, followed by the description of PRDA and LVRT implementations. Section 4 presents the experimental setup and model configuration, covering both traditional ML models and DL architectures. Section 5 reports and analyzes the experimental findings, including baseline results under PRDA, realistic performance under LVRT, training time and computational cost analysis, and interpretability insights through XAI. Section 6 concludes this paper with key findings, while Section 7 outlines promising future research directions.

2. Related Work and Background

Network slicing represents a transformative approach for addressing heterogeneous service requirements across different vertical sectors. The technology provides three key capabilities: guaranteed quality of service (QoS) parameters, dynamic resource allocation, and sector-specific security and privacy measures.

Research by [] identifies seven key vertical use cases for slicing: Smart Transportation, Military Services, Smart Education, Industry 4.0, Health 4.0, Agriculture 4.0, and Smart Cities. Each sector presents unique challenges and requirements. Smart Transportation demands ultra-low latency (approximately 1 ms) and high reliability for autonomous driving, while in-vehicle multimedia services prioritize high data rates and storage capacity. Military applications require high security, network isolation, and low latency as critical parameters. Smart Education scenarios, particularly augmented reality (AR)- and virtual reality (VR)-based learning environments, need high bandwidth and low latency combinations. Industry 4.0 applications span two categories: process automation and factory communication fall under URLLC, while IoT-intensive automation applications belong to mMTC. Health 4.0 encompasses diverse applications including tele-surgery, tele-consultation, and tele-monitoring, all requiring data security, ultra-low latency, and high reliability. Agriculture 4.0 focuses on precision farming, drone-based monitoring, and autonomous agricultural vehicles, emphasizing low-cost, energy-efficient, and long-battery-life IoT solutions. Smart Cities present the most complex requirements, encompassing large-scale IoT, intelligent transportation, energy management, and multimedia services, necessitating simultaneous management of eMBB, URLLC, and mMTC service types. Beyond conceptual categorizations, several studies have examined domain-specific slicing deployments. In vehicular communications, ref. [] demonstrates that autonomous driving URLLC and in-vehicle infotainment eMBB can be concurrently supported via logically isolated slices on shared infrastructure, meeting low-latency and high-throughput requirements under mobility constraints. For mission-critical domains, ref. [] proposes a comprehensive management framework with deployment options such as dedicated emergency slices, dynamic scaling under overload, and edge-based migration of critical functions, providing experimental evidence of service continuity for Public Protection and Disaster Relief (PPDR) use cases. These works illustrate how slicing principles translate into concrete vertical applications, reinforcing the practical relevance of the problem investigated in this paper.

These theoretical requirements have been validated through real-world implementations. Real-world applications demonstrate the practical success of 5G-based network slicing technology. Notable examples include autonomous driving slicing trials conducted in collaboration with Deutsche Telekom and BMW, Nokia–Finnish Defense Forces’ secure communication-focused work, AR/VR-based education scenarios under the South Korea Green School project, 5G-supported production lines at BMW Leipzig facilities, robot-assisted remote healthcare services in Wuhan hospitals during the COVID-19 pandemic, satellite-connected agricultural solutions through the John Deere–SpaceX partnership, and the Barcelona 5G Smart City project.

However, implementing these vertical scenarios presents significant technical challenges. Key challenges include ensuring slice isolation, meeting ultra-low latency and high reliability requirements simultaneously, protecting data security and privacy, providing low-cost and energy-efficient services in rural areas, and ensuring scalability under heavy traffic conditions. Consequently, solutions such as adaptive Service Function Chaining (SFC), software-defined networking (SDN), and network function virtualization (NFV)-based dynamic slice management are gaining prominence.

The architectural framework shown in Figure 2 provides the foundation for our analysis. This reference model aligns use cases with RAN/transport/core resources and slice categories (feMBB, mURLLC, ERLLC, MBRLLC, and umMTC). It demonstrates how QoS/QoE constraints propagate across layers and motivates the AI-driven orchestration mechanisms discussed in subsequent sections.

Figure 2. End-to-end network slicing reference model used throughout this paper. The Service Layer (right, top) exposes slice-oriented services (e.g., feMBB, mURLLC, ERLLC, MBRLLC, and umMTC) to verticals and virtual operators. The Network Function Layer orchestrates network operation and VNFs, while the Infrastructure Layer spans RAN, transport, and core with edge/core clouds. Arrows indicate control/data interactions governed by the Network Slice Controller across layers and domains.

A critical limitation emerges from current 5G implementations. While real-world 5G slicing implementations demonstrate sector-specific feasibility, they predominantly rely on static and predefined configurations. Meanwhile, 6G slicing demands a fundamental shift toward AI-driven automation and adaptability to manage complex, heterogeneous environments. This evolution makes ML and DL technologies indispensable for future network operations.

The transition to 6G introduces revolutionary capabilities. These 6G networks will integrate multiple revolutionary technologies: terahertz communication, AI-enabled autonomous network management, holographic communication, digital twin-based optimization, and non-terrestrial networks. These technologies will redefine network slicing through four core characteristics: enhanced QoS and Quality of Experience (QoE), improved energy efficiency, robust cyber–physical security, and autonomous self-configuration capabilities. The resulting 6G slicing paradigm will enable end-to-end network reconfiguration with AI support, real-time allocation of physical and virtual network resources, and integrated management of terrestrial-, aerial-, maritime-, and space-based communication layers. Consequently, the sector-focused network slicing applications developed during the 5G era will evolve into fully autonomous, hyper-connected, and cognitive network slicing paradigms.

2.1. 6G Network Slicing Evolution

Network slicing technology has undergone significant evolution since its inception.

The technology has progressed from initial conceptualization in 5G networks to advanced implementations envisioned for 6G systems []. Foundational research by Rost et al. established the fundamental principles of network slicing, demonstrating how virtualization technologies could enable multiple logical networks to coexist on shared physical infrastructure.

The current literature positions network slicing as a mature yet evolving technology. Network slicing is now recognized as a critical approach that has matured through SDN and network function virtualization (NFV) technologies beginning in 5G. This maturation enables multiple service types to be delivered as logically isolated slices on a single physical infrastructure [,]. Following standardization in 3GPP Release 15, the concept evolves toward greater complexity in 6G through integration with Space–Air–Ground Integrated Network (SAGIN) architectures, THz communication, edge–cloud integration, and AI-enabled automation [,]. Research has established comprehensive frameworks for slice management. Previous studies have defined three key phases: preparation, planning, and operation of slice management. These studies have also demonstrated the applicability of AI methods including Recurrent Neural Network (RNN), deep reinforcement learning, and multi-armed bandit algorithms in critical processes: resource reservation, service demand estimation, slice acceptance, and virtual network function (VNF) placement []. Complementing this work, the “slicing for AI” approach emphasizes the necessity of QoS-guaranteed customized network slices for AI services during data collection, model training, and inference phases []. In parallel, the concept of AI instances has been developed to enable the selection of different algorithms and training formats in a manner that is adaptable to network resources []. Despite significant progress, critical challenges remain unresolved. While previous research has highlighted the potential of this field, given 6G’s high dynamism, heterogeneous infrastructure, and new QoS dimensions, areas including joint optimization of planning and operations, distributed data management, and prediction-based slicing remain open research areas. Recent research has advanced the fundamental network slicing approaches developed for 5G networks by focusing on hybrid and AI-based techniques to meet more complex service requirements, particularly for 6G networks [,].

Contemporary research demonstrates significant advances in hybrid approaches. Dangi and Lalwani [] highlight the advantages of hybrid slicing techniques in determining the most suitable network slice for each incoming network traffic based on the device’s fundamental characteristics. They propose a hybrid DL architecture to address the slicing classification challenges in 6G networks. Their methodology involves comprehensive data preprocessing. They utilized the Unicauca IP Flow Version 2 dataset, which was relabeled taking into account device request types. Through this process, 78 application classes were reduced to five basic 6G slice categories (super-eMBB, massive-MTC, super-URLLC, super-precision, and super-immersive). The proposed architecture combines complementary neural network components. The CNN component performs automatic feature extraction, while the Bidirectional Long Short-Term Memory (BiLSTM) layer performs slice classification by considering time dependencies. Evaluation through stratified 10-fold cross-validation achieved 97.21% accuracy. The study’s notable contribution lies in combining high accuracy with low data dependency, enabling effective operation on small-scale datasets while achieving lower misclassification rates compared to existing methods. However, a single-dataset evaluation creates uncertainty regarding its generalizability under different network conditions, heterogeneous traffic dynamics, and real-world operational scenarios.

In parallel, Wang et al. [] present a comprehensive tutorial on AI-assisted 6G network slicing, emphasizing its dual role in network assurance and service provisioning. The authors identify six defining characteristics of 6G slicing—tailored service provisioning, efficient resource utilization, strict service isolation, seamless service coverage, technology convergence, and ubiquitous slice intelligence—and explore AI-driven solutions across RAN, transport, and core network domains, as well as slicing management systems and non-terrestrial extensions. They detail how reinforcement learning, DL, federated learning, and transfer learning can be applied to intelligent device association, elastic bandwidth scaling, VNF placement, anomaly detection, service level agreement (SLA) decomposition, and capacity forecasting. A case study on reinforecement learning (RL)-based elastic bandwidth scaling in elastic optical networks demonstrates superior request satisfaction rates and adaptability compared to shortest-path-based approaches. The study identifies open research challenges for 6G slicing, including rapid slice adjustment, multi-domain/multi-operator orchestration, security and reliability under AI-oriented threats, and standardized architectures. This work complements hybrid and data-driven approaches in the literature by providing a domain-spanning, AI-centered framework for evolving slicing strategies toward the complex requirements of 6G systems.

Alwakeel and Alnaim [] aim to develop an optimized, scalable, and secure network slicing framework for 6G-based smart city IoT applications. The study follows a systematic methodology that includes steps such as requirement analysis, definition of performance metrics, identification of constraints, mathematical modeling, and improvement of slicing configurations using metaheuristic optimization methods such as genetic algorithms and simulated annealing. The model’s performance was tested in a simulation environment based on latency, reliability, bandwidth, scalability, and security metrics during the evaluation process. However, the study did not use real-world data or open datasets, relying solely on simulation-based validation. Furthermore, this study did not integrate AI-based prediction, optimization, or decision-making techniques, which are frequently recommended in the literature. The lack of comprehensive testing for different IoT scenarios and real-time adaptation mechanisms is considered the primary factor limiting the practical applicability of the model.

Cunha et al. [] present a comprehensive review on enhancing network slicing security in 5G and emerging 6G networks by integrating ML, SDN, and NFV. The study highlights how ML can enable predictive threat detection and proactive security responses, while SDN and NFV facilitate agile, context-aware policy enforcement. Several AI-driven frameworks—such as Secure5G, DeepSecure, and Intelligence Slicing are surveyed, showing high accuracy in detecting distributed denial of service (DDoS), malware, and other attacks. The review identifies significant challenges, including slice isolation, inter-slice handover, policy translation in heterogeneous environments, and vulnerabilities introduced by AI/ML itself (e.g., adversarial attacks; false positives). Notable gaps include the need for standardized policy enforcement across multi-provider domains, lightweight yet practical AI models for real-time use, and robust privacy-preserving mechanisms. The authors conclude that a unified, scalable security strategy combining ML-based analytics with SDN/NFV’s orchestration capabilities is essential for safeguarding confidentiality, integrity, and availability in future network slicing deployments.

Mahesh et al. [] aim to design and evaluate a simulation-based framework for 6G network slicing using a K-D Tree-based handover algorithm. The study models three slice types, eMBB, URLLC, and mMTC, using hardcoded base station coordinates and randomly distributed client locations, measuring connection ratio, client count, latency, and slice usage. No real-world dataset is used; synthetic data is generated through simulation. Results show that URLLC achieves the lowest latency, eMBB exhibits the highest slice utilization, and mMTC usage is minimal, with overall stable latency even under sudden user base changes. The proposed design offers reusable, object-oriented modules for network operators to pre-assess slicing performance. However, it lacks integration with live network data, real-world traffic patterns, and advanced AI-driven dynamic slicing. The authors suggest future improvements using ML modules, dynamic load management, and more realistic input scenarios.

Wu et al. [] proposed an ML-based framework for 5G network slicing management that integrates traffic classification with dynamic slice resource allocation. The authors employed the Universidad Del Cauca network traffic dataset, comprising approximately 2.7 million records across 141 applications with 50 traffic features. They applied Random Forest, Gradient Boosting Decision Tree (GBDT), XGBoost, and k-Nearest Neighbors (KNN) classifiers to categorize flows into lightweight, hybrid, and heavyweight slices. Data preprocessing steps included removing non-essential identifiers and applying Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance. Experimental results showed that Random Forest and GBDT achieved the highest classification accuracy of 95.73%. Furthermore, GPU-based training reduced execution time by nearly a factor of ten compared to CPU-based implementations. The study also incorporated an auto-scaling mechanism to reallocate resources dynamically under varying traffic loads. Although the framework demonstrated high accuracy and computational efficiency in 5G slicing scenarios, it was validated solely on a single dataset in a simulated environment, leaving its performance in real-time 6G deployments and under heterogeneous network conditions to be explored.

However, 6G networks introduce additional complexity through support for new service categories and more stringent performance requirements.

Table 1 reveals a consistent pattern across recent studies. Recent research consistently shows that ML/DL-based models consistently achieve high classification accuracy (95–99%) when tested on controlled datasets. However, a critical limitation appears: most approaches rely on synthetic or simulation-based data rather than live network traffic, raising concerns about their robustness in heterogeneous and dynamic 6G environments. Furthermore, while GPU-accelerated implementations improve scalability, few works address cross-domain orchestration, real-time adaptability, or resilience under adversarial conditions. This gap motivates our research approach. The persistent gap between laboratory success and real-world deployment necessitates evaluation methodologies that incorporate realistic impairments, heterogeneous traffic conditions, and predictive optimization mechanisms. Our research directly addresses this need through systematic comparison of pure raw data analysis with literature-validated realistic transformations for 6G slicing applications [,]. Common trends include synthetic or lab-created datasets to represent slice scenarios (due to the novelty of 6G) and the pursuit of real-time intelligent slicing. However, a recurring limitation is the gap between simulation and reality. Many solutions have yet to be tested on actual networks or at scale, leaving open challenges in generalization to diverse, evolving 6G traffic patterns and integration with live network management. These studies collectively lay a foundation, and ongoing research is refining datasets and models to enable robust, AI-driven slice classification in practical 6G deployments []. However, no previous study has comprehensively compared pure raw data analysis and literature-validated realistic transformations specifically for 6G network slicing applications. This represents a significant gap in the literature that our study addresses through systematic experimental evaluation.

Table 1. ML/DL-based studies on 5G/6G network slicing classification.

2.2. XAI in Telecommunications

The telecommunications industry has increasingly embraced AI and ML to manage networks, optimize services, and automate operations. This shift has improved efficiency but also introduced complex ’black-box’ models that lack transparency []. In earlier generations, telecom operators relied on rule-based expert systems that could explicitly justify their reasoning. Now, with data-driven 5G/6G network intelligence, the decision-making process of AI can be opaque, raising concerns about trust, accountability, and compliance [,]. Unexplained or incorrect AI decisions can lead to costly downtime, degraded service quality, or even regulatory non-compliance. This has created a strong motivation to integrate XAI techniques that make model behavior interpretable and human-understandable [,]. Therefore, XAI plays a vital role in ensuring that AI-driven telecom systems remain trustworthy and transparent, allowing network engineers to validate automated decisions and maintain oversight in mission-critical environments.

XAI Approaches and Classification: XAI methods are broadly categorized into intrinsic (transparent) models and post hoc explanation techniques [,]. Intrinsic approaches use interpretable-by-design models such as decision trees or rule-based systems, providing step-by-step human-readable reasoning. Post hoc techniques, such as SHapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), explain complex models without altering them [,]. Post hoc methods can be model-agnostic (applicable to any ML model) or model-specific (leveraging internal structures such as attention weights in neural networks). Moreover, XAI can operate at a local level (explaining individual predictions) or a global level (summarizing the model’s overall decision logic). In telecommunications, balancing model complexity with interpretability is essential for maintaining both high performance and explainability.

XAI in Telecom—From Fault Diagnosis to Optimization: XAI applications span multiple telecommunications domains. Successful applications include network fault management and service quality assurance [,]. Fault management benefits significantly from explainable approaches. For anomaly detection and fault diagnosis, explainable models can highlight which key performance indicators (KPIs) contributed to an outage prediction, aiding root cause analysis. In customer churn prediction and QoS assurance, explainability helps identify the primary factors behind model decisions (e.g., bandwidth usage patterns or latency spikes), enabling proactive interventions and ensuring decisions are business-relevant.

Explainability for Network Slicing Classification: Network slicing is central to 5G/6G, allowing virtual networks with different QoS profiles. ML models are used to automate slice classification and allocation. XAI ensures that these decisions are transparent, revealing which features (e.g., latency, packet loss, and device type) drive classification outcomes []. This transparency builds operator trust, supports compliance with service-level agreements, and allows auditing for fairness or bias. For example, SHAP-based analysis can show that high packet delay and video traffic type were decisive in routing a flow to a low-latency slice.

Recent surveys further emphasize that in 6G network slicing, balancing high AI model performance with interpretability is a central challenge, particularly for ensuring fairness, compliance, and trust in automated orchestration []. Multiple studies have demonstrated the use of advanced XAI techniques such as SHAP, LIME, RuleFit, Partial Dependence Plots (PDPs), and Integrated Gradients (IGs) to identify key features influencing SLA violation predictions and network slice performance metrics. For example, kernel SHAP has been applied to Short-Term Resource Reservation (STRR) models to monitor real-time decisions, uncover global behavioral trends, and diagnose potential failures during model development. Federated learning has emerged as a complementary paradigm to XAI in this context, enabling collaborative model training without exposing raw data, thus maintaining privacy while improving interpretability. XAI methods integrated into federated learning such as SHAP, LIME, and IGs have been successfully used for latency KPI prediction, slice reconfiguration optimization, and root cause analysis of SLA violations. Furthermore, explanation-guided deep reinforcement learning (DRL) and MLOps frameworks, such as SliceOps, have enhanced automation in network slice resource allocation by feeding back explainer-derived confidence metrics into the optimization process, enabling closed-loop, adaptive, and transparent orchestration. These developments indicate that the synergy between XAI, federated learning, and DRL offers a pathway toward more reliable, explainable, and privacy-preserving network slicing in 6G, meeting both performance demands and regulatory requirements.

XAI in Network Resource Planning: Resource allocation in 5G/6G often involves deep reinforcement learning and multi-agent systems, which can be opaque. Explainable multi-agent resource allocation frameworks, such as the Prioritized Value-Decomposition Network (PVDN), decompose global decisions into interpretable contributions from each slice []. In simulations, such XAI-based approaches have achieved up to 67% throughput improvement and 35% latency reduction compared to baselines, showing that explainability can enhance both performance and trust.

XAI is a cornerstone for trustworthy AI adoption in telecommunications. From high-level network planning to real-time slicing, explainability bridges the gap between algorithmic complexity and operational transparency, enabling the safe and accountable deployment of AI in 6G networks.

3. Methodology and Dataset Characteristics

The evolution toward 6G networks demands sophisticated ML algorithms for real-time network slice classification in operational environments. While Botez et al. [] demonstrated over 99% accuracy under controlled laboratory conditions, a critical gap persists between idealized algorithm evaluation and realistic deployment performance. This study addresses this limitation through a dual-methodology framework comparing PRDA representing laboratory conditions with LVRT incorporating authentic network impairments.

3.1. Dual-Methodology Framework and Dataset Foundation

Our approach builds upon the 6G network slicing dataset (10,000 samples; 13 features) introduced by Botez et al. [], extending it through systematic refinement and realistic transformation processes. Figure 3 illustrates the complete methodology workflow encompassing feature optimization, dual-path evaluation, and comprehensive algorithm assessment.

Figure 3. Overview of the proposed methodology.

The evaluation encompasses five network slice types representing the complete 6G service spectrum: feMBB for ultra-high bandwidth applications (holographic communications, 16 K streaming), umMTC enabling hyper-dense IoT deployments (

10^{7}

devices/km²), mURLLC providing mission-critical connectivity (>99.999% reliability), ERLLC delivering sub-microsecond latency for precision applications, and MBRLLC supporting balanced multi-dimensional performance requirements.

3.2. Feature Optimization and Dimensionality Reduction

Prior to dual-methodology evaluation, we performed systematic feature refinement using statistical analysis to eliminate multicollinearity artifacts. Figure 4 demonstrates the deterministic relationship between slice types and use cases, confirming feature redundancy requiring elimination.

Figure 4. Slice type and use case type relationship demonstrating perfect correlation, justifying use case elimination as redundant feature.

Statistical analysis employing ANOVA (

η^{2}

) for numeric features and Cramér’s V for categorical variables revealed significant redundancy. Table 2 presents comprehensive results showing perfect correlations (

η^{2}

= 1.000 or Cramér’s V = 1.000) for budget-related features and configuration parameters. Systematic elimination of six redundant features yielded an optimized space of five variables: four independent numeric features (Transfer Rate, Latency, Packet Loss, and Jitter) and one categorical feature (Packet Loss Budget) with strong association (Cramér’s V = 0.791).

Table 2. Feature association analysis and selection strategy.

3.3. PRDA Implementation: Laboratory Baseline

The PRDA methodology establishes theoretical performance benchmarks through minimal preprocessing interventions:

D_{P R D A} = M_{m i n i m a l} (D_{o r i g i n a l})

(1)

where

M_{m i n i m a l}

encompasses correlation-based feature selection and label encoding. Label encoding maps categorical attributes to integer values, fitted exclusively on training data to prevent leakage while preserving structural characteristics. Figure 5 shows the approximate class balance maintained, with four slice types exhibiting near-uniform distribution (22.0–22.6%) and mURLLC showing lower representation (10.9%).

Figure 5. Original (percentages may not sum to 100% due to rounding) dataset network slice distribution.

3.4. LVRT Implementation: Realistic Operational Simulation

The LVRT methodology transforms laboratory data into realistic deployment benchmarks through systematic application of four sequential operators addressing operational challenges documented in the telecommunications literature:

D_{L V R T} = ϕ_{SMOTE} \circ ϕ_{missing} \circ ϕ_{noise} \circ ϕ_{distribution} (D_{original})

(2)

where the composition operator ∘ indicates sequential application from right to left.

3.4.1. Market-Driven Distribution Transformation

Real-world 6G networks exhibit severe class imbalance based on comprehensive industry projections from flagship research initiatives [,]. The realistic distribution applies market-driven allocation:

p_{realistic} = {[0.55, 0.18, 0.12, 0.089, 0.061]}^{T}

(3)

corresponding to [feMBB, umMTC, mURLLC, ERLLC, MBRLLC], reflecting feMBB dominance (55%) driven by bandwidth-intensive applications, umMTC expansion (18%) reflecting exponential IoT growth in smart city deployments, and specialized slice adoption patterns. This creates a 9:1 imbalance ratio representing critical deployment challenges that algorithms must handle effectively under operational stress.

3.4.2. Multi-Layer Noise Injection

Urban 6G deployments experience complex interference patterns requiring multi-layer modeling based on the established telecommunications literature [,] and 3GPP specifications [,]:

x_{noisy} [n] = (x [n] + w_{AWGN} [n] + w_{imp} [n]) \cdot g [n]

(4)

Layer 1 introduces additive white Gaussian noise (AWGN) representing measurement uncertainty with scenario-stratified SNR distributions following 3GPP urban deployment statistics: outdoor line-of-sight (18 ± 3.8 dB), outdoor non-line-of-sight (11 ± 4.1 dB), indoor hotspots (14 ± 4.5 dB), and vehicle penetration (−1 ± 3.2 dB), with scenario probabilities of 25%, 45%, 20%, and 10%, respectively. These empirically validated ranges reflect documented field measurements from operational 6G testbeds [].

Layer 2 adds impulsive interference modeling irregular electromagnetic interference from industrial equipment, with 5% occurrence probability reflecting documented interference patterns in dense urban environments []:

w_{imp} [n] = I [n] \cdot A [n] \cdot S [n]

(5)

I [n] \sim Bernoulli (0.05)

(6)

Layer 3 incorporates multiplicative gain variations representing system calibration drift and environmental adaptation, with 2% standard deviation reflecting typical RF front-end stability characteristics documented in wireless system analysis []:

g [n] \sim N (1.0, 0 . 02^{2})

(7)

3.4.3. Systematic Missing Data and Quality Control

Operational 6G networks exhibit systematic data quality degradation following documented failure patterns in telecommunications infrastructure [,]. We model four distinct failure modes representing sensor failures, network outages, measurement errors, and maintenance windows, generating the following expected missing data rate:

E [missing rate] = \sum_{i} P_{i} \cdot E [D_{i}] \approx 0.07 (7 %)

(8)

Quality control implements a two-stage process to maintain dataset utility while preserving realistic characteristics. The complete data transformation sequence proceeds as follows: Original laboratory dataset (10,000 samples) → Quality control removes samples with >20% missing values (7997 samples) → SMOTE augmentation balances class distribution (final: 21,033 samples). The first stage applies a threshold-based filter to prevent training on inadequate information, as formalized in Equation (9):

Remove sample i if \frac{\sum_{j = 1}^{d} 1 [x_{i, j} = NaN]}{d} > 0.20

(9)

The second stage addresses severe class imbalance through SMOTE [], generating synthetic samples using linear interpolation between existing minority class instances and their nearest neighbors (Equation (10)):

x_{synthetic} = x_{i} + λ \cdot (x_{neighbor} - x_{i}), λ \sim U [0, 1]

(10)

where

x_{i}

represents a randomly selected minority class instance,

x_{neighbor}

is one of its k nearest neighbors (k = 5), and

λ

is a random interpolation factor. This approach maintains underlying data distribution while providing sufficient training examples for minority classes, expanding the dataset to 21,033 samples. Figure 6 illustrates the resulting realistic service distribution.

Figure 6. Realistic dataset network slice distribution (percentages may not sum to 100% due to rounding).

This dual-methodology framework enables comprehensive algorithm assessment spanning idealized laboratory conditions (PRDA: 10,000 samples) to realistic operational challenges (LVRT: 21,033 samples), providing evidence-based guidance for algorithm selection across diverse 6G infrastructure scenarios.

4. Experimental Setup and Model Configuration

This study employs a multi-layered experimental framework to assess the reliability of ML and DL algorithms in the context of 6G network slicing. A total of eleven algorithms are implemented, covering both traditional ML approaches and DL architectures to evaluate trade-offs between interpretability, computational efficiency, and robustness to real-world conditions.

Two distinct experimental regimes are considered: PRDA represents a controlled baseline environment with minimal noise and well-structured data, while LVRT incorporates congestion modeling, multi-layer noise, missing data, and temporal correlation patterns, thereby mimicking actual 6G deployment conditions. This contrast allows analysis of model performance when transitioning from theoretical laboratory conditions to realistic field operations.

The dataset consists of 10,000 samples, stratified into 80% training and 20% testing sets. Class imbalance is handled using SMOTE (

k = 5

), feature selection relies on SelectKBest (f-classification, top 15 features), and features are normalized using Z-score standardization. Evaluation employs 5-fold repeated cross-validation with three repetitions to minimize variance, while SHAP-based XAI enables interpretable feature attribution across models.

4.1. Traditional ML Configurations

The traditional algorithms were tuned to balance accuracy with computational cost, as summarized in Table 3. Ensemble methods such as Random Forest, Gradient Boosting, and XGBoost use multiple trees to capture non-linear decision boundaries, while simpler models like Logistic Regression and Naive Bayes offer interpretability and computational efficiency. SVM employs an RBF kernel with probability estimation, while kNN uses distance-weighted voting. The configuration is designed to benchmark each model fairly under both PRDA and LVRT.

Table 3. Traditional ML model parameters.

4.1.1. Ensemble Learning Methods

Random Forest employs bootstrap aggregating and out-of-bag error estimation with decision trees as base learners, utilizing random feature selection at each split to minimize overfitting through ensemble aggregation (Equation (11)) []. Gradient Boosting constructs additive models sequentially by iteratively fitting new models to the negative gradient of the loss, controlling learning progression via a learning rate (Equation (12) []. XGBoost (version 3.0.5) integrates L1/L2 regularization with advanced tree pruning and parallel computation, enhancing efficiency and reducing overfitting risk through a regularized objective (Equation (13)) [].

{\hat{y}}_{i} = \frac{1}{B} \sum_{b = 1}^{B} T_{b} (x_{i})

(11)

where

T_{b} (x_{i})

denotes the prediction of the b-th decision tree for input instance

x_{i}

, and B is the total number of trees. The final prediction

{\hat{y}}_{i}

is obtained by averaging (for regression) or majority voting (for classification) across all base learners, thus reducing variance and mitigating overfitting.

{\hat{y}}_{i}^{(m)} = {\hat{y}}_{i}^{(m - 1)} + γ_{m} h_{m} (x_{i})

(12)

where

{\hat{y}}_{i}^{(m - 1)}

is the prediction up to stage

m - 1

,

h_{m} (x_{i})

is the newly added weak learner fitted to the negative gradient of the loss, and

γ_{m}

is the learning rate controlling the contribution of

h_{m}

. This sequential refinement allows the model to minimize the residual errors step by step, leading to improved generalization performance.

L (ϕ) = \sum_{i = 1}^{n} ℓ ({\hat{y}}_{i}, y_{i}) + \sum_{k = 1}^{K} Ω (f_{k}), Ω (f) = γ T + \frac{1}{2} λ {∥ w ∥}^{2}

(13)

where

ℓ ({\hat{y}}_{i}, y_{i})

denotes the loss between predicted value

{\hat{y}}_{i}

and true label

y_{i}

, and

Ω (f)

penalizes model complexity. Here, T is the number of leaves, w represents leaf weights,

γ

is the regularization parameter penalizing additional leaves, and

λ

controls

L_{2}

regularization on weights.

4.1.2. Linear and Probabilistic Models

Logistic Regression utilizes a sigmoid link for binary probability estimation with L2 regularization to enhance generalization in high-dimensional spaces (Equation (14)) []. Naive Bayes applies Bayes’ theorem under conditional independence and (for continuous features) Gaussian likelihood assumptions, computing class posteriors via multiplicative likelihood integration (Equation (15)) [].

P (y = 1 ∣ x) = \frac{1}{1 + exp (- (β_{0} + \sum_{j = 1}^{p} β_{j} x_{j}))}

(14)

where the probability of class

y = 1

is modeled using the sigmoid function applied to a linear combination of input features

x_{j}

with coefficients

β_{j}

.

\hat{y} = arg max_{c \in C} P (C = c) \prod_{i = 1}^{d} P (x_{i} ∣ C = c)

(15)

which predicts the class

\hat{y}

by maximizing the posterior probability under the assumption of conditional independence among features

x_{i}

.

4.1.3. Tree-Based and Instance-Based Methods

Decision Tree induces interpretable rule-based models via recursive binary splitting that minimizes CART’s Gini impurity (Equation (16)) []. Support Vector Machine employs an RBF kernel transformation to find maximum-margin separating hyperplanes in higher-dimensional spaces (Equation (17)) [,]. k-NN classifies instances through distance-weighted majority voting among the k nearest neighbors, adapting locally to data topology (Equation (18)) [].

G i n i (S) = 1 - \sum_{c = 1}^{C} p_{c}^{2}

(16)

where

p_{c}

is the class proportion of

y_{i}

in node S.

K (x_{i}, x_{j}) = exp (- γ {∥ x_{i} - x_{j} ∥}^{2})

(17)

which maps input pairs

(x_{i}, x_{j})

into a higher-dimensional space to separate

y_{i}

classes.

{\hat{y}}_{i} = arg max_{c \in C} \sum_{j \in N_{k} (x_{i})} w_{j} 1 [y_{j} = c]

(18)

where

{\hat{y}}_{i}

is assigned to the majority class among the k nearest neighbors

y_{j}

of

x_{i}

.

Table 3 summarizes the hyperparameter settings of traditional models. Ensemble methods such as Random Forest and XGBoost prioritize depth and parallelism to handle high-dimensional interactions, while Gradient Boosting incorporates a moderate learning rate for stability. Simpler models (Naive Bayes, Logistic Regression, and Decision Tree) operate under stricter assumptions but provide baselines for interpretability and low complexity. This parameterization ensures that differences in outcomes reflect algorithmic characteristics rather than suboptimal tuning.

4.2. DL Architectures

DL models are tuned for both spatial and temporal feature learning, incorporating regularization techniques and dynamic training adaptations. The architectures are designed to capture different aspects of feature relationships while maintaining computational efficiency through optimized training configurations as detailed in Table 4.

Table 4. DL architectures and training configurations.

4.2.1. CNN

CNN employs stacked one-dimensional convolutional layers to extract hierarchical local patterns from sequential inputs, with batch normalization to stabilize learning and dropout regularization to mitigate overfitting [,]. The convolutional layers extract local feature patterns (Equation (19)), which are subsequently aggregated by dense layers and transformed into final class probabilities through the softmax function (Equation (20)).

y_{c_{out}, t} = σ (\sum_{c_{in}} \sum_{m} w_{c_{out}, c_{in}, m} x_{c_{in}, t + m} + b_{c_{out}})

(19)

where

w_{c_{out}, c_{in}, m}

are the convolutional filter weights,

x_{c_{in}, t + m}

are the input features,

b_{c_{out}}

is the bias term, and

σ (\cdot)

is the activation function.

{\hat{y}}_{i} = softmax (W^{(out)} h^{(CNN)} + b^{(out)})

(20)

where

h^{(CNN)}

is the flattened representation obtained after convolution and pooling.

4.2.2. LSTM

LSTM overcomes the vanishing gradient problem inherent in standard RNNs by utilizing input, forget, and output gates that selectively retain, update, and transfer temporal information across extended sequences [,]. The gating mechanisms and state updates are defined in Equations (21) and (22), while the final prediction is obtained through the softmax transformation of the hidden state as shown in Equation (23).

i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i}), f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f}), o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(21)

the input gate of the LSTM is

i_{t}

, the forget gate is

f_{t}

, and the output gate is

o_{t}

, where

σ (\cdot)

is the sigmoid activation.

{\tilde{C}}_{t} = tanh (W_{C} [h_{t - 1}, x_{t}] + b_{C}), C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\tilde{C}}_{t}, h_{t} = o_{t} ⊙ tanh (C_{t})

(22)

the candidate cell state is

{\tilde{C}}_{t}

, the updated cell state is

C_{t}

, and the hidden state is

h_{t}

.

{\hat{y}}_{i} = softmax (W^{(out)} h_{t} + b^{(out)})

(23)

where

h_{t}

is the final hidden state at time t.

4.2.3. FNN

FNN consists of stacked dense layers with non-linear activation functions, enhanced by dropout regularization and batch normalization between layers to accelerate convergence and stabilize training dynamics [,]. The dense connectivity enables comprehensive feature interaction modeling as formulated in Equation (24), while the final prediction is obtained through the softmax transformation of the last hidden representation as shown in Equation (25). Regularization techniques are commonly applied to prevent overfitting in high-dimensional spaces.

h^{(l + 1)} = σ (W^{(l)} h^{(l)} + b^{(l)}), h^{(0)} = x_{i}

(24)

where

σ (\cdot)

is the activation function,

W^{(l)}

and

b^{(l)}

are the weights and biases of layer l, and

h^{(0)} = x_{i}

is the input vector of instance i.

{\hat{y}}_{i} = softmax (W^{(out)} h^{(L)} + b^{(out)})

(25)

where

h^{(L)}

is the last hidden representation.

Table 4 details the layer structures and training regimes of the three DL models. CNNs use progressively deeper convolutional layers to extract hierarchical features, followed by dense layers for classification. LSTM integrates two recurrent layers to capture temporal dependencies before transitioning to dense layers. The FNN employs multiple dense layers to exploit feature interactions directly. Training configurations are harmonized across models to enable fair comparisons.

4.3. Resilience Threshold Derivation and SLA Risk Analysis

The resilience classification framework establishes quantitative boundaries for algorithm performance degradation based on empirical analysis of 6G service level agreement (SLA) violation risks and operational deployment constraints. This systematic approach draws from ITU-R recommendations, 3GPP technical specifications, and comprehensive industry deployment studies to ensure practical relevance for operational 6G networks.

Modern 6G network slicing demands unprecedented reliability levels, with different service categories imposing distinct performance constraints that directly influence SLA compliance. Our resilience threshold calibration methodology aligns algorithm performance boundaries with documented SLA violation probability distributions, enabling evidence-based deployment decisions across diverse infrastructure scenarios.

The threshold derivation process considers multiple factors: measurement uncertainty inherent in 6G systems, acceptable service degradation boundaries for different application categories, monitoring resource requirements, and documented failure mode analysis from operational networks. Table 5 presents the comprehensive threshold framework linking performance degradation levels to SLA violation risks and deployment suitability.

Table 5. SLA-based resilience threshold derivation.

The Excellent resilience threshold (−5%) reflects inherent measurement uncertainty documented in 6G system specifications. According to 3GPP technical reports [], typical radio frequency measurement accuracy ranges from 3 to 5% under operational conditions, making performance variations within this range attributable to normal operational variance rather than algorithmic degradation. Algorithms maintaining performance within this bound demonstrate robustness characteristics suitable for ultra-reliable low-latency communication (URLLC) and extremely reliable low-latency communication (ERLLC) slices requiring 99.999% availability.

The Good resilience threshold (−5% to −10%) aligns with acceptable service quality boundaries established through extensive user experience studies in commercial deployments. Industry research [] demonstrates that user satisfaction remains within acceptable parameters for performance degradations up to 10% in bandwidth-intensive applications, making this threshold appropriate for further enhanced mobile broadband (feMBB) and ultra-massive machine type communication (umMTC) deployments where slight performance variations do not compromise core functionality.

The Moderate resilience threshold (−10% to −20%) identifies algorithms requiring enhanced monitoring infrastructure and potential redundancy mechanisms to maintain service continuity. This boundary reflects the transition point between acceptable and concerning performance levels documented in comprehensive network operator studies [], indicating that while deployment remains feasible, additional operational oversight becomes necessary to prevent service degradation.

The Poor resilience threshold (>−20%) establishes the boundary for unacceptable operational performance based on systematic SLA violation analysis []. Performance degradations exceeding 20% indicate fundamental algorithmic limitations that compromise service delivery reliability, requiring substantial improvements before consideration for production deployment in any operational scenario.

This resilience-based classification framework enables quantitative deployment risk assessment through the following mathematical relationship:

\begin{matrix} {Risk}_{deployment} & = f (Δ A, Service Category, Infrastructure Type) \end{matrix}

(26)

\begin{matrix} where f (Δ A) & = \{\begin{matrix} Minimal & if Δ A \geq - 5 % \\ Low & if - 10 % \leq Δ A < - 5 % \\ Moderate & if - 20 % \leq Δ A < - 10 % \\ High & if Δ A < - 20 % \end{matrix} \end{matrix}

(27)

where

Δ A

represents the relative accuracy change between controlled and realistic conditions, enabling systematic algorithm selection aligned with specific deployment scenarios, risk tolerance requirements, and operational infrastructure constraints. This framework provides network operators with quantitative guidance for evidence-based decision making in algorithm selection and deployment planning across diverse 6G infrastructure scenarios.

5. Experimental Results and Analysis

All experiments were conducted on Google Colab Pro+ platform (Ubuntu 22.04.4 LTS runtime environment) with high-RAM configuration (51.0 GB system RAM; GPU acceleration enabled), ensuring reproducible performance comparisons across algorithms while representing the typical cloud-based infrastructure available for 6G network operators.

5.1. Performance Analysis: PRDA vs. LVRT Comparison

The experimental evaluation reveals distinct algorithmic behaviors under controlled laboratory conditions PRDA versus realistic operational environments LVRT. Table 6 presents baseline performance under idealized conditions, where deep learning models achieve perfect accuracy (CNN, FNN: 100%) at significant computational cost (46–56 s training time), while traditional algorithms demonstrate varying effectiveness ranging from 60.9% (SVM) to 89.0% (Naive Bayes). It is important to note that performance changes are measured as degradation percentages; positive values indicate accuracy loss under realistic conditions, while negative values indicate counter-intuitive improvements.

Table 6. PRDA results (10,000 samples).

Table 7 demonstrates performance under realistic operational conditions incorporating network congestion, missing data, and multi-layer interference patterns. Deep learning models maintain superior performance (CNN: 81.2%; FNN: 81.1%) while traditional algorithms exhibit remarkable adaptability, with several showing counter-intuitive performance improvements under realistic conditions.

Table 7. LVRT results (21,033 samples).

To quantify algorithmic resilience, we define the relative accuracy change as follows:

Δ (A) = \frac{A_{P R D A} (A) - A_{L V R T} (A)}{A_{P R D A} (A)} \times 100 %

(28)

where

A_{P R D A} (A)

and

A_{L V R T} (A)

denote the accuracies of algorithm

A

under PRDA and LVRT, respectively. Negative values indicate performance improvement under realistic conditions, while positive values signify degradation. Here, positive

Δ (A)

values indicate performance degradation (e.g., CNN: +18.8% means accuracy decreased from 100% to 81.2%), and the negative values indicate performance improvement under realistic conditions (e.g., SVM: −14.1% means accuracy increased from 60.9% to 69.5%).

Figure 7 illustrates the performance transition patterns across all evaluated algorithms. To quantify algorithmic resilience, we define the relative accuracy change as shown in Equation (28), where negative values indicate performance improvement under realistic conditions.

Figure 7. PRDA vs. LVRT accuracy comparison across algorithms.

Table 8 reveals three distinct adaptation patterns:

Table 8. Performance transition (PRDA → LVRT) across algorithms.

Resilient: SVM (RBF) and Logistic Regression show performance improvements (−14.1% and −10.3%, respectively), demonstrating robustness under realistic conditions.
Stable: k-NN maintains consistent performance (+1.1%), showing minimal sensitivity to operational changes.
Degrading: Traditional ML algorithms (Naive Bayes +34.8%, ensemble methods +15–17%) and deep learning models (CNN +18.8%, FNN +18.9%) experience substantial performance losses under realistic conditions.

The performance degradation observed in ensemble methods and neural networks under LVRT conditions reflects challenges in handling realistic operational stress, where algorithms trained on clean data struggle with the complexity introduced by noise, class imbalance, and missing data patterns characteristic of operational 6G networks.

In the Table 8, the

Δ A

metric quantifies performance change as follows:

Δ A = \frac{A_{P R D A} - A_{L V R T}}{A_{P R D A}} \times 100 %

. Therefore, positive values (e.g., Naive Bayes +34.8%) represent accuracy degradation, while negative values (e.g., SVM −14.1%) indicate improved performance under realistic conditions.

5.2. Computational Efficiency and Deployment Constraints

The training time analysis reveals critical trade-offs for 6G edge deployment scenarios where computational resources impose strict operational constraints. Figure 8 demonstrates the computational cost spectrum across algorithm families, with lightweight models (Naive Bayes: 0.02 s) enabling real-time edge deployment while complex architectures (CNN: 52.36 s) require centralized training with inference model distribution.

Figure 8. Average training time comparison (log scale) between traditional ML and DL models.

Edge deployment constraints create a three-tier strategy: sub-0.1 s algorithms enable direct edge deployment for emergency slice reconfiguration, medium-complexity algorithms (0.2–15 s) support regional cloud deployment for near-real-time operations, while high-complexity algorithms require centralized training architectures. The 2600× speed advantage of Naive Bayes over CNN translates to proportional energy savings critical for sustainable 6G operations.

The analysis of computational complexity in Table 9 shows clear boundaries for the deployment of 6G network slicing algorithms. Linear complexity algorithms (Naive Bayes, k-NN) enable real-time edge deployment essential for ultra-low latency slicing scenarios [,]. Log-linear methods (Decision Tree, ensemble approaches) require regional cloud infrastructure [,,]. Quadratic complexity algorithms (SVM) exhibit prohibitive scaling, limiting deployment to offline scenarios []. Deep learning architectures maintain high training complexity with consistent inference requirements, necessitating centralized training with edge inference deployment [,]. This analysis establishes the boundaries of computational feasibility that are essential for selecting algorithms in resource-constrained 6G deployment scenarios, where real-time decision-making (with a latency of less than 100 ms for URLLC applications) is a fundamental operational requirement.

Table 9. Algorithm complexity.

The theoretical complexity bounds translate into concrete operational constraints when evaluated against our dataset parameters (

n = 10, 000

–

21, 033

;

d = 5

;

c = 5

). Linear complexity algorithms demonstrate manageable scaling. Naive Bayes processes

250, 000

operations (PRDA) expanding to

525, 825

operations (LVRT) with a proportional

2 \times

training time increase (

0.02

s→

0.04

s), while k-NN scales from

50, 000

to

105, 165

operations with a

2.8 \times

time increase (

0.06

s→

0.17

s). These remain within the

< 100

ms URLLC requirements for real-time edge deployment. Quadratic complexity algorithms reveal prohibitive scaling: SVM operations expand from 500 M to

2.2

B, manifesting as dramatic as a

14.7 \times

training time degradation (

35.49

s→

521.3

s). This exponential growth renders SVM incompatible with dynamic 6G slice reconfiguration requirements. Deep learning architectures process

1.34

B–

2.82

B operations but maintain stable 52–56 s training times due to fixed epoch limits, necessitating hybrid deployment with centralized training and edge inference. The analysis sets quantitative deployment boundaries: real-time slice adaptation can only be supported by sub-second algorithms (Naive Bayes; k-NN), while 6G’s millisecond decision requirements are incompatible with quadratic methods because they create computational bottlenecks. This numerical validation shows that operational constraints, rather than laboratory accuracy, should inform decisions on which algorithms to use for 6G.

5.3. Algorithm Resilience Validation

To validate the robustness of our findings across varying operational conditions, we conducted comprehensive sensitivity analysis under multiple urban 6G SNR scenarios reflecting documented deployment environments. The validation methodology employed three representative urban conditions: optimal urban environments (18–25 dB SNR) characteristic of well-planned metropolitan deployments, dense urban environments (8-15 dB SNR) representing challenging high-density scenarios, and extreme challenging conditions (0–8 dB SNR) simulating worst-case interference scenarios with significant electromagnetic interference from industrial equipment and competing wireless systems.

Table 10 demonstrates the resilience rankings remain highly stable across all SNR conditions, with Spearman’s correlation coefficient

ρ = 0.989

(p < 0.001) between optimal and challenging scenarios, providing strong evidence for the reliability of our resilience classifications across the complete spectrum of evaluated algorithms.

Table 10. Comprehensive algorithm performance across urban 6G SNR scenarios.

5.3.1. Statistical Validation of Resilience Classifications

Bootstrap confidence interval analysis (n = 1000 iterations) confirmed statistical significance of performance differences across all evaluated algorithms (p < 0.05). The analysis categorized algorithms based on their performance degradation under realistic operational conditions relative to laboratory benchmarks:

Excellent Resilience (3 algorithms): Mean degradation +8.2% ± 2.1%. These algorithms (SVM, Logistic Regression, and k-NN) demonstrate relatively stable performance under operational stress.
Good Resilience (2 algorithms): Mean degradation +18.9% ± 0.1%. This category includes CNN and FNN, showing manageable performance loss while maintaining high absolute accuracy levels suitable for high-performance applications.
Moderate Resilience (5 algorithms): Mean degradation +15.8% ± 1.2%. Enhanced LSTM, ensemble methods (Random Forest, XGBoost, and Gradient Boosting), and Decision Tree require careful deployment consideration with appropriate monitoring mechanisms for standard commercial slice deployments.
Poor Resilience (1 algorithm): Mean degradation +34.8%. Naive Bayes demonstrates substantial performance degradation unsuitable for production deployment without significant algorithmic modifications or restricted to research environments.

The one-sample t-test confirmed statistically significant overall performance variation (t = −4.139, p = 0.002), with Cohen’s effect sizes ranging from small (d = −4.41) to large (d = −146.39), demonstrating substantial heterogeneity in algorithmic responses to operational stress conditions. These findings provide empirical evidence for algorithm-specific resilience characteristics that must be considered in practical 6G network slice management decisions.

5.3.2. Mechanism Validation Through Empirical Analysis

To validate the three proposed improvement mechanisms underlying the resilience phenomenon, we conducted systematic ablation studies isolating individual transformation components. Each mechanism was tested independently to establish causal relationships between specific environmental factors and algorithmic performance changes.

SMOTE Regularization Effect: Controlled experiments comparing identical datasets with and without SMOTE augmentation revealed differential algorithmic responses to synthetic minority class generation. Linear algorithms (SVM; Logistic Regression) showed average performance improvements of 8–12%, demonstrating enhanced decision boundary robustness. Tree-based methods exhibited mixed responses, with ensemble methods showing modest improvements (2-4%) while simple Decision Trees remained relatively stable.

Noise-Induced Regularization: Systematic noise injection experiments across controlled SNR ranges demonstrated that linear algorithms with appropriate regularization (SVM; Logistic Regression) improved generalization capability under moderate noise conditions by 6–10%. Neural networks exhibited performance degradation of 18–19%, while ensemble methods showed moderate degradation of 15–17%. This differential response supports the regularization hypothesis that controlled noise prevents overfitting for algorithms with appropriate inductive biases.

Class Imbalance Adaptation: Analysis of algorithmic responses to realistic class imbalance patterns (9:1 ratio) revealed that certain algorithms exploit structured imbalance more effectively than balanced distributions. Linear classifiers demonstrated superior adaptation to minority class detection, while tree-based methods struggled with extreme imbalance despite SMOTE augmentation.

These empirical validations provide mechanistic evidence that performance changes under realistic conditions reflect fundamental algorithmic characteristics rather than random degradation. Linear algorithms with strong regularization demonstrate superior resilience, while complex ensemble methods and neural networks show vulnerability to operational stress despite higher laboratory performance.

5.4. XAI Analysis and Algorithm Interpretability

The deployment of ML algorithms in mission-critical 6G network slicing requires transparent and interpretable decision-making processes to ensure regulatory compliance and operational confidence. This section presents a comprehensive explainability analysis using the SHAP (SHapley Additive exPlanations) framework to quantify feature importance patterns and establish deployment guidelines based on interpretability requirements.

5.4.1. SHAP-Based Feature Importance Framework

The deployment of ML algorithms in mission-critical 6G network slicing necessitates transparent decision-making processes to ensure regulatory compliance and operational confidence. We employ algorithm-specific SHAP explainers tailored to model characteristics: TreeExplainer for ensemble methods, LinearExplainer for Logistic Regression, and KernelExplainer for SVM and k-NN architectures.

To validate interpretation robustness, we applied three complementary explainability methods to the k-NN classifier. Figure 9 demonstrates remarkable consistency across SHAP, LIME, and Permutation Importance methods, with Packet Loss Budget dominating feature importance rankings (0.15–0.30 across methods) in both operational scenarios. This cross-method agreement validates that our interpretations reflect genuine algorithmic behavior rather than method-specific artifacts.

Figure 9. Global explainability comparison for the k-NN model.

The analysis reveals cross-scenario variability in algorithmic focus patterns. Figure 10 illustrates how Decision Tree models adapt their feature emphasis between datasets, with PRDA models prioritizing Slice Jitter (0.320) and Slice Latency (0.298), while LVRT models demonstrate increased reliance on Packet Loss Budget (0.332), reflecting operational stress adaptation mechanisms.

Figure 10. Comparison of SHAP-based feature importance between PRDA (left) and LVRT (right) Decision Tree models.

5.4.2. Cross-Scenario Consistency Analysis

Cross-scenario consistency analysis quantifies explanation stability between PRDA and LVRT datasets using correlation coefficients of feature importance rankings. The comprehensive evaluation presented in Table 11 reveals distinct interpretability profiles that directly impact deployment suitability across diverse 6G infrastructure scenarios.

Table 11. Algorithm interpretability consistency analysis.

Neural network architectures demonstrate exceptional interpretability stability, with CNN and LSTM achieving perfect consistency scores (1.000) as shown in the table. Classical algorithms exhibit similarly robust behavior, with Naive Bayes (0.998), SVM (0.997), and Logistic Regression (0.988) maintaining minimal explanation variance across operational conditions. These high-consistency algorithms provide reliable feature importance rankings essential for regulatory compliance scenarios.

Conversely, XGBoost presents significant interpretability challenges despite competitive accuracy performance, exhibiting the lowest consistency score (0.106) in our analysis. This finding indicates that high predictive performance does not guarantee explanation stability, necessitating careful consideration of interpretability requirements in algorithm selection decisions.

5.4.3. Algorithm Selection Guidelines for 6G Deployment

Local explanation analysis provides concrete insights into algorithmic decision-making under varying operational conditions. Table 12 and Table 13 present detailed misclassification analysis under contrasting SNR environments, demonstrating how operational stress fundamentally alters feature attribution patterns.

Table 12. Local explanation for high SNR misclassification case (k-NN, Sample #0).

Table 13. Local explanation for low SNR misclassification case (k-NN, Sample #0).

Under high SNR conditions (15–25 dB), as detailed in Table 12, moderate latency elevation (4.19M ns) generates positive SHAP contributions (+0.31) suggesting similarity-based classification mechanisms. In contrast, Table 13 reveals that low SNR environments (−10 to +5 dB) produce extreme latency values (38.0M ns) triggering strong negative SHAP contributions (−0.94), indicating a shift from pattern recognition to outlier-based rejection under operational stress.

The analysis establishes evidence-based deployment guidelines linking interpretability characteristics with operational requirements. Mission-critical infrastructure benefits from CNN and FNN architectures that combine high accuracy (>0.81) with perfect consistency scores (1.000). Regulatory compliance scenarios are optimally served by classical algorithms maintaining high consistency while providing audit-ready explanation generation. Performance-focused deployments may utilize XGBoost despite interpretability limitations, provided intensive explanation monitoring compensates for consistency challenges.

5.4.4. Deployment Guidelines and Operational Monitoring Framework

The interpretability analysis establishes evidence-based deployment guidelines that integrate performance characteristics, explanation stability, and operational monitoring requirements across diverse 6G network slicing scenarios. This comprehensive framework enables algorithm selection tailored to specific operational contexts where transparency demands vary significantly based on regulatory requirements and mission-critical considerations.

Neural network architectures demonstrate exceptional suitability for mission-critical infrastructure deployment, where both high performance and explanation stability are paramount. CNN and FNN achieve superior accuracy levels (0.812 and 0.811, respectively) while maintaining perfect or near-perfect consistency scores (1.000 and 0.981), ensuring reliable interpretability across varying operational conditions. LSTM architectures provide specialized capabilities for temporal pattern analysis in dynamic slicing scenarios, offering perfect consistency (1.000) despite moderate accuracy performance (0.748).

Regulatory compliance scenarios benefit from classical algorithms that prioritize explanation stability over absolute performance metrics. SVM and Logistic Regression maintain exceptional consistency scores (0.997 and 0.988, respectively), providing audit-ready explanation generation essential for regulatory documentation and compliance verification processes. Performance-focused applications may utilize XGBoost despite its interpretability limitations (0.106 consistency), provided comprehensive monitoring compensates for explanation instability while leveraging superior predictive capabilities (0.726 accuracy).

Table 14 synthesizes these findings into practical deployment recommendations, revealing systematic trade-off patterns between algorithmic complexity and explanation requirements.

Table 14. XAI-based algorithm deployment guide.

The feature importance analysis establishes a hierarchical monitoring framework that prioritizes network parameters based on their demonstrated impact across algorithmic decisions. Packet Loss Budget emerges as the primary monitoring priority, demonstrating consistent dominance across all algorithm categories with feature importance values ranging from 0.15 to 0.30. This parameter requires continuous real-time monitoring with automated anomaly detection capabilities to ensure service continuity and prevent network degradation.

Secondary monitoring priorities encompass Slice Jitter and Latency parameters, which exhibit substantial influence in traditional ML algorithms and require periodic assessment with comprehensive trend analysis capabilities. These timing-critical parameters demand hourly monitoring cycles with weekly threshold reviews to maintain optimal network performance. Contextual parameters including Transmission Rate and Packet Loss constitute the tertiary monitoring tier, requiring monthly trend analysis integrated with quarterly capacity assessments for long-term infrastructure optimization.

This integrated framework creates a comprehensive three-tiered monitoring system that scales intensity according to feature criticality while maintaining service level agreement compliance across all network slicing configurations.

6. Conclusions

This study presents a dual-methodology evaluation framework that comprehensively assesses ML algorithms for 6G network slicing, effectively bridging the gap between laboratory performance and real-world deployment effectiveness. Through systematic comparison of PRDA and LVRT evaluation approaches, we reveal the limitations of conventional laboratory-based assessments in predicting practical performance, emphasizing the necessity of realistic evaluation strategies for operational 6G networks. The investigation contributes to the field in three key ways. First, it quantifies the discrepancy between laboratory and real-world performance, revealing that algorithms achieving perfect accuracy (100% for CNN and FNN) under controlled conditions experience substantial degradation in realistic network environments, with LVRT accuracies ranging from 58.0% to 81.2%. Second, it reveals counter-intuitive patterns of resilience where sophisticated algorithms perform significantly worse (CNN: 18.8 percentage points; FNN: 18.9 points; Naive Bayes: 34.8 points), while simpler classifiers show unexpected improvements (SVM: 14.1 points; Logistic Regression: 10.3 points) when subjected to realistic operational stress. This challenges the notion that algorithms are inherently complex and that data quality is always poor. It demonstrates that certain algorithms can adaptively exploit structural patterns in realistic conditions. Third, it proposes a resilience-based categorization framework for evidence-driven algorithm selection that prioritizes operational stability over idealized laboratory benchmarks, providing network operators with actionable guidance balancing accuracy, computational efficiency, and deployment reliability. The XAI analysis strengthens practical implications through SHAP-based interpretability assessment, identifying Packet Loss Budget as the dominant feature across all algorithms, with Slice Jitter and Slice Latency as secondary factors. This establishes clear monitoring priorities for operational deployment. Neural networks (CNN: 1.000; LSTM: 1.000) and classical algorithms (SVM: 0.997; Logistic Regression: 0.988) exhibit exceptional interpretability consistency suitable for regulatory compliance scenarios. Cross-scenario analysis confirms that algorithm selection must balance accuracy and explanation stability, with high-consistency algorithms proving suitable for mission-critical deployments, while low-consistency algorithms like XGBoost (0.106) require intensive monitoring despite superior accuracy performance. The framework’s enhancement will be achieved through dataset diversity expansion across geographic regions and infrastructure types, incorporation of emerging architectures beyond CNN, FNN, and LSTM, investigation of temporal dynamics in adaptive algorithms, and validation through real-time deployment testing in operational networks. Integration of federated learning approaches with industry standardization efforts has potential to enhance practical adoption and scalability. This comprehensive evaluation methodology ensures regulatory compliance and enhances the reliability and efficiency of mission-critical 6G deployments. While our analysis is based on a single dataset of 10,000 samples, the methodological framework itself—comparing PRDA laboratory conditions with LVRT realistic transformations—provides a dataset-agnostic approach applicable across diverse 6G scenarios. The framework’s value lies not in absolute performance metrics but in revealing systematic algorithmic behavior patterns under operational stress, establishing evidence-based guidelines for deployment decisions that balance accuracy (58.0–81.2% under realistic conditions), computational efficiency (0.04 s–789.5 s training times), and interpretability consistency (0.106–1.000 scores). Future validation across heterogeneous network environments will further strengthen the generalizability of these resilience-based insights, ultimately establishing a new standard for realistic ML algorithm assessment in next-generation wireless networks.

7. Future Work

While this study establishes a robust framework for evaluating 6G network slice classification algorithms under realistic conditions, it should be noted that several avenues remain for further exploration. Subsequent research will concentrate on validating the findings on heterogeneous, real-world datasets spanning multiple network domains in order to assess the generalizability of resilience-based insights. The investigation will encompass advanced DL architectures, including Graph Neural Networks for topology-aware modeling, Transformers for capturing long-range temporal dependencies, and hybrid CNN-LSTM models for spatio-temporal feature extraction. The development of online and continual learning algorithms will facilitate the adaptation of models to dynamic network conditions and enable the acquisition of new slice types without the occurrence of catastrophic forgetting. In order to enhance the robustness of the system while preserving data privacy, the implementation of federated learning frameworks for collaborative model training across multiple operators is proposed. The utilization of XAI-driven insights will facilitate the proposal of minimal yet sufficient feature sets for AI-based slice management, thereby contributing to standardization efforts for interoperable multi-vendor orchestration. Furthermore, the LVRT framework should be extended to capture rare catastrophic events and unpredictable anomalies beyond systematic operational impairments. Future work will incorporate stochastic extreme-event modeling, adversarial stress testing, and heavy-tailed distribution analysis to ensure algorithmic resilience in mission-critical slices (mURLLC; ERLLC) under low-probability–high-impact scenarios. This extension will complement current systematic stress testing with comprehensive anomaly detection capabilities essential for ultra-reliable 6G deployments. Finally, the most resilient algorithms will be deployed on physical 6G testbeds to evaluate real-time performance, computational overhead, and latency in closed-loop network slice orchestration. By pursuing these directions, the framework presented in this work can evolve into practical, deployable, and standardized AI solutions, supporting the autonomous and efficient operation of future 6G networks.

Author Contributions

S.N.K. provided the vast majority of the content for this work, taking the lead in conceptualization, methodology, software development, formal analysis, investigation, data curation, and the writing of the initial draft of this manuscript. M.G. contributed to methodology, validation, visualization, and manuscript review and editing, providing substantial improvements in clarity and presentation. D.K. assisted with visualization and contributed to the review and editing of this manuscript. S.Ç. supported the editing process and contributed feedback to refine this manuscript. M.S.O. and N.B. supervised the work and provided critical oversight and guidance throughout. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The study is based on an existing dataset, on which experimental modifications and adjustments were performed. No new datasets were generated. The underlying dataset is available from the original source, while the modified experimental data are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank The Scientific and Technological Research Council of Türkiye (TÜBİTAK) and Türk Telekom 6G R&D Lab for their support.

Conflicts of Interest

Authors Sümeye Nur Karahan, Merve Güllü, Deniz Karhan, Sedat Çimen, and Mustafa Serdar Osmanca were employed by the company Türk Telekom (Turkey). The remaining author (Necaattin Barışçı) declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Bhide, P.; Shetty, D.; Mikkili, S. Review on 6G communication and its architecture, technologies included, challenges, security challenges and requirements, applications, with respect to AI domain. IET Quantum Commun. 2025, 6, e12114. [Google Scholar] [CrossRef]
Botez, R.; Zinca, D.; Dobrota, V. Redefining 6G Network Slicing: AI-Driven Solutions for Future Use Cases. Electronics 2025, 14, 368. [Google Scholar] [CrossRef]
Chowdhury, M.Z.; Shahjalal, M.; Ahmed, S.; Jang, Y.M. 6G wireless communication systems: Applications, requirements, technologies, challenges, and research directions. IEEE Open J. Commun. Soc. 2020, 1, 957–975. [Google Scholar] [CrossRef]
Chaudhari, B.S. Enabling Tactile Internet via 6G: Application Characteristics, Requirements, and Design Considerations. Future Internet 2025, 17, 122. [Google Scholar] [CrossRef]
Tarafder, P.; Chun, C.; Ullah, A.; Kim, Y.; Choi, W. Channel Estimation in 5G-and-Beyond Wireless Communication: A Comprehensive Survey. Electronics 2025, 14, 750. [Google Scholar] [CrossRef]
Ni, X.; Dong, Z.; Rong, X. Innovative Application of 6G Network Slicing Driven by Artificial Intelligence in the Internet of Vehicles. Int. J. Netw. Manag. 2025, 35, e70004. [Google Scholar] [CrossRef]
Dubey, M.; Singh, A.K.; Mishra, R. AI based resource management for 5g network slicing: History, use cases, and research directions. Concurr. Comput. Pract. Exp. 2025, 37, e8327. [Google Scholar] [CrossRef]
Dangi, R.; Lalwani, P. Harris Hawks optimization based hybrid deep learning model for efficient network slicing in 5G network. Clust. Comput. 2024, 27, 395–409. [Google Scholar] [CrossRef]
Gabilondo, Á.; Fernández, Z.; Viola, R.; Martín, Á.; Zorrilla, M.; Angueira, P.; Montalbán, J. Traffic classification for network slicing in mobile networks. Electronics 2022, 11, 1097. [Google Scholar] [CrossRef]
Khan, H.; Luoto, P.; Samarakoon, S.; Bennis, M.; Latva-Aho, M. Network slicing for vehicular communication. Trans. Emerg. Telecommun. Technol. 2021, 32, e3652. [Google Scholar] [CrossRef]
Spantideas, S.T.; Giannopoulos, A.E.; Trakadas, P. Smart Mission Critical Service Management: Architecture, Deployment Options, and Experimental Results. IEEE Trans. Netw. Serv. Manag. 2024, 22, 1108–1128. [Google Scholar] [CrossRef]
Rost, P.; Mannweiler, C.; Michalopoulos, D.S.; Sartori, C.; Sciancalepore, V.; Sastry, N.; Holland, O.; Tayade, S.; Han, B.; Bega, D.; et al. Network slicing to enable scalability and flexibility in 5G mobile networks. IEEE Commun. Mag. 2017, 55, 72–79. [Google Scholar] [CrossRef]
Barakabitze, A.A.; Ahmad, A.; Mijumbi, R.; Hines, A. 5G network slicing using SDN and NFV: A survey of taxonomy, architectures and future challenges. Comput. Networks 2020, 167, 106984. [Google Scholar] [CrossRef]
Shu, Z.; Taleb, T. A novel QoS framework for network slicing in 5G and beyond networks based on SDN and NFV. IEEE Netw. 2020, 34, 256–263. [Google Scholar] [CrossRef]
Cui, H.; Zhang, J.; Geng, Y.; Xiao, Z.; Sun, T.; Zhang, N.; Liu, J.; Wu, Q.; Cao, X. Space-air-ground integrated network (SAGIN) for 6G: Requirements, architecture and challenges. China Commun. 2022, 19, 90–108. [Google Scholar] [CrossRef]
Zhou, G.; Zhao, L.; Zheng, G.; Song, S.; Zhang, J.; Hanzo, L. Multiobjective Optimization of Space–Air–Ground-Integrated Network Slicing Relying on a Pair of Central and Distributed Learning Algorithms. IEEE Internet Things J. 2024, 11, 8327–8344. [Google Scholar] [CrossRef]
Wu, W.; Zhou, C.; Li, M.; Wu, H.; Zhou, H.; Zhang, N.; Shen, X.S.; Zhuang, W. AI-native network slicing for 6G networks. IEEE Wirel. Commun. 2022, 29, 96–103. [Google Scholar] [CrossRef]
Hamdi, W.; Dağdeviren, O.; Bulut, H. QoS-aware Network Slicing and Resource Management for Internet of Vehicles in 5G networks. Ad Hoc Netw. 2025, 154, 103976. [Google Scholar] [CrossRef]
Ming, Z.; Yu, H.; Taleb, T. Federated deep reinforcement learning for prediction-based network slice mobility in 6G mobile networks. IEEE Trans. Mob. Comput. 2024, 23, 11937–11953. [Google Scholar] [CrossRef]
Dangi, R.; Lalwani, P. Optimizing network slicing in 6G networks through a hybrid deep learning strategy. J. Supercomput. 2024, 80, 20400–20420. [Google Scholar] [CrossRef]
Wang, J.; Liu, J.; Li, J.; Kato, N. Artificial intelligence-assisted network slicing: Network assurance and service provisioning in 6G. IEEE Veh. Technol. Mag. 2023, 18, 49–58. [Google Scholar] [CrossRef]
Alwakeel, A.M.; Alnaim, A.K. Network slicing in 6G: A strategic framework for IoT in smart cities. Sensors 2024, 24, 4254. [Google Scholar] [CrossRef] [PubMed]
Cunha, J.; Ferreira, P.; Castro, E.M.; Oliveira, P.C.; Nicolau, M.J.; Núñez, I.; Sousa, X.R.; Serôdio, C. Enhancing Network Slicing Security: Machine Learning, Software-Defined Networking, and Network Functions Virtualization-Driven Strategies. Future Internet 2024, 16, 226. [Google Scholar] [CrossRef]
HB, M.; GF, A.A.; SM, U. The Network Slicing and Performance Analysis of 6G Networks using Machine Learning. EMITTER Int. J. Eng. Technol. 2023, 11. [Google Scholar] [CrossRef]
Wu, Z.X.; You, Y.Z.; Liu, C.C.; Chou, L.D. Machine learning based 5g network slicing management and classification. In Proceedings of the 2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Osaka, Japan, 19–22 February 2024; IEEE: Piscataway, NJ, USA; pp. 371–375. [Google Scholar] [CrossRef]
Khan, S.; Khan, S.; Ali, Y.; Khalid, M.; Ullah, Z.; Mumtaz, S. Highly accurate and reliable wireless network slicing in 5th generation networks: A hybrid deep learning approach. J. Netw. Syst. Manag. 2022, 30, 29. [Google Scholar] [CrossRef]
Mohammedali, N.A.; Kanakis, T.; Al-Sherbaz, A.; Agyeman, M.O. Traffic classification using deep learning approach for end-to-end slice management in 5g/b5g. In Proceedings of the 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 19–21 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 357–362. [Google Scholar] [CrossRef]
Malik, R.Q.; Alsharfa, R.M.; Mohammed, B.K.; Al-Fatlawi, A.H.; Abd Al-Ameer, M.S.; Najm, H. A Novel Taneja Distance-based Classifier with PSO-Optimized Feature Selection for Efficient 5G Network Slicing. Int. J. Intell. Eng. Syst. 2025, 18. [Google Scholar] [CrossRef]
Mahmoud, H.; Zhang, Y.; Guan, M.; Lu, C.; Ismail, T.; Idrissi, M.; Mi, D.; Daraz, U. A Hybrid Deep Learning Approach for Enhanced Network Slicing in 6G. In Proceedings of the 2024 IEEE Middle East Conference on Communications and Networking (MECOM), Abu Dhabi, United Arab Emirates, 17–20 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 434–439. [Google Scholar]
Jain, M.; Verma, R.; Kumar, S.; Kumar, G.; Basheer, S. Enhancing Network Slicing Efficiency in 6G Networks with a Hybrid Deep Learning Approach Leveraging Crisscross Harris Hawks Optimization. IEEE Commun. Stand. Mag. 2025, 9, 70–77. [Google Scholar] [CrossRef]
Samidi, F.S.; Radzi, N.A.M.; Aripin, N.M.; Jalil, Y.E.; Azmi, K.H.M. 5G Slicing with Machine Learning: Dataset Design, Model Adjustment and Performance Metrics. In Proceedings of the 2025 21st IEEE International Colloquium on Signal Processing & Its Applications (CSPA), Pulau Pinang, Malaysia, 7–8 February 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 161–166. [Google Scholar]
Wang, S.; Qureshi, M.A.; Miralles-Pechuán, L.; Huynh-The, T.; Gadekallu, T.R.; Liyanage, M. Explainable AI for 6G use cases: Technical aspects and research challenges. IEEE Open J. Commun. Soc. 2024, 5, 2490–2540. [Google Scholar] [CrossRef]
Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Samek, W.; Wiegand, T.; Müller, K.R. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv 2017, arXiv:1708.08296. [Google Scholar] [CrossRef]
Gunning, D.; Aha, D. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 2019, 40, 44–58. [Google Scholar] [CrossRef]
Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Giannotti, F.; Pedreschi, D. A survey of methods for explaining black box models. ACM Comput. Surv. 2018, 51, 1–42. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 4765–4774. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
Aramide, O. Explainable AI (XAI) for Network Operations and Troubleshooting. Int. J. Res. Publ. Semin. 2025, 16, 533–554. [Google Scholar] [CrossRef]
Salehi, S.; Iturria-Rivera, P.E.; Elsayed, M.; Bavand, M.; Gaigalas, R.; Ozcan, Y.; Erol-Kantarci, M. Prioritized Value-Decomposition Network for Explainable AI-Enabled Network Slicing. arXiv 2025, arXiv:2501.15734. [Google Scholar] [CrossRef]
Sun, H.; Liu, Y.; Al-Tahmeesschi, A.; Nag, A.; Soleimanpour-Moghadam, M.; Canberk, B.; Arslan, H.; Ahmadi, H. Advancing 6G: Survey for explainable AI on communications and network slicing. IEEE Open J. Commun. Soc. 2025, 6, 1372–1412. [Google Scholar] [CrossRef]
5G-IA (5G Industry Association). European Vision for the 6G Network Ecosystem: 5G-IA 6G White Paper; Technical report; European Commission/Hexa-X: Brussels, Belgium, 2022. [Google Scholar] [CrossRef]
Ericsson. Ericsson Mobility Report—Latest Edition. 2025. Available online: https://www.ericsson.com/en/reports-and-papers/mobility-report (accessed on 23 July 2025).
Rappaport, T.S.; Annamalai, A.; Buehrer, R.M.; Tranter, W.H. Wireless communications: Past events and a future perspective. IEEE Commun. Mag. 2002, 40, 148–161. [Google Scholar] [CrossRef]
Goldsmith, A. Wireless Communications; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar] [CrossRef]
3GPP. Study on Channel Model for Frequencies from 0.5 to 100 GHz (Release 17). Technical Report TR 38.901 V17.00.00, 3rd Generation Partnership Project, Sophia Antipolis, France. 2022. Available online: https://www.etsi.org/deliver/etsi_tr/138900_138999/138901/17.00.00_60/tr_138901v170000p.pdf (accessed on 20 August 2025).
ITU-R. Guidelines for Evaluation of Radio Interface Technologies for IMT-2020; Recommendation ITU-R M.2412-0; International Telecommunication Union: Geneva, Switzerland, 2015; Available online: https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-M.2412-2017-PDF-E.pdf (accessed on 20 August 2025).
Clavier, L.; Peters, G.W.; Septier, F.; Nevat, I. Impulsive Noise Modeling and Robust Receiver Design. EURASIP J. Wirel. Commun. Netw. 2021, 2021, 13. [Google Scholar] [CrossRef]
Razavi, B. RF front-end design challenges for software defined radio. IEEE Commun. Mag. 2012, 50, 64–71. [Google Scholar] [CrossRef]
Donnellan, D.; Lawrence, A.; Bizo, D.; Judge, P.; O’Brien, J.; Davis, J.; Smolaks, M.; Williams-George, J.; Weinschenk, R. Uptime Institute Global Data Center Survey 2024. 2024. Available online: https://intelligence.uptimeinstitute.com/resource/uptime-institute-global-data-center-survey-2024 (accessed on 23 July 2025).
Teh, H.Y.; Kempa-Liehr, A.W.; Wang, K.I.K. Sensor data quality: A systematic review. J. Big Data 2020, 7, 11. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar] [CrossRef]
Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
Breiman, L.; Friedman, J.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman and Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2323. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Bach, F., Blei, D., Eds.; Proceedings of Machine Learning Research. Volume 37, pp. 448–456. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Networks Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; Adaptive Computation and Machine Learning series; MIT Press: Cambridge, MA, USA, 2016; p. 800. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Cisco Systems. Cisco Annual Internet Report (2018–2023); White Paper; Cisco Systems: San Jose, CA, USA, 2020; Available online: https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html (accessed on 1 August 2025).
NGMN Alliance. 6G Requirements and Design Considerations; NGMN Alliance Publication: Düsseldorf, Germany, 2023; Available online: https://www.ngmn.org/publications/6g-requirements-and-design-considerations.html (accessed on 20 August 2025).
Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA, 4–10 August 2001; pp. 41–46. [Google Scholar]
Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Wadsworth: Belmont, CA, USA, 1984. [Google Scholar]

Figure 1. Overview of the dual-path methodological framework. The workflow contrasts the idealized PRDA pathway against the LVRT pathway, which simulates real-world operational stress, to enable a systematic analysis of model performance and interpretability.

Figure 2. End-to-end network slicing reference model used throughout this paper. The Service Layer (right, top) exposes slice-oriented services (e.g., feMBB, mURLLC, ERLLC, MBRLLC, and umMTC) to verticals and virtual operators. The Network Function Layer orchestrates network operation and VNFs, while the Infrastructure Layer spans RAN, transport, and core with edge/core clouds. Arrows indicate control/data interactions governed by the Network Slice Controller across layers and domains.

Figure 3. Overview of the proposed methodology.

Figure 4. Slice type and use case type relationship demonstrating perfect correlation, justifying use case elimination as redundant feature.

Figure 5. Original (percentages may not sum to 100% due to rounding) dataset network slice distribution.

Figure 6. Realistic dataset network slice distribution (percentages may not sum to 100% due to rounding).

Figure 7. PRDA vs. LVRT accuracy comparison across algorithms.

Figure 8. Average training time comparison (log scale) between traditional ML and DL models.

Figure 9. Global explainability comparison for the k-NN model.

Figure 10. Comparison of SHAP-based feature importance between PRDA (left) and LVRT (right) Decision Tree models.

Table 1. ML/DL-based studies on 5G/6G network slicing classification.

Study (Year)	Methodology	Dataset	Key Findings	Limitations
Khan et al. (2022) []	Hybrid CNN + LSTM	Public 5G slice dataset (“DeepSlice & Secure5G”)	∼95.2% accuracy; proactive reconfiguration to prevent slice failures	No real network testing; focuses only on slice selection; lacks 6G-specific security/dynamic adaptability
Mohammedali et al. (2022) []	SVM, Random Forest, others	5G traffic data (simulated or testbed)	≈100% accuracy; feasible real-time classification	Dataset not fully described; generalizability questionable; small-scale controlled scenario
Dangi & Lalwani (2024) []	HHO-optimized CNN + LSTM	Public IP flow dataset + private dataset	Achieved top accuracy; improved slice selection performance	Added computational complexity; real-time feasibility unclear; private dataset not accessible
Dangi & Lalwani (2024) []	CNN + BiLSTM	Unicauca IP Flow v2	∼97.2% accuracy; reliable slice recommendations	Offline evaluation; real-world 6G traffic generalization not validated
Wu et al. (2024) []	Random Forest, GBDT, XGBoost, KNN	Universidad Del Cauca network traffic dataset (141 applications, 50 features, and ∼2.7M records)	RF and GBDT achieved up to 95.73% accuracy in slice classification; GPU reduced training time nearly $10 \times$ compared to CPU	Evaluation limited to 5G slicing; no direct 6G validation; relies on a single dataset, and real-time deployment feasibility not assessed
Botez et al. (2025) []	FNN, RF, XGBoost, others	Synthetic 6G slicing dataset (2025)	>99% accuracy; successful slice selection and handover prediction	Synthetic dataset may not capture full real-world complexity; no live network validation
Malik et al. (2025) []	Taneja Distance-Based Classifier (TDC) + PSO Feature Selection	Public 5G Network Slicing Dataset (30,000 instances, 10 attributes)	Achieved 98.9% accuracy; execution time 1.7 ms; robust across noisy, imbalanced, and high-dimensional datasets	Requires proper feature normalization due to sensitivity of Taneja distance; potential computational overhead in large-scale real-time deployments; limited validation on real 6G scenarios
Mahmoud et al. (2024) []	CNN + RF, 5-NN, XGBoost, Naive Bayes, DT, LR, MLP	Real-world 5G/6G traffic (DeepSlice dataset)	Achieved up to 94.44% accuracy; RF, 5-NN, XGBoost, and Naive Bayes reached 100% precision and recall with CNN-based feature selection	Logistic Regression showed low accuracy (72%); Bagging Trees had high false positives; limited evaluation under massive-scale 6G data
Jain et al. (2025) []	Hybrid CNN + LSTM + Crisscross Harris Hawks Optimization (CHHO)	Unicauca IP Flow v2 dataset	Achieved 95.48% accuracy; precision 94.11%, recall 87.45%, and F1-score 93.87%; CHHO improved convergence stability and learning robustness compared to PSO and GWO	Evaluation limited to a single dataset; validated in simulation only; lacks large-scale real 6G traffic experiments
Samidi et al. (2023) []	SVM, RF, GBT, DT, NB, DL (Auto Model)	5GNS dataset (simulated, ∼30k samples)	Gradient Boosted Tree achieved top accuracy (78.7%); evaluated multiple ML/DL models via Auto Model tool	Reported accuracy lower than state-of-the-art; dependent on Auto Model optimization; validated only on simulated dataset, no real 6G experiments

Table 2. Feature association analysis and selection strategy.

Feature	Type	Association Metric	Strength	Decision
Latency Budget (ns)	Numeric	$η^{2}$ = 1.000	Perfect	Removed
Data Rate Budget (Gbps)	Numeric	$η^{2}$ = 1.000	Perfect	Removed
Jitter Budget (ns)	Numeric	$η^{2}$ = 0.436	Strong	Removed
Use Case Type	Categorical	Cramér’s V = 1.000	Perfect	Removed
Required Mobility	Categorical	Cramér’s V = 1.000	Perfect	Removed
Required Connectivity	Categorical	Cramér’s V = 1.000	Perfect	Removed
Slice Handover	Categorical	Cramér’s V = 0.999	Near-perfect	Removed
Retained Features:
Slice Available Transfer Rate	Numeric	$η^{2}$ = 0.000	Independent	Retained
Slice Latency (ns)	Numeric	$η^{2}$ = 0.000	Independent	Retained
Slice Packet Loss	Numeric	$η^{2}$ = 0.001	Independent	Retained
Slice Jitter (ns)	Numeric	$η^{2}$ = 0.000	Independent	Retained
Packet Loss Budget	Categorical	Cramér’s V = 0.791	Strong	Retained

Table 3. Traditional ML model parameters.

Algorithm	Key Parameters	Optimization Strategy
Random Forest	$B = 200$ trees; $m a x_d e p t h = 15$	Parallel processing: all cores
Gradient Boosting	$M = 150$ trees; $m a x_d e p t h = 8$	Learning rate $η = 0.1$
XGBoost	$T = 200$ trees; $m a x_d e p t h = 8$	Parallel processing: all cores
SVM (RBF)	$C = 1.0$ ; $γ = a u t o$	Probability estimates enabled
Logistic Regression	$m a x_i t e r = 1000$	L2 regularization $λ = 1.0$
Naive Bayes	Gaussian assumption	MLE parameter estimation
Decision Tree	$m a x_d e p t h = 15$	Gini impurity criterion
k-NN	$k = 7$ ; distance weighting	Euclidean distance metric

Table 4. DL architectures and training configurations.

Model	Architecture	Training Setup
CNN	Conv1D: 64,128,256; Dense: 512,256	Epochs 50; Batch 32; Adam; LR = 0.001
LSTM	LSTM: 128,64; Dense: 256,128	Epochs 50; Batch 32; RMSprop; LR = 0.001
FNN	Dense: 512,256,128,64	Epochs 50; Batch 32; Adam; LR = 0.001

Table 5. SLA-based resilience threshold derivation.

Category	Threshold	SLA Violation Risk	Empirical Basis	Deployment Suitability
Excellent	$Δ A \geq - 5 %$	<1%	Measurement uncertainty bounds	Mission-critical slices
Good	$- 10 % \leq Δ A < - 5 %$	1–5%	Acceptable service degradation	Commercial deployments
Moderate	$- 20 % \leq Δ A < - 10 %$	5–15%	Requires active monitoring	Non-critical applications
Poor	$Δ A < - 20 %$	>15%	Unacceptable for deployment	Research/development only

Table 6. PRDA results (10,000 samples).

Algorithm	Accuracy	Training Time (s)	Efficiency Class
Naive Bayes	0.890	0.02	Excellent
XGBoost	0.874	0.48	Very Good
Gradient Boosting	0.871	15.53	Very Good
Random Forest	0.858	7.09	Very Good
Decision Tree	0.803	0.24	Good
Logistic Regression	0.652	0.31	Acceptable
k-NN	0.650	0.06	Acceptable
SVM (RBF)	0.609	35.49	Poor
CNN	1.000	52.36	Excellent
FNN	1.000	55.64	Excellent
LSTM	0.785	46.02	Acceptable

Table 7. LVRT results (21,033 samples).

Algorithm	Accuracy	Training Time (s)	Resilience Class
Random Forest	0.726	15.89	Moderate
XGBoost	0.726	1.52	Moderate
Gradient Boosting	0.723	789.5	Moderate
Logistic Regression	0.719	0.57	Excellent
SVM (RBF)	0.695	521.3	Excellent
Decision Tree	0.667	1.93	Moderate
k-NN	0.643	0.17	Excellent
Naive Bayes	0.580	0.04	Poor
CNN	0.812	55.88	Good
FNN	0.811	53.68	Good
LSTM	0.748	57.82	Moderate

Table 8. Performance transition (PRDA → LVRT) across algorithms.

Algorithm	Accuracy Change, $Δ A$ (%)	Training Time (PRDA, s)	Training Time (LVRT, s)
SVM (RBF)	−14.1	35.49	521.3
Logistic Regression	−10.3	0.31	0.57
k-NN	+1.1	0.06	0.17
LSTM	+4.7	46.02	57.82
Random Forest	+15.4	7.09	15.89
XGBoost	+16.9	0.48	1.52
Decision Tree	+16.9	0.24	1.93
Gradient Boosting	+17.0	15.53	789.5
CNN	+18.8	52.36	55.88
FNN	+18.9	55.64	53.68
Naive Bayes	+34.8	0.02	0.04

Table 9. Algorithm complexity.

Algorithm	Training Complexity	Prediction Complexity
Naive Bayes	$O (n d c)$	$O (c d)$
k-NN	$O (n d)$	$O (n d)$
Logistic Regression	$O (n d t)$	$O (d)$
Decision Tree	$O (n d log n)$	$O (h)$
XGBoost	$O (M \cdot n d log n)$	$O (M \cdot h)$
Random Forest	$O (B \cdot n d log n)$	$O (B \cdot h)$
Gradient Boosting	$O (M \cdot n d log n)$	$O (M \cdot h)$
SVM (RBF)	$O (n^{2} d)$	$O (s v d)$
FNN	$O (L \cdot H \cdot (H + d) \cdot n \cdot e)$	$O (L \cdot H \cdot (H + d))$
CNN	$O (K \cdot F \cdot H \cdot W \cdot k^{2} \cdot n \cdot e)$	$O (K \cdot F \cdot H \cdot W \cdot k^{2})$
LSTM	$O (H \cdot (d + H) \cdot T \cdot n \cdot e)$	$O (H \cdot (d + H) \cdot T)$

Notation: n = samples, d = features, c = classes, t = iterations, h = tree depth, B = trees, M = boosting steps, L = layers, H = hidden units, e = epochs, k = kernel size, T = sequence length, and

s v

= support vectors.

Table 10. Comprehensive algorithm performance across urban 6G SNR scenarios.

Algorithm	Optimal (%)	Dense (%)	Challenging (%)	Resilience Category
CNN	82.4	81.8	80.1	Good
FNN	82.1	81.3	79.9	Good
Random Forest	74.2	73.1	71.8	Moderate
XGBoost	74.0	72.9	71.6	Moderate
Gradient Boosting	73.8	72.7	71.4	Moderate
LSTM	76.1	75.3	73.9	Moderate
Logistic Regression	73.2	72.8	72.0	Excellent
SVM (RBF)	70.8	70.2	69.1	Excellent
Decision Tree	68.1	67.4	66.2	Moderate
k-NN	65.5	64.9	63.7	Excellent
Naive Bayes	59.2	58.1	56.8	Poor

Table 11. Algorithm interpretability consistency analysis.

Algorithm	Consistency Score	Category	Primary Feature Focus
CNN	1.000	High	Packet Loss Budget
LSTM	1.000	High	Packet Loss Budget
Naive Bayes	0.998	High	Packet Loss Budget
SVM (RBF)	0.997	High	Packet Loss Budget
Logistic Regression	0.988	High	Slice Jitter
FNN	0.981	High	Packet Loss Budget
k-NN (Weighted)	0.735	Moderate	Packet Loss Budget
Gradient Boosting	0.448	Moderate	Slice Jitter/Latency
Decision Tree	0.398	Moderate	Slice Jitter
Random Forest	0.351	Moderate	Slice Packet Loss/Jitter
XGBoost	0.106	Low	Packet Loss Budget/Latency

Table 12. Local explanation for high SNR misclassification case (k-NN, Sample #0).

Case Details	True: 2 → Predicted: 3 (57.14% Confidence)
Key SHAP Contributors
Slice Latency (ns)	+0.3100 (primary driver toward class 3)
Slice Jitter (ns)	+0.0100 (secondary driver)
LIME Decision Thresholds
Latency trigger	$1.89 \times 10^{6}$ < Slice Latency ≤ $4.55 \times 10^{6}$ ns
Jitter trigger	Slice Jitter > $2.34 \times 10^{6}$ ns
Feature Values vs. Correct Sample (#98)
Slice Latency	$4.19 \times 10^{6}$ ns vs. $- 24.9 \times 10^{6}$ ns (+29.1 M diff)
Slice Jitter	$3.07 \times 10^{6}$ ns vs. $0.98 \times 10^{6}$ ns (+2.09 M diff)

Table 13. Local explanation for low SNR misclassification case (k-NN, Sample #0).

Case Details	True: 2 → Predicted: 0 (57.14% Confidence)
Key SHAP Contributors
Slice Latency (ns)	−0.9400 (strong negative driver from correct class)
Slice Jitter (ns)	+0.0200 (weak positive toward class 0)
LIME Decision Thresholds
Jitter trigger	Slice Jitter > 4.54e6 ns (+0.0401)
Latency trigger	Slice Latency > 23.0 M ns (−0.0293)
Feature Values vs. Correct Sample (#1)
Slice Latency	$38.0 \times 10^{6}$ ns vs. $- 7.45 \times 10^{6}$ ns (+45.4 M diff)
Slice Jitter	$4.68 \times 10^{6}$ ns vs. $3.89 \times 10^{6}$ ns ( $+ 0.79 \times 10^{6}$ diff)
Transfer Rate	11,644 Gbps vs. −9,030 Gbps (+20.7 K diff)

Table 14. XAI-based algorithm deployment guide.

Algorithm	Accuracy	Consistency	Deployment Scenario	Monitoring Level
CNN	0.812	1.000	Mission-critical systems	High
FNN	0.811	0.981	Mission-critical systems	High
LSTM	0.748	1.000	Temporal pattern analysis	High
Random Forest	0.726	0.351	Performance-focused	Moderate
XGBoost	0.726	0.106	High-performance applications	Intensive
Gradient Boosting	0.723	0.448	Balanced applications	Moderate
Logistic Regression	0.719	0.988	Regulatory compliance	Critical
SVM (RBF)	0.695	0.997	Regulatory compliance	Critical
Decision Tree	0.667	0.398	Transparent pathways	Moderate
k-NN (Weighted)	0.643	0.735	Dynamic environments	Moderate
Naive Bayes	0.580	0.998	Probabilistic reasoning	Critical

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Realistic Performance Assessment of Machine Learning Algorithms for 6G Network Slicing: A Dual-Methodology Approach with Explainable AI Integration

Abstract

1. Introduction

2. Related Work and Background

2.1. 6G Network Slicing Evolution

2.2. XAI in Telecommunications

3. Methodology and Dataset Characteristics

3.1. Dual-Methodology Framework and Dataset Foundation

3.2. Feature Optimization and Dimensionality Reduction

3.3. PRDA Implementation: Laboratory Baseline

3.4. LVRT Implementation: Realistic Operational Simulation

3.4.1. Market-Driven Distribution Transformation

3.4.2. Multi-Layer Noise Injection

3.4.3. Systematic Missing Data and Quality Control

4. Experimental Setup and Model Configuration

4.1. Traditional ML Configurations

4.1.1. Ensemble Learning Methods

4.1.2. Linear and Probabilistic Models

4.1.3. Tree-Based and Instance-Based Methods

4.2. DL Architectures

4.2.1. CNN

4.2.2. LSTM

4.2.3. FNN

4.3. Resilience Threshold Derivation and SLA Risk Analysis

5. Experimental Results and Analysis

5.1. Performance Analysis: PRDA vs. LVRT Comparison

5.2. Computational Efficiency and Deployment Constraints

5.3. Algorithm Resilience Validation

5.3.1. Statistical Validation of Resilience Classifications

5.3.2. Mechanism Validation Through Empirical Analysis

5.4. XAI Analysis and Algorithm Interpretability

5.4.1. SHAP-Based Feature Importance Framework

5.4.2. Cross-Scenario Consistency Analysis

5.4.3. Algorithm Selection Guidelines for 6G Deployment

5.4.4. Deployment Guidelines and Operational Monitoring Framework

6. Conclusions

7. Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics