You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

27 September 2025

Realistic Performance Assessment of Machine Learning Algorithms for 6G Network Slicing: A Dual-Methodology Approach with Explainable AI Integration

,
,
,
,
and
1
R&D Department, Türk Telekom, 06103 Ankara, Türkiye
2
Department of Computer Engineering, Gazi University, 06560 Ankara, Türkiye
*
Author to whom correspondence should be addressed.
This article belongs to the Topic Advanced Array Signal Processing for B5G/6G: Models, Algorithms, and Applications

Abstract

As 6G networks become increasingly complex and heterogeneous, effective classification of network slicing is essential for optimizing resources and managing quality of service. While recent advances demonstrate high accuracy under controlled laboratory conditions, a critical gap exists between algorithm performance evaluation under idealized conditions and their actual effectiveness in realistic deployment scenarios. This study presents a comprehensive comparative analysis of two distinct preprocessing methodologies for 6G network slicing classification: Pure Raw Data Analysis (PRDA) and Literature-Validated Realistic Transformations (LVRTs). We evaluate the impact of these strategies on algorithm performance, resilience characteristics, and practical deployment feasibility to bridge the laboratory–reality gap in 6G network optimization. Our experimental methodology involved testing eleven machine learning algorithms—including traditional ML, ensemble methods, and deep learning approaches—on a dataset comprising 10,000 network slicing samples (expanded to 21,033 through realistic transformations) across five network slice types. The LVRT methodology incorporates realistic operational impairments including market-driven class imbalance (9:1 ratio), multi-layer interference patterns, and systematic missing data reflecting authentic 6G deployment challenges. The experimental results revealed significant differences in algorithm behavior between the two preprocessing approaches. Under PRDA conditions, deep learning models achieved perfect accuracy (100% for CNN and FNN), while traditional algorithms ranged from 60.9% to 89.0%. However, LVRT results exposed dramatic performance variations, with accuracies spanning from 58.0% to 81.2%. Most significantly, we discovered that algorithms achieving excellent laboratory performance experience substantial degradation under realistic conditions, with CNNs showing an 18.8% accuracy loss (dropping from 100% to 81.2%), FNNs experiencing an 18.9% loss (declining from 100% to 81.1%), and Naive Bayes models suffering a 34.8% loss (falling from 89% to 58%). Conversely, SVM (RBF) and Logistic Regression demonstrated counter-intuitive resilience, improving by 14.1 and 10.3 percentage points, respectively, under operational stress, demonstrating superior adaptability to realistic network conditions. This study establishes a resilience-based classification framework enabling informed algorithm selection for diverse 6G deployment scenarios. Additionally, we introduce a comprehensive explainable artificial intelligence (XAI) framework using SHAP analysis to provide interpretable insights into algorithm decision-making processes. The XAI analysis reveals that Packet Loss Budget emerges as the dominant feature across all algorithms, while Slice Jitter and Slice Latency constitute secondary importance features. Cross-scenario interpretability consistency analysis demonstrates that CNN, LSTM, and Naive Bayes achieve perfect or near-perfect consistency scores (0.998–1.000), while SVM and Logistic Regression maintain high consistency (0.988–0.997), making them suitable for regulatory compliance scenarios. In contrast, XGBoost shows low consistency (0.106) despite high accuracy, requiring intensive monitoring for deployment. This research contributes essential insights for bridging the critical gap between algorithm development and deployment success in next-generation wireless networks, providing evidence-based guidelines for algorithm selection based on accuracy, resilience, and interpretability requirements. Our findings establish quantitative resilience boundaries: algorithms achieving >99% laboratory accuracy exhibit 58–81% performance under realistic conditions, with CNN and FNN maintaining the highest absolute accuracy (81.2% and 81.1%, respectively) despite experiencing significant degradation from laboratory conditions.

1. Introduction

With the rapid evolution of wireless communication technologies, the expectations surrounding sixth-generation (6G) networks are unprecedented. Sixth-generation networks differ fundamentally from fifth-generation (5G) systems. While 5G primarily focuses on enhanced mobile broadband and low-latency communications, 6G represents a paradigm shift. It is envisioned as a comprehensive ecosystem that integrates multiple infrastructure types: terrestrial-, aerial-, maritime-, and space-based networks [,,]. The driving force behind this evolution comes from emerging applications with unprecedented demands. Applications including extended reality (XR), brain–computer interfaces, holographic communication, autonomous vehicles, and the Internet of Robotic Things (IoRT) require capabilities that exceed 5G’s limitations. These applications demand multiple simultaneous requirements: extreme reliability, ultra-low latency, terabit-level data rates, artificial intelligence (AI)-driven orchestration, sustainable energy efficiency, and quantum-secure communication [,]. As a result, the advanced mobile broadband (eMBB), ultra-reliable low-latency communication (URLLC), and massive machine-type communication (mMTC) services offered by 5G may not fully address future requirements []. Therefore, 6G aims to meet these demands through a unified architecture. This architecture integrates three core technologies: communication, computing, and sensing. The entire system operates on a data-driven foundation supported by AI.
A critical technology enabling this transformation is network slicing. Network slicing was established during the 5G era but will play an even more critical role in 6G [,]. This technology works by creating virtualized end-to-end logical network slices on shared physical infrastructure. Each slice is specifically customized to meet the unique requirements of different application types. The key advantage is that multiple virtual networks can operate simultaneously on the same shared infrastructure, enabling efficient and dynamic resource allocation []. The result is maximized resource efficiency through isolation of critical data flows.
Sixth-generation network slicing encompasses five distinct service categories, each with unique characteristics and resource requirements. First, further enhanced mobile broadband (feMBB) delivers data speeds exceeding 1 Tbps for applications like holographic communication and 16K video streaming. Second, ultra-massive machine type communications (umMTC) supports hyper-dense IoT deployments with up to 10 million devices per square kilometer. Third, Mobile URLLC (mURLLC) enables mobility scenarios such as autonomous vehicle coordination and remote health monitoring. Fourth, extremely reliable low-latency communications (ERLLC) provides sub-microsecond latency and deterministic reliability for industrial automation. Finally, mobile broadband reliable low-latency communications (MBRLLC) serves applications requiring balanced multi-dimensional performance, including delivery drones and vehicle-to-everything (V2X) communication.
The methodological workflow of this study is illustrated in Figure 1. The framework begins with a standardized preprocessing phase on the initial laboratory dataset, involving feature selection and label encoding. Subsequently, the methodology bifurcates into two distinct analytical pathways designed to systematically evaluate algorithmic performance under contrasting conditions.
Figure 1. Overview of the dual-path methodological framework. The workflow contrasts the idealized PRDA pathway against the LVRT pathway, which simulates real-world operational stress, to enable a systematic analysis of model performance and interpretability.
The complexity of 6G networks makes traditional static approaches insufficient. The 6G network incorporates revolutionary components including communication–computing–sensing convergence, AI-based automation, and integration with satellite and airborne networks. Consequently, static slice definitions become inadequate. Network slicing must now address real-time challenges: classification, optimization, and prediction problems that evolve dynamically.
This complexity drives the need for intelligent solutions. ML-based approaches are emerging as essential tools for effectively managing this complex service ecosystem. AI-supported classifiers are recognized as a critical strategy for optimizing performance, managing heterogeneity, and overcoming dynamic network conditions in intelligent network slicing management and traffic routing to the appropriate slice [].
Current research demonstrates promising results in controlled environments. Studies in the literature report that models created using classical ML algorithms on synthetic datasets achieve accuracy rates of over 99%. Similarly, deep learning (DL)-based hybrid models are reported to achieve success rates of over 97% []. Furthermore, experiments conducted on 5G test networks have shown that systems automatically classify real traffic flows and assign them to appropriate slots, reducing packet loss and jitter, providing higher reliability, especially under heterogeneous traffic conditions []. However, a significant gap exists between the high accuracy achieved in laboratory conditions and the performance achieved in real-world scenarios. Most studies are limited to controlled simulation environments, where the same level of success cannot be guaranteed in real networks. Performance degradation is frequently observed under real traffic patterns, interference, and variable load conditions, posing a significant challenge for field applications. Consequently, developing models supported by transfer learning, domain adaptation, and realistic datasets emerges as a critical research direction for bridging the gap between laboratory and field conditions [].
This challenge leads to a fundamental research question. Given the increasing complexity of 6G networks and heterogeneous service requirements, network slicing and its specific subproblem slice classification become indispensable. While AI-based solutions show promising results in controlled environments, the critical question remains: how can these solutions work reliably and scalably in real-world deployments?
One of the fundamental issues is the methodological gap in data preprocessing, which significantly affects the robustness and generalizability of ML models for 6G network slicing. To address this challenge, we identify two distinct methodological approaches. The first approach, PRDA, maintains network data in its original laboratory-controlled form. It applies only basic normalization and feature selection. This approach preserves ideal statistical properties typically observed in controlled experimental environments, such as balanced class distributions, minimal measurement noise, and absence of missing values. The second approach, LVRT, applies comprehensive transformations to laboratory data. These transformations simulate authentic real-world network conditions based on empirical evidence. The approach incorporates empirically validated measurement uncertainty models, network congestion scenarios, equipment failures, and service adoption patterns. All transformations are grounded in extensive literature review and industry reports. While LVRT introduces complexity and potential performance degradation, it provides a more realistic evaluation of algorithm behavior under operational conditions.
The implications of this methodological choice are profound. This decision directly affects algorithm selection, deployment strategy, and performance expectations in practical 6G network implementations. PRDA may lead to overoptimistic performance predictions and inappropriate algorithm selection. In contrast, LVRT provides more conservative but realistic performance bounds that better reflect operational constraints and challenges.
This work presents the first systematic dual-methodology evaluation framework that bridges the critical gap between laboratory algorithm assessment and realistic 6G deployment performance. Unlike existing studies that rely solely on controlled synthetic datasets, our approach introduces LVRT incorporating authentic operational impairments including market-driven class imbalance, multi-layer interference patterns, and systematic missing data. This methodology reveals previously unknown algorithmic behavior patterns, demonstrating that certain algorithms exhibit substantial performance improvements under realistic conditions—a phenomenon invisible under traditional evaluation approaches.
Our research addresses this critical gap through concrete contributions. This paper makes the following key contributions:
  • Dual-Methodology Evaluation Framework: Introduction of a novel comparative framework combining PRDA and LVRT methodologies to systematically evaluate algorithm performance across laboratory and realistic deployment conditions;
  • Laboratory–Reality Performance Gap Analysis: First comprehensive demonstration that algorithms achieving >99% accuracy under controlled conditions exhibit 58–72% performance under realistic 6G deployment scenarios, revealing critical limitations in traditional evaluation approaches;
  • Counter-Intuitive Algorithm Behavior Discovery: Identification of algorithms (SVM RBF; Logistic Regression) that demonstrate performance improvements (14.0% and 10.1%, respectively) under realistic conditions, challenging fundamental assumptions about data quality relationships;
  • Algorithm Resilience Classification System: Development of a resilience-based categorization framework (Excellent, Good, Moderate, and Poor) based on performance degradation analysis, enabling informed algorithm selection for diverse 6G infrastructure scenarios;
  • Realistic 6G Network Simulation: Implementation of comprehensive operational impairments including market-driven class imbalance (9:1 ratio), multi-layer interference modeling, and systematic missing data patterns reflecting authentic 6G deployment challenges;
  • Practical Deployment Guidance: Evidence-based recommendations demonstrating that simple, stable algorithms often outperform sophisticated methods in operational environments, fundamentally altering 6G deployment strategy considerations;
  • Cross-Paradigm Comparative Analysis: Comprehensive evaluation of classical ML methods (e.g., SVM, Logistic Regression, and Random Forest) and DL models (e.g., CNNs and LSTMs) for 6G network slicing classification, highlighting their respective strengths, limitations, and deployment trade-offs under realistic conditions;
  • Explainable AI Integration: Application of XAI techniques, including SHAP, to interpret model decisions and feature importance, thereby enhancing transparency, enabling trust in algorithmic outputs, and providing actionable insights for network operators in critical 6G deployment scenarios.
The remainder of this paper is organized as follows. Section 2 provides the background and related work, including a detailed overview of the evolution of 6G network slicing and the role of XAI in telecommunications systems. Section 3 describes the methodology and dataset characteristics, elaborating on the research motivation, methodological philosophy, feature association analysis, dimensionality reduction, and the proposed dual-methodology evaluation framework, followed by the description of PRDA and LVRT implementations. Section 4 presents the experimental setup and model configuration, covering both traditional ML models and DL architectures. Section 5 reports and analyzes the experimental findings, including baseline results under PRDA, realistic performance under LVRT, training time and computational cost analysis, and interpretability insights through XAI. Section 6 concludes this paper with key findings, while Section 7 outlines promising future research directions.

3. Methodology and Dataset Characteristics

The evolution toward 6G networks demands sophisticated ML algorithms for real-time network slice classification in operational environments. While Botez et al. [] demonstrated over 99% accuracy under controlled laboratory conditions, a critical gap persists between idealized algorithm evaluation and realistic deployment performance. This study addresses this limitation through a dual-methodology framework comparing PRDA representing laboratory conditions with LVRT incorporating authentic network impairments.

3.1. Dual-Methodology Framework and Dataset Foundation

Our approach builds upon the 6G network slicing dataset (10,000 samples; 13 features) introduced by Botez et al. [], extending it through systematic refinement and realistic transformation processes. Figure 3 illustrates the complete methodology workflow encompassing feature optimization, dual-path evaluation, and comprehensive algorithm assessment.
Figure 3. Overview of the proposed methodology.
The evaluation encompasses five network slice types representing the complete 6G service spectrum: feMBB for ultra-high bandwidth applications (holographic communications, 16 K streaming), umMTC enabling hyper-dense IoT deployments ( 10 7 devices/km2), mURLLC providing mission-critical connectivity (>99.999% reliability), ERLLC delivering sub-microsecond latency for precision applications, and MBRLLC supporting balanced multi-dimensional performance requirements.

3.2. Feature Optimization and Dimensionality Reduction

Prior to dual-methodology evaluation, we performed systematic feature refinement using statistical analysis to eliminate multicollinearity artifacts. Figure 4 demonstrates the deterministic relationship between slice types and use cases, confirming feature redundancy requiring elimination.
Figure 4. Slice type and use case type relationship demonstrating perfect correlation, justifying use case elimination as redundant feature.
Statistical analysis employing ANOVA ( η 2 ) for numeric features and Cramér’s V for categorical variables revealed significant redundancy. Table 2 presents comprehensive results showing perfect correlations ( η 2 = 1.000 or Cramér’s V = 1.000) for budget-related features and configuration parameters. Systematic elimination of six redundant features yielded an optimized space of five variables: four independent numeric features (Transfer Rate, Latency, Packet Loss, and Jitter) and one categorical feature (Packet Loss Budget) with strong association (Cramér’s V = 0.791).
Table 2. Feature association analysis and selection strategy.

3.3. PRDA Implementation: Laboratory Baseline

The PRDA methodology establishes theoretical performance benchmarks through minimal preprocessing interventions:
D P R D A = M m i n i m a l ( D o r i g i n a l )
where M m i n i m a l encompasses correlation-based feature selection and label encoding. Label encoding maps categorical attributes to integer values, fitted exclusively on training data to prevent leakage while preserving structural characteristics. Figure 5 shows the approximate class balance maintained, with four slice types exhibiting near-uniform distribution (22.0–22.6%) and mURLLC showing lower representation (10.9%).
Figure 5. Original (percentages may not sum to 100% due to rounding) dataset network slice distribution.

3.4. LVRT Implementation: Realistic Operational Simulation

The LVRT methodology transforms laboratory data into realistic deployment benchmarks through systematic application of four sequential operators addressing operational challenges documented in the telecommunications literature:
D L V R T = ϕ SMOTE ϕ missing ϕ noise ϕ distribution ( D original )
where the composition operator ∘ indicates sequential application from right to left.

3.4.1. Market-Driven Distribution Transformation

Real-world 6G networks exhibit severe class imbalance based on comprehensive industry projections from flagship research initiatives [,]. The realistic distribution applies market-driven allocation:
p realistic = [ 0.55 , 0.18 , 0.12 , 0.089 , 0.061 ] T
corresponding to [feMBB, umMTC, mURLLC, ERLLC, MBRLLC], reflecting feMBB dominance (55%) driven by bandwidth-intensive applications, umMTC expansion (18%) reflecting exponential IoT growth in smart city deployments, and specialized slice adoption patterns. This creates a 9:1 imbalance ratio representing critical deployment challenges that algorithms must handle effectively under operational stress.

3.4.2. Multi-Layer Noise Injection

Urban 6G deployments experience complex interference patterns requiring multi-layer modeling based on the established telecommunications literature [,] and 3GPP specifications [,]:
x noisy [ n ] = ( x [ n ] + w AWGN [ n ] + w imp [ n ] ) · g [ n ]
Layer 1 introduces additive white Gaussian noise (AWGN) representing measurement uncertainty with scenario-stratified SNR distributions following 3GPP urban deployment statistics: outdoor line-of-sight (18 ± 3.8 dB), outdoor non-line-of-sight (11 ± 4.1 dB), indoor hotspots (14 ± 4.5 dB), and vehicle penetration (−1 ± 3.2 dB), with scenario probabilities of 25%, 45%, 20%, and 10%, respectively. These empirically validated ranges reflect documented field measurements from operational 6G testbeds [].
Layer 2 adds impulsive interference modeling irregular electromagnetic interference from industrial equipment, with 5% occurrence probability reflecting documented interference patterns in dense urban environments []:
w imp [ n ] = I [ n ] · A [ n ] · S [ n ]
I [ n ] Bernoulli ( 0.05 )
Layer 3 incorporates multiplicative gain variations representing system calibration drift and environmental adaptation, with 2% standard deviation reflecting typical RF front-end stability characteristics documented in wireless system analysis []:
g [ n ] N ( 1.0 , 0 . 02 2 )

3.4.3. Systematic Missing Data and Quality Control

Operational 6G networks exhibit systematic data quality degradation following documented failure patterns in telecommunications infrastructure [,]. We model four distinct failure modes representing sensor failures, network outages, measurement errors, and maintenance windows, generating the following expected missing data rate:
E [ missing rate ] = i P i · E [ D i ] 0.07   ( 7 % )
Quality control implements a two-stage process to maintain dataset utility while preserving realistic characteristics. The complete data transformation sequence proceeds as follows: Original laboratory dataset (10,000 samples) → Quality control removes samples with >20% missing values (7997 samples) → SMOTE augmentation balances class distribution (final: 21,033 samples). The first stage applies a threshold-based filter to prevent training on inadequate information, as formalized in Equation (9):
Remove sample i if j = 1 d 1 [ x i , j = NaN ] d > 0.20
The second stage addresses severe class imbalance through SMOTE [], generating synthetic samples using linear interpolation between existing minority class instances and their nearest neighbors (Equation (10)):
x synthetic = x i + λ · ( x neighbor x i ) , λ U [ 0 , 1 ]
where x i represents a randomly selected minority class instance, x neighbor is one of its k nearest neighbors (k = 5), and λ is a random interpolation factor. This approach maintains underlying data distribution while providing sufficient training examples for minority classes, expanding the dataset to 21,033 samples. Figure 6 illustrates the resulting realistic service distribution.
Figure 6. Realistic dataset network slice distribution (percentages may not sum to 100% due to rounding).
This dual-methodology framework enables comprehensive algorithm assessment spanning idealized laboratory conditions (PRDA: 10,000 samples) to realistic operational challenges (LVRT: 21,033 samples), providing evidence-based guidance for algorithm selection across diverse 6G infrastructure scenarios.

4. Experimental Setup and Model Configuration

This study employs a multi-layered experimental framework to assess the reliability of ML and DL algorithms in the context of 6G network slicing. A total of eleven algorithms are implemented, covering both traditional ML approaches and DL architectures to evaluate trade-offs between interpretability, computational efficiency, and robustness to real-world conditions.
Two distinct experimental regimes are considered: PRDA represents a controlled baseline environment with minimal noise and well-structured data, while LVRT incorporates congestion modeling, multi-layer noise, missing data, and temporal correlation patterns, thereby mimicking actual 6G deployment conditions. This contrast allows analysis of model performance when transitioning from theoretical laboratory conditions to realistic field operations.
The dataset consists of 10,000 samples, stratified into 80% training and 20% testing sets. Class imbalance is handled using SMOTE ( k = 5 ), feature selection relies on SelectKBest (f-classification, top 15 features), and features are normalized using Z-score standardization. Evaluation employs 5-fold repeated cross-validation with three repetitions to minimize variance, while SHAP-based XAI enables interpretable feature attribution across models.

4.1. Traditional ML Configurations

The traditional algorithms were tuned to balance accuracy with computational cost, as summarized in Table 3. Ensemble methods such as Random Forest, Gradient Boosting, and XGBoost use multiple trees to capture non-linear decision boundaries, while simpler models like Logistic Regression and Naive Bayes offer interpretability and computational efficiency. SVM employs an RBF kernel with probability estimation, while kNN uses distance-weighted voting. The configuration is designed to benchmark each model fairly under both PRDA and LVRT.
Table 3. Traditional ML model parameters.

4.1.1. Ensemble Learning Methods

Random Forest employs bootstrap aggregating and out-of-bag error estimation with decision trees as base learners, utilizing random feature selection at each split to minimize overfitting through ensemble aggregation (Equation (11)) []. Gradient Boosting constructs additive models sequentially by iteratively fitting new models to the negative gradient of the loss, controlling learning progression via a learning rate (Equation (12) []. XGBoost (version 3.0.5) integrates L1/L2 regularization with advanced tree pruning and parallel computation, enhancing efficiency and reducing overfitting risk through a regularized objective (Equation (13)) [].
y ^ i = 1 B b = 1 B T b ( x i )
where T b ( x i ) denotes the prediction of the b-th decision tree for input instance x i , and B is the total number of trees. The final prediction y ^ i is obtained by averaging (for regression) or majority voting (for classification) across all base learners, thus reducing variance and mitigating overfitting.
y ^ i ( m ) = y ^ i ( m 1 ) + γ m h m ( x i )
where y ^ i ( m 1 ) is the prediction up to stage m 1 , h m ( x i ) is the newly added weak learner fitted to the negative gradient of the loss, and γ m is the learning rate controlling the contribution of h m . This sequential refinement allows the model to minimize the residual errors step by step, leading to improved generalization performance.
L ( ϕ ) = i = 1 n ( y ^ i , y i ) + k = 1 K Ω ( f k ) , Ω ( f ) = γ T + 1 2 λ w 2
where ( y ^ i , y i ) denotes the loss between predicted value y ^ i and true label y i , and Ω ( f ) penalizes model complexity. Here, T is the number of leaves, w represents leaf weights, γ is the regularization parameter penalizing additional leaves, and λ controls L 2 regularization on weights.

4.1.2. Linear and Probabilistic Models

Logistic Regression utilizes a sigmoid link for binary probability estimation with L2 regularization to enhance generalization in high-dimensional spaces (Equation (14)) []. Naive Bayes applies Bayes’ theorem under conditional independence and (for continuous features) Gaussian likelihood assumptions, computing class posteriors via multiplicative likelihood integration (Equation (15)) [].
P ( y = 1 x ) = 1 1 + exp β 0 + j = 1 p β j x j
where the probability of class y = 1 is modeled using the sigmoid function applied to a linear combination of input features x j with coefficients β j .
y ^ = arg max c C P ( C = c ) i = 1 d P ( x i C = c )
which predicts the class y ^ by maximizing the posterior probability under the assumption of conditional independence among features x i .

4.1.3. Tree-Based and Instance-Based Methods

Decision Tree induces interpretable rule-based models via recursive binary splitting that minimizes CART’s Gini impurity (Equation (16)) []. Support Vector Machine employs an RBF kernel transformation to find maximum-margin separating hyperplanes in higher-dimensional spaces (Equation (17)) [,]. k-NN classifies instances through distance-weighted majority voting among the k nearest neighbors, adapting locally to data topology (Equation (18)) [].
G i n i ( S ) = 1 c = 1 C p c 2
where p c is the class proportion of y i in node S.
K ( x i , x j ) = exp γ x i x j 2
which maps input pairs ( x i , x j ) into a higher-dimensional space to separate y i classes.
y ^ i = arg max c C j N k ( x i ) w j 1 [ y j = c ]
where y ^ i is assigned to the majority class among the k nearest neighbors y j of x i .
Table 3 summarizes the hyperparameter settings of traditional models. Ensemble methods such as Random Forest and XGBoost prioritize depth and parallelism to handle high-dimensional interactions, while Gradient Boosting incorporates a moderate learning rate for stability. Simpler models (Naive Bayes, Logistic Regression, and Decision Tree) operate under stricter assumptions but provide baselines for interpretability and low complexity. This parameterization ensures that differences in outcomes reflect algorithmic characteristics rather than suboptimal tuning.

4.2. DL Architectures

DL models are tuned for both spatial and temporal feature learning, incorporating regularization techniques and dynamic training adaptations. The architectures are designed to capture different aspects of feature relationships while maintaining computational efficiency through optimized training configurations as detailed in Table 4.
Table 4. DL architectures and training configurations.

4.2.1. CNN

CNN employs stacked one-dimensional convolutional layers to extract hierarchical local patterns from sequential inputs, with batch normalization to stabilize learning and dropout regularization to mitigate overfitting [,]. The convolutional layers extract local feature patterns (Equation (19)), which are subsequently aggregated by dense layers and transformed into final class probabilities through the softmax function (Equation (20)).
y c out , t = σ c in m w c out , c in , m x c in , t + m + b c out
where w c out , c in , m are the convolutional filter weights, x c in , t + m are the input features, b c out is the bias term, and σ ( · ) is the activation function.
y ^ i = softmax W ( out ) h ( CNN ) + b ( out )
where h ( CNN ) is the flattened representation obtained after convolution and pooling.

4.2.2. LSTM

LSTM overcomes the vanishing gradient problem inherent in standard RNNs by utilizing input, forget, and output gates that selectively retain, update, and transfer temporal information across extended sequences [,]. The gating mechanisms and state updates are defined in Equations (21) and (22), while the final prediction is obtained through the softmax transformation of the hidden state as shown in Equation (23).
i t = σ W i [ h t 1 , x t ] + b i , f t = σ W f [ h t 1 , x t ] + b f , o t = σ W o [ h t 1 , x t ] + b o
the input gate of the LSTM is i t , the forget gate is f t , and the output gate is o t , where σ ( · ) is the sigmoid activation.
C ˜ t = tanh W C [ h t 1 , x t ] + b C , C t = f t C t 1 + i t C ˜ t , h t = o t tanh ( C t )
the candidate cell state is C ˜ t , the updated cell state is C t , and the hidden state is h t .
y ^ i = softmax W ( out ) h t + b ( out )
where h t is the final hidden state at time t.

4.2.3. FNN

FNN consists of stacked dense layers with non-linear activation functions, enhanced by dropout regularization and batch normalization between layers to accelerate convergence and stabilize training dynamics [,]. The dense connectivity enables comprehensive feature interaction modeling as formulated in Equation (24), while the final prediction is obtained through the softmax transformation of the last hidden representation as shown in Equation (25). Regularization techniques are commonly applied to prevent overfitting in high-dimensional spaces.
h ( l + 1 ) = σ W ( l ) h ( l ) + b ( l ) , h ( 0 ) = x i
where σ ( · ) is the activation function, W ( l ) and b ( l ) are the weights and biases of layer l, and h ( 0 ) = x i is the input vector of instance i.
y ^ i = softmax W ( out ) h ( L ) + b ( out )
where h ( L ) is the last hidden representation.
Table 4 details the layer structures and training regimes of the three DL models. CNNs use progressively deeper convolutional layers to extract hierarchical features, followed by dense layers for classification. LSTM integrates two recurrent layers to capture temporal dependencies before transitioning to dense layers. The FNN employs multiple dense layers to exploit feature interactions directly. Training configurations are harmonized across models to enable fair comparisons.

4.3. Resilience Threshold Derivation and SLA Risk Analysis

The resilience classification framework establishes quantitative boundaries for algorithm performance degradation based on empirical analysis of 6G service level agreement (SLA) violation risks and operational deployment constraints. This systematic approach draws from ITU-R recommendations, 3GPP technical specifications, and comprehensive industry deployment studies to ensure practical relevance for operational 6G networks.
Modern 6G network slicing demands unprecedented reliability levels, with different service categories imposing distinct performance constraints that directly influence SLA compliance. Our resilience threshold calibration methodology aligns algorithm performance boundaries with documented SLA violation probability distributions, enabling evidence-based deployment decisions across diverse infrastructure scenarios.
The threshold derivation process considers multiple factors: measurement uncertainty inherent in 6G systems, acceptable service degradation boundaries for different application categories, monitoring resource requirements, and documented failure mode analysis from operational networks. Table 5 presents the comprehensive threshold framework linking performance degradation levels to SLA violation risks and deployment suitability.
Table 5. SLA-based resilience threshold derivation.
The Excellent resilience threshold (−5%) reflects inherent measurement uncertainty documented in 6G system specifications. According to 3GPP technical reports [], typical radio frequency measurement accuracy ranges from 3 to 5% under operational conditions, making performance variations within this range attributable to normal operational variance rather than algorithmic degradation. Algorithms maintaining performance within this bound demonstrate robustness characteristics suitable for ultra-reliable low-latency communication (URLLC) and extremely reliable low-latency communication (ERLLC) slices requiring 99.999% availability.
The Good resilience threshold (−5% to −10%) aligns with acceptable service quality boundaries established through extensive user experience studies in commercial deployments. Industry research [] demonstrates that user satisfaction remains within acceptable parameters for performance degradations up to 10% in bandwidth-intensive applications, making this threshold appropriate for further enhanced mobile broadband (feMBB) and ultra-massive machine type communication (umMTC) deployments where slight performance variations do not compromise core functionality.
The Moderate resilience threshold (−10% to −20%) identifies algorithms requiring enhanced monitoring infrastructure and potential redundancy mechanisms to maintain service continuity. This boundary reflects the transition point between acceptable and concerning performance levels documented in comprehensive network operator studies [], indicating that while deployment remains feasible, additional operational oversight becomes necessary to prevent service degradation.
The Poor resilience threshold (>−20%) establishes the boundary for unacceptable operational performance based on systematic SLA violation analysis []. Performance degradations exceeding 20% indicate fundamental algorithmic limitations that compromise service delivery reliability, requiring substantial improvements before consideration for production deployment in any operational scenario.
This resilience-based classification framework enables quantitative deployment risk assessment through the following mathematical relationship:
Risk deployment = f ( Δ A , Service Category , Infrastructure Type )
where f ( Δ A ) = Minimal if Δ A 5 % Low if 10 % Δ A < 5 % Moderate if 20 % Δ A < 10 % High if Δ A < 20 %
where Δ A represents the relative accuracy change between controlled and realistic conditions, enabling systematic algorithm selection aligned with specific deployment scenarios, risk tolerance requirements, and operational infrastructure constraints. This framework provides network operators with quantitative guidance for evidence-based decision making in algorithm selection and deployment planning across diverse 6G infrastructure scenarios.

5. Experimental Results and Analysis

All experiments were conducted on Google Colab Pro+ platform (Ubuntu 22.04.4 LTS runtime environment) with high-RAM configuration (51.0 GB system RAM; GPU acceleration enabled), ensuring reproducible performance comparisons across algorithms while representing the typical cloud-based infrastructure available for 6G network operators.

5.1. Performance Analysis: PRDA vs. LVRT Comparison

The experimental evaluation reveals distinct algorithmic behaviors under controlled laboratory conditions PRDA versus realistic operational environments LVRT. Table 6 presents baseline performance under idealized conditions, where deep learning models achieve perfect accuracy (CNN, FNN: 100%) at significant computational cost (46–56 s training time), while traditional algorithms demonstrate varying effectiveness ranging from 60.9% (SVM) to 89.0% (Naive Bayes). It is important to note that performance changes are measured as degradation percentages; positive values indicate accuracy loss under realistic conditions, while negative values indicate counter-intuitive improvements.
Table 6. PRDA results (10,000 samples).
Table 7 demonstrates performance under realistic operational conditions incorporating network congestion, missing data, and multi-layer interference patterns. Deep learning models maintain superior performance (CNN: 81.2%; FNN: 81.1%) while traditional algorithms exhibit remarkable adaptability, with several showing counter-intuitive performance improvements under realistic conditions.
Table 7. LVRT results (21,033 samples).
To quantify algorithmic resilience, we define the relative accuracy change as follows:
Δ ( A ) = A P R D A ( A ) A L V R T ( A ) A P R D A ( A ) × 100 %
where A P R D A ( A ) and A L V R T ( A ) denote the accuracies of algorithm A under PRDA and LVRT, respectively. Negative values indicate performance improvement under realistic conditions, while positive values signify degradation. Here, positive Δ ( A ) values indicate performance degradation (e.g., CNN: +18.8% means accuracy decreased from 100% to 81.2%), and the negative values indicate performance improvement under realistic conditions (e.g., SVM: −14.1% means accuracy increased from 60.9% to 69.5%).
Figure 7 illustrates the performance transition patterns across all evaluated algorithms. To quantify algorithmic resilience, we define the relative accuracy change as shown in Equation (28), where negative values indicate performance improvement under realistic conditions.
Figure 7. PRDA vs. LVRT accuracy comparison across algorithms.
Table 8 reveals three distinct adaptation patterns:
Table 8. Performance transition (PRDA → LVRT) across algorithms.
  • Resilient: SVM (RBF) and Logistic Regression show performance improvements (−14.1% and −10.3%, respectively), demonstrating robustness under realistic conditions.
  • Stable: k-NN maintains consistent performance (+1.1%), showing minimal sensitivity to operational changes.
  • Degrading: Traditional ML algorithms (Naive Bayes +34.8%, ensemble methods +15–17%) and deep learning models (CNN +18.8%, FNN +18.9%) experience substantial performance losses under realistic conditions.
The performance degradation observed in ensemble methods and neural networks under LVRT conditions reflects challenges in handling realistic operational stress, where algorithms trained on clean data struggle with the complexity introduced by noise, class imbalance, and missing data patterns characteristic of operational 6G networks.
In the Table 8, the Δ A metric quantifies performance change as follows: Δ A = A P R D A A L V R T A P R D A × 100 % . Therefore, positive values (e.g., Naive Bayes +34.8%) represent accuracy degradation, while negative values (e.g., SVM −14.1%) indicate improved performance under realistic conditions.

5.2. Computational Efficiency and Deployment Constraints

The training time analysis reveals critical trade-offs for 6G edge deployment scenarios where computational resources impose strict operational constraints. Figure 8 demonstrates the computational cost spectrum across algorithm families, with lightweight models (Naive Bayes: 0.02 s) enabling real-time edge deployment while complex architectures (CNN: 52.36 s) require centralized training with inference model distribution.
Figure 8. Average training time comparison (log scale) between traditional ML and DL models.
Edge deployment constraints create a three-tier strategy: sub-0.1 s algorithms enable direct edge deployment for emergency slice reconfiguration, medium-complexity algorithms (0.2–15 s) support regional cloud deployment for near-real-time operations, while high-complexity algorithms require centralized training architectures. The 2600× speed advantage of Naive Bayes over CNN translates to proportional energy savings critical for sustainable 6G operations.
The analysis of computational complexity in Table 9 shows clear boundaries for the deployment of 6G network slicing algorithms. Linear complexity algorithms (Naive Bayes, k-NN) enable real-time edge deployment essential for ultra-low latency slicing scenarios [,]. Log-linear methods (Decision Tree, ensemble approaches) require regional cloud infrastructure [,,]. Quadratic complexity algorithms (SVM) exhibit prohibitive scaling, limiting deployment to offline scenarios []. Deep learning architectures maintain high training complexity with consistent inference requirements, necessitating centralized training with edge inference deployment [,]. This analysis establishes the boundaries of computational feasibility that are essential for selecting algorithms in resource-constrained 6G deployment scenarios, where real-time decision-making (with a latency of less than 100 ms for URLLC applications) is a fundamental operational requirement.
Table 9. Algorithm complexity.
The theoretical complexity bounds translate into concrete operational constraints when evaluated against our dataset parameters ( n = 10 , 000 21 , 033 ; d = 5 ; c = 5 ). Linear complexity algorithms demonstrate manageable scaling. Naive Bayes processes 250 , 000 operations (PRDA) expanding to 525 , 825 operations (LVRT) with a proportional 2 × training time increase ( 0.02 s→ 0.04 s), while k-NN scales from 50 , 000 to 105 , 165 operations with a 2.8 × time increase ( 0.06 s→ 0.17 s). These remain within the < 100 ms URLLC requirements for real-time edge deployment. Quadratic complexity algorithms reveal prohibitive scaling: SVM operations expand from 500 M to 2.2 B, manifesting as dramatic as a 14.7 × training time degradation ( 35.49 s→ 521.3 s). This exponential growth renders SVM incompatible with dynamic 6G slice reconfiguration requirements. Deep learning architectures process 1.34 B– 2.82 B operations but maintain stable 52–56 s training times due to fixed epoch limits, necessitating hybrid deployment with centralized training and edge inference. The analysis sets quantitative deployment boundaries: real-time slice adaptation can only be supported by sub-second algorithms (Naive Bayes; k-NN), while 6G’s millisecond decision requirements are incompatible with quadratic methods because they create computational bottlenecks. This numerical validation shows that operational constraints, rather than laboratory accuracy, should inform decisions on which algorithms to use for 6G.

5.3. Algorithm Resilience Validation

To validate the robustness of our findings across varying operational conditions, we conducted comprehensive sensitivity analysis under multiple urban 6G SNR scenarios reflecting documented deployment environments. The validation methodology employed three representative urban conditions: optimal urban environments (18–25 dB SNR) characteristic of well-planned metropolitan deployments, dense urban environments (8-15 dB SNR) representing challenging high-density scenarios, and extreme challenging conditions (0–8 dB SNR) simulating worst-case interference scenarios with significant electromagnetic interference from industrial equipment and competing wireless systems.
Table 10 demonstrates the resilience rankings remain highly stable across all SNR conditions, with Spearman’s correlation coefficient ρ = 0.989 (p < 0.001) between optimal and challenging scenarios, providing strong evidence for the reliability of our resilience classifications across the complete spectrum of evaluated algorithms.
Table 10. Comprehensive algorithm performance across urban 6G SNR scenarios.

5.3.1. Statistical Validation of Resilience Classifications

Bootstrap confidence interval analysis (n = 1000 iterations) confirmed statistical significance of performance differences across all evaluated algorithms (p < 0.05). The analysis categorized algorithms based on their performance degradation under realistic operational conditions relative to laboratory benchmarks:
  • Excellent Resilience (3 algorithms): Mean degradation +8.2% ± 2.1%. These algorithms (SVM, Logistic Regression, and k-NN) demonstrate relatively stable performance under operational stress.
  • Good Resilience (2 algorithms): Mean degradation +18.9% ± 0.1%. This category includes CNN and FNN, showing manageable performance loss while maintaining high absolute accuracy levels suitable for high-performance applications.
  • Moderate Resilience (5 algorithms): Mean degradation +15.8% ± 1.2%. Enhanced LSTM, ensemble methods (Random Forest, XGBoost, and Gradient Boosting), and Decision Tree require careful deployment consideration with appropriate monitoring mechanisms for standard commercial slice deployments.
  • Poor Resilience (1 algorithm): Mean degradation +34.8%. Naive Bayes demonstrates substantial performance degradation unsuitable for production deployment without significant algorithmic modifications or restricted to research environments.
The one-sample t-test confirmed statistically significant overall performance variation (t = −4.139, p = 0.002), with Cohen’s effect sizes ranging from small (d = −4.41) to large (d = −146.39), demonstrating substantial heterogeneity in algorithmic responses to operational stress conditions. These findings provide empirical evidence for algorithm-specific resilience characteristics that must be considered in practical 6G network slice management decisions.

5.3.2. Mechanism Validation Through Empirical Analysis

To validate the three proposed improvement mechanisms underlying the resilience phenomenon, we conducted systematic ablation studies isolating individual transformation components. Each mechanism was tested independently to establish causal relationships between specific environmental factors and algorithmic performance changes.
SMOTE Regularization Effect: Controlled experiments comparing identical datasets with and without SMOTE augmentation revealed differential algorithmic responses to synthetic minority class generation. Linear algorithms (SVM; Logistic Regression) showed average performance improvements of 8–12%, demonstrating enhanced decision boundary robustness. Tree-based methods exhibited mixed responses, with ensemble methods showing modest improvements (2-4%) while simple Decision Trees remained relatively stable.
Noise-Induced Regularization: Systematic noise injection experiments across controlled SNR ranges demonstrated that linear algorithms with appropriate regularization (SVM; Logistic Regression) improved generalization capability under moderate noise conditions by 6–10%. Neural networks exhibited performance degradation of 18–19%, while ensemble methods showed moderate degradation of 15–17%. This differential response supports the regularization hypothesis that controlled noise prevents overfitting for algorithms with appropriate inductive biases.
Class Imbalance Adaptation: Analysis of algorithmic responses to realistic class imbalance patterns (9:1 ratio) revealed that certain algorithms exploit structured imbalance more effectively than balanced distributions. Linear classifiers demonstrated superior adaptation to minority class detection, while tree-based methods struggled with extreme imbalance despite SMOTE augmentation.
These empirical validations provide mechanistic evidence that performance changes under realistic conditions reflect fundamental algorithmic characteristics rather than random degradation. Linear algorithms with strong regularization demonstrate superior resilience, while complex ensemble methods and neural networks show vulnerability to operational stress despite higher laboratory performance.

5.4. XAI Analysis and Algorithm Interpretability

The deployment of ML algorithms in mission-critical 6G network slicing requires transparent and interpretable decision-making processes to ensure regulatory compliance and operational confidence. This section presents a comprehensive explainability analysis using the SHAP (SHapley Additive exPlanations) framework to quantify feature importance patterns and establish deployment guidelines based on interpretability requirements.

5.4.1. SHAP-Based Feature Importance Framework

The deployment of ML algorithms in mission-critical 6G network slicing necessitates transparent decision-making processes to ensure regulatory compliance and operational confidence. We employ algorithm-specific SHAP explainers tailored to model characteristics: TreeExplainer for ensemble methods, LinearExplainer for Logistic Regression, and KernelExplainer for SVM and k-NN architectures.
To validate interpretation robustness, we applied three complementary explainability methods to the k-NN classifier. Figure 9 demonstrates remarkable consistency across SHAP, LIME, and Permutation Importance methods, with Packet Loss Budget dominating feature importance rankings (0.15–0.30 across methods) in both operational scenarios. This cross-method agreement validates that our interpretations reflect genuine algorithmic behavior rather than method-specific artifacts.
Figure 9. Global explainability comparison for the k-NN model.
The analysis reveals cross-scenario variability in algorithmic focus patterns. Figure 10 illustrates how Decision Tree models adapt their feature emphasis between datasets, with PRDA models prioritizing Slice Jitter (0.320) and Slice Latency (0.298), while LVRT models demonstrate increased reliance on Packet Loss Budget (0.332), reflecting operational stress adaptation mechanisms.
Figure 10. Comparison of SHAP-based feature importance between PRDA (left) and LVRT (right) Decision Tree models.

5.4.2. Cross-Scenario Consistency Analysis

Cross-scenario consistency analysis quantifies explanation stability between PRDA and LVRT datasets using correlation coefficients of feature importance rankings. The comprehensive evaluation presented in Table 11 reveals distinct interpretability profiles that directly impact deployment suitability across diverse 6G infrastructure scenarios.
Table 11. Algorithm interpretability consistency analysis.
Neural network architectures demonstrate exceptional interpretability stability, with CNN and LSTM achieving perfect consistency scores (1.000) as shown in the table. Classical algorithms exhibit similarly robust behavior, with Naive Bayes (0.998), SVM (0.997), and Logistic Regression (0.988) maintaining minimal explanation variance across operational conditions. These high-consistency algorithms provide reliable feature importance rankings essential for regulatory compliance scenarios.
Conversely, XGBoost presents significant interpretability challenges despite competitive accuracy performance, exhibiting the lowest consistency score (0.106) in our analysis. This finding indicates that high predictive performance does not guarantee explanation stability, necessitating careful consideration of interpretability requirements in algorithm selection decisions.

5.4.3. Algorithm Selection Guidelines for 6G Deployment

Local explanation analysis provides concrete insights into algorithmic decision-making under varying operational conditions. Table 12 and Table 13 present detailed misclassification analysis under contrasting SNR environments, demonstrating how operational stress fundamentally alters feature attribution patterns.
Table 12. Local explanation for high SNR misclassification case (k-NN, Sample #0).
Table 13. Local explanation for low SNR misclassification case (k-NN, Sample #0).
Under high SNR conditions (15–25 dB), as detailed in Table 12, moderate latency elevation (4.19M ns) generates positive SHAP contributions (+0.31) suggesting similarity-based classification mechanisms. In contrast, Table 13 reveals that low SNR environments (−10 to +5 dB) produce extreme latency values (38.0M ns) triggering strong negative SHAP contributions (−0.94), indicating a shift from pattern recognition to outlier-based rejection under operational stress.
The analysis establishes evidence-based deployment guidelines linking interpretability characteristics with operational requirements. Mission-critical infrastructure benefits from CNN and FNN architectures that combine high accuracy (>0.81) with perfect consistency scores (1.000). Regulatory compliance scenarios are optimally served by classical algorithms maintaining high consistency while providing audit-ready explanation generation. Performance-focused deployments may utilize XGBoost despite interpretability limitations, provided intensive explanation monitoring compensates for consistency challenges.

5.4.4. Deployment Guidelines and Operational Monitoring Framework

The interpretability analysis establishes evidence-based deployment guidelines that integrate performance characteristics, explanation stability, and operational monitoring requirements across diverse 6G network slicing scenarios. This comprehensive framework enables algorithm selection tailored to specific operational contexts where transparency demands vary significantly based on regulatory requirements and mission-critical considerations.
Neural network architectures demonstrate exceptional suitability for mission-critical infrastructure deployment, where both high performance and explanation stability are paramount. CNN and FNN achieve superior accuracy levels (0.812 and 0.811, respectively) while maintaining perfect or near-perfect consistency scores (1.000 and 0.981), ensuring reliable interpretability across varying operational conditions. LSTM architectures provide specialized capabilities for temporal pattern analysis in dynamic slicing scenarios, offering perfect consistency (1.000) despite moderate accuracy performance (0.748).
Regulatory compliance scenarios benefit from classical algorithms that prioritize explanation stability over absolute performance metrics. SVM and Logistic Regression maintain exceptional consistency scores (0.997 and 0.988, respectively), providing audit-ready explanation generation essential for regulatory documentation and compliance verification processes. Performance-focused applications may utilize XGBoost despite its interpretability limitations (0.106 consistency), provided comprehensive monitoring compensates for explanation instability while leveraging superior predictive capabilities (0.726 accuracy).
Table 14 synthesizes these findings into practical deployment recommendations, revealing systematic trade-off patterns between algorithmic complexity and explanation requirements.
Table 14. XAI-based algorithm deployment guide.
The feature importance analysis establishes a hierarchical monitoring framework that prioritizes network parameters based on their demonstrated impact across algorithmic decisions. Packet Loss Budget emerges as the primary monitoring priority, demonstrating consistent dominance across all algorithm categories with feature importance values ranging from 0.15 to 0.30. This parameter requires continuous real-time monitoring with automated anomaly detection capabilities to ensure service continuity and prevent network degradation.
Secondary monitoring priorities encompass Slice Jitter and Latency parameters, which exhibit substantial influence in traditional ML algorithms and require periodic assessment with comprehensive trend analysis capabilities. These timing-critical parameters demand hourly monitoring cycles with weekly threshold reviews to maintain optimal network performance. Contextual parameters including Transmission Rate and Packet Loss constitute the tertiary monitoring tier, requiring monthly trend analysis integrated with quarterly capacity assessments for long-term infrastructure optimization.
This integrated framework creates a comprehensive three-tiered monitoring system that scales intensity according to feature criticality while maintaining service level agreement compliance across all network slicing configurations.

6. Conclusions

This study presents a dual-methodology evaluation framework that comprehensively assesses ML algorithms for 6G network slicing, effectively bridging the gap between laboratory performance and real-world deployment effectiveness. Through systematic comparison of PRDA and LVRT evaluation approaches, we reveal the limitations of conventional laboratory-based assessments in predicting practical performance, emphasizing the necessity of realistic evaluation strategies for operational 6G networks. The investigation contributes to the field in three key ways. First, it quantifies the discrepancy between laboratory and real-world performance, revealing that algorithms achieving perfect accuracy (100% for CNN and FNN) under controlled conditions experience substantial degradation in realistic network environments, with LVRT accuracies ranging from 58.0% to 81.2%. Second, it reveals counter-intuitive patterns of resilience where sophisticated algorithms perform significantly worse (CNN: 18.8 percentage points; FNN: 18.9 points; Naive Bayes: 34.8 points), while simpler classifiers show unexpected improvements (SVM: 14.1 points; Logistic Regression: 10.3 points) when subjected to realistic operational stress. This challenges the notion that algorithms are inherently complex and that data quality is always poor. It demonstrates that certain algorithms can adaptively exploit structural patterns in realistic conditions. Third, it proposes a resilience-based categorization framework for evidence-driven algorithm selection that prioritizes operational stability over idealized laboratory benchmarks, providing network operators with actionable guidance balancing accuracy, computational efficiency, and deployment reliability. The XAI analysis strengthens practical implications through SHAP-based interpretability assessment, identifying Packet Loss Budget as the dominant feature across all algorithms, with Slice Jitter and Slice Latency as secondary factors. This establishes clear monitoring priorities for operational deployment. Neural networks (CNN: 1.000; LSTM: 1.000) and classical algorithms (SVM: 0.997; Logistic Regression: 0.988) exhibit exceptional interpretability consistency suitable for regulatory compliance scenarios. Cross-scenario analysis confirms that algorithm selection must balance accuracy and explanation stability, with high-consistency algorithms proving suitable for mission-critical deployments, while low-consistency algorithms like XGBoost (0.106) require intensive monitoring despite superior accuracy performance. The framework’s enhancement will be achieved through dataset diversity expansion across geographic regions and infrastructure types, incorporation of emerging architectures beyond CNN, FNN, and LSTM, investigation of temporal dynamics in adaptive algorithms, and validation through real-time deployment testing in operational networks. Integration of federated learning approaches with industry standardization efforts has potential to enhance practical adoption and scalability. This comprehensive evaluation methodology ensures regulatory compliance and enhances the reliability and efficiency of mission-critical 6G deployments. While our analysis is based on a single dataset of 10,000 samples, the methodological framework itself—comparing PRDA laboratory conditions with LVRT realistic transformations—provides a dataset-agnostic approach applicable across diverse 6G scenarios. The framework’s value lies not in absolute performance metrics but in revealing systematic algorithmic behavior patterns under operational stress, establishing evidence-based guidelines for deployment decisions that balance accuracy (58.0–81.2% under realistic conditions), computational efficiency (0.04 s–789.5 s training times), and interpretability consistency (0.106–1.000 scores). Future validation across heterogeneous network environments will further strengthen the generalizability of these resilience-based insights, ultimately establishing a new standard for realistic ML algorithm assessment in next-generation wireless networks.

7. Future Work

While this study establishes a robust framework for evaluating 6G network slice classification algorithms under realistic conditions, it should be noted that several avenues remain for further exploration. Subsequent research will concentrate on validating the findings on heterogeneous, real-world datasets spanning multiple network domains in order to assess the generalizability of resilience-based insights. The investigation will encompass advanced DL architectures, including Graph Neural Networks for topology-aware modeling, Transformers for capturing long-range temporal dependencies, and hybrid CNN-LSTM models for spatio-temporal feature extraction. The development of online and continual learning algorithms will facilitate the adaptation of models to dynamic network conditions and enable the acquisition of new slice types without the occurrence of catastrophic forgetting. In order to enhance the robustness of the system while preserving data privacy, the implementation of federated learning frameworks for collaborative model training across multiple operators is proposed. The utilization of XAI-driven insights will facilitate the proposal of minimal yet sufficient feature sets for AI-based slice management, thereby contributing to standardization efforts for interoperable multi-vendor orchestration. Furthermore, the LVRT framework should be extended to capture rare catastrophic events and unpredictable anomalies beyond systematic operational impairments. Future work will incorporate stochastic extreme-event modeling, adversarial stress testing, and heavy-tailed distribution analysis to ensure algorithmic resilience in mission-critical slices (mURLLC; ERLLC) under low-probability–high-impact scenarios. This extension will complement current systematic stress testing with comprehensive anomaly detection capabilities essential for ultra-reliable 6G deployments. Finally, the most resilient algorithms will be deployed on physical 6G testbeds to evaluate real-time performance, computational overhead, and latency in closed-loop network slice orchestration. By pursuing these directions, the framework presented in this work can evolve into practical, deployable, and standardized AI solutions, supporting the autonomous and efficient operation of future 6G networks.

Author Contributions

S.N.K. provided the vast majority of the content for this work, taking the lead in conceptualization, methodology, software development, formal analysis, investigation, data curation, and the writing of the initial draft of this manuscript. M.G. contributed to methodology, validation, visualization, and manuscript review and editing, providing substantial improvements in clarity and presentation. D.K. assisted with visualization and contributed to the review and editing of this manuscript. S.Ç. supported the editing process and contributed feedback to refine this manuscript. M.S.O. and N.B. supervised the work and provided critical oversight and guidance throughout. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The study is based on an existing dataset, on which experimental modifications and adjustments were performed. No new datasets were generated. The underlying dataset is available from the original source, while the modified experimental data are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank The Scientific and Technological Research Council of Türkiye (TÜBİTAK) and Türk Telekom 6G R&D Lab for their support.

Conflicts of Interest

Authors Sümeye Nur Karahan, Merve Güllü, Deniz Karhan, Sedat Çimen, and Mustafa Serdar Osmanca were employed by the company Türk Telekom (Turkey). The remaining author (Necaattin Barışçı) declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Bhide, P.; Shetty, D.; Mikkili, S. Review on 6G communication and its architecture, technologies included, challenges, security challenges and requirements, applications, with respect to AI domain. IET Quantum Commun. 2025, 6, e12114. [Google Scholar] [CrossRef]
  2. Botez, R.; Zinca, D.; Dobrota, V. Redefining 6G Network Slicing: AI-Driven Solutions for Future Use Cases. Electronics 2025, 14, 368. [Google Scholar] [CrossRef]
  3. Chowdhury, M.Z.; Shahjalal, M.; Ahmed, S.; Jang, Y.M. 6G wireless communication systems: Applications, requirements, technologies, challenges, and research directions. IEEE Open J. Commun. Soc. 2020, 1, 957–975. [Google Scholar] [CrossRef]
  4. Chaudhari, B.S. Enabling Tactile Internet via 6G: Application Characteristics, Requirements, and Design Considerations. Future Internet 2025, 17, 122. [Google Scholar] [CrossRef]
  5. Tarafder, P.; Chun, C.; Ullah, A.; Kim, Y.; Choi, W. Channel Estimation in 5G-and-Beyond Wireless Communication: A Comprehensive Survey. Electronics 2025, 14, 750. [Google Scholar] [CrossRef]
  6. Ni, X.; Dong, Z.; Rong, X. Innovative Application of 6G Network Slicing Driven by Artificial Intelligence in the Internet of Vehicles. Int. J. Netw. Manag. 2025, 35, e70004. [Google Scholar] [CrossRef]
  7. Dubey, M.; Singh, A.K.; Mishra, R. AI based resource management for 5g network slicing: History, use cases, and research directions. Concurr. Comput. Pract. Exp. 2025, 37, e8327. [Google Scholar] [CrossRef]
  8. Dangi, R.; Lalwani, P. Harris Hawks optimization based hybrid deep learning model for efficient network slicing in 5G network. Clust. Comput. 2024, 27, 395–409. [Google Scholar] [CrossRef]
  9. Gabilondo, Á.; Fernández, Z.; Viola, R.; Martín, Á.; Zorrilla, M.; Angueira, P.; Montalbán, J. Traffic classification for network slicing in mobile networks. Electronics 2022, 11, 1097. [Google Scholar] [CrossRef]
  10. Khan, H.; Luoto, P.; Samarakoon, S.; Bennis, M.; Latva-Aho, M. Network slicing for vehicular communication. Trans. Emerg. Telecommun. Technol. 2021, 32, e3652. [Google Scholar] [CrossRef]
  11. Spantideas, S.T.; Giannopoulos, A.E.; Trakadas, P. Smart Mission Critical Service Management: Architecture, Deployment Options, and Experimental Results. IEEE Trans. Netw. Serv. Manag. 2024, 22, 1108–1128. [Google Scholar] [CrossRef]
  12. Rost, P.; Mannweiler, C.; Michalopoulos, D.S.; Sartori, C.; Sciancalepore, V.; Sastry, N.; Holland, O.; Tayade, S.; Han, B.; Bega, D.; et al. Network slicing to enable scalability and flexibility in 5G mobile networks. IEEE Commun. Mag. 2017, 55, 72–79. [Google Scholar] [CrossRef]
  13. Barakabitze, A.A.; Ahmad, A.; Mijumbi, R.; Hines, A. 5G network slicing using SDN and NFV: A survey of taxonomy, architectures and future challenges. Comput. Networks 2020, 167, 106984. [Google Scholar] [CrossRef]
  14. Shu, Z.; Taleb, T. A novel QoS framework for network slicing in 5G and beyond networks based on SDN and NFV. IEEE Netw. 2020, 34, 256–263. [Google Scholar] [CrossRef]
  15. Cui, H.; Zhang, J.; Geng, Y.; Xiao, Z.; Sun, T.; Zhang, N.; Liu, J.; Wu, Q.; Cao, X. Space-air-ground integrated network (SAGIN) for 6G: Requirements, architecture and challenges. China Commun. 2022, 19, 90–108. [Google Scholar] [CrossRef]
  16. Zhou, G.; Zhao, L.; Zheng, G.; Song, S.; Zhang, J.; Hanzo, L. Multiobjective Optimization of Space–Air–Ground-Integrated Network Slicing Relying on a Pair of Central and Distributed Learning Algorithms. IEEE Internet Things J. 2024, 11, 8327–8344. [Google Scholar] [CrossRef]
  17. Wu, W.; Zhou, C.; Li, M.; Wu, H.; Zhou, H.; Zhang, N.; Shen, X.S.; Zhuang, W. AI-native network slicing for 6G networks. IEEE Wirel. Commun. 2022, 29, 96–103. [Google Scholar] [CrossRef]
  18. Hamdi, W.; Dağdeviren, O.; Bulut, H. QoS-aware Network Slicing and Resource Management for Internet of Vehicles in 5G networks. Ad Hoc Netw. 2025, 154, 103976. [Google Scholar] [CrossRef]
  19. Ming, Z.; Yu, H.; Taleb, T. Federated deep reinforcement learning for prediction-based network slice mobility in 6G mobile networks. IEEE Trans. Mob. Comput. 2024, 23, 11937–11953. [Google Scholar] [CrossRef]
  20. Dangi, R.; Lalwani, P. Optimizing network slicing in 6G networks through a hybrid deep learning strategy. J. Supercomput. 2024, 80, 20400–20420. [Google Scholar] [CrossRef]
  21. Wang, J.; Liu, J.; Li, J.; Kato, N. Artificial intelligence-assisted network slicing: Network assurance and service provisioning in 6G. IEEE Veh. Technol. Mag. 2023, 18, 49–58. [Google Scholar] [CrossRef]
  22. Alwakeel, A.M.; Alnaim, A.K. Network slicing in 6G: A strategic framework for IoT in smart cities. Sensors 2024, 24, 4254. [Google Scholar] [CrossRef] [PubMed]
  23. Cunha, J.; Ferreira, P.; Castro, E.M.; Oliveira, P.C.; Nicolau, M.J.; Núñez, I.; Sousa, X.R.; Serôdio, C. Enhancing Network Slicing Security: Machine Learning, Software-Defined Networking, and Network Functions Virtualization-Driven Strategies. Future Internet 2024, 16, 226. [Google Scholar] [CrossRef]
  24. HB, M.; GF, A.A.; SM, U. The Network Slicing and Performance Analysis of 6G Networks using Machine Learning. EMITTER Int. J. Eng. Technol. 2023, 11. [Google Scholar] [CrossRef]
  25. Wu, Z.X.; You, Y.Z.; Liu, C.C.; Chou, L.D. Machine learning based 5g network slicing management and classification. In Proceedings of the 2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Osaka, Japan, 19–22 February 2024; IEEE: Piscataway, NJ, USA; pp. 371–375. [Google Scholar] [CrossRef]
  26. Khan, S.; Khan, S.; Ali, Y.; Khalid, M.; Ullah, Z.; Mumtaz, S. Highly accurate and reliable wireless network slicing in 5th generation networks: A hybrid deep learning approach. J. Netw. Syst. Manag. 2022, 30, 29. [Google Scholar] [CrossRef]
  27. Mohammedali, N.A.; Kanakis, T.; Al-Sherbaz, A.; Agyeman, M.O. Traffic classification using deep learning approach for end-to-end slice management in 5g/b5g. In Proceedings of the 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 19–21 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 357–362. [Google Scholar] [CrossRef]
  28. Malik, R.Q.; Alsharfa, R.M.; Mohammed, B.K.; Al-Fatlawi, A.H.; Abd Al-Ameer, M.S.; Najm, H. A Novel Taneja Distance-based Classifier with PSO-Optimized Feature Selection for Efficient 5G Network Slicing. Int. J. Intell. Eng. Syst. 2025, 18. [Google Scholar] [CrossRef]
  29. Mahmoud, H.; Zhang, Y.; Guan, M.; Lu, C.; Ismail, T.; Idrissi, M.; Mi, D.; Daraz, U. A Hybrid Deep Learning Approach for Enhanced Network Slicing in 6G. In Proceedings of the 2024 IEEE Middle East Conference on Communications and Networking (MECOM), Abu Dhabi, United Arab Emirates, 17–20 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 434–439. [Google Scholar]
  30. Jain, M.; Verma, R.; Kumar, S.; Kumar, G.; Basheer, S. Enhancing Network Slicing Efficiency in 6G Networks with a Hybrid Deep Learning Approach Leveraging Crisscross Harris Hawks Optimization. IEEE Commun. Stand. Mag. 2025, 9, 70–77. [Google Scholar] [CrossRef]
  31. Samidi, F.S.; Radzi, N.A.M.; Aripin, N.M.; Jalil, Y.E.; Azmi, K.H.M. 5G Slicing with Machine Learning: Dataset Design, Model Adjustment and Performance Metrics. In Proceedings of the 2025 21st IEEE International Colloquium on Signal Processing & Its Applications (CSPA), Pulau Pinang, Malaysia, 7–8 February 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 161–166. [Google Scholar]
  32. Wang, S.; Qureshi, M.A.; Miralles-Pechuán, L.; Huynh-The, T.; Gadekallu, T.R.; Liyanage, M. Explainable AI for 6G use cases: Technical aspects and research challenges. IEEE Open J. Commun. Soc. 2024, 5, 2490–2540. [Google Scholar] [CrossRef]
  33. Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  34. Samek, W.; Wiegand, T.; Müller, K.R. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv 2017, arXiv:1708.08296. [Google Scholar] [CrossRef]
  35. Gunning, D.; Aha, D. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 2019, 40, 44–58. [Google Scholar] [CrossRef]
  36. Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Giannotti, F.; Pedreschi, D. A survey of methods for explaining black box models. ACM Comput. Surv. 2018, 51, 1–42. [Google Scholar] [CrossRef]
  37. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 4765–4774. [Google Scholar]
  38. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
  39. Aramide, O. Explainable AI (XAI) for Network Operations and Troubleshooting. Int. J. Res. Publ. Semin. 2025, 16, 533–554. [Google Scholar] [CrossRef]
  40. Salehi, S.; Iturria-Rivera, P.E.; Elsayed, M.; Bavand, M.; Gaigalas, R.; Ozcan, Y.; Erol-Kantarci, M. Prioritized Value-Decomposition Network for Explainable AI-Enabled Network Slicing. arXiv 2025, arXiv:2501.15734. [Google Scholar] [CrossRef]
  41. Sun, H.; Liu, Y.; Al-Tahmeesschi, A.; Nag, A.; Soleimanpour-Moghadam, M.; Canberk, B.; Arslan, H.; Ahmadi, H. Advancing 6G: Survey for explainable AI on communications and network slicing. IEEE Open J. Commun. Soc. 2025, 6, 1372–1412. [Google Scholar] [CrossRef]
  42. 5G-IA (5G Industry Association). European Vision for the 6G Network Ecosystem: 5G-IA 6G White Paper; Technical report; European Commission/Hexa-X: Brussels, Belgium, 2022. [Google Scholar] [CrossRef]
  43. Ericsson. Ericsson Mobility Report—Latest Edition. 2025. Available online: https://www.ericsson.com/en/reports-and-papers/mobility-report (accessed on 23 July 2025).
  44. Rappaport, T.S.; Annamalai, A.; Buehrer, R.M.; Tranter, W.H. Wireless communications: Past events and a future perspective. IEEE Commun. Mag. 2002, 40, 148–161. [Google Scholar] [CrossRef]
  45. Goldsmith, A. Wireless Communications; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar] [CrossRef]
  46. 3GPP. Study on Channel Model for Frequencies from 0.5 to 100 GHz (Release 17). Technical Report TR 38.901 V17.00.00, 3rd Generation Partnership Project, Sophia Antipolis, France. 2022. Available online: https://www.etsi.org/deliver/etsi_tr/138900_138999/138901/17.00.00_60/tr_138901v170000p.pdf (accessed on 20 August 2025).
  47. ITU-R. Guidelines for Evaluation of Radio Interface Technologies for IMT-2020; Recommendation ITU-R M.2412-0; International Telecommunication Union: Geneva, Switzerland, 2015; Available online: https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-M.2412-2017-PDF-E.pdf (accessed on 20 August 2025).
  48. Clavier, L.; Peters, G.W.; Septier, F.; Nevat, I. Impulsive Noise Modeling and Robust Receiver Design. EURASIP J. Wirel. Commun. Netw. 2021, 2021, 13. [Google Scholar] [CrossRef]
  49. Razavi, B. RF front-end design challenges for software defined radio. IEEE Commun. Mag. 2012, 50, 64–71. [Google Scholar] [CrossRef]
  50. Donnellan, D.; Lawrence, A.; Bizo, D.; Judge, P.; O’Brien, J.; Davis, J.; Smolaks, M.; Williams-George, J.; Weinschenk, R. Uptime Institute Global Data Center Survey 2024. 2024. Available online: https://intelligence.uptimeinstitute.com/resource/uptime-institute-global-data-center-survey-2024 (accessed on 23 July 2025).
  51. Teh, H.Y.; Kempa-Liehr, A.W.; Wang, K.I.K. Sensor data quality: A systematic review. J. Big Data 2020, 7, 11. [Google Scholar] [CrossRef]
  52. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  53. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  54. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  55. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  56. Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar] [CrossRef]
  57. Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
  58. Breiman, L.; Friedman, J.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman and Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar] [CrossRef]
  59. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  60. Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
  61. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
  62. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2323. [Google Scholar] [CrossRef]
  63. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Bach, F., Blei, D., Eds.; Proceedings of Machine Learning Research. Volume 37, pp. 448–456. [Google Scholar]
  64. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  65. Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Networks Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
  66. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; Adaptive Computation and Machine Learning series; MIT Press: Cambridge, MA, USA, 2016; p. 800. [Google Scholar]
  67. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  68. Cisco Systems. Cisco Annual Internet Report (2018–2023); White Paper; Cisco Systems: San Jose, CA, USA, 2020; Available online: https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html (accessed on 1 August 2025).
  69. NGMN Alliance. 6G Requirements and Design Considerations; NGMN Alliance Publication: Düsseldorf, Germany, 2023; Available online: https://www.ngmn.org/publications/6g-requirements-and-design-considerations.html (accessed on 20 August 2025).
  70. Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA, 4–10 August 2001; pp. 41–46. [Google Scholar]
  71. Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Wadsworth: Belmont, CA, USA, 1984. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.