Machine Learning-Driven Dynamic Traffic Steering in 6G: A Novel Path Selection Scheme

Hisyam Ng, Hibatul Azizi; Mahmoodi, Toktam

doi:10.3390/bdcc8120172

Open AccessArticle

Machine Learning-Driven Dynamic Traffic Steering in 6G: A Novel Path Selection Scheme

by

Hibatul Azizi Hisyam Ng

^*

and

Toktam Mahmoodi

^*

Department of Engineering, King’s College London, London WC2R 2LS, UK

^*

Authors to whom correspondence should be addressed.

Big Data Cogn. Comput. 2024, 8(12), 172; https://doi.org/10.3390/bdcc8120172

Submission received: 21 May 2024 / Revised: 19 November 2024 / Accepted: 25 November 2024 / Published: 27 November 2024

Download

Browse Figures

Versions Notes

Abstract

Machine learning is taking on a significant role in materializing a new vision of 6G. 6G aspires to provide more use cases, handle high-complexity tasks, and improvise the current 5G and beyond 5G infrastructure. Artificial Intelligence (AI) and machine learning (ML) are the optimal candidates to support and deliver these aspirations. Traffic steering functions encompass many opportunities to help enable new use cases and improve overall performance. The emergence and advancement of the non-terrestrial network is another driving factor for creating an intelligence selection scheme to have a dynamic traffic steering function. With service-based architecture, 5G and 6G are data-driven architectures that use massive transactional data to emerge a new approach to handling highly complex processes. A highly complex process, a massive volume of data, and a short timeframe require a scheme using machine learning techniques to resolve the challenges. In this paper, the study creates a scheme to use the massive historical data and provide a decision scheme that enables dynamic traffic steering functions addressing the future emergence of the heterogeneous transport network and aligns with the Open Radio Access Network (O-RAN). The proposed scheme in this paper gives an inference to be programmed in the telecommunication nodes. It provides a novel scheme to enable dynamic traffic steering functions for the 6G transport network. The study shows an appropriate data size to create a high-performance multi-output classification model that produces more than 90% accuracy for traffic steering functions.

Keywords:

machine learning; Open Radio Access Network; non-terrestrial network; QoS flow identifier; mixed integer linear programming; optimization solver service function chaining; Virtual Network Function; multi-output classification; random forest algorithm

1. Introduction

Machine learning is obtaining an important role in many industries. The usability of machine learning contributes to regression, classification, clustering, dimensionality reduction, and decision-making tasks that constitute supervised learning, unsupervised learning, and reinforcement learning, respectively. Each machine learning is driven by distinctive functionalities such as predictive tasks, insight extraction from the unlabeled data, and the responses from the current state. The adoption of machine learning by industries, specifically in the telecommunication field, is focused on lowering operating costs, creating highly automated processes, creating new revenue streams, and enabling faster provisioning processes (fulfilment and assurance) [1]. Machine learning applications are positioned to accelerate a more critical use of resources to enable sixth generation (6G) networks [1].

This progression in the telecommunication industry incited the growth of many applications to carry specific tasks in various flows/activities from 5G, subsequently progressing to improve in similar ways for 6G. Traffic steering functions optimize the overall flows for traffic traversing in the telecommunication infrastructure. This function gives bare load balancing functionality to dynamic traffic management by re-routing the traffic based on network conditions [2]. Ultimately, the traffic steering function computes the end-to-end path of the traffic flow to obtain the optimum route. Other than optimizing the flow, the study in [3] adopts the traffic steering method to improve the reliability of the data transmission utilizing heterogeneous access networks. This area is further investigated by [4,5] to improve the overall resource using distinctive machine learning methods to predict traffic demands and avoid congestion.

6G aspires to have new use cases, more devices, and ubiquitous connectivity. The two essential elements qualifying the next generation of networks into 6G are the diverse use cases and the heterogeneous transport networks. According to the study in [6], four pillars support the 6G aspirations: (1) Enhanced Human Communication, (2) Enhanced Machine Communication, (3) Enabling Services, and (4) Network Evolution. In particular, this paper focuses on the network evolution of 6G, which aims to enable artificial intelligence (AI) and expand service ubiquity. AI and service ubiquity emerge as new opportunities to provide additional options for transport network resources to the overall service deliverables. The research expands to understand the approach to maximizing the enormous volume of data generated in the infrastructure, potentially giving valuable insight for subsequent actions to fulfil the service ubiquity aspiration by enabling the integration of a non-terrestrial network (NTN) into the terrestrial network (TN). The innate nature of satellite technology from NTN integration gives inevitable service coverage to an area. Hence, extending the TN to NTN creates new resource provisioning benefits. Nonetheless, incorporating the NTN into 5G and 6G is the main challenge while upholding the existing critical requirements in 5G and future 6G.

The applications of machine learning in telecommunication processes are vast and diverse. However, machine learning essentially works with large volumes of data, and handling data for machine learning processes requires massive processing computations and more time. Incorporating a new process into 5G and 6G infrastructure using machine learning shall consider the time-critical applications because 6G demands a lower latency value [7]. Based on the study in [4,5], an Open Radio Access Network (O-RAN) relies on non-real-time and near-real-time nodes to perform offline learning of the data and push the programmed inference to the RAN component. Although machine learning gives new propositions to the decision-making process, a comprehensive study is required to ensure the non-impairment of the latency value. In recent years, several studies have been conducted to improve the latency value, such as the placement of edge computing near the access node and the application of federated learning. Both non-exhaustive methods share the common objective of reducing the processing time. This study focuses on the manipulation of a different technique of machine learning to expedite and uphold the optimum decision from the learning process. In addition, the emergence of the non-terrestrial network co-existing with the terrestrial transport network in the end-to-end infrastructure necessitates automating the path selection process for optimum resource assignment.

This study focuses on the usability of offline data generated from a series of machine learning and linear programming works in [8], where the clustering outcome implies the mixtures of traffic classifications that share similar attributes, and the path assignment is the outcome from the optimization solver for the resource assignment. The results for each packet are stored as an extensive list of training data for a classification task that labels the selected transport network for the assigned traffic. Based on the findings from the study in [8], the overall clustering and optimization activity consumes a significant amount of time and computational resources. It is inapplicable for real-time operations, insinuating an investigation to adopt supervised learning to manipulate the labeled data from the clustering and optimization solver processes for classification processes.

This paper classifies path selection based on traffic attributes and assigns it to appropriate transport networks for traffic steering functions. The aim is to perform a classification that indicates the selection of transport network for each traffic. Respectively, the contributions to this study are stipulated as follows:

i.: Introduced a traffic steering model that learns from operational data handled by the 5G nodes in a scenario of heterogeneous transport network types.
ii.: Produced a novel traffic–transport network assignment scheme based on data generated in an area to achieve optimum resource management.
iii.: Conducted an analysis of the timeslot for optimum classification model performance. The appropriate volume size of timeslot constituting sufficient diversity of UE traffic types helped create a good performance model.

The remainder of this study is organized as follows: Section 2 elaborates on the related work on traffic steering approaches using various machine learning applications for classifications and predictive tasks. Section 3 discusses the framework for the traffic-to-transport network assignment proposal. Section 4 highlights the findings and analysis of the observations from the simulation scenario incorporated in the framework. Finally, Section 5 describes the conclusion and future works.

2. Related Works

The utilization of machine learning in 5G for various objectives in 5G and 6G is substantial. Machine learning improvises the past communication system that relies on mathematical models [9]. In [9], the role of machine learning in 6G networks is classified from the physical, medium access control (MAC), network, transport, and application layer. It then elaborates on the challenges in respective domains and how various machine learning algorithms could resolve the limitations. Similarly, this study is driven to resolve the resource management issue by introducing a classification method from supervised learning.

The requirements in 6G are more complex and require highly efficient resource management. Ref. [10] addressed the need to improve resource management from the Virtual Network Function (VNF) management in the Service Function Chain (SFC). The unpredictable value of traffic and the static resource allocation configuration are the main contributors to the inefficient resource management faced by the telecommunication service provider. In particular, the differences in the service demand create huge variations in resource allocation. Thus, the study in [10] adopts machine learning techniques to close the gap, enabling the dynamic resource allocation for chaining the VNFs needed by the predictive method on the resource requirement. The outcome from end-to-end VNF instances instantiated in an SFC improved VNF resource allocation compared to the conventional method. A similar mission is shared from the research works in [5], where the machine learning technique is embedded to aid the traffic steering decision by predicting the congestion on the VNFs serving URLLC traffic.

Studies in [4] elaborated on the research from similar abrupt traffic demand problems by utilizing machine learning techniques to predict the traffic demand and enable dynamic resource management. Specifically, the study in [4] aims to provide a guaranteed latency requirement and maximize throughput for URLLC and eMBB, respectively. It leverages the Open Radio Access Network (O-RAN) alliance platform to enable two short-term and long-term optimization solutions strategies. The short-term strategy is to resolve congestion and optimize RAN resources using inferences from historical data collected from RAN, which are learned and modeled by the machine learning process offline. Then, the long-term strategy objective is to resolve the traffic steering process, which comprises the prediction of traffic demand, bandwidth-split distribution, and flow-split variables. The findings from the study in [4] indicate the workability of using two separate time-scale learning processes adopted in this study. The data from this study undergo a series of machine learning and optimization solver processes in offline mode, potentially inferring the learning outcome into the programmed RAN/node for execution.

A comparative study was done by Kim et al. to demonstrate the improvement of traffic steering performance using machine learning techniques [2] against traditional methods. Ref. [2] emphasizes the advantages of Mobile Edge Cloud (MEC) in 5G architecture because MEC handles essential resource management functions like computing, storage, and networking for last-mile connected nodes. Thus, the MEC node hosts and caches enormous transactional data that are highly useful for the machine learning process. In particular, ref. [2] focused the research on manipulating machine learning techniques for traffic steering decision-making in radio access technology (RAT), specifically for scenarios with the connectivity of third-generation partnership program (3GPPP) radio access and non-3GPP radio access to MEC, using deep learning networks. The learning started by executing the traffic steering algorithms to recognize the network conditions. The process of learning the network conditions in [2] is used in this study, and the mission is to capture the conditions of each transport network before the subsequent classification process. In addition, this study shifts the focus from RAT to the heterogeneity of the transport networks, composed of the terrestrial network (TN) and the non-terrestrial network (NTN); satellite and DOCSIS cable are the transport technology candidates envisioned for the 6G infrastructure.

The study into NTN cooperation with the TN network progresses towards the definition of workable architecture between both. The prominent candidate of NTN technology is satellite technology, and multiple integration types are discussed in [11]. The 3GPP in Release 17 (R17) specified the enhanced functions for the foundational technologies that include coverage and capacity. Specifically, the integrated satellite–terrestrial network (ISTN) proposed in [11] provides a novel ISTN architecture for different scenarios. Ref. [11] also highlights the challenge of coordinating unified technical standards to enable ISTN because of the uncertain time of 6G commercialization and the reconciliation efforts within the satellite industry. Ref. [12] performs a survey and highlights the challenges involved in the ISTN categorically from the network architecture, technical performance, and optimization together with the findings of key technology enablers to successfully have ISTN for 6G.

3. Network Architecture, Learning Framework, and Methodology

The evolution of network generation from 4G to 5G and beyond enables new technologies to address the requirements of multiple domains from the industry. The aspiration for 6G continued to expand and provide extensive service coverage via feasible technologies with the adoption of non-terrestrial networks like satellite technology. Moreover, based on the niche characteristics of a specific locality, the footprint of such technologies is vast and reliable. Ref. [13] mentioned that the service coverage of Data Over Cable Service Interface Specification (DOCSIS) Cable TV services comprises 67% of total fixed broadband subscriptions. On the other hand, the satellite is the optimal candidate for “Ubiquitous Services” because of its innate capability to reach a place where standard terrestrial network technologies could not reach. Thus, the study opted to analyze a scenario of the co-existence of three different types of transport networks in overall network architecture, as shown in Figure 1. Then, the mechanism to enable the traffic steering function is formulated by adopting a sequential process of collecting raw data and performing data cleaning and transformation. Followed by the clustering of packets flowing to the access node, each computed cluster is assigned to the appropriate transport network that shares similar characteristics with packets in the respective clusters.

The role of the AI plane in Figure 1 is to store the collected data from access and edge nodes, execute a series of machine learning algorithms, and create and store ML models. Based on the research work in [8], the parameters such as download (DL), upload (UL), delay, and error rate are collected from nodes. The aggregated packets then undergo unsupervised learning to extract shared attributes and clusters into three transport network selection options for traffic steering functions. The subsequent process is optimization, which uses the mixed integer linear programming (MILP) method to identify the optimum traffic–transport assignment for each cluster. The processes classify each packet using DL/UL, delay, and error rate parameters. Then, the processes label the traffic into the cluster and the assigned transport network type. The overall process is implemented offline, and each output is stored for classification learning to produce a model that represents the niche characteristics of traffic generated in a specific area. Ultimately, the classification model is envisioned to be used in a node where the traffic is labeled and steered to the appropriate transport network. The proposed concept adopts a similar approach of assigning a bearer using the Quality of Service Flow Index (QFI). The end-to-end algorithm infers and is deliberate from the study in [4]. The overall process is illustrated in Figure 2 with high-level information on input and output from such activities.

The framework of this study is to explore the classification works using the data from the earlier works done in [8], represented by Steps 1 to 5, as elaborated below. Thus, the scope of supervised learning begins from Step 6 onwards.

Data collection: raw data generated from UE’s traffic and transport network.
The data transformation process involves cleaning and transforming from multivariate to two-dimensional data using unsupervised learning, known as dimensional reduction.
Extraction of information:
- UE’s generated traffic undergoes a clustering process using an unsupervised learning technique to form a defined number of clusters with attribute information for each cluster.
- The extraction of attribute information for every type of transport network.
The preparation of clusters onto a transport matrix format for the subsequent matching process. The matching process runs matching algorithms to capture every pair of matching values based on clusters and transport attributes.
The execution of the optimization solver where the objective function is to find the maximum matching values between clusters and transport attributes. The outcome of the process yields a decision for the best traffic–transport assignment.
The storing process of pertinent data is based on the outcome of the prior activities.
The execution of a supervised learning process on the historical data by train and test process to create an extensive classification model. The vision of the machine learning classification model is to provide an inference to be programmed in the nodes for traffic steering decisions.
A final output comprises hyperparameters and labels results from clustering and optimization solver activities.

The machine learning algorithms rely heavily on the volume of data. 5G and beyond are data-driven architectures that utilize multiple data sources from different network functions and domains for automation, optimization, and improvement to support critical requirements, specifically in 6G [9]. From the storage activity (Step 6) in the workflow, the UE’s generated data are used, and attributes like downlink, uplink, delay, and error rate values are captured and stored in every instance. Table 1 indicates the parameters involved in the study. Three different datasets are captured for every defined duration in three different timeslots to demonstrate the independent relativity of the classification model.

Two new columns are added based on the output from Step 3 and Step 5 from Figure 2, representing the cluster group of UEs and the types of transport assigned. Step 3 uses raw data as an input, transforms the data into the two-dimensional form and executes a clustering technique to identify the hidden pattern based on the density of every point tabulated on the graph. It is clustered into three defined clusters to be mapped using the total number of transport types. Subsequently, each cluster is mapped to every possible pair between cluster, cn, and the transport network, tn, types. The highest matching score for traffic–transport represents high similarity, and it will be selected for the traffic–transport assignment process using an optimization solver to obtain the best matching cluster–transport pair. Thus, variables, v ϵ {cn, tn} and the V is the predictive value of v for each input of a user, u, I; therefore, u ϵ I. Table 2 enlists the activities in prior processes (from Step 1 to Step 5), and this study simplified the process to a minimum number of processes, reducing the handling time.

A Python-based program simulated the UEs and the traffic generated by the UEs. It generated various types of traffic/users scattered and served in cells converging to an access node, as in Figure 1. In an area, the simulator emulated the generations of (1) static and mobile users and (2) the classifications of traffic produced by each user. The traffic generated by UEs followed the attributes of eMBB, URLLC, and mMTC categorized by the size of DL and UL data, packet error rate, and packet delay budget.

Supervised learning works with models were created from the training and test process of data. However, this study explored creating a model based on splitting the data according to the ratio in Table 3. This step explored the granularity of the data size, where the models were developed by splitting the data based on percentages in Table 3. The volume of each data set was determined by the duration of the recorded data and the total number of UEs in each instance. Referring to Figure 3, the volume of each instance is huge because of the high number of UEs (more than 5000 UEs) with diverse attribute values. The multiplication of the total number of UEs with total instances in each duration tabulated in Table 3 possesses a weightage to be measured from this perspective. Models denoted by M_ij represent the volume of data used to build the model. Subsequently, a series of M_i₄ models was used to train different data sets.

Next, the selected model M_i₄ from Table 4 was used to classify the cluster number and the assigned transport network. Each M_i₄ represents the maximum volume of data generated in a set used to train and build a model to be used as a classifier for different datasets. The performance of the classification works will be validated by cross-checking the value from the actual clustering process (Step 3) and traffic–transport assignment (Step 5) against the multi-output classification model results.

4. Findings and Discussions

The first output (please refer to Table 5) from the supervised learning process focused on the feasibility of model creation based on volume and the diverse values of the UE’s attributes. Partial instances (based on splitting percentage) from the entire dataset were used to train and create the classification model. The finding from this stage demonstrated that a high volume of datasets composed of durations and number of UEs produced better model accuracy. The accuracy of the multi-output classification model was determined by the percentage of the total number of wrong classification outputs against actual values from offline clustering and resource assignment processes using the MILP method. The table below stipulates the pattern showing the result of the supervised classification learning model. The model’s performance improved by producing a minimum error percentage according to the trained dataset’s expansion.

In both downlink and uplink streams, the model performs better in the increment volume of data. Subsequently, the study assesses the practicality of each created model to classify the targeted variables on the other dataset. Referring to Table 6, the model’s performance on each dataset varies. The classification model developed from Set 1.1 consists of only 60 instances that produce a significant number of errors (unmatched output of classification model against actual), and the second classification target value, the assigned transport network, shows more than 70% unmatched output. Likewise, the classification model developed based on the massive volume of data in Set 10.1, consisting of 600 instances, produces close to 60% unmatched output.

The second model, developed with 300 instances, sits in the middle range between model Set 1.1 and Set 10.1 and shows good classification output. However, the percentage errors for the second classification target (assigned transport network) are higher than the first target output (cluster number). Overall performance of the model developed from Set 5.1 shows full accuracy when classifying the “cluster number” for downlink and uplink streams, whilst the classification for “assigned transport network” gives the average of 7.37% and 6.55%, respectively. The value translates to a scenario where in 100 traffic–transport assignment decisions, eight traffic from the downlink stream and seven traffic from the uplink stream will wrongly be assigned to the non-optimum transport network. The output from upstream links shows low variations of errors across all models developed and tested to respective datasets. The average error percentage for all datasets shows that the model developed using set 10.1 gives more than 95% classifying accuracy. Referring to the table, the highlighted red cells indicate the model’s performance against its training dataset; hence, it produces no error.

The overall process from Figure 2 demonstrates a small task but plays a significant role in producing a feasible classification model. The data collection process begins with extensive cleansing and the transformation of raw data for subsequent machine learning use. The UE’s data produce multiple attributes that transform high-dimensional data, and hence, a dimensional reduction algorithm from unsupervised learning is required before undergoing the clustering process. A series of algorithms to process a massive volume of data, then utilizing MILP processes, requires significant handling time. Therefore, the prior processes are set to be employed offline and stored as an individual profile of a node in the area. The classification technique from supervised learning is used to learn offline data, reducing the handling time by providing instantaneous traffic labelling from steering functions.

Compared with the previous study done in [5], the processes defined in Figure 2 produce a base value set to represent the time taken to collect data, transform, cluster and perform the resource assignment (Steps 2 to 5), as shown in Table 2, which is the most time needed to execute the overall processes contributed by the clustering activities. Figure 4 illustrates the share of time required to cluster data with parameters specified in Table 1.

5. Conclusions and Future Works

Using user traffic data for unsupervised learning and supervised learning techniques facilitates the development of feasible classification functions to achieve the optimum traffic–transport assignment. A programmed network node with the classification model shall label the traffic based on the classification output that determines the selection of transport types for the traffic. The study eventually helps to automate the selection of a path for pertinent traffic, envisioning the 6G extensive TN and NTN aspirations for optimal “service ubiquity”. Thus, the goal to enable highly efficient resource management could be realized to achieve higher performance requirements in 6G with low errors in the classification results that give more than a 90% classification accuracy score.

The study also opens an avenue to impose a classification model based on an appropriate timeslot that could determine the frequency of the provisioning of inference models to be programmed to the pertinent nodes in an area. In addition, the entire study could lead to a new opportunity where each traffic requirement in an area could be handled by specific models that are dynamically created based on local demand (traffic diversity) and capacity (bandwidth resources and the heterogeneity of transport network provision in the area).

Future studies should extend the variable from this study to determine the end-to-end flow of imposing models based on the timeslot into the programmed nodes using the O-RAN platform. The scope of this study requires the regressing steps on the models for respective timeslots to assess the performance of the classification model in a time-series format to establish the overall models’ organization. The other variable is to learn the model’s compatibility to produce and assign a highly efficient model for an area and other possible uses of the models in other areas. Lastly, there is potential for the incorporation of the organizational flow of this study in a federated learning framework. Finally, research on the workability and improvement could be harvested in the telecommunication infrastructure based on a federated learning approach.

Author Contributions

Conceptualization, H.A.H.N. and T.M.; methodology, H.A.H.N.; validation, H.A.H.N. and T.M.; formal analysis, H.A.H.N.; investigation, H.A.H.N.; resources, H.A.H.N.; data curation, H.A.H.N.; writing—original draft preparation, H.A.H.N.; writing—review and editing, H.A.H.N.; visualization, T.M.; supervision, T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data of this research will be provided to interested individuals upon request to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest, and the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Patil, A.; Iyer, S.; Pandya, R.J. A Survey of Machine Learning Algorithms for 6G Wireless Networks. arXiv 2022, arXiv:2203.08429. [Google Scholar]
Kim, D.-Y.; Kim, S. Network-Aided Intelligent Traffic Steering in 5G Mobile Networks. Comput. Mater. Contin. 2020, 65, 243–261. [Google Scholar] [CrossRef]
Choi, Y.; Kim, J.H. Reliable data transmission in 5G Network using Access Traffic Steering method. In Proceedings of the 2020 IEEE International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 21–23 October 2020; pp. 1034–1038. [Google Scholar] [CrossRef]
Kavehmadavani, F.; Nguyen, V.-D.; Vu, T.X.; Chatzinotas, S. Intelligent Traffic Steering in Beyond 5G Open RAN Based on LSTM Traffic Prediction. IEEE Trans. Wirel. Commun. 2023, 22, 7727–7742. [Google Scholar] [CrossRef]
Tamim, I.; Aleyadeh, S.; Shami, A. Intelligent O-RAN Traffic Steering for URLLC Through Deep Reinforcement Learning. arXiv 2023. Available online: http://arxiv.org/abs/2303.01960 (accessed on 8 April 2024).
Erfanian, J.; Lister, D.; Zhao, Q.; Wikström, G.; Chen, Y. 6G Vision & Analysis of Potential Use Cases. IEEE Commun. Mag. 2023, 61, 12–14. [Google Scholar] [CrossRef]
Salameh, A.I.; El Tarhuni, M. From 5G to 6G—Challenges, Technologies, and Applications. Future Internet 2022, 14, 117. [Google Scholar] [CrossRef]
Ng, H.A.H.; Mahmoodi, T. Intelligent Traffic Engineering for 6G Heterogeneous Transport Networks. Computers 2024, 13, 74. [Google Scholar] [CrossRef]
Ali, S.; Saad, W.; Rajatheva, N.; Chang, K.; Steinbach, D.; Sliwa, B.; Wietfeld, C.; Mei, K.; Shiri, H.; Zepernick, H.J.; et al. 6G White Paper on Machine Learning in Wireless Communication Networks. arXiv 2020. Available online: http://arxiv.org/abs/2004.13875 (accessed on 8 April 2024).
Basu, D.; Kal, S.; Ghosh, U.; Datta, R. SoftChain: Dynamic Resource Management and SFC Provisioning for 5G using Machine Learning. In Proceedings of the 2022 IEEE Globecom Workshops (GC Wkshps), Rio de Janeiro, Brazil, 4–8 December 2022; pp. 280–285. [Google Scholar] [CrossRef]
Qi, W.; Wang, H.; Xia, X.; Mei, C.; Liu, Y.; Xing, Y. Research on Novel Type of Non Terrestrial Network Architecture for 6G. In Proceedings of the 2023 IEEE International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 19–23 June 2023; pp. 1281–1285. [Google Scholar] [CrossRef]
Tirmizi, S.B.R.; Chen, Y.; Lakshminarayana, S.; Feng, W.; Khuwaja, A.A. Hybrid Satellite–Terrestrial Networks toward 6G: Key Technologies and Open Issues. Sensors 2022, 22, 8544. [Google Scholar] [CrossRef] [PubMed]
Schnitzer, J.; Prahladan, P.; Rahimzadeh, P.; Humble, C.; Lee, J.; Lee, J.; Lee, K.; Ha, S. Toward Programmable DOCSIS 4.0 Networks: Adaptive Modulation in OFDM Channels. IEEE Trans. Netw. Serv. Manag. 2021, 18, 441–455. [Google Scholar] [CrossRef]
ETSI. 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects. Service Aspects; Service Ca-pabilities (Release 16); ETSI TS 122 261 V16.14.0. 2023. Available online: https://www.etsi.org/deliver/etsi_ts/122200_122299/122261/16.14.00_60/ts_122261v161400p.pdf (accessed on 12 September 2024).
Köksal, B.; Schmidt, R.; Vasilakos, X.; Nikaien, N. CRAWDAD eurecom/elasticmon5G2019. IEEE Dataport. 2022. [Google Scholar] [CrossRef]

Figure 1. Heterogeneous transport network in 5G network architecture.

Figure 2. The high-level organization flow.

Figure 3. The summary of the total number of UEs generated from the simulation.

Figure 4. The distribution of time taken to execute both methods.

Table 1. The parameters and attributes for traffic and transport attributes.

No	Parameters	Remarks
1	Total No. of UEs	Total UEs: average UEs (5379) 1st Instance = 5443 60th Instance = 5250
2	Traffic Classification	Based on the randomly generated percentage applied to the total number of UEs. Sampled from 1st instance. Normal UEs: 2352 UE classified in Specific 5G Use Cases: 3091 i. Total eMBB UEs: 1241 ii. Total URLLC UEs: 250 iii. Total mMTC UEs: 1600
3	Types of Transport Network	Optical Fiber Network Satellite Network DOCSIS Coaxial Cable TV
4	Size of Datasets (based on duration)	Three different datasets for 60, 300, and 600 instances, respectively.
5	Machine Learning	Unsupervised learning: multi-output classification using Random Forests algorithm. i. Dataset, I, with the sample size of duration, t, and users, u. I ϵ R ^txu. ii. User, u, with features, f_m, where m = {DL/UL, pdb, per}. iii. Input: u ϵ I. iv. Output: Multiple target variables, v, denoted by cluster number, cn, and transport networks, tn. DL: Downlink UL: Uplink pdb: Packer Delay Budget per: Packet Error Rate
6	Machine Learning Algorithm	Create and validate machine learning models from various sample sizes. Step 1: Retrieve data from storage. Step 2: Create the model, m_ij, based on sample sizes, I (1%, 10%, 50% and 100%). Step 3: Obtain predictive value, v, using model, m_ij. Step 4: Perform a comparison of false prediction results. Step 5: Compile results.
6	Machine Learning Algorithm	2. Validate machine learning models from across various datasets. Step 1: Perform multi-output classification model, m_i4, to all datasets, I. Step 2: Capture error results. Step 3: Perform a comparison of false prediction results with the actual cluster-assignment process (Step 4, 5, 6) from Figure 2. Step 4: Compile results.

Table 2. The detailed activities in prior works.

No	Activity	Output	Source/Process	Remarks
1	Data Simulation, [14,15]	List of Parameters: i. Unique ID ii. Downlink, DL iii. Uplink, UL iv. Delay, pdb v. Error Rate, per	Simulation	Python-based simulator generating users Input: generate UE and traffic dataset, I. Output: Parameter values for clustering. Step 1: Define UE types. - User, u_p, p, traffic classification types. Step 2: Define UE mobility pattern. - Dynamic and static types of users. Step 3: Generate UE and traffic in UE. Step 4: Store the UE information.
2	Data Collection and Transformation	TSNE two-dimensional data format	Unsupervised learning: Dimensionality Reduction	User, u with features, f_m, where m = {DL/UL, pdb, per} transformed to u_t_-sne = [x_i].
3	Clustering	Three clusters consist of UE packets	Unsupervised learning: HDBSCAN clustering	Input: UE and traffic dataset, I. Output: UEs in three clusters, C_n. Step 1: Measure the distances between points, d_dist = \|x_core − x_i\|. Step 2: Define HDBSCAN core and minimum samples and cluster size. Step 3: Visualize the clusters. Step 4: Compute the attributes for each cluster.
4	Matching Process		Finding the matching score for every possible pair of clusters and transport networks.	Input: UEs in clustering format and transport network attributes. Output: The matching values between cluster and transport network. Step 1: Compute the attributes of the transport network, T_n, throughput capacity, V_T-dl_/ul, the round-trip time, β_T_, and packet error rate, ε_T_. Step 2: Perform cosine similarity, cos(θ). $\cos (θ) = \frac{C_{n} \cdot T_{n}}{{\| C}_{n} \| \| T_{n} \|}$
5	Resource Assignment	Assigned transport network.	Compute the MILP process to obtain the optimum assignment of cluster transport.	Input: Cosine similarity values between clusters against transport network. Output: Assignment of transport network. Step 1: Define the objective function. Maximize the matching score between cluster and transport network attributes. Step 2: Define the constraints. Single assignment of a cluster to a transport network.

Table 3. The creation of models based on the splitting ratio.

No	Dataset, i	Dataset		Splitting Percentages, j
No	(DL & UL)	Duration	Average * #UEs	1%	10%	50%	100%
1	Set 1.1	60 instances	5379	M₁₁	M₁₂	M₁₃	M₁₄
2	Set 5.1	300 instances	4183	M₂₁	M₂₂	M₂₃	M₂₄
3	Set 10.1	600 instances	2339	M₃₁	M₃₂	M₃₃	M₃₄

* Average number of UEs per dataset.

Table 4. The application of ML models on every dataset.

No	Dataset, i	ML Model, M			Actual		ML Classifier
1	Set 1.1	M₁₄	M₂₄	M₃₄	Cluster Number, cn	Assigned Transport, tn	Cluster Number, cn	Assigned Transport, tn
2	Set 1.2
3	Set 1.3
4	Set 5.1
5	Set 5.2
6	Set 5.3
7	Set 10.1
8	Set 10.2
9	Set 10.3

Table 5. The performance of the model created based on the volume of UEs in the dataset.

Dataset, i			Set 1.1		Set 5.1		Set 10.1
No	Split Ratio, j	Model	Cluster	Transport	Cluster	Transport	Cluster	Transport
Downlink
1	1%	M_i₁	0.00%	33.56%	8.19%	35.45%	0.00%	25.48%
2	10%	M_i₂	0.00%	22.12%	0.21%	1.72%	0.00%	3.33%
3	50%	M_i₃	0.00%	9.18%	0.08%	0.17%	0.00%	0.02%
4	100%	M_i₄	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%
Uplink
5	1%	M_i₁	30.58%	50.53%	1.39%	73.62%	44.90%	73.44%
6	10%	M_i₂	45.94%	41.78%	0.00%	2.13%	0.00%	0.50%
7	50%	M_i₃	0.00%	4.63%	0.00%	0.00%	0.00%	0.21%
8	100%	M_i₄	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%

% value: represents the number of errors (unmatched classification output against the actual values).

Table 6. The performance of the classification model on the datasets, i.

		Set 1.1: Model, M₁₄		Set 5.1: Model, M₂₄		Set 10.1: Model, M₃₄
No	Dataset, i	Cluster	Transport	Cluster	Transport	Cluster	Transport
Downlink
1	Set 1.1	0.00%	0.00%	0.00%	26.07%	0.00%	0.00%
2	Set 1.2	17.86%	74.06%	0.00%	0.00%	13.84%	59.93%
3	Set 1.3	20.59%	60.32%	0.00%	0.68%	0.00%	30.61%
4	Set 5.1	25.73%	50.92%	0.00%	0.00%	0.00%	18.04%
5	Set 5.2	1.60%	21.61%	0.00%	0.24%	0.14%	20.09%
6	Set 5.3	21.58%	32.74%	0.00%	0.15%	0.10%	21.02%
7	Set 10.1	0.72%	4.07%	0.00%	31.38%	0.00%	0.00%
8	Set 10.2	4.72%	37.43%	0.00%	0.30%	0.29%	13.71%
9	Set 10.3	4.75%	32.95%	0.00%	0.14%	0.26%	12.48%
Uplink
1	Set 1.1	0.00%	0.00%	0.00%	0.00%	0.00%	19.37%
2	Set 1.2	1.65%	1.68%	0.00%	18.63%	0.00%	18.63%
3	Set 1.3	0.00%	0.00%	0.00%	17.32%	0.00%	0.21%
4	Set 5.1	1.72%	2.00%	0.00%	0.00%	0.00%	0.00%
5	Set 5.2	0.72%	1.02%	0.00%	0.00%	0.00%	0.00%
6	Set 5.3	0.79%	1.37%	0.00%	0.34%	0.00%	0.14%
7	Set 10.1	2.40%	8.67%	0.00%	6.24%	0.00%	0.00%
8	Set 10.2	2.94%	12.69%	0.00%	9.74%	0.00%	0.05%
9	Set 10.3	2.99%	12.05%	0.00%	0.12%	0.00%	0.11%

% value: represents the number of errors (unmatched classification output against the actual values).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hisyam Ng, H.A.; Mahmoodi, T. Machine Learning-Driven Dynamic Traffic Steering in 6G: A Novel Path Selection Scheme. Big Data Cogn. Comput. 2024, 8, 172. https://doi.org/10.3390/bdcc8120172

AMA Style

Hisyam Ng HA, Mahmoodi T. Machine Learning-Driven Dynamic Traffic Steering in 6G: A Novel Path Selection Scheme. Big Data and Cognitive Computing. 2024; 8(12):172. https://doi.org/10.3390/bdcc8120172

Chicago/Turabian Style

Hisyam Ng, Hibatul Azizi, and Toktam Mahmoodi. 2024. "Machine Learning-Driven Dynamic Traffic Steering in 6G: A Novel Path Selection Scheme" Big Data and Cognitive Computing 8, no. 12: 172. https://doi.org/10.3390/bdcc8120172

APA Style

Hisyam Ng, H. A., & Mahmoodi, T. (2024). Machine Learning-Driven Dynamic Traffic Steering in 6G: A Novel Path Selection Scheme. Big Data and Cognitive Computing, 8(12), 172. https://doi.org/10.3390/bdcc8120172

Article Menu

Machine Learning-Driven Dynamic Traffic Steering in 6G: A Novel Path Selection Scheme

Abstract

1. Introduction

2. Related Works

3. Network Architecture, Learning Framework, and Methodology

4. Findings and Discussions

5. Conclusions and Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI