1. Introduction
In the era of the Fourth Industrial Revolution (Industry 4.0), seamless integration of wireless communications has become a critical factor because of its potential to revolutionize industrial processes and transform traditional manufacturing paradigms [
1,
2]. Cutting-edge technologies such as the Internet of Things (IoT), big data analytics, artificial intelligence (AI), and robotics are leveraged to create smart, interconnected, and automated industrial ecosystems. Such industrial ecosystems demand enhanced connectivity for seamless exchange of data and information between machines, devices, and systems [
3]. To address verticals with specialized connectivity needs (like the smart factory), the 3rd Generation Partnership Project (3GPP) introduced the concept of private 5G networks (a private 5G network is a unique system that a particular enterprise deploys on its premises for its connectivity needs), also known as non-public networks (NPNs) [
4]. Private networks are becoming more attractive due to the introduction of New Radio-Unlicensed (NR-U) [
5] 5G technology that enables utilization of unlicensed spectrum for 5G wireless communications.
NR-U networks are envisioned to offer a blend of flexibility, reliability, and efficiency that could surpass both WiFi and licensed 5G networks in meeting the specialized requirements of smart factories [
6,
7,
8,
9]. In contrast to WiFi, NR-U ensures robust connectivity by implementing seamless handovers and advanced coexistence mechanisms [
9]. Existing WiFi technologies up to WiFi 6 do not explicitly define a seamless handover protocol [
10]. Furthermore, NR-U’s ability to support Ultra-Reliable Low-Latency Communications (URLLC) is essential for real-time control and automation of smart factories. In contrast to conventional 5G, NR-U provides a critical advantage by avoiding the high costs associated with licensed spectrum by operating in the unlicensed spectrum [
7]. NR-U, therefore, stands out by offering smart factories a scalable and flexible wireless solution that can easily integrate with the broader 5G ecosystem.
Despite the aforementioned benefits, the deployment of NR-U poses unique resource management challenges due to coexistence requirements. For coexistence, NR-U devices are required to employ techniques such as Listen-Before-Talk (LBT) [
11] and adopt dynamic frequency selection to minimize interference in the unlicensed band. LBT is a technique used in wireless communications whereby a wireless transmitter first senses the radio environment before it starts a transmission. By listening for signals from other users and only transmitting when the channel appears free, LBT helps to prevent interference and data packet collisions among devices that share the same spectrum. Since the spectrum is divided into sub-bands, different sub-bands might experience varying levels of congestion and interference [
12]. NR-U systems should adopt a dynamic channel-selection algorithm to choose the best sub-band for transmission to minimize interference and maximize throughput. As a result, the issue of resource management becomes more important due to the dynamic nature of NR-U spectrum.
To fully leverage NR-U networks, LBT-aware load-balancing mechanisms are crucial for distributing the traffic load efficiently to avoid congestion and to meet quality of service (QoS) requirements [
13]. In the industrial IoT (IIoT), workloads can be highly dynamic and unpredictable, leading to uneven load distributions. The high level of dynamicity arises from the fluctuation of available spectrum due to coexistence requirements, the varying operational demands, and sporadic events. In industrial settings, machine operations often change, and events such as monitoring alerts, emergency stops, or real-time quality updates are common. All of these require immediate data transmission. Effective load balancing is required to ensure that network resources are optimally utilized, preventing any gNodeB (gNB) from becoming congested, which can degrade performance and reliability [
14,
15,
16]. Load balancing maintains high availability and responsiveness of the service by distributing traffic across the network. This ensures that industrial applications perform at their peak and make full use of private NR-U networks. Therefore, tailored, efficient NR-U load balancing methods are needed to overcome system performance degradation and resource utilization inefficiency.
Existing load-balancing algorithms developed for WiFi and cellular networks do not effectively address the specialized needs of NR-U networks. Several user association and channel assignment schemes have been proposed to improve fairness and throughput across access points [
17,
18,
19,
20] in WiFi networks. Although these methods optimize resource allocation in best-effort WiFi networks, they do not support advanced QoS mechanisms that ensure critical industrial applications receive the bandwidth and priority they need to reduce the chances of performance degradation. Moreover, WiFi networks have limited coverage, lack seamless handover protocols, and rely on contention-based access, which cannot ensure 5G-grade service continuity or QoS guarantees. For cellular networks, load balancing has been studied extensively using heuristics [
21,
22,
23], game theoretic formulations [
24,
25,
26], and machine learning techniques [
27,
28,
29]. While these methods improve resource efficiency in licensed spectrum environments, they do not consider the spectrum fluctuations caused by coexistence mechanisms like Listen-Before-Talk (LBT), which are mandatory in NR-U. Furthermore, their reliance on centralized optimization makes them less scalable for large-scale IIoT deployments, where several devices with diverse QoS requirements operate within confined areas [
30]. In summary, none of the existing solutions adequately address the combined challenges of NR-U private networks, which include (i) coexistence-driven spectrum dynamics, (ii) heterogeneous QoS requirements of industrial IoT traffic, and (iii) the need for seamless mobility support. NR-U offers 5G-like performance without high spectrum cost [
31], but to realize this potential, an adaptive and decentralized load balancing mechanism tailored to industrial NR-U environments is necessary to handle the specialized needs of NR-U private networks.
Recently, multi-agent deep reinforcement learning (MA-DRL) is gaining traction as a distributed optimization paradigm for wireless resource management to address scalability issues [
32,
33,
34,
35]. In [
32], a multi-agent Q-learning framework was designed for real-time load balancing of next-generation cellular networks achieving better throughput–handover trade-offs than existing heuristics. The authors in [
33] proposed a distributed MA-DRL transmit power control scheme for multi-cell networks, incorporating LSTM-enhanced actor–critic agents to improve spectral efficiency while minimizing inter-cell interference. MA-DRL was applied to heterogeneous UAV swarms, introducing cluster-based spectrum sharing strategies that enable cooperative resource allocation across diverse agent types [
34]. The work of [
35] addressed mobility in non-terrestrial networks with an MA-DRL-based handover scheme for mega-constellations, adapting to dynamic propagation conditions more effectively than rule-based approaches. While [
32,
33,
34,
35] show the versatility of MA-DRL in fostering distributed and adaptive wireless optimization, they lack formal convergence guarantees, which limits their applicability to industrial, dynamic NR-U network environments that require high service reliability. In addition, these works do not consider spectrum fluctuations due to coexistence mechanisms such as Listen-Before-Talk (LBT) in NR-U.
In this paper, a load-aware load-balancing procedure and a reinforcement learning traffic-steering algorithm are proposed to efficiently distribute traffic in NR-U networks. Load balancing is presented as a game theoretic, equilibrium problem to alleviate complexity by tackling it in a decentralized manner. A reinforcement learning algorithm that utilizes a game theoretic policy is developed to efficiently drive load distribution to equilibrium. An extended System Information Block (SIB) is introduced to regularly inform UEs in the network about their local network’s state (i.e., the load status of gNBs within communications range). The SIBs from gNBs carry instantaneous load information and are broadcast periodically. This facilitates offloading by exploiting UEs’ selfish behavior in a game theoretic setting by leveraging the conditional handover scheme introduced in [
36]. That is, the UEs have complete local network information that enables them to make well-informed decisions to switch to the most favorably loaded gNB. We conduct comprehensive system-level simulation in NS3-gym and prove that the proposed procedure and algorithm can outperform existing work by attaining lower network load variances, while achieving higher system throughput and higher QoS satisfaction levels.
The rest of this paper is organized as follows.
Section 2 defines a system model for the NR-U network and formulates the load-balancing problem.
Section 3 presents the reinforcement learning framework to address the problem.
Section 4 develops the procedure and algorithm for load balancing.
Section 5 describes the simulation results, and
Section 6 concludes the paper.
4. Proposed Load-Balancing Procedure and Algorithm
We propose a load-aware, distributed load-balancing procedure and algorithm for NR-U networks. For load awareness, the proposed method defines a load information broadcasting mechanism by extending the 5G NR system information. For distributed load balancing, a mechanism is introduced to facilitate UE-assisted load redistributions by pre-configuring possible target gNBs for conditional offloading and allowing the UEs to intelligently select a target gNB based on load awareness. This allows UEs to adapt to changing conditions using the reinforcement learning algorithm. By applying a load-aware adaptive user association strategy driven by reinforcement learning, the algorithm ensures the optimal distribution of UEs across gNBs, effectively balancing the network load while meeting the QoS demands of UEs. Details of the load-balancing procedure and algorithm are provided in the following subsections.
4.1. Operational Framework
To facilitate distributed, UE-assisted load redistribution for load balancing, a mechanism for conditional offloading is introduced. The conditional offloading is realized by extending the conditional handover (CHO), a feature introduced in 3GPP Release 16 [
46]. The overall design of the proposed load-balancing procedure is illustrated in
Figure 3, which involves two phases: the conditional offloading pre-configuration phase and the offloading evaluation phase. In the pre-configuration phase, the serving gNB prepares CHO configuration information,
, which includes
and
.
includes a newly introduced parameter,
, which informs the UE on whether to use the proposed conditional offloading load balancing. Other CHO parameters such as the common RSRP threshold,
, and time to trigger,
, can also be set depending on the demand of the network operator during the configuration phase. The serving gNB then sends a Radio Resource Control (RRC) reconfiguration message containing CHO configuration parameters to the UEs. The UEs receive the CHO configuration and start monitoring the pre-configured events and conditions, i.e, the load state of the serving and candidate gNBs.
To provide load awareness, the NR-U-tailored System Information Block (system information is a down-link broadcast information transmitted periodically by a base station and is vital information for a UE to maintain connection with the base station in any radio access technology) broadcasts the network load status information from the gNBS by extending the existing 5G SIBs while still adhering to 3GPP standards. The load information is carried by a newly introduced SIB. The new SIB is given the suffics x and is henceforth referred to as SIBx. SIBx is 40 bits long and contains three information elements: sequence number, cell ID, and cell load information. The structure of SIBx is shown as a legend in
Figure 2.
SIBx is periodically broadcast with the
cycle time so that all users in the RAN regularly obtain true load information from the gNBs in their locality. With SIBx, every UE is periodically updated about the load status of its candidate gNBs. According to the 3GPP specifications for 5G NR, System Information Block #1 (SIB1) is used to carry information related to the availability and scheduling of other SIBs [
47]. Accordingly, the periodicity and type of SIBx is specified with the help of SIB1.
In the evaluation phase, a UE evaluates the pre-configured CHO trigger conditions and re-associates itself with a candidate target cell when conditions are met. A UE that receives an RRCReconfiguration message in the CHO context evaluates the association costs of its alternative target gNBs using Equation (
4). The load information conveyed by SIBx from neighboring gNBs is exploited during the evaluation. The UE then picks the gNB with the minimum association cost using the GT−RL algorithm described in the next subsection, starts synchronizing with it, and detaches from its current gNB.
4.2. Load Balancing
The reinforcement learning algorithm inspired by game theory detailed in Algorithm 1 is suggested to intelligently choose a target gNB for a UE experiencing overload on its current gNB. The GT-RL load-balancing algorithm optimizes the association of UEs to gNBs in order to achieve equilibrium in load distribution for NR-U networks. The input to the algorithm includes the initial UE-gNB association and model parameters, namely, exploration–exploitation trade-off, , , and decay rate . The desired output is an optimal UE gNB association .
The algorithm adopts an iterative approach, adjusting the UE-gNB associations for up to iterations. At each iteration t, flag variable is initially set to True, indicating that the association is assumed to be stable unless changes occur. Then, for each UE , the algorithm performs the following steps. First, it acquires neighboring cell list from UE measurements. Next, it observes the load of the neighboring gNBs, , using the SIBx information. It identifies the current serving gNB k for UE j, where . The algorithm then determines preferred target gNB i for the given UE j by invoking a game theoretic epsilon-greedy policy function, as defined in Algorithm 2.
The epsilon-greedy
function involves the following steps. A random value
p is drawn from a uniform distribution between 0 and 1. If
, a target gNB is selected based on a probability distribution
, which is determined using a game-equilibrium approach, as defined by Equation (
6). Otherwise, the target gNB selected is the one with the maximum reward based on (
7). The function returns the chosen target gNB, which updates the UE-gNB association, which in turn impacts the load state
.
Algorithm 1 The GT-RL load-balancing algorithm. |
Input: Initial UE-gNB association , initial model parameters decay rate |
Output: An optimal UE association, , that achieves equilibrium load distribution |
- 1:
- 2:
while
do - 3:
- 4:
for Every do - 5:
get neighbor gNBs from UE measurements - 6:
Observe neighbor cell load using SIBx info - 7:
- 8:
- 9:
if then - 10:
- 11:
- 12:
- 13:
end if - 14:
end for - 15:
- 16:
- 17:
end while - 18:
return as
|
Algorithm 2 GTPolicy Subroutine |
- 1:
function GTPolicy() - 2:
Draw a random value p from the uniform distribution on the interval 0 to 1 - 3:
if then - 4:
Calculate probability distributions using ( 6) - 5:
is chosen from according to - 6:
else - 7:
Compute serving gNB cost using ( 4) - 8:
for Every do - 9:
Compute candidate gNB cost using ( 4) - 10:
Compute rewards, using ( 7) - 11:
end for - 12:
- 13:
end if - 14:
return - 15:
end function
|
When the gNB i selected by the function is different from the current serving gNB k, the association is updated. Specifically, UE j is associated with a new gNB i by setting to 1 and to 0 for all neighboring gNBs . Consequently, the flag is set to False, indicating that the association has changed and the equilibrium load distribution was not reached. However, if the flag remains True after checking all UEs, the algorithm concludes that equilibrium was reached, and it returns the current association, , as the optimal association . Otherwise, it updates exploration parameter by decaying it by . It then proceeds with the next iteration.
This iterative UE gNB association update by Algorithm 1, combining reinforcement learning principles with game theory, ensures the load distribution across the network becomes more balanced over time, potentially leading to an optimal equilibrium where network resources are efficiently utilized. The users are treated as uncoordinated selfish agents that try to find the least-loaded gNBs by exploiting the newly introduced load state information broadcast scheme. As a consequence, users are incentivized to shift their traffic to less-loaded gNBs. This in turn adjusts network load by shifting traffic away from heavily loaded gNBs. In doing so, the load on the gNBs is gradually brought into a balance.
The proposed game theoretic reinforcement learning framework leverages only local load information accessible to UEs and gNBs, allowing for distributed decision making. This decentralized approach eliminates the need for global network state information collection, improves scalability, and reduces computational complexity, while still ensuring load equilibrium. In contrast, centralized load-balancing strategies, despite their theoretical optimality, depend on the aggregation of extensive global information, such as channel conditions, traffic demands, and cell loads, at a central controller. Such requirements incur considerable signaling overhead in dense NR-U deployments, and the associated optimization problem is NP-hard, resulting in high computational costs. Consequently, the proposed framework constitutes a practical and efficient alternative to centralized schemes for dynamic large-scale NR-U networks.
4.3. Discussion of Convergence, Complexity, and Scalability
We present proof of convergence based on the Banach fixed-point theorem [
48]. The theorem guarantees convergence to a unique fixed point if the policy mapping is a contraction over a complete metric space. For convergence to be established based on the Banach fixed-point theorem, two key conditions must be satisfied: existence of a fixed-point solution (a fixed-point refers to a policy, i.e., an association policy in this work’s context, that remains unchanged after the application of a policy iteration), and the requirement that the Wardrop equilibrium-based policy given by Equation (
6) induces a contraction mapping. Load equilibrium (Definition 2) serves as a convergence target for the policy iteration, which inherently functions as a fixed point. The justification for the contraction mapping condition is established as follows.
Let
denote the load vector over all
n gNBs at time
t, and define load update mapping
induced by the Wardrop equilibrium-based policy given by Equation (
6). We equip the load space
with the infinity norm
, defined as
. This space is a complete metric space. To prove convergence, it suffices to show that
T is a contraction mapping under the infinity norm. To this end, we demonstrate that the underlying policy (
6) defines a valid probability distribution and satisfies the Markov property, thereby inducing a contraction mapping as shown below.
For any user
j and time
t, the association probabilities dictated by Equation (
6) form valid probability distribution because they sum to one as follows.
It is also apparent that Equation (
6) exhibits the Markov property as
where
T is the induced load update mapping derived from the user association probabilities
. By construction, (
6) promotes load balancing by directing UEs to shift their traffic preferentially toward less congested gNBs. Repeated application of (
6) induces a progressive transformation of the load vector over time, namely,
, driving the system toward equilibrium.
Observe that the association policy in Equation (
6) assigns lower probabilities to more congested gNBs, thereby redistributing user associations toward less loaded gNBs. Unless the system has reached equilibrium, the policy always shifts users away from the maximally loaded gNBs. Hence, the maximum load value
strictly decreases over time until a fixed point
is reached. Therefore, by the Banach fixed-point theorem [
48] (which states that every contraction mapping has a unique fixed point) and given the game theoretic policy induces a contraction, load vector
converges to fixed point
, which is the equilibrium load.
A high-level assessment of complexity and scalability can be established following the clear decomposition of computational tasks within the proposed framework. Specifically, each serving gNB incurs a per-iteration overhead of for selecting UEs for conditional offloading, while each selected UE contributes operations for target gNB identification. The resulting aggregate complexity is therefore bounded by , with . Importantly, this decomposition highlights the distributed and parallelizable nature of the algorithm: the gNBs manage admission control, while UEs handle their own decision processes. As a result, the computational cost grows only linearly with respect to the number of associated UEs and candidate gNBs, thereby ensuring favorable scalability and practical feasibility in dense, large-scale network deployments.
5. Performance Evaluation
To assess the effectiveness of the proposed procedure and algorithm, system-level simulations were conducted in ns3-gym [
49]. We considered a dense indoor scenario [
50] complying with modern industrial networks. Simulations were conducted by varying the number of gNBs and UEs deployed throughout an indoor area of length 120 m and width 50 m. Two scenarios were considered by altering the dropping of gNBs and UEs.
Figure 4 depicts the two simulation scenarios. In the first scenario,
Figure 4a, gNBs are regularly distributed over the whole indoor area, whereas the UEs are placed randomly. In the second scenario shown in
Figure 4b, the gNBs are placed randomly along with the UEs. Simulation parameters are summarized in
Table 1.
To validate the performance of the proposed algorithm, we compared it with three previous algorithms: a heuristic adaptive-threshold MLB approach referred to as Adaptive-MLB [
21], a deep reinforcement learning-based MLB referred to as DQN-MLB [
28], and a heuristic QoS-aware WiFi load-balancing approach referred to as QA-LB [
17]. Adaptive-MLB is based on a centralized self-organizing network (SON) for small cell networks. DQN-MLB proposes a two-layer DRL SON to accommodate large-scale networks. QA-LB is a joint user-association and channel-assignment solution in Wi-Fi networks. For performance evaluation, we considered four metrics: speed of convergence, network load variance, system throughput, and percentage of UEs satisfied with QoS.
5.1. Convergence Rate Comparison
To assess the convergence rate of the proposed method in comparison to the other algorithms, we conducted simulations with both regular and irregular deployment scenarios, as illustrated in
Figure 4a,b. For these simulations, 6 gNBs and 40 UEs were considered. The traffic demand by the UEs was configured to guaranteed bit rates of 5 Mbps. The standard deviation in gNB loads was monitored throughout the iterations during the load-balancing process to evaluate how quickly the proposed method achieved an even distribution in comparison to the other algorithms.
Figure 5 shows the load variance convergence rates of the GT-RL method compared to previous methods in the deployment depicted in
Figure 4a. The proposed method demonstrated high adaptability, achieving convergence to an even load distribution more rapidly than the other algorithms. The steep decline in variance during the initial iterations of the load-balancing process indicates strong adaptability by the GT-RL method. Three key ideas described in
Section 4.1 contribute to the high adaptability of the GT-RL method. First, the load information broadcasting system enhances load awareness. Second, the game theory reinforcement learning policy aids in picking the optimal gNB based on the load. Third, the conditional offloading scheme enables immediate association switching by UEs. QA-LB, being rule-based and non-adaptive, resulted in a less efficient load distribution, as indicated by the higher variance values. DQN-MLB and Adaptive-MLB were able to adjust their load distributions based on network conditions. However, due to the lack of a specialized strategy for swift load-aware traffic redistribution, they took longer to converge and resulted in larger load variances compared to the proposed method.
To verify that the GT-RL method maintains robust adaptability under various network deployment options, another simulation was conducted using the irregular deployment in
Figure 4b.
Figure 6 shows the convergence rate results. As can be seen from the figure, the proposed method demonstrated superior load-balancing convergence than the other algorithms for the irregular deployment as well. Load awareness makes the proposed method versatile to various deployment options. UEs are strategically pre-configured, utilizing the game theoretic strategy combined with a load-aware conditional offloading scheme to dynamically select the gNBs offering the best service. So, all UEs identifying a more suitable gNB are encouraged to switch simultaneously, resulting in a swift redistribution of the load. Therefore, the proposed method is more adaptive than the other algorithms under both regular and irregular deployments, as illustrated in
Figure 5 and
Figure 6, respectively.
5.2. Impact of Network Load
To evaluate the performance of the proposed algorithm under different network load conditions, multiple simulations were performed by varying the total number of UEs in the regular deployment scenario. For this evaluation, 10 gNBs were considered and the number of UEs was varied between 20 and 100. In this setup, 10% of the UEs exhibited random waypoint mobility at a pedestrian speed of 1 m/s. Traffic demand by the UEs was set to guaranteed bit rates with an average of 2.75 Mbps and a variance of 4.5 Mbps, representing moderate industrial IoT traffic [
51]. The load variance, throughput, and percentage of UEs satisfied with QoS were measured for each load condition.
Owing to the remarkable convergence rate demonstrated in
Figure 5, the proposed algorithm showed improved performance in redistributing the network load more evenly under various load conditions as demonstrated in
Figure 7. The GT-RL algorithm achieved small load variance values compared to the other algorithms under various load conditions. The proposed approach achieved a minimum 10% improvement in reducing network load variance. It brought the variance closer to zero more effectively, which helps mitigate performance degradation due to load imbalance.
The proposed algorithm also outperformed in achieving a higher percentage of UEs satisfied with QoS and network throughput, as illustrated in
Figure 8 and
Figure 9, respectively. The proposed method dynamically links UEs with the most suitable gNBs. By doing so, more resources are available to meet the UEs’ required data rates, which led to improved UE satisfaction with QoS and throughput. More performance gains were observed as the number of UEs in the network increased. When the number of UEs is high enough, there is a greater chance for the network to get congested, which requires a robust load-balancing algorithm to meet UE data rate demands. When the network had 100 UEs, the proposed method achieved about a 5% improvement in both QoS satisfaction and throughput compared to DQN-MLB, and over 10% compared to the other two algorithms. Overall, the proposed method surpassed the others by delivering a more balanced load distribution, improved user satisfaction, and increased throughput, as can be seen from
Figure 7,
Figure 8, and
Figure 9, respectively. The GT-RL method proved adaptive to various kinds of network load conditions.
5.3. Impact of Load Dynamics
To assess the adaptiveness of the proposed method to dynamic load conditions, simulations were conducted by varying the percentage of mobile UEs under the regular deployment scenario in
Figure 4a. For this evaluation, 10 gNBs and 100 UEs were considered. The actual percentage of mobile UEs, which considered random waypoint mobility at a pedestrian speed of 1 m/s, varied from 10% to 40% to intentionally introduce varying degrees of a dynamic load. The load variance, throughput, and percentage of UEs satisfied with QoS were measured for each simulation instance.
Figure 10 shows the load distribution evaluations for the proposed and other algorithms. As shown in
Figure 10, the proposed method is more robust to dynamic situations than the other algorithm in maintaining lowest load variance. The load distribution by the proposed algorithm remained almost unaffected as the percentage of dynamic UEs varied from
to
. The GT-RL method is endowed with the specialized System Information Block for load awareness, and the UEs are pre-configured to immediately switch their association to the optimal gNB as soon as they see a more favorable gNB, which makes the proposed method highly adaptive to fluctuating load conditions. Consequently, this led to substantial improvements in the percentage of UEs satisfied with QoS and overall system throughput, as illustrated in
Figure 11 and
Figure 12, respectively. Owing to its highly adaptive nature, the GT-RL method is robust under dynamic conditions and achieved nearly 100% UE satisfaction. The performance of the other algorithms was significantly affected as the percentage of dynamic UEs increased because they lack a specialized technique that makes them highly adaptive. When the number of dynamic UEs is set to 40%, the proposed method achieved an 8% gain in satisfied UEs compared to DQN-MLB and at least a 15% gain compared to the other two algorithms, as seen in
Figure 11. The proposed method enhanced system throughput by 7.61% compared to DQN-MLB and by at least 17% compared to the other algorithms. Overall, the GT-RL method demonstrated superior performance under dynamic load conditions due to its highly adaptive nature, consistently achieving higher throughput and close to 100% UE satisfaction.
5.4. Impact of Network Size
To scrutinize the adaptiveness of the GT-RL algorithm to varying network sizes, more simulations were conducted that varied the number of randomly deployed gNBs in the network while the number of UEs was fixed at 100. During these simulations, the number of gNBs varied between 6 and 10. The traffic demand by the UEs was set to guaranteed bit rates at an average of 2.75 Mbps and a variance 4.5 Mbps, with 10% of the UEs exhibiting random waypoint mobility at a pedestrian speed of 1 m/s.
Figure 13 shows the comparison in load distributions. The proposed method achieved a more even load distribution reflected by the lowest standard deviation. The load-aware conditional offloading idea is adaptable to various network sizes because it enables UEs to dynamically choose the gNB that provides the best possible service. As the number of gNBs in the network increases, the UEs associated with an overloaded cell have more options for offloading. Consequently, the load variance decreases, reflecting a more even load distribution. For the various load conditions considered, the GT-RL algorithm achieved the lowest variance with a performance gain of
at a minimum. By attaining equilibrium in the load distribution, the proposed method ensures UEs have enough resources to serve their data rate demands. Performance comparison of the percentage of UEs satisfied with QoS and system throughput are shown in
Figure 14 and
Figure 15, respectively. As seen in these figures, the GT-RL method achieved higher throughput and UE satisfaction levels in the various network sizes under consideration. Decentralized UE-assisted conditional offloading is highly adaptive to various network deployment options.
6. Conclusions
In this paper, we proposed a load-aware load-balancing procedure with a game theoretic reinforcement learning algorithm for 5G NR-U networks. The proposed method enhances load balancing through three key ideas. First, an extended SIB message for load awareness provides UEs with direct insights into the load status of neighboring gNBs. Second, a game theoretic reinforcement learning policy helps UEs choose the optimal gNB based on load conditions. Lastly, the conditional offloading mechanism enables UEs to promptly switch to the most suitable gNB. These strategies enable the proposed method to achieve swift redistributions to balance network loads. Simulation results showed that the GT-RL method outperformed previous algorithms by driving the network into an even load distribution, achieving higher levels of network throughput and UE QoS satisfaction. Especially under a high-load dynamics, the proposed method achieved an 8% gain in UE satisfaction with QoS and a 7.61% gain in network throughput compared to DQN-MLB, but compared to the non-AI approaches, UE QoS satisfaction and the network throughput were enhanced by more than 15%.
Beyond its simulation performance, our GT-RL framework offers a significant theoretical advantage over conventional DRL approaches. While DRL faces challenges with convergence guarantees, especially under the spectrum fluctuations induced by the LBT mechanism in NR-U, our method ensures convergence and is inherently aware of LBT constraints. Consequently, the proposed GT-RL framework represents a robust and practical solution, balancing theoretical soundness with real-world applicability, making it a prime candidate for enhancing service reliability in demanding industrial 5G NR-U environments.