NR-U Network Load Balancing: A Game Theoretic Reinforcement Learning Approach

Seyoum, Yemane Teklay; Shahid, Syed Maaz; Duong, Tho Minh; Kim, Sungmin; Kwon, Sungoh

doi:10.3390/electronics14203986

Open AccessArticle

NR-U Network Load Balancing: A Game Theoretic Reinforcement Learning Approach

by

Yemane Teklay Seyoum

¹

,

Syed Maaz Shahid

¹

,

Tho Minh Duong

¹

,

Sungmin Kim

²

and

Sungoh Kwon

^1,*

¹

Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea

²

School of Mechanical Engineering, University of Ulsan, Ulsan 44610, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(20), 3986; https://doi.org/10.3390/electronics14203986

Submission received: 4 September 2025 / Revised: 28 September 2025 / Accepted: 9 October 2025 / Published: 11 October 2025

(This article belongs to the Special Issue Advanced Control Strategies and Applications of Multi-Agent Systems)

Download

Browse Figures

Versions Notes

Abstract

In this paper, we propose a load-aware, load-balancing procedure for fifth-generation (5G) New Radio-Unlicensed (NR-U) networks in order to address performance degradation and resource inefficiencies caused by load imbalance. Load imbalances frequently occur in NR-U networks due to factors such as the dynamic spectrum, user mobility, and varying traffic demand. To tackle these challenges, a load-aware, load-balancing procedure utilizing game theoretic reinforcement learning (GT-RL) is introduced. For load awareness, an extended System Information Block (SIB) is incorporated within the framework of 5G wireless networks. The load-balancing problem is addressed as a game theoretic cost-minimization task combining conditional offloading with reinforcement learning traffic-steering to dynamically distribute loads. Reinforcement learning applies a game theoretic policy to move users from overloaded cells to less congested cells that best serve their needs. Analytically, the proposed method is proven to spread the network load toward equilibrium. The proposed method is validated through simulations that show the effectiveness of its load balancing. The proposed method achieved better performance than previous work by attaining lower load variances while achieving higher throughput and greater quality of service satisfaction. Especially under high-load dynamics, the proposed method achieved an 8% gain in UE satisfaction with QoS and a 7.61% gain in network throughput compared to existing RL-based approach, whereas compared to the non-AI approaches, UE QoS satisfaction and the network throughput were enhanced by more than 15%.

Keywords:

NR-unlicensed; load balancing; self organizing networks; reinforcement learning; game theory

1. Introduction

In the era of the Fourth Industrial Revolution (Industry 4.0), seamless integration of wireless communications has become a critical factor because of its potential to revolutionize industrial processes and transform traditional manufacturing paradigms [1,2]. Cutting-edge technologies such as the Internet of Things (IoT), big data analytics, artificial intelligence (AI), and robotics are leveraged to create smart, interconnected, and automated industrial ecosystems. Such industrial ecosystems demand enhanced connectivity for seamless exchange of data and information between machines, devices, and systems [3]. To address verticals with specialized connectivity needs (like the smart factory), the 3rd Generation Partnership Project (3GPP) introduced the concept of private 5G networks (a private 5G network is a unique system that a particular enterprise deploys on its premises for its connectivity needs), also known as non-public networks (NPNs) [4]. Private networks are becoming more attractive due to the introduction of New Radio-Unlicensed (NR-U) [5] 5G technology that enables utilization of unlicensed spectrum for 5G wireless communications.

NR-U networks are envisioned to offer a blend of flexibility, reliability, and efficiency that could surpass both WiFi and licensed 5G networks in meeting the specialized requirements of smart factories [6,7,8,9]. In contrast to WiFi, NR-U ensures robust connectivity by implementing seamless handovers and advanced coexistence mechanisms [9]. Existing WiFi technologies up to WiFi 6 do not explicitly define a seamless handover protocol [10]. Furthermore, NR-U’s ability to support Ultra-Reliable Low-Latency Communications (URLLC) is essential for real-time control and automation of smart factories. In contrast to conventional 5G, NR-U provides a critical advantage by avoiding the high costs associated with licensed spectrum by operating in the unlicensed spectrum [7]. NR-U, therefore, stands out by offering smart factories a scalable and flexible wireless solution that can easily integrate with the broader 5G ecosystem.

Despite the aforementioned benefits, the deployment of NR-U poses unique resource management challenges due to coexistence requirements. For coexistence, NR-U devices are required to employ techniques such as Listen-Before-Talk (LBT) [11] and adopt dynamic frequency selection to minimize interference in the unlicensed band. LBT is a technique used in wireless communications whereby a wireless transmitter first senses the radio environment before it starts a transmission. By listening for signals from other users and only transmitting when the channel appears free, LBT helps to prevent interference and data packet collisions among devices that share the same spectrum. Since the spectrum is divided into sub-bands, different sub-bands might experience varying levels of congestion and interference [12]. NR-U systems should adopt a dynamic channel-selection algorithm to choose the best sub-band for transmission to minimize interference and maximize throughput. As a result, the issue of resource management becomes more important due to the dynamic nature of NR-U spectrum.

To fully leverage NR-U networks, LBT-aware load-balancing mechanisms are crucial for distributing the traffic load efficiently to avoid congestion and to meet quality of service (QoS) requirements [13]. In the industrial IoT (IIoT), workloads can be highly dynamic and unpredictable, leading to uneven load distributions. The high level of dynamicity arises from the fluctuation of available spectrum due to coexistence requirements, the varying operational demands, and sporadic events. In industrial settings, machine operations often change, and events such as monitoring alerts, emergency stops, or real-time quality updates are common. All of these require immediate data transmission. Effective load balancing is required to ensure that network resources are optimally utilized, preventing any gNodeB (gNB) from becoming congested, which can degrade performance and reliability [14,15,16]. Load balancing maintains high availability and responsiveness of the service by distributing traffic across the network. This ensures that industrial applications perform at their peak and make full use of private NR-U networks. Therefore, tailored, efficient NR-U load balancing methods are needed to overcome system performance degradation and resource utilization inefficiency.

Existing load-balancing algorithms developed for WiFi and cellular networks do not effectively address the specialized needs of NR-U networks. Several user association and channel assignment schemes have been proposed to improve fairness and throughput across access points [17,18,19,20] in WiFi networks. Although these methods optimize resource allocation in best-effort WiFi networks, they do not support advanced QoS mechanisms that ensure critical industrial applications receive the bandwidth and priority they need to reduce the chances of performance degradation. Moreover, WiFi networks have limited coverage, lack seamless handover protocols, and rely on contention-based access, which cannot ensure 5G-grade service continuity or QoS guarantees. For cellular networks, load balancing has been studied extensively using heuristics [21,22,23], game theoretic formulations [24,25,26], and machine learning techniques [27,28,29]. While these methods improve resource efficiency in licensed spectrum environments, they do not consider the spectrum fluctuations caused by coexistence mechanisms like Listen-Before-Talk (LBT), which are mandatory in NR-U. Furthermore, their reliance on centralized optimization makes them less scalable for large-scale IIoT deployments, where several devices with diverse QoS requirements operate within confined areas [30]. In summary, none of the existing solutions adequately address the combined challenges of NR-U private networks, which include (i) coexistence-driven spectrum dynamics, (ii) heterogeneous QoS requirements of industrial IoT traffic, and (iii) the need for seamless mobility support. NR-U offers 5G-like performance without high spectrum cost [31], but to realize this potential, an adaptive and decentralized load balancing mechanism tailored to industrial NR-U environments is necessary to handle the specialized needs of NR-U private networks.

Recently, multi-agent deep reinforcement learning (MA-DRL) is gaining traction as a distributed optimization paradigm for wireless resource management to address scalability issues [32,33,34,35]. In [32], a multi-agent Q-learning framework was designed for real-time load balancing of next-generation cellular networks achieving better throughput–handover trade-offs than existing heuristics. The authors in [33] proposed a distributed MA-DRL transmit power control scheme for multi-cell networks, incorporating LSTM-enhanced actor–critic agents to improve spectral efficiency while minimizing inter-cell interference. MA-DRL was applied to heterogeneous UAV swarms, introducing cluster-based spectrum sharing strategies that enable cooperative resource allocation across diverse agent types [34]. The work of [35] addressed mobility in non-terrestrial networks with an MA-DRL-based handover scheme for mega-constellations, adapting to dynamic propagation conditions more effectively than rule-based approaches. While [32,33,34,35] show the versatility of MA-DRL in fostering distributed and adaptive wireless optimization, they lack formal convergence guarantees, which limits their applicability to industrial, dynamic NR-U network environments that require high service reliability. In addition, these works do not consider spectrum fluctuations due to coexistence mechanisms such as Listen-Before-Talk (LBT) in NR-U.

In this paper, a load-aware load-balancing procedure and a reinforcement learning traffic-steering algorithm are proposed to efficiently distribute traffic in NR-U networks. Load balancing is presented as a game theoretic, equilibrium problem to alleviate complexity by tackling it in a decentralized manner. A reinforcement learning algorithm that utilizes a game theoretic policy is developed to efficiently drive load distribution to equilibrium. An extended System Information Block (SIB) is introduced to regularly inform UEs in the network about their local network’s state (i.e., the load status of gNBs within communications range). The SIBs from gNBs carry instantaneous load information and are broadcast periodically. This facilitates offloading by exploiting UEs’ selfish behavior in a game theoretic setting by leveraging the conditional handover scheme introduced in [36]. That is, the UEs have complete local network information that enables them to make well-informed decisions to switch to the most favorably loaded gNB. We conduct comprehensive system-level simulation in NS3-gym and prove that the proposed procedure and algorithm can outperform existing work by attaining lower network load variances, while achieving higher system throughput and higher QoS satisfaction levels.

The rest of this paper is organized as follows. Section 2 defines a system model for the NR-U network and formulates the load-balancing problem. Section 3 presents the reinforcement learning framework to address the problem. Section 4 develops the procedure and algorithm for load balancing. Section 5 describes the simulation results, and Section 6 concludes the paper.

2. System Model and Problem Formulation

2.1. The Network Model

In this work, we consider a standalone 5G NR-U private wireless network composed of next-generation gNBs, UEs, and WiFi APs, as shown in Figure 1. NR-U is a 5G feature introduced by 3GPP in Release 16, that allows 5G services to operate in unlicensed spectrum bands [5]. The gNBs are base stations that provide radio access network (RAN) connectivity to devices over the unlicensed spectrum. UEs constitute a wide range of smart factory IoT devices, from sensors and actuators to autonomously guided vehicles, all requiring reliable wireless connectivity. The APs represent non-5G technologies operating in the unlicensed spectrum, such as Wi-Fi that impact the NR-U network performance due to the co-existence requirements. For example, gNB 2 in Figure 1 must share the available unlicensed spectrum with WiFi AP 1 in its cell. Hence, the spectrum resource availability of gNB 2 depends on the traffic load of AP 1, which in turn impacts the access link setup between the gNBs and the UEs. Under a low traffic load on AP 1, UE 2 is associated with gNB 2 as shown with the solid red line (the NR-U access link). But, when AP 1 has a high traffic load, UE 2 could be forced to associate with gNB 1.

With regards to notations used in this paper,

G

and

U

denote the sets of all gNBs and UEs in the network, respectively. Indexes i and j indicate a specific gNB and UE, respectively. That is,

G = \{i : i \in \{1, 2, \dots, n\}\}

and

U = \{j : j \in \{1, 2, \dots, m\}\}

, where n and m denote the total numbers of gNBs and UEs in the network, respectively. When the signal strength received by UE j from gNB i is above a certain threshold, j is considered to be within the coverage area of i.

U_{i}

denotes the subset of UEs within the coverage area of each gNB

i \in G

, and the set of gNBs within the communication range of UE j is denoted by

G_{j}

.

Single connectivity is considered, that is, a UE is assumed to be served by one gNB at any particular time. The association between a UE and a gNB at time t is represented by an association indicator variable,

ϕ_{i, j} (t), (\forall i \in G, \forall j \in U),

and is given by

\begin{matrix} ϕ_{i, j} (t) = \{\begin{matrix} 1 & if j is served by i at time t \\ 0 & Otherwise . \end{matrix} \end{matrix}

The data rate demand of a UE

j \in U

is denoted by

R_{j}

.

2.2. NR-U Channel Access Procedure

Channel access in NR-U follows the rules of the LBT [11] procedure, which is a mechanism used in wireless communication systems (including 5G NR) to improve spectrum efficiency and reduce interference when multiple devices share the same frequency band. In NR-U, any transmission on downlink (DL) or uplink (UL) is preceded by LBT. Before initiating a transmission, the NR-U device must listen to the channel (the frequency) it intends to use to reduce interference and collision probabilities. After ensuring there is no ongoing transmissions for a predefined period, the device initiates its own transmission after a back-off window. Figure 2 shows an example scenario where two UEs are trying to launch UL transmissions. At the beginning, both UEs excute Clear Channel Assessment to check for any ongoing transmissions by other devices or technologies and both sense the channel is clear. The UEs enter a back-off period; UE1 has a back-off window of three, and UE2 has five. As a result, UE1 determines that the channel is clear first and starts its transmission. However, UE2 has to defer transmission (see first LBT to Tx SR under UE2 in Figure 2) and launches LBT again in the next time frame. We assume that the NR-U gNBs employ a time-dependent channel selection algorithm to account for dynamic changes. To simplify the analysis, each NR-U gNB i is considered to have a time-varying bandwidth

W_{i} (t)

.

2.3. Channel Model

In this work, we focus our analysis on sub-6GHz single-band operations in the downlink direction to study load balancing. With

h_{j}^{i} (t) \in R^{+}

, we represent the channel gain between UE j and gNB i in time slot t. The downlink transmission rate, according to Shannon channel capacity [37], between gNB i and UE j at time t is given as

r_{i, j} (t) = W_{i} (t) {log}_{2} (1 + \frac{| h_{j}^{i} {(t) |}^{2} P_{t x}}{σ_{i, j}^{2}}), \forall i, j,

(1)

where

P_{t x}

is fixed transmission power,

σ_{i, j}

is noise power, and

W_{i} (t)

is the available spectrum bandwidth at time t. Noise power

σ_{i, j}^{2}

is assumed to be fixed, and the channel gain,

h_{j}^{i} (t)

, follows a random process, with probability distribution

f (h_{j}^{i} (t))

.

2.4. Load Measurement

A capacity utilization ratio is chosen in order to account for disparities in the resource capacities of the gNBs in the network. That is, a gNB’s load is interpreted as the amount of radio resources required to satisfy the traffic forwarded to the gNB compared to the resource capacity of the gNB. The load contribution of UE j on gNB

i \in G_{j}

is denoted by

ℓ_{i, j} (t)

and is given by

ℓ_{i, j} (t) = \frac{R_{j} (t) ϕ_{i, j} (t)}{r_{i, j} (t)},

(2)

where

R_{j} (t)

is the data rate demand of UE j at time t, and

r_{i, j} (t)

is the transmission rate of the channel between UE j and gNB i at time t as given by Equation (1). The total load on gNB i at time t is then obtained by summing the contributions of all UEs served by i:

\begin{matrix} ℓ_{i} (t) = \sum_{j \in U_{i}} ℓ_{i, j} (t) . \end{matrix}

In practice, data rates are influenced by several factors such as transmission power, interference, modulation scheme, and frequency band. While richer models could jointly capture power control and interference dynamics, we adopt resource utilization ratio based load measure to focus on load-balancing strategies. The radio resource utilization ratio provides fair abstraction for relating the demand for UE to the radio resources of gNB. This normalizes the load across heterogeneous resource capacities and enables tractable load balancing under a dynamic NR-U spectrum. Utilization-based load metrics are well established in both academia and standardization [38,39,40], representing load values normalized between 0 and 1. For a given transmission power, inter-cell interference and frequency band effects are implicitly reflected in

r_{i, j} (t)

, with the channel model.

2.5. Problem Formulation

Load balancing is essential for industrial NR-U networks to adapt to rapidly varying load and network conditions [41]. The traffic demand on industrial networks varies significantly and abruptly due to shifts in production lines, UE mobility, and emergency situations. In addition, the spectrum available for NR-U gNBs varies due to spectrum sharing by coexisting technologies. The rapid changes in network demand and the dynamic availability of spectrum create uneven network load distributions, which degrades overall network performance and QoS for the UEs. By ensuring that resources are distributed according to the needs of different applications and UEs, load balancing enhances the overall quality of service and the experience.

In this paper, we address performance degradation due to load imbalance in NR-U networks by optimally allocating UEs to available gNBs in a manner that balances the load and satisfies QoS requirements for each UE. To that end, we define the following optimization problem using load variance as a measure of balance.

\begin{matrix} min_{Φ (t)} & \frac{1}{| G |} \sum_{\forall i \in G} (ℓ_{i} - \bar{ℓ})^{2} \\ s . t . & \sum_{i \in G_{j}} r_{i, j} (t) ϕ_{i, j} (t) \geq R_{j}, \forall j \in U, \\ 0 < ℓ_{i} < 1, \forall i \in G, \\ ϕ_{i, j} (t) \in {0, 1}, \forall i \in G_{j}, \forall j \in U, \\ \sum_{i \in G_{j}} ϕ_{i, j} (t) = 1, \forall j \in U . \end{matrix}

(3)

where

\bar{ℓ}

is the mean load, defined as

\bar{ℓ} = \frac{1}{| G |} \sum_{i \in G} ℓ_{i},

and

Φ (t)

is the optimization variable that defines the pairings of UEs with their serving gNBs, represented by binary variables

ϕ_{i, j} (t)

. The first constraint in (3) imposes a user-demanded traffic rate satisfaction, and the second constraint is the utilization ratio load measure metric. The third and fourth constraints address the binary nature of the association indicator variables and the single connectivity assumption, respectively.

The problem in (3) is a binary integer programming problem, which is NP-hard [42]. To cope with the complexity of the NP-hard load-balancing problem, it is beneficial to transform (3) into a form that can be tackled in a decentralized manner. To that end, we first devise a UE-gNB association cost function,

C_{i, j} (Φ (t))

, so that load variance minimization from (3) is transformed into a cost minimization problem, as follows:

C_{i, j} (Φ (t)) = \{\begin{matrix} \frac{1}{1 - ℓ_{i} (t)}, & if r_{i, j} (t) > R_{j} \\ \infty, & otherwise \end{matrix}

(4)

With this UE-gNB association cost definition, the load-balancing problem in (3) can be recast as a cost minimization problem in a game theoretic setting, as follows:

\begin{matrix} min_{Φ (t)} & \sum_{\forall j \in U} C_{i, j} (Φ (t)) \\ s . t . & \sum_{i \in G_{j}} r_{i, j} (t) ϕ_{i, j} (t) \geq R_{j}, \forall j \in U, \\ 0 < ℓ_{i} < 1, \forall i \in G, \\ ϕ_{i, j} (t) \in {0, 1}, \forall i \in G_{j}, \forall j \in U, \\ \sum_{i \in G_{j}} ϕ_{i, j} (t) = 1, \forall j \in U . \end{matrix}

(5)

Under the game theoretic setting, the solution to (5) is an equilibrium solution. That is, all users incur minimal and equal costs when equilibrium is reached. Since the cost function in (4) is defined in terms of cell load, an equilibrium solution to (5) solves (3) as well. To justify this, without loss of generality, we consider two UEs

h, j

in an overlapping coverage area of two gNBs

i, k

. Assume that

h, j

are associated with

i, k

correspondingly. When the UEs reach equilibrium,

C_{i, h} (Φ) = C_{k, j} (Φ)

, which means

ℓ_{i} = ℓ_{k}

.

With formulation (5), load balancing is relaxed to a load-aware greedy choice problem from set

G_{j}

for each UE

j \in U

. Since there are multiple UEs in the network that affect each other’s greedy association choices, it is essential to intelligently adjust UE associations in a manner that progressively leads to an even and stable load distribution. To that end, the cost minimization problem presented in (5) can be tackled using a reinforcement learning framework that leverages a game theoretic policy. This approach encourages UEs to independently adjust their traffic distribution to achieve equilibrium and minimize their individual cost function, as defined in (4). At equilibrium, all UE costs are equalized, which entails equalized loads due to the cost definition in (4).

3. Game Theoretic Reinforcement Learning Model of the Load Balancing Problem

In this section, a reinforcement learning framework for the load-balancing problem is established, with minimal discussion of essential game theory and reinforcement learning concepts.

3.1. Game Theory

Game theory is a mathematical modeling tool that provides a suitable basis for investigating strategic interactions among multiple rational decision makers. In a game theoretic model, a game is defined in terms of players, actions, strategies, and outcomes. Players and actions refer to the interacting agents and their decisions. The strategies comprise the complete set of rules that specify how a player decides to act in every possible circumstance, and outcomes (or payoffs) dictate how players choose an action or strategy. Game theory has proven an indispensable tool in various fields [43,44], and different game settings are utilized in various fields. This paper focuses on network load balancing to avoid congestion. So, essential definitions related to network selection games are provided.

Definition 1.

(Network Selection Game): A network selection game is defined by a tuple

Γ = 〈U, {(G_{j})}_{j \in U}, {(C_{i, j})}_{j \in U, i \in G_{j})}〉

, where

U = \{1, 2, \dots, m\}

represents a finite set of the users demanding to route their traffic to and from the network.

G_{j}

represents a finite set of strategies for player

j \in U

pertaining to the set of target gNBs for UE j to choose from, and

C_{i, j}

represents cost or payoff functions

\forall j \in U

with respect to the neighboring gNBs.

Definition 2.

(Wardrop Equilibrium): A feasible UE-gNB association vector,

Φ^{*}

, is at Wardrop equilibrium if, for every user

j \in U

and gNBs

i, k \in G_{j}

with

ϕ_{i, j}^{*} > ϕ_{k, j}^{*}

,

C_{i, j} (Φ^{*}) \leq C_{k, j} (Φ^{*})

holds for

\forall k \in G_{j} ∖ {i}

.

3.2. Reinforcement Learning

The reinforcement learning technique is a dynamic learning framework, where an agent learns to self-optimize its actions by interacting with the environment [45]. We recast the load-balancing problem as a reinforcement learning problem as follows. To fully describe the load-balancing problem in terms of reinforcement learning, we first need to specify state space

S

, policy

π

, and reward function

R

.

State Space: To foster distributed optimization, each user is required to be load-aware in order to select a cell that serves it best. Therefore, each user j is assumed to observe timely information on its neighbors’ cell load status

S_{j} (t) : = {ℓ_{i} (t) : \forall i \in G_{j}}

. Load information is broadcast regularly.

Action Space: The action space is defined in terms of candidate target gNBs for each UE j. Therefore, it is defined as

A_{j} (t) = {G_{j} (t) : j \in U}

Policy: A mixed-strategy Wardrop equilibrium policy is defined on a per user basis as

π_{j, t} (i | S_{j} (t)) = \frac{1 - ℓ_{i} (t - 1)}{\sum_{k \in G_{j}} 1 - ℓ_{k} (t - 1)}, \forall i \in G_{j}

(6)

The policy in (6) defines a probability distribution for action space

A_{j} (t)

of UE j in proportion to the resource availablity of candidate target gNBs.

Reward Function: The goal is to avoid cell overload conditions and minimize the load variance of the whole network. Cell load measures are given as a utilization ratio. In order to discourage UEs from switching to cells with a higher load, we define the reward function of UE j taking action

a_{j} (t) = i

, offloading to gNB i at time t from current serving gNB k, and transitions from current state

s_{j} (t)

to next state

s_{j} (t + 1)

as

R_{j, k - > i} (s_{j} (t), a_{j} (t) = i, s_{j} (t + 1)) = C_{i, j} (Φ (t)) - C_{k, j} (Φ (t))

(7)

Based on the GT-RL framework defined above, the Wardrop equilibrium load-balanced traffic-steering procedure and the algorithm are presented in Section 4.

4. Proposed Load-Balancing Procedure and Algorithm

We propose a load-aware, distributed load-balancing procedure and algorithm for NR-U networks. For load awareness, the proposed method defines a load information broadcasting mechanism by extending the 5G NR system information. For distributed load balancing, a mechanism is introduced to facilitate UE-assisted load redistributions by pre-configuring possible target gNBs for conditional offloading and allowing the UEs to intelligently select a target gNB based on load awareness. This allows UEs to adapt to changing conditions using the reinforcement learning algorithm. By applying a load-aware adaptive user association strategy driven by reinforcement learning, the algorithm ensures the optimal distribution of UEs across gNBs, effectively balancing the network load while meeting the QoS demands of UEs. Details of the load-balancing procedure and algorithm are provided in the following subsections.

4.1. Operational Framework

To facilitate distributed, UE-assisted load redistribution for load balancing, a mechanism for conditional offloading is introduced. The conditional offloading is realized by extending the conditional handover (CHO), a feature introduced in 3GPP Release 16 [46]. The overall design of the proposed load-balancing procedure is illustrated in Figure 3, which involves two phases: the conditional offloading pre-configuration phase and the offloading evaluation phase. In the pre-configuration phase, the serving gNB prepares CHO configuration information,

C H O I n f o

, which includes

T a r g e t C e l l I n f o

and

C H O T r i g g e r C o n d i t i o n s

.

C H O T r i g g e r C o n d i t i o n s

includes a newly introduced parameter,

c h o_l b_e n

, which informs the UE on whether to use the proposed conditional offloading load balancing. Other CHO parameters such as the common RSRP threshold,

R S R P T h r e s h o l d

, and time to trigger,

T T T

, can also be set depending on the demand of the network operator during the configuration phase. The serving gNB then sends a Radio Resource Control (RRC) reconfiguration message containing CHO configuration parameters to the UEs. The UEs receive the CHO configuration and start monitoring the pre-configured events and conditions, i.e, the load state of the serving and candidate gNBs.

To provide load awareness, the NR-U-tailored System Information Block (system information is a down-link broadcast information transmitted periodically by a base station and is vital information for a UE to maintain connection with the base station in any radio access technology) broadcasts the network load status information from the gNBS by extending the existing 5G SIBs while still adhering to 3GPP standards. The load information is carried by a newly introduced SIB. The new SIB is given the suffics x and is henceforth referred to as SIBx. SIBx is 40 bits long and contains three information elements: sequence number, cell ID, and cell load information. The structure of SIBx is shown as a legend in Figure 2.

SIBx is periodically broadcast with the

T_{L}

cycle time so that all users in the RAN regularly obtain true load information from the gNBs in their locality. With SIBx, every UE is periodically updated about the load status of its candidate gNBs. According to the 3GPP specifications for 5G NR, System Information Block #1 (SIB1) is used to carry information related to the availability and scheduling of other SIBs [47]. Accordingly, the periodicity and type of SIBx is specified with the help of SIB1.

In the evaluation phase, a UE evaluates the pre-configured CHO trigger conditions and re-associates itself with a candidate target cell when conditions are met. A UE that receives an RRCReconfiguration message in the CHO context evaluates the association costs of its alternative target gNBs using Equation (4). The load information conveyed by SIBx from neighboring gNBs is exploited during the evaluation. The UE then picks the gNB with the minimum association cost using the GT−RL algorithm described in the next subsection, starts synchronizing with it, and detaches from its current gNB.

4.2. Load Balancing

The reinforcement learning algorithm inspired by game theory detailed in Algorithm 1 is suggested to intelligently choose a target gNB for a UE experiencing overload on its current gNB. The GT-RL load-balancing algorithm optimizes the association of UEs to gNBs in order to achieve equilibrium in load distribution for NR-U networks. The input to the algorithm includes the initial UE-gNB association

Φ (0)

and model parameters, namely, exploration–exploitation trade-off,

ϵ

,

ϵ_{m i n}

, and decay rate

γ

. The desired output is an optimal UE gNB association

Φ^{*}

.

The algorithm adopts an iterative approach, adjusting the UE-gNB associations for up to

T_{m a x}

iterations. At each iteration t, flag variable

a s s o c i a t i o n_s t a b l e

is initially set to True, indicating that the association is assumed to be stable unless changes occur. Then, for each UE

j \in U

, the algorithm performs the following steps. First, it acquires neighboring cell list

G_{j} (t)

from UE measurements. Next, it observes the load of the neighboring gNBs,

S_{j} (t)

, using the SIBx information. It identifies the current serving gNB k for UE j, where

ϕ_{i, j} (t - 1) = 1

. The algorithm then determines preferred target gNB i for the given UE j by invoking a game theoretic epsilon-greedy policy function, as defined in Algorithm 2.

The epsilon-greedy

G T P o l i c y (.)

function involves the following steps. A random value p is drawn from a uniform distribution between 0 and 1. If

p \leq ϵ

, a target gNB is selected based on a probability distribution

π_{j, t}

, which is determined using a game-equilibrium approach, as defined by Equation (6). Otherwise, the target gNB selected is the one with the maximum reward based on (7). The function returns the chosen target gNB, which updates the UE-gNB association, which in turn impacts the load state

S_{j} (t + 1)

.

Algorithm 1 The GT-RL load-balancing algorithm.

Input: Initial UE-gNB association

Φ (0)

, initial model parameters

ϵ, ϵ_{m i n},

decay rate

γ

Output: An optimal UE association,

Φ^{*}

, that achieves equilibrium load distribution

1:: $t \leftarrow 1, a s s o c i a t i o n_s t a b l e \leftarrow F a l s e$
2:: while $! a s s o c i a t i o n_s t a b l e & t < T_{m a x}$ do
3:: $a s s o c i a t i o n_s t a b l e \leftarrow T r u e$
4:: for Every $j \in U$ do
5:: get neighbor gNBs $G_{j} (t)$ from UE measurements
6:: Observe neighbor cell load $S_{j} (t)$ using SIBx info
7:: $k \leftarrow {m \in G_{j} (t - 1) ∋ ϕ_{i, j} (t - 1) = 1}$
8:: $i \leftarrow GTPolicy (j, k, G_{j} (t), S_{j} (t))$
9:: if $k \neq i$ then
10:: $ϕ_{i, j} (t) \leftarrow 1$
11:: $ϕ_{k, j} (t) \leftarrow 0, \forall k \in G_{j} (t) ∖ {i}$
12:: $a s s o c i a t i o n_s t a b l e \leftarrow F a l s e$
13:: end if
14:: end for
15:: $ϵ \leftarrow max (ϵ_{m i n}, ϵ * γ)$
16:: $t \leftarrow t + 1$
17:: end while
18:: return $Φ (t)$ as $Φ^{*}$

Algorithm 2 GTPolicy Subroutine

1:: function GTPolicy( $j, k, G_{j} (t), S_{j} (t), ϵ$ )
2:: Draw a random value p from the uniform distribution on the interval 0 to 1
3:: if $p \leq ϵ$ then
4:: Calculate probability distributions $π_{j, t} (.) \forall i \in G_{j}$ using (6)
5:: $t a r g e t$ is chosen from $G_{j} (t)$ according to $π_{j, t} (.)$
6:: else
7:: Compute serving gNB cost $C_{k, j} (.)$ using (4)
8:: for Every $i \in G_{j} ∖ {k}$ do
9:: Compute candidate gNB cost $C_{i, j} (.)$ using (4)
10:: Compute rewards, $R_{j, k - > i} (.),$ using (7)
11:: end for
12:: $t a r g e t \leftarrow {argmax}_{i} R_{j, k - > i} (.)$
13:: end if
14:: return $t a r g e t$
15:: end function

When the gNB i selected by the

G T P o l i c y (.)

function is different from the current serving gNB k, the association is updated. Specifically, UE j is associated with a new gNB i by setting

ϕ_{i, j} (t)

to 1 and

ϕ_{k, j} (t)

to 0 for all neighboring gNBs

k \in G_{j} (t) ∖ {i}

. Consequently, the flag

a s s o c i a t i o n_s t a b l e

is set to False, indicating that the association has changed and the equilibrium load distribution was not reached. However, if the flag

a s s o c i a t i o n_s t a b l e

remains True after checking all UEs, the algorithm concludes that equilibrium was reached, and it returns the current association,

Φ (t)

, as the optimal association

Φ^{*}

. Otherwise, it updates exploration parameter

ϵ

by decaying it by

ϵ \leftarrow max (ϵ_{m i n}, ϵ * γ)

. It then proceeds with the next iteration.

This iterative UE gNB association update by Algorithm 1, combining reinforcement learning principles with game theory, ensures the load distribution across the network becomes more balanced over time, potentially leading to an optimal equilibrium where network resources are efficiently utilized. The users are treated as uncoordinated selfish agents that try to find the least-loaded gNBs by exploiting the newly introduced load state information broadcast scheme. As a consequence, users are incentivized to shift their traffic to less-loaded gNBs. This in turn adjusts network load by shifting traffic away from heavily loaded gNBs. In doing so, the load on the gNBs is gradually brought into a balance.

The proposed game theoretic reinforcement learning framework leverages only local load information accessible to UEs and gNBs, allowing for distributed decision making. This decentralized approach eliminates the need for global network state information collection, improves scalability, and reduces computational complexity, while still ensuring load equilibrium. In contrast, centralized load-balancing strategies, despite their theoretical optimality, depend on the aggregation of extensive global information, such as channel conditions, traffic demands, and cell loads, at a central controller. Such requirements incur considerable signaling overhead in dense NR-U deployments, and the associated optimization problem is NP-hard, resulting in high computational costs. Consequently, the proposed framework constitutes a practical and efficient alternative to centralized schemes for dynamic large-scale NR-U networks.

4.3. Discussion of Convergence, Complexity, and Scalability

We present proof of convergence based on the Banach fixed-point theorem [48]. The theorem guarantees convergence to a unique fixed point if the policy mapping is a contraction over a complete metric space. For convergence to be established based on the Banach fixed-point theorem, two key conditions must be satisfied: existence of a fixed-point solution (a fixed-point refers to a policy, i.e., an association policy in this work’s context, that remains unchanged after the application of a policy iteration), and the requirement that the Wardrop equilibrium-based policy given by Equation (6) induces a contraction mapping. Load equilibrium (Definition 2) serves as a convergence target for the policy iteration, which inherently functions as a fixed point. The justification for the contraction mapping condition is established as follows.

Let

ℓ (t) \in {[0, 1]}^{n}

denote the load vector over all n gNBs at time t, and define load update mapping

T : {[0, 1]}^{n} \to {[0, 1]}^{n}

induced by the Wardrop equilibrium-based policy given by Equation (6). We equip the load space

L = {[0, 1]}^{n}

with the infinity norm

{∥ \cdot ∥}_{\infty}

, defined as

{∥ ℓ ∥}_{\infty} = {max}_{i} | ℓ_{i} |

. This space is a complete metric space. To prove convergence, it suffices to show that T is a contraction mapping under the infinity norm. To this end, we demonstrate that the underlying policy (6) defines a valid probability distribution and satisfies the Markov property, thereby inducing a contraction mapping as shown below.

For any user j and time t, the association probabilities dictated by Equation (6) form valid probability distribution because they sum to one as follows.

\begin{matrix} \sum_{i \in G_{j}} π_{j, t} (i | S_{j} (t)) & = \sum_{i \in G_{j}} \frac{1 - ℓ_{i} (t - 1)}{\sum_{k \in G_{j}} (1 - ℓ_{k} (t - 1))} \end{matrix}

(8)

\begin{matrix} = \frac{\sum_{i \in G_{j}} (1 - ℓ_{i} (t - 1))}{\sum_{k \in G_{j}} (1 - ℓ_{k} (t - 1))} = 1 . \end{matrix}

(9)

It is also apparent that Equation (6) exhibits the Markov property as

π_{j, t} (i | S_{j} (t)) = T (ℓ (t - 1)),

(10)

where T is the induced load update mapping derived from the user association probabilities

π_{j, t - 1} (.)

. By construction, (6) promotes load balancing by directing UEs to shift their traffic preferentially toward less congested gNBs. Repeated application of (6) induces a progressive transformation of the load vector over time, namely,

ℓ (t) \to π_{t} \to ℓ (t + 1)

, driving the system toward equilibrium.

Observe that the association policy in Equation (6) assigns lower probabilities to more congested gNBs, thereby redistributing user associations toward less loaded gNBs. Unless the system has reached equilibrium, the policy always shifts users away from the maximally loaded gNBs. Hence, the maximum load value

{∥ ℓ (t) ∥}_{\infty}

strictly decreases over time until a fixed point

ℓ^{*}

is reached. Therefore, by the Banach fixed-point theorem [48] (which states that every contraction mapping has a unique fixed point) and given the game theoretic policy induces a contraction, load vector

ℓ (t)

converges to fixed point

ℓ^{*}

, which is the equilibrium load.

A high-level assessment of complexity and scalability can be established following the clear decomposition of computational tasks within the proposed framework. Specifically, each serving gNB incurs a per-iteration overhead of

O (| U_{i} |)

for selecting UEs for conditional offloading, while each selected UE contributes

O (| G |)

operations for target gNB identification. The resulting aggregate complexity is therefore bounded by

O (| U_{i} | + | U sel | \cdot | G |)

, with

| U sel | \leq | U_{i} |

. Importantly, this decomposition highlights the distributed and parallelizable nature of the algorithm: the gNBs manage admission control, while UEs handle their own decision processes. As a result, the computational cost grows only linearly with respect to the number of associated UEs and candidate gNBs, thereby ensuring favorable scalability and practical feasibility in dense, large-scale network deployments.

5. Performance Evaluation

To assess the effectiveness of the proposed procedure and algorithm, system-level simulations were conducted in ns3-gym [49]. We considered a dense indoor scenario [50] complying with modern industrial networks. Simulations were conducted by varying the number of gNBs and UEs deployed throughout an indoor area of length 120 m and width 50 m. Two scenarios were considered by altering the dropping of gNBs and UEs. Figure 4 depicts the two simulation scenarios. In the first scenario, Figure 4a, gNBs are regularly distributed over the whole indoor area, whereas the UEs are placed randomly. In the second scenario shown in Figure 4b, the gNBs are placed randomly along with the UEs. Simulation parameters are summarized in Table 1.

To validate the performance of the proposed algorithm, we compared it with three previous algorithms: a heuristic adaptive-threshold MLB approach referred to as Adaptive-MLB [21], a deep reinforcement learning-based MLB referred to as DQN-MLB [28], and a heuristic QoS-aware WiFi load-balancing approach referred to as QA-LB [17]. Adaptive-MLB is based on a centralized self-organizing network (SON) for small cell networks. DQN-MLB proposes a two-layer DRL SON to accommodate large-scale networks. QA-LB is a joint user-association and channel-assignment solution in Wi-Fi networks. For performance evaluation, we considered four metrics: speed of convergence, network load variance, system throughput, and percentage of UEs satisfied with QoS.

5.1. Convergence Rate Comparison

To assess the convergence rate of the proposed method in comparison to the other algorithms, we conducted simulations with both regular and irregular deployment scenarios, as illustrated in Figure 4a,b. For these simulations, 6 gNBs and 40 UEs were considered. The traffic demand by the UEs was configured to guaranteed bit rates of 5 Mbps. The standard deviation in gNB loads was monitored throughout the iterations during the load-balancing process to evaluate how quickly the proposed method achieved an even distribution in comparison to the other algorithms.

Figure 5 shows the load variance convergence rates of the GT-RL method compared to previous methods in the deployment depicted in Figure 4a. The proposed method demonstrated high adaptability, achieving convergence to an even load distribution more rapidly than the other algorithms. The steep decline in variance during the initial iterations of the load-balancing process indicates strong adaptability by the GT-RL method. Three key ideas described in Section 4.1 contribute to the high adaptability of the GT-RL method. First, the load information broadcasting system enhances load awareness. Second, the game theory reinforcement learning policy aids in picking the optimal gNB based on the load. Third, the conditional offloading scheme enables immediate association switching by UEs. QA-LB, being rule-based and non-adaptive, resulted in a less efficient load distribution, as indicated by the higher variance values. DQN-MLB and Adaptive-MLB were able to adjust their load distributions based on network conditions. However, due to the lack of a specialized strategy for swift load-aware traffic redistribution, they took longer to converge and resulted in larger load variances compared to the proposed method.

To verify that the GT-RL method maintains robust adaptability under various network deployment options, another simulation was conducted using the irregular deployment in Figure 4b. Figure 6 shows the convergence rate results. As can be seen from the figure, the proposed method demonstrated superior load-balancing convergence than the other algorithms for the irregular deployment as well. Load awareness makes the proposed method versatile to various deployment options. UEs are strategically pre-configured, utilizing the game theoretic strategy combined with a load-aware conditional offloading scheme to dynamically select the gNBs offering the best service. So, all UEs identifying a more suitable gNB are encouraged to switch simultaneously, resulting in a swift redistribution of the load. Therefore, the proposed method is more adaptive than the other algorithms under both regular and irregular deployments, as illustrated in Figure 5 and Figure 6, respectively.

5.2. Impact of Network Load

To evaluate the performance of the proposed algorithm under different network load conditions, multiple simulations were performed by varying the total number of UEs in the regular deployment scenario. For this evaluation, 10 gNBs were considered and the number of UEs was varied between 20 and 100. In this setup, 10% of the UEs exhibited random waypoint mobility at a pedestrian speed of 1 m/s. Traffic demand by the UEs was set to guaranteed bit rates with an average of 2.75 Mbps and a variance of 4.5 Mbps, representing moderate industrial IoT traffic [51]. The load variance, throughput, and percentage of UEs satisfied with QoS were measured for each load condition.

Owing to the remarkable convergence rate demonstrated in Figure 5, the proposed algorithm showed improved performance in redistributing the network load more evenly under various load conditions as demonstrated in Figure 7. The GT-RL algorithm achieved small load variance values compared to the other algorithms under various load conditions. The proposed approach achieved a minimum 10% improvement in reducing network load variance. It brought the variance closer to zero more effectively, which helps mitigate performance degradation due to load imbalance.

The proposed algorithm also outperformed in achieving a higher percentage of UEs satisfied with QoS and network throughput, as illustrated in Figure 8 and Figure 9, respectively. The proposed method dynamically links UEs with the most suitable gNBs. By doing so, more resources are available to meet the UEs’ required data rates, which led to improved UE satisfaction with QoS and throughput. More performance gains were observed as the number of UEs in the network increased. When the number of UEs is high enough, there is a greater chance for the network to get congested, which requires a robust load-balancing algorithm to meet UE data rate demands. When the network had 100 UEs, the proposed method achieved about a 5% improvement in both QoS satisfaction and throughput compared to DQN-MLB, and over 10% compared to the other two algorithms. Overall, the proposed method surpassed the others by delivering a more balanced load distribution, improved user satisfaction, and increased throughput, as can be seen from Figure 7, Figure 8, and Figure 9, respectively. The GT-RL method proved adaptive to various kinds of network load conditions.

5.3. Impact of Load Dynamics

To assess the adaptiveness of the proposed method to dynamic load conditions, simulations were conducted by varying the percentage of mobile UEs under the regular deployment scenario in Figure 4a. For this evaluation, 10 gNBs and 100 UEs were considered. The actual percentage of mobile UEs, which considered random waypoint mobility at a pedestrian speed of 1 m/s, varied from 10% to 40% to intentionally introduce varying degrees of a dynamic load. The load variance, throughput, and percentage of UEs satisfied with QoS were measured for each simulation instance.

Figure 10 shows the load distribution evaluations for the proposed and other algorithms. As shown in Figure 10, the proposed method is more robust to dynamic situations than the other algorithm in maintaining lowest load variance. The load distribution by the proposed algorithm remained almost unaffected as the percentage of dynamic UEs varied from

10 %

to

40 %

. The GT-RL method is endowed with the specialized System Information Block for load awareness, and the UEs are pre-configured to immediately switch their association to the optimal gNB as soon as they see a more favorable gNB, which makes the proposed method highly adaptive to fluctuating load conditions. Consequently, this led to substantial improvements in the percentage of UEs satisfied with QoS and overall system throughput, as illustrated in Figure 11 and Figure 12, respectively. Owing to its highly adaptive nature, the GT-RL method is robust under dynamic conditions and achieved nearly 100% UE satisfaction. The performance of the other algorithms was significantly affected as the percentage of dynamic UEs increased because they lack a specialized technique that makes them highly adaptive. When the number of dynamic UEs is set to 40%, the proposed method achieved an 8% gain in satisfied UEs compared to DQN-MLB and at least a 15% gain compared to the other two algorithms, as seen in Figure 11. The proposed method enhanced system throughput by 7.61% compared to DQN-MLB and by at least 17% compared to the other algorithms. Overall, the GT-RL method demonstrated superior performance under dynamic load conditions due to its highly adaptive nature, consistently achieving higher throughput and close to 100% UE satisfaction.

5.4. Impact of Network Size

To scrutinize the adaptiveness of the GT-RL algorithm to varying network sizes, more simulations were conducted that varied the number of randomly deployed gNBs in the network while the number of UEs was fixed at 100. During these simulations, the number of gNBs varied between 6 and 10. The traffic demand by the UEs was set to guaranteed bit rates at an average of 2.75 Mbps and a variance 4.5 Mbps, with 10% of the UEs exhibiting random waypoint mobility at a pedestrian speed of 1 m/s.

Figure 13 shows the comparison in load distributions. The proposed method achieved a more even load distribution reflected by the lowest standard deviation. The load-aware conditional offloading idea is adaptable to various network sizes because it enables UEs to dynamically choose the gNB that provides the best possible service. As the number of gNBs in the network increases, the UEs associated with an overloaded cell have more options for offloading. Consequently, the load variance decreases, reflecting a more even load distribution. For the various load conditions considered, the GT-RL algorithm achieved the lowest variance with a performance gain of

5 %

at a minimum. By attaining equilibrium in the load distribution, the proposed method ensures UEs have enough resources to serve their data rate demands. Performance comparison of the percentage of UEs satisfied with QoS and system throughput are shown in Figure 14 and Figure 15, respectively. As seen in these figures, the GT-RL method achieved higher throughput and UE satisfaction levels in the various network sizes under consideration. Decentralized UE-assisted conditional offloading is highly adaptive to various network deployment options.

6. Conclusions

In this paper, we proposed a load-aware load-balancing procedure with a game theoretic reinforcement learning algorithm for 5G NR-U networks. The proposed method enhances load balancing through three key ideas. First, an extended SIB message for load awareness provides UEs with direct insights into the load status of neighboring gNBs. Second, a game theoretic reinforcement learning policy helps UEs choose the optimal gNB based on load conditions. Lastly, the conditional offloading mechanism enables UEs to promptly switch to the most suitable gNB. These strategies enable the proposed method to achieve swift redistributions to balance network loads. Simulation results showed that the GT-RL method outperformed previous algorithms by driving the network into an even load distribution, achieving higher levels of network throughput and UE QoS satisfaction. Especially under a high-load dynamics, the proposed method achieved an 8% gain in UE satisfaction with QoS and a 7.61% gain in network throughput compared to DQN-MLB, but compared to the non-AI approaches, UE QoS satisfaction and the network throughput were enhanced by more than 15%.

Beyond its simulation performance, our GT-RL framework offers a significant theoretical advantage over conventional DRL approaches. While DRL faces challenges with convergence guarantees, especially under the spectrum fluctuations induced by the LBT mechanism in NR-U, our method ensures convergence and is inherently aware of LBT constraints. Consequently, the proposed GT-RL framework represents a robust and practical solution, balancing theoretical soundness with real-world applicability, making it a prime candidate for enhancing service reliability in demanding industrial 5G NR-U environments.

Author Contributions

Conceptualization, Y.T.S., S.M.S., T.M.D. and S.K. (Sungoh Kwon); Methodology, Y.T.S., S.M.S., T.M.D. and S.K. (Sungoh Kwon); Software, Y.T.S.; Validation, S.M.S., T.M.D., S.K. (Sungmin Kim) and S.K. (Sungoh Kwon); Formal analysis, Y.T.S.; Writing—original draft, Y.T.S.; Writing—review & editing, S.M.S., T.M.D., S.K. (Sungmin Kim) and S.K. (Sungoh Kwon); Visualization, Y.T.S. and S.K. (Sungoh Kwon); Supervision, S.K. (Sungmin Kim) and S.K. (Sungoh Kwon). All authors have read and agreed to the published version of the manuscript.

Funding

This result was supported by the “Regional Innovation System & Education (RISE)” through the Ulsan RISE Center, funded by the Ministry of Education (MOE) and the Ulsan Metropolitan City, Republic of Korea (2025-RISE-07-001).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wen, M.; Li, Q.; Kim, K.J.; Lopez-Perez, D.; Dobre, O.; Poor, H.V.; Popovski, P.; Tsiftsis, T. Private 5G Networks: Concepts, Architectures, and Research Landscape. IEEE J. Sel. Top. Signal Process. 2021, 16, 7–25. [Google Scholar] [CrossRef]
Rostami, A. Private 5G Networks for Vertical Industries: Deployment and Operation Models. In Proceedings of the 2019 IEEE 2nd 5G World Forum (5GWF), Dresden, Germany, 30 September–2 October 2019; pp. 433–439. [Google Scholar] [CrossRef]
Lu, Y.; Witherell, P.; Jones, A. Standard connections for IIoT empowered smart manufacturing. Manuf. Lett. 2020, 26, 17–20. [Google Scholar] [CrossRef]
3GPP. Technical Specification Group Radio Access Network; Study on NR-Based Access to Unlicensed Spectrum. Technical Recommendation (TR) 38.889, 3rd Generation Partnership Project (3GPP); 2018. Available online: https://www.3gpp.org/ftp//Specs/archive/38_series/38.889/38889-g00.zip (accessed on 1 September 2025).
3GPP. Introduction of 6GHz NR Unlicensed Operation. Technical Recommendation (TR) 38.849, 3GPP, 2022. 3GPP TR 38.849 Release 17. Available online: https://www.3gpp.org/ftp/Specs/archive/38_series/38.849/38849-h00.zip (accessed on 1 September 2025).
Aijaz, A. Private 5G: The Future of Industrial Wireless. IEEE Ind. Electron. Mag. 2020, 14, 136–145. [Google Scholar] [CrossRef]
Naik, G.; Park, J.M.; Ashdown, J.; Lehr, W. Next Generation Wi-Fi and 5G NR-U in the 6 GHz Bands: Opportunities and Challenges. IEEE Access 2020, 8, 153027–153056. [Google Scholar] [CrossRef]
Maldonado, R.; Karstensen, A.; Pocovi, G.; Esswie, A.A.; Rosa, C.; Alanen, O.; Kasslin, M.; Kolding, T. Comparing Wi-Fi 6 and 5G Downlink Performance for Industrial IoT. IEEE Access 2021, 9, 86928–86937. [Google Scholar] [CrossRef]
Ren, Q.; Wang, B.; Zheng, J.; Zhang, Y. Performance Modeling of an NR-U and WiFi Coexistence System Using the NR-U Category-4 LBT Procedure and WiFi DCF Mechanism in the Presence of Hidden Nodes. IEEE Trans. Veh. Technol. 2023, 72, 14801–14814. [Google Scholar] [CrossRef]
Ak, E.; Canberk, B. Forecasting Quality of Service for Next-Generation Data-Driven WiFi6 Campus Networks. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4744–4755. [Google Scholar] [CrossRef]
Loginov, V.; Khorov, E.; Lyakhov, A.; Akyildiz, I.F. CR-LBT: Listen-Before-Talk With Collision Resolution for 5G NR-U Networks. IEEE Trans. Mob. Comput. 2022, 21, 3138–3149. [Google Scholar] [CrossRef]
Yang, P.; Lei, J.; Kong, L.; Xu, C.; Zeng, P.; Khorov, E. Policy Learning based Cognitive Radio for Unlicensed Cellular Communication. In Proceedings of the GLOBECOM 2022—2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 1013–1018. [Google Scholar] [CrossRef]
Mahmood, A.; Beltramelli, L.; Fakhrul Abedin, S.; Zeb, S.; Mowla, N.I.; Hassan, S.A.; Sisinni, E.; Gidlund, M. Industrial IoT in 5G-and-Beyond Networks: Vision, Architecture, and Design Trends. IEEE Trans. Ind. Inform. 2022, 18, 4122–4137. [Google Scholar] [CrossRef]
Pourghebleh, B.; Hayyolalam, V. A comprehensive and systematic review of the load balancing mechanisms in the Internet of Things. Clust. Comput. 2020, 23, 641–661. [Google Scholar] [CrossRef]
Tran, Q.H.; Duong, T.M.; Kwon, S. Load Balancing for Integrated Access and Backhaul in mmWave Small Cells. IEEE Access 2023, 11, 138664–138674. [Google Scholar] [CrossRef]
Shahid, S.M.; Seyoum, Y.T.; Won, S.H.; Kwon, S. Load Balancing for 5G Integrated Satellite-Terrestrial Networks. IEEE Access 2020, 8, 132144–132156. [Google Scholar] [CrossRef]
Gómez, B.; Coronado, E.; Villalón, J.; Riggio, R.; Garrido, A. User Association in Software-Defined Wi-Fi Networks for Enhanced Resource Allocation. In Proceedings of the 2020 IEEE Wireless Communications and Networking Conference (WCNC), Seoul, Republic of Korea, 25–28 May 2020; pp. 1–7. [Google Scholar]
Lei, T.; Wen, X.; Lu, Z.; Li, Y. A Semi-Matching Based Load Balancing Scheme for Dense IEEE 802.11 WLANs. IEEE Access 2017, 5, 15332–15339. [Google Scholar] [CrossRef]
Cao, F.; Zhong, Z.; Fan, Z.; Sooriyabandara, M.; Armour, S.; Ganesh, A. User association for load balancing with uneven user distribution in IEEE 802.11ax networks. In Proceedings of the 2016 13th IEEE Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 9–12 January 2016; pp. 487–490. [Google Scholar]
Oh, H.S.; Jeong, D.G.; Jeon, W.S. Joint Radio Resource Management of Channel-Assignment and User-Association for Load Balancing in Dense WLAN Environment. IEEE Access 2020, 8, 69615–69628. [Google Scholar] [CrossRef]
Hasan, M.M.; Kwon, S.; Na, J.H. Adaptive Mobility Load Balancing Algorithm for LTE Small-Cell Networks. IEEE Trans. Wirel. Commun. 2018, 17, 2205–2217. [Google Scholar] [CrossRef]
Addali, K.; Kadoch, M. Enhanced Mobility Load Balancing Algorithm for 5G Small Cell Networks. In Proceedings of the 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), Edmonton, AB, Canada, 5–8 May 2019; pp. 1–5. [Google Scholar] [CrossRef]
Addali, K.M.; Chang, Z.; Lu, J.; Liu, R.; Kadoch, M. Mobility Load Balancing with Handover Minimization for 5G Small Cell Networks. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 1222–1227. [Google Scholar] [CrossRef]
Singh, N.P. Efficient network selection using game theory in a heterogeneous wireless network. In Proceedings of the 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Madurai, India, 10–12 December 2015; pp. 1–4. [Google Scholar] [CrossRef]
Alhabo, M.; Zhang, L.; Nawaz, N. Energy efficient handover for heterogeneous networks: A non-cooperative game theoretic approach. Wirel. Pers. Commun. 2022, 122, 2113–2129. [Google Scholar] [CrossRef]
Saha, N.; Vesilo, R. An evolutionary game theory approach for joint offloading and interference management in a two-tier HetNet. IEEE Access 2017, 6, 1807–1821. [Google Scholar] [CrossRef]
Attiah, K.; Banawan, K.; Gaber, A.; Elezabi, A.; Seddik, K.; Gadallah, Y.; Abdullah, K. Load Balancing in Cellular Networks: A Reinforcement Learning Approach. In Proceedings of the 2020 IEEE 17th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 10–13 January 2020; pp. 1–6. [Google Scholar] [CrossRef]
Xu, Y.; Xu, W.; Wang, Z.; Lin, J.; Cui, S. Load Balancing for Ultradense Networks: A Deep Reinforcement Learning-Based Approach. IEEE Internet Things J. 2019, 6, 9399–9412. [Google Scholar] [CrossRef]
Feriani, A.; Wu, D.; Xu, Y.T.; Li, J.; Jang, S.; Hossain, E.; Liu, X.; Dudek, G. Multiobjective Load Balancing for Multiband Downlink Cellular Networks: A Meta- Reinforcement Learning Approach. IEEE J. Sel. Areas Commun. 2022, 40, 2614–2629. [Google Scholar] [CrossRef]
Raharya, N.; She, C.; Hardjawana, W.; Vucetic, B. Deep Learning for Distributed User Association in Massive Industrial IoT Networks. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference (WCNC), Nanjing, China, 29 March–1 April 2021; pp. 1–6. [Google Scholar] [CrossRef]
Noor-A-Rahim, M.; John, J.; Firyaguna, F.; Sherazi, H.H.R.; Kushch, S.; Vijayan, A.; O’Connell, E.; Pesch, D.; O’Flynn, B.; O’Brien, W.; et al. Wireless Communications for Smart Manufacturing and Industrial IoT: Existing Technologies, 5G and Beyond. Sensors 2023, 23, 73. [Google Scholar] [CrossRef] [PubMed]
Alizadeh, A.; Lim, B.; Vu, M. Multi-Agent Q-Learning for Real-Time Load Balancing User Association and Handover in Mobile Networks. IEEE Trans. Wirel. Commun. 2024, 23, 9001–9015. [Google Scholar] [CrossRef]
Kim, H.; So, J. Distributed Multi-Agent Deep Reinforcement Learning-Based Transmit Power Control in Cellular Networks. Sensors 2025, 25, 4017. [Google Scholar] [CrossRef]
Liao, X.; Wang, Y.; Han, Y.; Li, Y.; Lin, C.; Zhu, X. Heterogeneous Multi-Agent Deep Reinforcement Learning for Cluster-Based Spectrum Sharing in UAV Swarms. Drones 2025, 9, 377. [Google Scholar] [CrossRef]
Liu, H.; Wang, Y.; Li, P.; Cheng, J. A multi-agent deep reinforcement learning-based handover scheme for mega-constellation under dynamic propagation conditions. IEEE Trans. Wirel. Commun. 2024, 23, 13579–13596. [Google Scholar] [CrossRef]
3GPP. NR; Radio Resource Control (RRC) protocol specification. Technical Specification (TS) 38.331, 3rd Generation Partnership Project (3GPP), 2022. Version 16.9.0. Available online: https://www.3gpp.org/ftp/Specs/archive/38_series/38.331/38331-g90.zip (accessed on 1 September 2025).
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Management and Orchestration; Performance Measurements for NG-RAN. Technical Report TS 28.552, 3GPP, 2021. Available online: https://www.3gpp.org/ftp/Specs/archive/28_series/28.552/28552-f00.zip (accessed on 1 September 2025).
Siomina, I.; Yuan, D. Analysis of Cell Load Coupling for LTE Network Planning and Optimization. IEEE Trans. Wirel. Commun. 2012, 11, 2287–2297. [Google Scholar] [CrossRef]
Salo, J.; Zacarias, L. Analysis of LTE Radio Load and User Throughput. Int. J. Comput. Netw. Commun. 2017, 9, 29–47. [Google Scholar] [CrossRef]
Lucas-Estañ, M.C.; Gozalvez, J. Load Balancing for Reliable Self-Organizing Industrial IoT Networks. IEEE Trans. Ind. Inform. 2019, 15, 5052–5063. [Google Scholar] [CrossRef]
Prasad, N.; Arslan, M.; Rangarajan, S. Exploiting Cell Dormancy and Load Balancing in LTE HetNets: Optimizing the Proportional Fairness Utility. IEEE Trans. Commun. 2014, 62, 3706–3722. [Google Scholar] [CrossRef]
Moura, J.; Hutchison, D. Game Theory for Multi-Access Edge Computing: Survey, Use Cases, and Future Trends. IEEE Commun. Surv. Tutor. 2019, 21, 260–288. [Google Scholar] [CrossRef]
Balasundaram, A.; Rajesh, L. A survey on game theoretic approach in wireless networks. In Proceedings of the 2014 International Conference on Communication and Network Technologies, Sivakasi, India, 18–19 December 2014; pp. 308–313. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
3GPP. NR and NG-RAN Overall Description; Stage-2, 2020. Available online: https://www.3gpp.org/ftp//Specs/archive/38_series/38.300/38300-g20.zip (accessed on 1 September 2025).
3GPP. NR; Radio Resource Control (RRC); Protocol Specification. Technical Report TS 38.331, 3GPP, 2020. 3GPP TS 38.331 version 16.2.0 Release 16. Available online: https://www.3gpp.org/ftp/Specs/archive/38_series/38.331/38331-g00.zip (accessed on 1 September 2025).
Turab, A.; Sintunavarat, W. On the solution of the generalized functional equation arising in mathematical psychology and theory of learning approached by the Banach fixed point theorem. Carpathian J. Math. 2023, 39, 541–551. [Google Scholar] [CrossRef]
Gawłowicz, P.; Zubow, A. ns-3 meets OpenAI Gym: The Playground for Machine Learning in Networking Research. In Proceedings of the ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWiM), Miami Beach, FL, USA, 25–29 November 2019. [Google Scholar]
Jiang, T.; Zhang, J.; Tang, P.; Tian, L.; Zheng, Y.; Dou, J.; Asplund, H.; Raschkowski, L.; D’Errico, R.; Jämsä, T. 3GPP standardized 5G channel model for IIoT scenarios: A survey. IEEE Internet Things J. 2021, 8, 8799–8815. [Google Scholar] [CrossRef]
Cantero, M.; Inca, S.; Ramos, A.; Fuentes, M.; Martín-Sacristán, D.; Monserrat, J.F. System-Level Performance Evaluation of 5G Use Cases for Industrial Scenarios. IEEE Access 2023, 11, 37778–37789. [Google Scholar] [CrossRef]

Figure 1. System model of the considered NR-U network.

Figure 2. NR-U LBT channel access procedure.

Figure 3. Operational framework of the proposed load-balancing procedure.

Figure 4. Example scenarios.

Figure 5. Convergence rates from regular deployment.

Figure 6. Convergence rates from irregular deployment.

Figure 7. Comparison of load variance under varying number of UEs.

Figure 8. Comparison of UE satisfaction under varying number of UEs.

Figure 9. Comparison of throughput under varying number of UEs.

Figure 10. Load distribution under dynamic conditions.

Figure 11. UE satisfaction under dynamic conditions.

Figure 12. Throughput under dynamic conditions.

Figure 13. Load variance under irregular deployment.

Figure 14. Satisfied UEs under irregular deployment.

Figure 15. Throughput comparison under irregular deployment.

Table 1. Simulation parameters.

Parameter	Value
Deployment	Standalone
Frequency	5.2 GHz
Bandwidth	20 MHz
gNB Tx Power	23 dBm
UE Tx Power	17 dBm
Network Layout	3 GPP indoor office scenario [4]
Number of gNBs	Varies between 6 and 10
gNB Placement	Regular in some scenarios; random in others with guaranteed area coverage
Number of UEs	Varies between 20 and 100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Seyoum, Y.T.; Shahid, S.M.; Duong, T.M.; Kim, S.; Kwon, S. NR-U Network Load Balancing: A Game Theoretic Reinforcement Learning Approach. Electronics 2025, 14, 3986. https://doi.org/10.3390/electronics14203986

AMA Style

Seyoum YT, Shahid SM, Duong TM, Kim S, Kwon S. NR-U Network Load Balancing: A Game Theoretic Reinforcement Learning Approach. Electronics. 2025; 14(20):3986. https://doi.org/10.3390/electronics14203986

Chicago/Turabian Style

Seyoum, Yemane Teklay, Syed Maaz Shahid, Tho Minh Duong, Sungmin Kim, and Sungoh Kwon. 2025. "NR-U Network Load Balancing: A Game Theoretic Reinforcement Learning Approach" Electronics 14, no. 20: 3986. https://doi.org/10.3390/electronics14203986

APA Style

Seyoum, Y. T., Shahid, S. M., Duong, T. M., Kim, S., & Kwon, S. (2025). NR-U Network Load Balancing: A Game Theoretic Reinforcement Learning Approach. Electronics, 14(20), 3986. https://doi.org/10.3390/electronics14203986

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

NR-U Network Load Balancing: A Game Theoretic Reinforcement Learning Approach

Abstract

1. Introduction

2. System Model and Problem Formulation

2.1. The Network Model

2.2. NR-U Channel Access Procedure

2.3. Channel Model

2.4. Load Measurement

2.5. Problem Formulation

3. Game Theoretic Reinforcement Learning Model of the Load Balancing Problem

3.1. Game Theory

3.2. Reinforcement Learning

4. Proposed Load-Balancing Procedure and Algorithm

4.1. Operational Framework

4.2. Load Balancing

4.3. Discussion of Convergence, Complexity, and Scalability

5. Performance Evaluation

5.1. Convergence Rate Comparison

5.2. Impact of Network Load

5.3. Impact of Load Dynamics

5.4. Impact of Network Size

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI