Game ‐ Based Resource Allocation Mechanism in B5G HetNets with Incomplete Information

Featured Application: The paper tends to be more practical by solving the problem that important information of either mobile users or HetNets is difficult to acquire during the resource allocation process. Abstract: Ultra ‐ dense and highly heterogeneous network (HetNet) deployments make the allocation of limited wireless resources among ubiquitous Internet of Things (IoT) devices an unprecedented challenge in 5G and beyond (B5G) networks. The interactions among mobile users and HetNets remain to be analyzed, where mobile users choose optimal networks to access and the HetNets adopt proper methods for allocating their own network resource. Existing works always need complete information among mobile users and HetNets. However, it is not practical in a realistic situation where important individual information is protected and will not be public to others. This paper proposes a distributed pricing and resource allocation scheme based on a Stackelberg game with incomplete information. The proposed model proves to be more practical by solving the problem that important information of either mobile users or HetNets is difficult to acquire during the resource allocation process. Considering the unknowability of channel gain information, the follower game among users is modeled as an incomplete information game, and channel gain is regarded as the type of each player. Given the pricing strategies of networks, users will adjust their bandwidth requesting strategies to maximize their expected utility. While based on the sub ‐ equilibrium obtained in the follower game, networks will correspondingly update their pricing strategies to be optimal. The existence and uniqueness of Bayesian Nash equilibrium is proved. A probabilistic prediction method realizes the feasibility of the incomplete information game, and a reverse deduction method is utilized to obtain the game equilibrium. Simulation results show the superior performance of the proposed method.


Introduction
Rapid proliferation of intelligent objects in the Internet of Things (IoT) brings a tremendous data traffic increase in 5G and beyond networks. The Cisco visual networking index [1], updated in 2019, predicts that, by 2022, mobile data traffic will reach a seven-fold increase over 2017 and average 5G connections will be three times more than 4G connections. Meanwhile, as diverse deployments of 5G heterogeneous networks (HetNets), mobile users are able to access one or more wireless networks optimally and simultaneously for obtaining better network resources and quality of services (QoS). This trend incurs increasing requirements and competitions for limited network resources among ubiquitous IoT devices. Furthermore, for each HetNet in a 5G and beyond environment, efficient and effective resource allocation methods should be adopted to improve the rate of resource utilization and optimize global system performance. Consequently, the allocation or management of HetNet resources has always been a crucial issue needing prompt solutions.
Numerous works have been conducted to solve the above issues, especially game-based methods, proving to be efficient and practical for improving the allocation of limited resources in realistic situations, where the strategies of mobile users or networks are interacted with mutually. Zhang et al. [2] propose a potential game-based distributed resource allocation optimization algorithm in HetNets with mobile edge computing. Xu et al. [3] address the uplink cross-layer (physical layer and link layer) resource allocation problem for heterogeneous wireless access, which is based on imperfect channel state information and queue state information, respectively. Authors in References [4,5] analyzed the interactions among users and networks based on the Stackelberg game, where both mobile users and networks obtain their optimal equilibrium strategies. The game theory has also been used to solve the problem of distributed power control and dynamic pricing in References [6,7]. Meanwhile, in research [8][9][10], game theory is utilized to address the issue of network selection, with mobile users aimed to selfishly maximize their own utilities. Heterogeneous vehicular network selection is studied in Reference [8] under conditions where the performance parameters of some networks are changing. Li et al. [9] analyzes the resource sharing problem in the cloud radio access network (C-RAN)-based cellular networks and models the problem as a noncooperative game. Yan et al. [10] studies the unmanned aerial vehicles (UAVs) access selection and base station (BS) bandwidth allocation problems by presenting a hierarchical game framework. Bargaining game-based frameworks is proposed in Reference [11] to solve the joint power and bandwidth allocation in cognitive HetNets, while frequency and timeslot selection problems are studied in Reference [12], based on the potential game. Game theory-enabled methods are also utilized to solve the resource allocation issue in the device-to-device (D2D) communication scenario [13][14][15][16]. Specifically, power control and resource assignment problems in a cognitive radio based HetNet coexisting with cognitive D2D pairs and cellular users are solved in Reference [13], and the non-cooperative game is adopted in Reference [14] to address the cross-tier and co-tier interference in D2D communications. Rudenko et al. [15] focus on the security of both cellular and D2D communications through a game-based resource allocation scheme, while Sun et al. [16] propose the coalition formation game-based joint resource allocation and power control algorithm to maximize the D2D system throughput. Furthermore, the optimal allocation of the spectrum resource is studied in [17,18], respectively, based on the n-person game and Stackelberg game. Meanwhile, in Reference [19], cooperative game theory is utilized for solving the fair-awareness resource sharing problem in 5G environments and Xie et al. [20] address the caching resource allocation problem based on a competitive game.
All above game-based resource allocation algorithms are proposed in a restricted condition, where each player has complete information about others' related parameters. However, it is unrealistic for users to make their important information public.
Thus, in order to solve the realistic problem that game players, i.e., the HetNets and mobile users in this paper, cannot acquire complete information about others, a distributed pricing and resource allocation method, based on the Stackelberg game with incomplete information, is proposed in this paper. The resource competition among users is modeled as the follower game with incomplete information. To be practical, the channel gain is regarded as private information, which is the type of each player. Given the price strategies of networks, users need to adjust their bandwidth, requesting strategies to maximize the expected utility. We construct a utility function for users, which is suitable for the incomplete information scenario and consists of the profit obtained by throughout; the fee to buy bandwidth and the cost caused by the network load. A probabilistic prediction method is proposed to realize the feasibility of the follower game with incomplete information. An appropriate distributed iteration algorithm is used to find the Bayesian Nash equilibrium. The leader game is played among networks, in which network operators need to change their pricing strategies based on user strategy profiles to obtain the Nash equilibrium. Additionally, a distributed algorithm based on the network's marginal utility is used to find the best response dynamics of the networks. A mathematical derivation and system simulation both prove the existence and uniqueness of the equilibrium. The main contributions of this paper are listed as follows:


We propose a Stackelberg game model with incomplete information to solve the problem, where players have only partial information about others during the process of resource competition.  We design a practical utility function for each mobile user and propose a probabilistic prediction method to realize the feasibility of the incomplete information game. The proof of the existence and of uniqueness of the Bayesian Nash equilibrium is presented.  The Bayesian Nash equilibrium obtained from the game with incomplete information is compared with the Nash equilibrium in the complete information game, and superior performance is achieved through system simulation.
The rest of the paper is organized as follows. Section 2 shows the system model of the two-tiered Stackelberg game with incomplete information. Section 3 details the proposed game-based distributed resource allocation algorithm and Section 4 illustrates the simulation results. Finally, the conclusion is provided in Section 5.

System Model
The Stackelberg game is a hierarchical decision-making game, in which the players who enforce their strategies on the other players are called 'leaders', while the players who respond to the leader's actions are called 'followers'. In a leader-follower Game, the followers are rational, and they will act to optimize their own payoff given the leader's strategies. However, considering the difficulty to obtain the core information of others in a particular situation, a Bayesian approach offers the possibility to solve the game with incomplete information. In the Bayesian game, a probability distribution is used to express the belief about the private information of others, which is indicated as the type of player. A game-based resource allocation model in a two-stage heterogeneous wireless network consisting of multiple networks and multiple mobile users is illustrated in Figure 1. Without loss of generality, we consider service area 1 to be covered only by network 1 (Net 1), while area 2 is covered both by Net 1 and network 2 (Net 2). In this model, network operators acting as the leaders provide users with limited bandwidth and play by properly setting price strategies to maximize their own profits, while mobile users play in the second stage, acting as the followers to adjust the need of bandwidth, according to the strategies of different operators and their payoff functions. In this model, we consider the practical situation where the channel gain between a mobile user and its associated network receiver is available for this user only, regarded as the type of user.
Without loss of generality, we assumed that the communication between user and network can be uplinked or downlinked. The SINR (signal-to-interference-and-noise-ratio) received by a user is given as follows: where i randomly represents a user in the user set h is the channel gain from the user i to the network j , and 2 0  is the noise power. We assume that the noise power is the same to all users. The throughput of user i obtained from network j is where ij q is the bandwidth of user i acquired from network j . In order to prevent network congestion, the sum of the bandwidth requests in the coverage area of network j must not exceed a maximum level. This constraint can be expressed as in Equation (3), where j W is the capacity of In this model, m network operators are competing for n users in their covered areas.
Regardless of other cost, the utility of network j is given in Equation (4), which considers only its pure profit based on the price of bandwidth.
where j b is the price strategy of network j and j Q is the load of network j , defined as the total bandwidth of users who are connected to network j , which is It is obvious that the network wishes to maximize its profit by choosing an optimal bandwidth price strategy. If the price is too high, mobile users may change over to other networks, while, if too low, the profit may be also decrease, even though the network has come to saturation.
The competition among users plays as a noncooperation game with incomplete information. The utility of the mobile user is composed of the profit based on the throughout and expense. We define the profit of user i based on the acquired throughput as follows: where i g is the throughout profit of user i and i  is the constant related to the user experience.
The expense contains the fee to acquire the bandwidth and the cost caused by the network load represented respectively in Equations (6) and (7).
where j  is a constant related to specific access technology. Therefore, from Equations (5), (6), and (7), the utility of the mobile user is defined as follows: The proposed Stackelberg game-based model contains two stages. In the first stage, different networks declare the price strategy profile to all users so that they can make actions to request bandwidth from the connected networks. As a result of the limited network resource, the game among users comes to a noncooperation game with incomplete information, and the Bayesian Nash equilibrium is the final solution. In the second stage, according to the equilibrium strategy of users, the network will adjust its price strategy to obtain the optimal utility. In the next section, we will analyze the follower game firstly, and, subsequently, the behaviors of networks in the leader game are studied.

Distributed Resource Allocation Algorithm
To obtain the solution in the two-tire game model, consisting of a follower game among users and a leader game among networks, we proposed a distributed iteration algorithm combined with a backward induction method to find the equilibrium in the two subgames. We analyzed the follower game and leader game, respectively.

Follower Game
Given the bandwidth price declared by networks, mobile users compete with each other for the limited resources to maximize their payoffs. In the follower subgame with incomplete information, the channel gain is the private information regarded as the type of the mobile user, while its probability distribution is public information. As a result, we modelled the competition among users with there Bayesian game, where:  Players are n mobile users in the set of The action set of user i is the set of the requested bandwidth i . We assumed the types of all users were continuous and had the same probability distribution, and the probability density function was Strategy of user i is a mapping from the type set to the action set, which is :


Payoff of user i is the function of its action ij q and its type ij h , given the actions and types of others, which is defined as: For the Bayesian resource-competition game, the expected utility of user i denotes as F i u : To obtain the Bayesian Nash equilibrium in the follower subgame, the best response of user i is defined as the highest expected utility, with respect to the strategies of other players and the belief about the probability distribution of other types. With a continuous type space, we can obtain a pure Bayesian Nash equilibrium (BNE) 1 (12) where is represented as: It is easy to know that As a result, 2 ( , ; , )

□
User's utility curve in simulation part also indicates the concave properties and consequently proves the existence of the equilibrium point. Moreover, the uniqueness of BNE can also be proved with the method of Karush-Kuhn-Tucker conditions to Equation (11), which offers the necessary and sufficient conditions for the uniqueness [21].
Given the price strategies of the allocated bandwidth made by operators, it is necessary for mobile users to obtain the optimal strategy * i q by changing requesting strategies for limited bandwidth. We apply a distributed algorithm to study the dynamic process, in which the rate of strategy change is directly proportional to the gradient of the expected utility function: 2 ( , ; , ) where represents the gradient,  is time variance, i  is the learning rate of user i (e.g., the rate of strategy adaptation), and 0 i   . Specifically, during the time slot between  and 1   , the dynamic iteration of the follower's strategy can be expressed as Equation (16). The concave utility function ensures the dynamic algorithm converges to the Bayesian Nash equilibrium. z q ij

Leader Game
When obtaining the BNE in the follower game, the network operators as leaders can know the equilibrium strategies of mobile users. Furthermore, the competition for mobile users among networks is modeled as a dynamic non-cooperation game, where the actions made by the players are sequential and the information can be known by the players of the other players' decisions. This is so that, different from the follower game, the leader game acquires complete information and the Nash equilibrium is the solution.  Payoff of network j in the leader game with complete information is expressed in Equation (17), which contains only the profit obtained by selling the bandwidth (as in Equation (4)).

( , ; )
where   The utility function curve also proves the existence of the Nash equilibrium, where we took Net 1 as an example to study the player's utility curve changing with its price strategy. It is easy to see that the network's utility function is a concave function based on the price strategy of the bandwidth.
Similar to the follower game, a distributed iteration algorithm is used to obtain the Nash equilibrium in the leader game. Based on the equilibrium strategy profile of mobile users, network operators will adjust their price strategies correspondingly, and the rate of change can be presented with its marginal utility: We can obtain the marginal utility by giving a little increment  to the network's utility function: As a result, the iteration of price strategy made by network j can be expressed as: where j  represents the adjustable step for the price strategy of network j .
When obtaining the best price strategy * j b for all j , the Nash equilibrium in the leader game will be reached. Sequentially, the hole two-tiered game converges to its final equilibrium, denoted as * * ( , ) q b , where * q and * b represent, respectively, the optimal bandwidth requesting strategy profile of mobile users and the optimal price strategy profile of networks. Algorithm 1 details the proposed two-tier game-based distributed resource allocation algorithm. We assume that networks will announce their pricing strategy profile to all mobile users at the beginning of the follower game.    (19) and (20), j M   12: Networks obtain the new pricing strategy

Simulation Results
We considered a heterogeneous evolution situation, illustrated in Figure 1, in which area 2 was covered by both an IEEE 802.16-based network (such as WIMAX (World Interoperability for Microwave Access)) and an IEEE 802.11-based network (such as WLAN (Wireless Local Area Network)), while the rest was covered only by WIMAX. For WIMAX, we assumed the transmission range was 500 m and its saturation bandwidth was 50 MHz (i.e., 1 50 W  ), while, for WLAN, the transmission range was 100 m and its transmission rate was 11 MHz (i.e., 2 11 W  ). The number of users in each area was 10. Practically, mobile users in the same area had similar influence factors to their QoS experience. Thus, we can study the follower game with two classes of users; users covered by multi-networks and the other users covered only by one network. Without loss of generality, the coverage area 2 is a hot spot region where the mobile users can be served by all of two networks and obtain more network resources. Thus, we assumed (2) 2   for all users in area 2 and (1) 1.5   for users in area 1.
To prove the accuracy of the simulation, we adopted some key parameters from paper [22] to model the proposed Stackelberg game. We set the channel gain regarded as user's type, which was uniformly distributed, ranging from 0 to 1. The transmit power was assumed as   Figure 2 shows the utility of one mobile user in area 2, which was plotted when given the pricing strategies of two networks. As illustrated in this figure, the utility curve shows one and only one peak with the iterations of the user's bandwidth requesting strategy. Moreover, as described in Section 3, the mobile users in area 1 had same objective functions but different strategy scopes with the users in area 2, thus, similarly, one and only one peak can be obtained on the utility curve of a user in area 1.  Utility of WIMAX Figure 3 illustrates the utility curve of WIMAX changing with its pricing strategy when given all mobile users' bandwidth requesting strategies in the follower game. The above results prove the existence of both BNE in the follower game and Nash equilibrium in the leader game, which can also be proved by the description in Section 3. In the following section, we analyze the obtained performances in the follower game and leader game, respectively.

Follower Game
Given the initial printing strategies of the two networks, users in area 1 and area 2 played the resource competition game with the adjustment of their bandwidth requesting strategies to maximize their expected utilities. Figure 4 gives the adaptation trajectories of user strategies in area 1 and area 2. The "Area 1" curve shows the bandwidth strategies of users in area 1, which is covered only by WIMAX. The "Area 2 (WIMAX)" and "Area 2 (WLAN)" represent, respectively, the bandwidth requests from WIMAX and WLAN for users in area 2. As shown in this figure, the strategies of mobile users in either area 1 or area 2 tend to converge, which proves the existence of equilibrium in the follower game. Taking users in area 2 as an example, Figure 5 shows the impacts of different factors on the utility function, including the profit based on the throughput, the fee to request bandwidth, and the cost caused by the load of the connected network. The profit based on throughput and the fee to buy bandwidth both increased gradually with the increment of the user's bandwidth request, while the increasing load of networks caused the volatility in the user's cost curve till to an equilibrium value.  The utility curves of mobile users in both area 1 and area 2 are shown in Figure 6. The utility of one mobile user in area 2 is higher than that in area 1. This phenomenon can be explained by the reason that mobile users in the area covered by multi-networks could acquire more resources than the users served by only one network. Furthermore, the curve of mobile user's utility in area 2 fluctuated till it reached a stable value, which was caused by the adjustment of the user's strategy with the changes in the load of networks.  Figure 7 analyzes the impact of the number of users on the utilities of users in area 1 and area 2, respectively. The X-axis is the number of users in area 2. We conducted this simulation considering the practical situation that the number of users in a hot spot region (i.e., area 2) will increase as time goes long. In this case, the utilities of users both in the two areas decreased with the increasing number of players in the follower game. But it is obvious to see that the descent rate of users in area 2 was faster than the users in area 1. The reason is that the increment of the number of users in area 2 had a more serious impact on the load of the WLAN and users may have adjusted their strategies to request more bandwidth from WIMAX. The average utilities of all users obtained from BNE in an incomplete information game and the Nash equilibrium (NE) in a complete information game are shown in Figure 8. Here, the NE with complete information referred to the actual average utility when users perfectly knew the channel gain of each other. Note that the average utility obtained from BNE with incomplete information was lower than that from the NE with complete information. The reason can be explained via the fact that the users' expected average utility obtained by probabilistic prediction was less accurate than the actual average utility obtaining in a complete information game.

Leader Game
When obtaining the BNE in the follower game among mobile users, the game process turned to the leader game, where the two networks competed with each other for the users in their own covered area and conclusively maximized their utilities. Figure 9 shows the load trajectories of two networks, which is related to the number of connected users and their bandwidth requesting strategies. For WLAN, when changing with the increment of users' bandwidth requests, its load increased to a maximum (i.e., 9.8 MHz) and then decreased gradually to a stable value (i.e., 9.4 MHz). But for WIMAX, its load increases monotonously to a stable value (i.e., 49.2 MHz). This phenomenon can be explained by the effect of the network's load level on users' actions. Specifically, the network capacity of WLAN was smaller than WIMAX, so it was earlier for WLAN to reach its saturation point. Sequentially, as a rise in the cost was caused by the congestion of networks, users transferred their bandwidth requests to another network with a lower load level (i.e., WIMAX).  Figure 10 represents the best response pricing strategies of the two networks. The Nash equilibrium is the intersection of the strategy adaptation trajectories of two networks. The concave utility of each network proves the existence of the equilibrium. Using the backward induction and distributed iteration algorithm, we got the Nash equilibrium point to satisfy the maximal utility of the two networks. Consequently, the hole Stackelberg game with incomplete information reached its perfect equilibrium point. The above simulation results not only prove the existence and uniqueness of both the follower game and leader game, but also show the interactions among both mobile users and HetNets in B5G environments. Meanwhile, the optimal bandwidth requesting strategy and optimal pricing strategy for each mobile user and network are also illustrated in the above results, which further validates the correctness of theoretical deduction in Section 3.

Conclusions
In this paper, a Stackelberg game-based resource allocation scheme with incomplete information is proposed to solve the competitions for the limited wireless resource in B5G HetNets. The dynamic interactions among networks and mobile users is modeled as a two-tier game framework, where both networks and mobile users have only partial information about others. To be practical, the follower game, which is the competition of limited bandwidth among mobile users, was analyzed as a noncooperation dynamic Bayesian game, where the channel gain was the private information. A method of probability prediction and a distributed iteration algorithm ensured the feasibility of the Bayesian game. Given the price strategies made by network operators, the mobile users adjusted their bandwidth requesting strategies based on a distributed iteration algorithm to maximize their expected utility function and finally converge to equilibrium. The existence of the Bayesian Nash equilibrium is proved both by mathematical derivation and evolution, and we offer sufficient conditions to prove its uniqueness. When obtaining the BNE in the follower game, the networks in the leader game will correspondingly change their pricing strategies to satisfy their maximal profit. The iteration step for networks to adjust their pricing is based on its marginal utility. In the end, the two-tier Stackelberg game will converge its final equilibrium when the follower game obtains its BNE and the leader game gets its Nash equilibrium. Finally, the evolution results illustrate the theoretical analysis.