#
An Adaptive Learning Based Network Selection Approach for 5G Dynamic Environments^{ †}

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

- The heterogeneous network selection scenario is abstracted as a multiagent coordination problem, and a corresponding mathematical model is established. We analyzed the theoretical results of the model, i.e., the system guarantees convergence towards Nash equilibrium, which is proved to be Pareto optimal and socially optimal.
- The multiagent network selection strategy is proposed and appropriate algorithms are designed that enable users to adaptively adjust their selections in response to the gradually or abruptly changing environment.
- The performances of the approach are investigated under various conditions and parameters. Moreover, we compare our results with two existing approaches and get significantly better performances. Finally, the robustness of our proposed approach is examined, for which the system keeps desirable performances with non-compliant terminal users.

## 2. Background

#### 2.1. Game Theory

- The most commonly adopted solution concept in game theory is
**Nash equilibrium**(NE). Under an NE, no player can benefit by unilaterally deviating from its current strategy. - An outcome is
**Pareto optimal**if there does not exist any other outcome under which no player’s payoff is decreased while at least one player’s payoff is strictly increased. **Socially optimal**outcomes refer to those outcomes under which the sum of all players’ payoffs are maximized [14].

#### 2.2. Q-Learning

#### 2.3. Dynamic HetNet Environments

## 3. Methods

#### 3.1. Network Selection Problem Definition

#### 3.1.1. Multiagent Network Selection Model

- $BS=\{1,2,\dots ,m\}$ is the set of available base stations (BSs) in the HetNet environment.
- ${B}_{k}\left(t\right)$ denotes the provided bandwidth of base station $k\in BS$ at time t, which varies over time.
- $U=\{1,2,\dots ,n\}$ is the set of terminal users involved.
- ${b}_{i}\left(t\right)$ denotes the bandwidth demand of user $i\in U$ at time t, which also changes over time.
- ${A}_{i}\subseteq BS$ is the finite set of actions available to user $i\in U$, and ${a}_{i}\in {A}_{i}$ denotes the action (i.e., selected base station) taken by user i.
- ${P}_{i}(t,\mathbf{a})$ denotes the expected payoff of user $i\in U$ by performing the strategy profile $\mathbf{a}=\{{a}_{1},\dots ,{a}_{i},\dots {a}_{n}\}\in {\times}_{j\in U}{A}_{j}$ at time t.

#### 3.1.2. Theoretical Analysis

**Definition**

**1.**

**Theorem**

**1.**

**Proof.**

#### 3.2. Multiagent Network Selection Strategy

Algorithm 1 Network selection algorithm for each user |

Input: available base station set $BS$ |

bandwidth demand ${b}_{i}$ |

Output: selected base station $seleBS$ |

1: loop |

2: $seleBS\leftarrow $ Selection() |

3: receive the feedback of state information in the last compelted interaction |

4: Evaluation() |

5: end loop |

#### 3.2.1. Selection

Algorithm 2 Selection | |

1: | for all $k\in BS$ do |

2: | if $tabl{e}_{k}=\u2300$ then |

3: | push k in $unpredList$ |

4: | else |

5: | $predLoad\leftarrow $LoadPredict(${p}^{A}$)$//$ ${p}^{A}$ active predictor |

6: | $predBW\leftarrow $BWPredict() |

7: | if $predLoad+{b}_{i}\le predBW$ then |

8: | push k in $candList$ |

9: | end if |

10: | end if |

11: | end for |

12: | if $candList\ne \u2300$ then |

13: | for all $cand\in candList$ do |

14: | $availBW=predBW-predLoad$ |

15: | end for |

16: | $seleBS\leftarrow {argmax}_{k\in BS}(availBW)$ |

17: | else if $unpredList\ne \u2300$ then |

18: | $seleBS\leftarrow $random($unpredList$) |

19: | else |

20: | $seleBS\leftarrow lastBS$$//$ stay at last BS |

21: | $flag=-1$ |

22: | end if |

- 1
- Create predictor set. Each user keeps a set of r predictors $P(a,k)=\{{p}_{i}|1\le i\le r\}$, which is created from some predefined set in evaluation procedure (Section 3.2.2, case 1), for each available base station k. Each predictor is a function from a time series of historic loads to a predictive load value, i.e., $f:(({t}_{i},loa{d}_{i})|i=0,\u2025,p)\to predLoad$.
- 2
- Select active predictor. One predictor ${p}^{A}\in P$ is called active predictor, which is chosen in the evaluation procedure (Section 3.2.2, case 2,3), used in real load prediction.
- 3
- Make forecast. Predict the base station’s possible load via its historic load records and the active predictor.

#### 3.2.2. Evaluation

Algorithm 3 Evaluation | |

1: | if $predictorSet=\u2300$ then |

2: | create $predictorSet$ for $seleBS$ |

3: | ${p}^{A}\leftarrow $random$(predictordSet)$ |

4: | update$(tabl{e}_{seleBS})$ |

5: | else if $flag=-1$ then |

6: | for all $k\in BS$ do |

7: | delete $h\in tabl{e}_{seleBS}$ with a probability |

8: | end for |

9: | else |

10: | for all $p\in predictorSet$ do |

11: | $predLoad\leftarrow $LoadPredict$\left(p\right)$ |

12: | ${r}_{p}=1-\frac{\left|load-predload\right|}{load}$ |

13: | ${Q}_{p}=(1-\alpha ){Q}_{p}+\alpha {r}_{p}$ |

14: | end for |

15: | ${p}^{A}\leftarrow $BoltzmanExploration$(predictordSet)$ |

16: | $//$ abruptly changing environment |

17: | if $\left|{B}_{seleBS}-predBW\right|>\Delta $ then |

18: | $d=\left|B-lastBW\right|$ |

19: | for all $h\in tabl{e}_{seleBS}$ do |

20: | $h\leftarrow h\pm d$ |

21: | end for |

22: | end if |

23: | update$(tabl{e}_{seleBS})$ |

24: | end if |

**Case 1**If the selected base station is visited for the first time, the user will create a new predictor set for this base station and record its state information into the corresponding record table (Lines 1–4). All predictors in the set are chosen randomly from a predefined set, hence users’ predictor sets may be different from each other. As displayed in Table 1, the predefined set contains multiple types of forecasting functions [28] differ in window sizes. Different types of predictors are suitable for different situations and environments.

**Case 2**If $flag=-1$, it implies that currently historical records recommended no appropriate base station (Line 5). In this case, some old records need to be removed from the table to get more up-to-date information for further predictions (Line 6–8), which is necessary for a successful adaptation in the future. Otherwise, the user will never get an opportunity to access other base stations, which may satisfy its demand very well.

**Case 3**The general situation is that the user switched to a previously visited base station, i.e., it already has historical records on this base station (Line 9). The evaluation mainly involves two aspects: assessing the performance of all predictors in the set (Line 10–15) and dealing with the case of abruptly changing bandwidth (Line 17–22). The assessment of predictors resorts to Q-learning. Specifically, the Q-function in our approach is defined as the following equation:

## 4. Results

- RAT type: we consider three typical networks with various radio access technologies (RATs), namely IEEE 802.11 Wireless Local Area Networks (WLAN), IEEE 802.16 Wireless Metropolitan Area Networks (WMAN) and OFDMA Cellular Network, which are represented by $B{S}_{i}(i=0,1,2)$. Multi-mode user equipment in the heterogeneous wireless network can access any of the three networks.
- provided bandwidth: the maximum provided bandwidth of the three networks are 25 Mbps, 50 Mbps, and 5 Mbps, respectively [30]. Without loss of generality, two types of changing environments based on historical statistic traffic are considered. One of them is simulated as sinusoidal profiles, which change gradually. The provided bandwidth may also change abruptly according to time division, such as dawn, daytime and evening.
- bandwidth demand: users’ bandwidth demands also vary in a reasonable range. There are two types of traffic demand in the area: real-time voice traffic and non-real-time data traffic, which are randomly distributed.

#### 4.1. Experiment Results

**Adaptability**. Figure 4 shows the behavior of network selection on $B{S}_{0}$ within gradually and abruptly changing environments.When the user number increases from 600 to 700, the total demand is less than the total provided bandwidth. Initially, all users randomly select their base stations, thus resulting in high levels of overload or underload on different base stations. However, after a short period of interactions, all users can learn to coordinate their selections and the network bandwidth of $B{S}_{0}$ becomes well-utilized without being overloaded. Moreover, we observe that the increase of user number leads to better adaptability. Intuitively, this indicates that, when the total demand reaches close to the upper bound of the provided bandwidth, users can more sensitively sense the dynamic environment and quickly accommodate to the changes.

**User Payoff, Switching Rate and Bandwidth Utilization**. In Figure 5, increasing user number results in a slight decrease in user payoff while a marginal increase in switching rate. With increasing competition of limited base stations and bandwidth, the average bandwidth utilization efficiency increases approximately linearly. The three performances are a little worse in abruptly changing environments due to jitters at catastrophe points.

**Convergence Time**. The terminal user number plays a great influence on the complexity of our proposed algorithm (i.e., the convergence time). When the total demand is less than the total provided bandwidth, the system guarantees convergence, i.e., if there is no overload on any base station, the system converges to Nash Equilibrium, which is also Pareto optimal and socially optimal (Definition 1, Theorem 1). Initially, the system takes a learning phase to achieve convergence. If the provided bandwidth changes gradually or stays static, the equilibrium is sustained over time. We call it first-convergence, and the average first-convergence time exponentially increases in an acceptable range with the increasing user number from 640 to 720. Especially, in an abruptly changing environment, when encountering catastrophe points, the equilibrium is broken but re-converges in a number of steps. The average re-convergence time linearly varies with the number of users (see Figure 6).

#### 4.2. Experiment Comparisons

**Communication Complexity**. We first compare the communication complexity of the three algorithms in Table 3. Before an initial selection, each user has the knowledge of its base station candidates and bandwidth demand. In ALA, we assume that there exists a cooperation between a user and it connected base station. The user gets the feedback of a tuple $<load,bandwidth>$ from the base station in previous connections, rather than any prior knowledge. Such a cooperation is available and helpful, but does not infringe upon interests of any others.

**Load Balancing Analysis**. We investigate the load situations on the three base stations for some time when there are 720 users involved (see Figure 7 and Figure 8). It is the case that the total bandwidth demand is quite close to the total amount of provided bandwidth.

**User Payoff, Switching Rate and Bandwidth Utilization**. Comparison results of the three algorithms in terms of user payoff, switching rate and bandwidth utilization are presented in Figure 9 and Figure 10 under two changing environments, respectively. We observe that, in the beginning period, RATSA performs better than ALA in bandwidth utilization and user payoff; however, ALA outperforms it after a few interactions and shows better performance thereafter. The switching rate of ALA is slightly higher because users try to switch their connections to respond to the dynamics to get higher payoffs in the initial phase and at catastrophe points. It is important to highlight that the jitters in the abruptly changing environment of ALA are because of the time-lag of detecting abruptly changing bandwidth. This phenomenon may not exist in RATSA, since users are assumed to always access the currently and next provided bandwidth of all base stations, which is usually not accessible in practical environments.

#### 4.3. Robustness Testing

## 5. Discussion

## 6. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Wang, C.X.; Haider, F.; Gao, X.; You, X.H. Cellular architecture and key technologies for 5G wireless communication networks. Commun. Mag. IEEE
**2014**, 52, 122–130. [Google Scholar] [CrossRef] - Andrews, J.G.; Buzzi, S.; Choi, W.; Hanly, S.V.; Lozano, A.; Soong, A.C.; Zhang, J.C. What will 5G be? IEEE J. Sel. Areas Commun.
**2014**, 32, 1065–1082. [Google Scholar] [CrossRef] - Chin, W.H.; Fan, Z.; Haines, R. Emerging technologies and research challenges for 5G wireless networks. IEEE Wirel. Commun.
**2014**, 21, 106–112. [Google Scholar] [CrossRef] - Trestian, R.; Ormond, O.; Muntean, G.M. Game theory-based network selection: Solutions and challenges. IEEE Commun. Surv. Tutor.
**2012**, 14, 1212–1231. [Google Scholar] [CrossRef] - Martinez-Morales, J.D.; Pineda-Rico, U.; Stevens-Navarro, E. Performance comparison between MADM algorithms for vertical handoff in 4G networks. In Proceedings of the 2010 7th International Conference on Electrical Engineering Computing Science and Automatic Control (CCE), Tuxtla Gutierrez, Mexico, 8–10 September 2010; pp. 309–314. [Google Scholar]
- Lahby, M.; Cherkaoui, L.; Adib, A. An enhanced-TOPSIS based network selection technique for next generation wireless networks. In Proceedings of the 2013 20th International Conference on Telecommunications (ICT), Casablanca, Morocco, 6–8 May 2013; pp. 1–5. [Google Scholar]
- Lahby, M.; Adib, A. Network selection mechanism by using M-AHP/GRA for heterogeneous networks. In Proceedings of the 2013 6th Joint IFIP Wireless and Mobile Networking Conference (WMNC), Dubai, UAE, 23–25 April 2013; pp. 1–6. [Google Scholar]
- Wang, L.; Kuo, G.S.G. Mathematical modeling for network selection in heterogeneous wireless networks—A tutorial. IEEE Commun. Surv. Tutor.
**2013**, 15, 271–292. [Google Scholar] [CrossRef] - Niyato, D.; Hossain, E. Dynamics of network selection in heterogeneous wireless networks: An evolutionary game approach. IEEE Trans. Veh. Technol.
**2009**, 58, 2008–2017. [Google Scholar] [CrossRef] - Wu, Q.; Du, Z.; Yang, P.; Yao, Y.D.; Wang, J. Traffic-aware online network selection in heterogeneous wireless networks. IEEE Trans. Veh. Technol.
**2016**, 65, 381–397. [Google Scholar] [CrossRef] - Xu, Y.; Wang, J.; Wu, Q. Distributed learning of equilibria with incomplete, dynamic, and uncertain information in wireless communication networks. Prov. Med. J. Retrosp. Med. Sci.
**2016**, 4, 306. [Google Scholar] - Vamvakas, P.; Tsiropoulou, E.E.; Papavassiliou, S. Dynamic Provider Selection & Power Resource Management in Competitive Wireless Communication Markets. Mob. Netw. Appl.
**2018**, 23, 86–99. [Google Scholar] - Tsiropoulou, E.E.; Katsinis, G.K.; Filios, A.; Papavassiliou, S. On the Problem of Optimal Cell Selection and Uplink Power Control in Open Access Multi-service Two-Tier Femtocell Networks. In Lecture Notes in Computer Science Description; Springer: Berlin, Germany, 2014; pp. 114–127. [Google Scholar]
- Hao, J.; Leung, H.F. Achieving socially optimal outcomes in multiagent systems with reinforcement social learning. ACM Trans. Auton. Adapt. Syst.
**2013**, 8, 15. [Google Scholar] [CrossRef] - Malanchini, I.; Cesana, M.; Gatti, N. Network selection and resource allocation games for wireless access networks. IEEE Trans. Mob. Comput.
**2013**, 12, 2427–2440. [Google Scholar] [CrossRef][Green Version] - Aryafar, E.; Keshavarz-Haddad, A.; Wang, M.; Chiang, M. RAT selection games in HetNets. In Proceedings of the 2013 IEEE INFOCOM, Turin, Italy, 14–19 April 2013; pp. 998–1006. [Google Scholar]
- Monsef, E.; Keshavarz-Haddad, A.; Aryafar, E.; Saniie, J.; Chiang, M. Convergence properties of general network selection games. In Proceedings of the 2015 IEEE Conference on Computer Communications (INFOCOM), Hong Kong, China, 26 April–1 May 2015; pp. 1445–1453. [Google Scholar]
- Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res.
**1996**, 4, 237–285. [Google Scholar] - Busoniu, L.; Babuska, R.; De Schutter, B. A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. C
**2008**, 38, 156–172. [Google Scholar] [CrossRef] - Barve, S.S.; Kulkarni, P. Dynamic channel selection and routing through reinforcement learning in cognitive radio networks. In Proceedings of the 2012 IEEE International Conference on Computational Intelligence Computing Research, Coimbatore, India, 18–20 December 2012; pp. 1–7. [Google Scholar]
- Xu, Y.; Chen, J.; Ma, L.; Lang, G. Q-Learning Based Network Selection for WCDMA/WLAN Heterogeneous Wireless Networks. In Proceedings of the 2014 IEEE 79th Vehicular Technology Conference (VTC Spring), Seoul, Korea, 18–21 May 2014; pp. 1–5. [Google Scholar]
- Kittiwaytang, K.; Chanloha, P.; Aswakul, C. CTM-Based Reinforcement Learning Strategy for Optimal Heterogeneous Wireless Network Selection. In Proceedings of the 2010 Second International Conference on Computational Intelligence, Modelling and Simulation (CIMSiM), Tuban, Indonesia, 28–30 September 2010; pp. 73–78. [Google Scholar]
- Demestichas, P.; Georgakopoulos, A.; Karvounas, D.; Tsagkaris, K.; Stavroulaki, V.; Lu, J.; Xiong, C.; Yao, J. 5G on the horizon: Key challenges for the radio-access network. IEEE Veh. Technol. Mag.
**2013**, 8, 47–53. [Google Scholar] [CrossRef] - Chen, Z.; Wang, L. Green Base Station Solutions and Technology. Zte Commun.
**2011**, 9, 58–61. [Google Scholar] - Oh, E.; Krishnamachari, B. Energy savings through dynamic base station switching in cellular wireless access networks. In Proceedings of the 2010 IEEE Global Telecommunications Conference (GLOBECOM 2010), Miami, FL, USA, 6–10 December 2010; pp. 1–5. [Google Scholar]
- Hao, J.Y.; Huang, D.P.; Cai, Y.; Leung, H.F. The dynamics of reinforcement social learning in networked cooperative multiagent systems. Eng. Appl. Artif. Intell.
**2017**, 58, 111–122. [Google Scholar] [CrossRef] - Sachs, J.; Prytz, M.; Gebert, J. Multi-access management in heterogeneous networks. Wirel. Pers. Commun.
**2009**, 48, 7–32. [Google Scholar] [CrossRef] - Brockwell, P.J.; Davis, R.A. Introduction to Time Series and Forecasting; Springer: Berlin, Germany, 2016. [Google Scholar]
- Kianercy, A.; Galstyan, A. Dynamics of Boltzmann q learning in two-player two-action games. Phys. Rev. E
**2012**, 85, 041145. [Google Scholar] [CrossRef] [PubMed] - Zhu, K.; Niyato, D.; Wang, P. Network Selection in Heterogeneous Wireless Networks: Evolution with Incomplete Information. In Proceedings of the 2010 IEEE Wireless Communications and NETWORKING Conference (WCNC), Sydney, Australia, 18–21 April 2010; pp. 1–6. [Google Scholar]
- Mcgarry, M.P.; Maier, M.; Reisslein, M. Ethernet pons: A survey of dynamic bandwidth allocation (dba) algorithms. IEEE Commun. Mag.
**2004**, 42, 8–15. [Google Scholar] [CrossRef] - Malialis, K.; Devlin, S.; Kudenko, D. Resource abstraction for reinforcement learning in multiagent congestion problems. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, Singapore, 9–13 May 2016; pp. 503–511. [Google Scholar]
- Li, J.Y.; Qiu, M.K.; Ming, Z.; Quan, G.; Qin, X.; Gu, Z.H. Online optimization for scheduling preemptable tasks on IaaS cloud systems. Int. J. Comput. Sci. Mob. Comput.
**2013**, 2, 666–677. [Google Scholar] [CrossRef] - Hao, J.Y.; Sun, J.; Chen, G.Y.; Wang, Z.; Yu, C.; Ming, Z. Efficient and Robust Emergence of Norms through Heuristic Collective Learning. ACM Trans. Auton. Adapt. Syst.
**2017**, 12, 1–20. [Google Scholar] [CrossRef]

**Figure 2.**Normalized traffic profile during one week [25].

Method | Description (Window Size $\mathit{x}\le \mathit{p}$) |
---|---|

Weighted Average | $predload={\sum}_{i=0}^{x}{w}_{i}loa{d}_{i}$ ${\sum}_{i=0}^{x}{w}_{i}=1$ |

Geometric Average | $predload=\sqrt[x+1]{{\prod}_{i=0}^{x}loa{d}_{i}}$ |

Linear Regression | $predload=\widehat{a}t+\widehat{b}$ ($\widehat{a},\widehat{b}$ can be obtained by using least square method) |

Exponential Smoothing | $predloa{d}_{t+1}=\alpha loa{d}_{t}$ $+(1-\alpha )predloa{d}_{t}$ $={\sum}_{i=0}^{x}\alpha {(1-\alpha )}^{i}loa{d}_{t-i}$ |

Access Tech | Network Rep | Base Station | Maximum Bandwidth | User Demand |
---|---|---|---|---|

WLAN | Wi-Fi | $B{S}_{0}$ | 25 Mbps | voice traffic: 32 kbps |

WMAN | WiMAX | $B{S}_{1}$ | 50 Mbps | data traffic: 64 |

OFDMA Cellular Network | 4G | $B{S}_{2}$ | 5 Mbps | kbps ∼ 128 kbps |

Algorithm | ALA | RATSA | QLA |
---|---|---|---|

common information required | before selection: BS candidates; bandwidth demand. after selection: perceived bandwidth w from selected BS. | ||

different information required | 1. previous provided bandw-idth of selected BS. 2. histroical load on selected BS. | 1. future provided bandwidth of each BS. 2. number of users on each BS. 3. number of past consecutive migrations on selected BS. | – |

base stations to be communicated | selected BS | all BS candidates | selected BS |

influencing parameter | – | switching threshold $\eta $ | – |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, X.; Cao, R.; Hao, J. An Adaptive Learning Based Network Selection Approach for 5G Dynamic Environments. *Entropy* **2018**, *20*, 236.
https://doi.org/10.3390/e20040236

**AMA Style**

Li X, Cao R, Hao J. An Adaptive Learning Based Network Selection Approach for 5G Dynamic Environments. *Entropy*. 2018; 20(4):236.
https://doi.org/10.3390/e20040236

**Chicago/Turabian Style**

Li, Xiaohong, Ru Cao, and Jianye Hao. 2018. "An Adaptive Learning Based Network Selection Approach for 5G Dynamic Environments" *Entropy* 20, no. 4: 236.
https://doi.org/10.3390/e20040236