You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

Published: 12 August 2022

A Hybrid Route Selection Scheme for 5G Network Scenarios: An Experimental Approach

,
,
and
1
Department of Computing and Information Systems, School of Engineering and Technology, Sunway University, Petaling Jaya 47500, Malaysia
2
Lee Kong Chian Faculty of Engineering & Science, Universiti Tunku Abdul Rahman (UTAR), Kajang 43200, Malaysia
3
National Advanced IPv6 Centre, Universiti Sains Malaysia (USM), Penang 11800, Malaysia
*
Authors to whom correspondence should be addressed.
This article belongs to the Special Issue Advances in 5G Wireless Communication Networks: The Path to 6G

Abstract

With the significant rise in demand for network utilization, such as data transmission and device-to-device (D2D) communication, fifth-generation (5G) networks have been proposed to fill the demand. Deploying 5G enhances the utilization of network channels and allows users to exploit licensed channels in the absence of primary users (PUs). In this paper, a hybrid route selection mechanism is proposed, and it allows the central controller (CC) to evaluate the route map proactively in a centralized manner for source nodes. In contrast, source nodes are enabled to make their own decisions reactively and select a route in a distributed manner. D2D communication is preferred, which helps networks to offload traffic from the control plane to the data plane. In addition to the theoretical analysis, a real testbed was set up for the proof of concept; it was composed of eleven nodes with independent processing units. Experiment results showed improvements in traffic offloading, higher utilization of network channels, and a lower interference level between primary and secondary users. Packet delivery ratio and end-to-end delay were affected due to a higher number of intermediate nodes and the dynamicity of PU activities.

1. Introduction

Fifth-generation (5G) has been envisaged as the next-generation cellular network for deploying, supporting, and scaling new technologies, including augmented reality, driver-less vehicles, Internet of Things, smart cities, virtual reality, and 3D video streaming services. Nevertheless, three main characteristics of the next-generation network scenario have posed main challenges to the realization of 5G. Firstly, dynamicity of channel availability, in which the operating channels, particularly the licensed channels, can be randomly occupied by licensed users (or primary users, PUs); consequently, unlicensed users (or secondary users, SUs) must search for and use the licensed channels in an opportunistic manner [,,,,]. Secondly, heterogeneity, in which the network consists of different types of network cells (e.g., macrocells and small cells) and different types of nodes (e.g., using different operating channels and transmission power levels); consequently, nodes must adapt to a diversified operating environment [,,,,]. Thirdly, ultra-densification, in which there are a large number of base stations (BSs), particularly small cells, and nodes in an area; consequently, nodes must search within nodes and BSs to find a route with the least traffic intensity in order to maximize traffic offloading [,,,].
The fifth generation (5G) is a multi-tier cellular network, as shown in Figure 1. There are different types of network cells, such as macrocells and femtocells. Macrocells have the broadest coverage, followed by femtocells (e.g., 30 m radius). In Figure 1, the BSs of femtocells f c 1 , f c 2 , , f c 8 are scattered within the transmission range of a macrocell base station (MC BS). The BSs of the macrocell and femtocells can communicate with each other either directly or via multiple hops. Small cells deployed outside the transmission range of macrocells can help to improve network coverage. Due to the different characteristics of the network cells, the network can be segregated into two main layers, as shown in Figure 2. The macrocell layer uses lower frequency bands with a higher transmission power to provide a longer transmission range; however, the channel capacity is lower at lower frequency bands. The small-cell layer (i.e., the femtocell layer) uses higher frequency bands (e.g., millimeter wave (mm-wave) or above 30 GHz []) with a lower transmission power to provide a higher channel capacity; however, the transmission range is shorter at higher frequency bands. In terms of the dynamicity of channel availability, the macrocell layer may use channels with a higher number of white spaces, given a limited number of available licensed channels. In terms of heterogeneity and ultra-densification, these characteristics are more prevalent in small cells as a large number of small cells must be deployed to provide connections to heterogeneous nodes (or user equipment). Some of the main characteristics of 5G architecture are control plane and data plane separation as shown in Figure 2. The control plane, which includes the macrocell layer, contains a central controller (CC) that (a) collects network-wide information (e.g., network topology comprised of nodes, links, and channel availability) from all nodes in the network and maintains the information (e.g., in a routing table); (b) performs global-level tasks (e.g., determines routes); and c) disseminates information (e.g., prioritizes routes) to nodes in the network (i.e., source node f c s ). The data plane, which includes the small-cell layer, performs local-level tasks (e.g., selects a route based on a set of available and prioritized routes, and the availability of device-to-device (D2D) communication) [,].
Figure 1. A multi-tiered 5G network. The source node f c s establishes a route to the destination node f c d . The optional routes are also shown.
Figure 2. Control and data plane separation in a multi-tiered 5G network. An equivalent network is shown in Figure 1.
There are two main features of 5G. Firstly, dynamic channel access, whereby a SU node senses underutilized channels in licensed channels (or white spaces) owned by PUs, and subsequently accesses the white spaces in an opportunistic manner without causing unacceptable interference to PU activities in order to improve the overall spectrum utilization. Secondly, D2D communication, whereby a node can communicate directly with its neighboring nodes without passing through a BS, which helps to (a) offload traffic from highly-utilized BSs, particularly the MC BSs, to improve load balancing and reduce traffic congestion at BSs, and (b) extend coverage.
This paper proposes a cognition-inspired hybrid route selection scheme called CenTri for 5G network scenarios to embrace these two main features. CenTri consists of centralized and distributed route selection mechanisms and establishes multi-hop routes across the macrocell and small-cell layers. The centralized route selection mechanism by the CC adopts dynamic channel access to address the dynamicity of channel availability. In contrast, the distributed route selection mechanism adopts D2D to address heterogeneity and ultra-densification. Cognition enables a decision-maker (or an agent) to observe the respective operating environment and learn about the right decisions on the route selection in the operating environment at any time instant. In CenTri, cognition is embedded in the source node, which is the final ‘decision maker’ to learn about and select the best possible route with a high amount of white spaces to facilitate the traffic offload from the macrocell through traffic distribution to the femtocell layer. The CC sends a list of routes, which are ranked (or prioritized) based on the route length (or the number of hops), to the source node. The source node reranks the routes while considering the traffic offload, using the femtocell layer and the congestion level of the MC BS.

1.1. Why Is a Hybrid Route Selection Scheme Crucial to 5G Networks?

While separate centralized mechanisms [,,] and distributed mechanisms [,,] for route selection are shown to improve network performance, the benefits of combining both mechanisms are yet to be discovered in the context of 5G, and so a hybrid route selection scheme is the focus of this paper. Intuitively, a hybrid route selection scheme can address the intrinsic limitations of each mechanism. In the centralized mechanism, the computational complexity and routing overhead increase with the number of nodes in the network (or node density) as each source node in the data plane must receive information from the CC and then discover and maintain routes (e.g., update the routing table) [,]. In the distributed mechanism, the routing overhead increases as each node in the data plane must report the information availability (i.e., the available channels and the bottleneck link rate) with neighboring source nodes [,,]. The issues observed in both centralized and distributed mechanisms intensify with the increased dynamicity of channel availability, network heterogeneity, and ultra-densification. Our proposed hybrid mechanism uses both centralized and distributed mechanisms. The distributed mechanism performs the traditional centralized tasks, particularly those using highly dynamic information in a distributed manner to minimize high computational complexities at the CC. Naturally, the routing overhead reduces because BSs and nodes do not send dynamic information (e.g., the traffic level) to the CC. The centralized mechanism performs the traditional distributed tasks, particularly those requiring frequent information exchange among BSs and nodes in a centralized manner to minimize high routing overheads at BSs and nodes. Naturally, the routing overhead reduces because BSs and nodes do not exchange dynamic information among themselves.
CenTri consists of centralized and distributed mechanisms for route selection. The centralized mechanism is embedded in a CC in the control plane, gathers and maintains network-wide and less dynamic information (i.e., available routes), and selects routes across different layers (i.e., the macrocell and small-cell layers). Subsequently, the source node makes routing decisions based on the list of routes provided by the CC. The routing decisions prioritize routes with fewer PUs in the data plane and D2D communication. The purpose is to minimize SU interference to PUs at the global level. In addition, being more stable, the routes can be established in a proactive manner to serve as backbone routes for different source–destination pairs, contributing to a reduced routing overhead required in the route discovery and maintenance. The distributed mechanism, which is embedded in each BS and node in the data plane, gathers and maintains neighborhood (or local) and more dynamic information (i.e., available routes and traffic levels of nodes in range), and selects intra- or inter-cell routes in the small-cell layer. The purpose is to maximize traffic offload from BSs, particularly the MC BSs, at the local level. Being more dynamic, the routes can be established via D2D among BSs and nodes in the small-cell layer in a reactive (or on-demand) manner to offload traffic from the macrocell layer.

1.2. Why Is Cognition Crucial to the Hybrid Route Selection Scheme in 5G Networks?

Due to the need for interaction with the operating environment, the majority of testbeds using USRP have adopted Q-learning to adapt to the environment. Q-learning is the preferred approach because it does not need datasets required in supervised machine learning approaches and does not explore the underlying pattern or relevancy. A clear justification is needed (e.g., utilized or unutilized), which is why an unsupervised ML is not preferred [,,]. For more details, refer to Section 3.1.
Due to the dynamicity of the operating environment, it is necessary to practice constant learning to achieve optimal network performance as time passes. As different types of network cells have different characteristics, the dynamicity level varies across the macrocell and small-cell layers. For example, the dynamicity of channel availability is lower in the macrocell layer as channels with a higher number of white spaces have been assigned to the layer. Therefore, a single set of rules or policies would less likely be optimal when applied across the entire network. In CenTri, BSs and nodes embrace a popular cognition approach called reinforcement learning (RL).

1.3. CenTri as a Hybrid Route Selection Scheme: An Overview

In CenTri, the centralized mechanism enables a CC to establish backbone routes that minimize SU interference to PUs based on network-wide information (i.e., channel availability of routes). The backbone routes can be used by different source–destination pairs. Nevertheless, the centralized mechanism has two shortcomings: (a) the CC is not sensitive to the dynamicity of the local environment (i.e., the traffic level of each source node towards its destination node) and (b) the routes predetermined and disseminated to source nodes by the CC may suggest the best route being the shortest one, which goes through the backbone route, causing an increased traffic load at the MC BS. The distributed mechanism enables nodes of small cells to revise routing decisions received from the CC to maximize traffic offload from MC BSs based on local information (i.e., available routes and traffic levels of D2D routes).
In the small-cell layer, a source node selects a route based on four priority levels, which helps to offload traffic from the macrocell layer: the first (or highest) priority is communication via the femtocell layer (through a D2D route), the second priority is communication via the route with minimum interference from PUs, the third priority is communication via the route with the minimum number of hops, and the fourth (or last) priority is communication via the backbone route. A single multi-hop route can consist of links belonging to different layers. For example, the communication through the backbone route connects a source node of the femtocell layer to a BS of the macrocell layer. It then transmits the data to the destination node in the femtocell layer. In Figure 1, the f c s f c 2 f c 5 f c d route is preferred compared to f c s m c f c d used in the traditional network.
RL is embedded in the source nodes of the femtocell layer so that the right decisions can be made on the route selection at the local level to reduce the global workload of the CC.

1.4. Reinforcement Learning: An Overview

Q-learning, which is a popular RL approach, enables an agent (or decision maker) to gain knowledge independently in order to take the right action at the right time in its operating environment for individual performance enhancement. At any time instant t, an agent i observes its operating environment in the form of state s t i , and then selects and takes action a t i in the operating environment. At the next time instant t + 1 , the agent i receives a reward r t + 1 i ( s t + 1 i ) and the state changes to s t + 1 i s t i . The Q-value Q t i ( s t i , a t i ) , which represents the long-term reward of a state–action pair ( s t i , a t i ) of an agent i, is updated using the Q-function, as follows:
Q t + 1 i ( s t i , a t i ) = Q t i ( s t i , a t i ) + α [ r t + 1 i ( s t + 1 i ) + γ ( Q t + 1 i ( s t i , a t i ) Q t i ( s t i , a t i ) ) ]
where 0 α 1 represents the learning rate and 0 γ 1 represents the discount factor, which is the next state–action pair emphasizing the future reward. The discount reward always has a lesser weight than the immediate reward.

1.5. Contributions

Common routing mechanisms, such as route requests and route replies, have been well investigated in the literature [,,,]. This paper focuses on route selection and makes three contributions. Firstly, a hybrid route selection scheme called CenTri is proposed to select routes that minimize SU interference to PUs in available licensed channels in 5G networks characterized by the dynamicity of channel availability and ultra-densification. The purposes are to: a) improve load balancing through traffic offload from the macrocell layer to the small-cell layer; and b) the overall spectrum utilization. Secondly, RL models and algorithms are proposed for CenTri. Thirdly, the issues associated with implementing CenTri on a real-world platform consisting of heterogeneous nodes embedded with universal software radio peripheral (USRP) units are presented.

1.6. Organization of This Paper

The rest of this paper is organized as follows. Section 2 presents the related work. Section 3 presents the system model. Section 4 presents the CenTri RL model and algorithm. Section 5 presents the results and discussion.

3. System Model

There is a set of channels C = { c 1 , c 2 , , c | C | } , each c i is occupied by PU p i P = { p 1 , p 2 , , p | P | } . A multi-tier 5G network shown in Figure 1 and Figure 2 is considered. The MC BSs M C = { m c 1 , m c 2 , , m c | N | } and femtocell nodes and BSs, F C = { f c 1 , f c 2 , , f c | N | } can communicate among themselves directly. For instance, a femtocell BS f c 1 can communicate with a MC BS m c 1 directly via f c 1 m c 1 . A BS can establish a multi-hop route k n K = { k 1 , k 2 , , k | K | } across different layers. For instance, a femtocell source node f c s establishes a multi-hop route f c s m c 1 f c d to MC BS m c 1 and then to the femtocell destination node f c d . The traffic intensity Ψ t n k n Ψ = { Ψ t 1 k 1 , Ψ t 2 k 2 , , Ψ t | N | k | K | } is one of the parameters that source nodes consider before selecting a route. The C C and the M C BSs are located in the control plane C c c , while the F C BSs are located in the data plane D f c .
A source node selects a route based on the PU activities, which can either be ON (busy) or OFF (idle). The ON/OFF duration of PUs in channel c C C and link L = { l 1 , l 2 , , l | n | } follows the Poisson model, which is exponentially distributed with rates λ O N c C , P U for active time and λ O F F c C , P U for idle time of the PU activities, respectively. The channel availability Φ t , O F F c C , l n determines the average availability of the link in the time instant t as follows []:
Φ t , O F F c C , l n = λ O N c C , P U λ O N c C , P U + λ O F F c C , P U + λ O F F c C , P U λ O N c C , P U + λ O F F c C , P U × e ( λ O N c C , P U + λ O F F c C , P U ) t
The PU and SU avoid any possible collision, but if that happens, it has to be less than the IEEE requirement []. In this setup, the appearance of PUs follows the Poisson model that creates a random pattern for ON and OFF. The SU node estimates the average channel available time Φ t , O F F c C , l n of channel c C C on a link of a route k n K . This ON–OFF time assignment is exponentially distributed and shows the duration of the PU transitions (the traffic in each channel). The duration of this random appearance follows the ON–OFF time period shown in Table 2. For example, the PU ON time period can be 50 s and the PU OFF time period can be from 50 to 250 s. This allows PUs to utilize their licensed channels whenever they want during their OFF time. Meanwhile, SUs utilize white spaces opportunistically.
The time horizon in a channel for SU is segregated into the sense–transmit time window []. The sensing time is the duration of the channel sensing and processing time. The processing time is the duration that the USRP/GNU radio takes for hardware and software initiation, such as packet encoding and decoding, digital conversion, and transmission or reception. The transmit time is the duration that a SU node takes to send or receive a data packet.

3.1. Reinforcement with Static Learning

In traditional RL (refer to Equation (3)), the learning mechanism has a constant rate that can be determined based on the importance of the current or discount values. The SU receives its reward based on the channel availability (or white spaces, or the idle status of PU). Specifically, the learning rate can be higher (lower) when the PU activity level is lower (higher), as represented by the equation below []:
Q t + 1 i ( s t i , a t i ) = [ ( 1 α ) × Q t i ( s t i , a t i ) + α × min ( r t + 1 i ( s t + 1 i ) ) ]
where r t + 1 i ( s t + 1 i ) is the traffic intensity, which refers to the channel available time of a route k n K . The traditional RL approach can be ideal for a less dynamic network with low PU activity levels so that no adjustment is required for learning.

3.2. Enhanced Reinforcement with Dynamic Learning

For a more dynamic network with frequent changes in the PU appearance in channels, an enhanced RL approach is required to adapt its learning to the dynamicity of the channel. For instance, the dynamicity of M C is less than that of the F C . Therefore, from MC to FC, the dynamicity increases, and the learning rate α is decreased. Equation (3) is enhanced as follows []:
Q t + 1 i ( s t i , a t i ) = ( 1 α ( s t i , a t i ) ) × Q t i ( s t i , a t i ) + α × min ( r t + 1 i ( s t + 1 i ) ) ]
As the bottleneck link in a route has the least channel capacity, it helps in determining the priority of the D2D route over other routers in the network. A link with a lower channel capacity has a lower priority compared to other routes. The dynamic learning rate is α min α ( s t i , a t i ) α max , and it is determined by the availability of channels (or white spaces) as follows:
α t i ( s t i , a t i ) = Φ t , O F F c n , l i n , j n
For α t i ( s t i , a t i ) , a higher value shows a higher dependency of the Q-value on the current knowledge and a lower value shows a higher dependency of the Q-value on previous knowledge. The dynamic learning of α t i ( s t i , a t i ) can be varied based on the immediate state–action pair and previously learned rewards. In this study, the agent is myopic and relies on the immediate state–action pair with some consideration of the previously learned Q-value (rather than the next state–action pair (or discounted reward) due to the random appearance of PUs; hence, the discount value is γ = 0 . Due to the need for interaction with the operating environment, the majority of the testbeds using USRP [,,] have adopted Q-learning to adapt to the environment. Q-learning is the preferred approach because: (a) it does not need datasets required in supervised machine learning approaches [,,] and (b) it provides definite outcomes, particularly whether a channel is utilized or unutilized, which is preferred compared to unsupervised machine learning. The non-learning approach, called non-RL, selects routes based on their priority (e.g., based on the number of hops of the routes specifically routes k 2 > k 3 > k 4 ). Hence, the proposed dynamic Q-learning approach is compared with the traditional Q-learning approach and the non-learning approach.
In the following sections, the learning model and algorithm of CenTri are presented.

4. CenTri: Reinforcement Learning Model and Algorithm

In CenTri, the BSs of both macrocell and small-cell layers collaborate to perform the route selection. A proactive link selection mechanism is deployed in the control plane and a reactive route selection mechanism is deployed in the data plane.
Figure 3 presents the route selection mechanism of the RL model, which serves as the decision-making engine. The CC in the control plane selects routes based on factors with less dynamicity (i.e., a lower number of intermediate nodes and delay); while the nodes in the data plane (i.e., the source node f c s ) select routes based on the priority levels of D2D communication (or with less PU interference). The amount of the delay is higher in D2D routes, so the distributed RL model embedded in the BSs and nodes of small cells aims to improve the routing decision made by the CC and select routes with lower PU activity levels and the number of intermediate nodes.
Figure 3. Route selection decision made by the source node based on RL.
The CC establishes backbone routes in order to provide an always available route for small-cell nodes. The backbone routes are shared among BSs in the data plane and, subsequently, the BSs share the backbone routes with source nodes. The distributed RL model determines the traffic intensity of available routes and establishes routes with the least traffic congestion via traffic offloading. BSs in small cells are part of the backbone routes from the macrocell, and so they inform the CC about their usage of white spaces. Routes are assumed to be disjointed in this paper for simplicity. The rest of this section presents the RL models and algorithms for CenTri.

4.1. Reinforcement Learning Models

This section presents the centralized proactive route selection mechanism with the distributed reactive RL model. While the CC establishes more stable routes and backbones, the distributed RL model learns to offload traffic from MC BSs in a collaborative manner.

4.1.1. Centralized Route Selection

In the centralized mechanism, a proactive route selection mechanism is deployed in the CC to establish backbone routes among MC BS and femtocell nodes f c . Backbone routes, which do not have PU activities, help to maximize the packet delivery ratio in networks. The CC evaluates the available routes based on network-wide information gathered from the BSs and femtocell nodes f c . The routes are prioritized as k 1 = f c s m c f c d , k 2 = f c s f c 3 f c 6 f c d , k 3 = f c s f c 1 f c 4 f c 7 f c d , and k 4 = f c s f c 2 f c 5 f c 8 f c d . The evaluated routes from the CC are prioritized based on the number of hops without taking PU activities into consideration. The CC tends to give priority to the shortest route (i.e., the backbone route), which goes through MC BS. The routes are given to source nodes proactively to save processing times incurred in determining routes. Figure 3 shows that the proactive routes are provided by the CC to distributed source nodes in the data plane.
The source node re-prioritizes (or reranks) the given routes based on D2D communication with the minimum number of intermediate nodes (or the number of hops) and traffic intensity. Hence, the source node gives the lowest priority to the backbone route since it does not use D2D communication; this helps in distributing traffic from the MC BS. The presence of PUs has a direct impact on the throughput and delays the performance of packet transmission. Routes with lesser channel switches have lower signaling overheads, leading to higher stability and bandwidth availability [].

4.1.2. Distributed Reinforcement Learning Model

In the data plane, the distributed RL model is embedded in all small-cell BSs and their corresponding nodes in the network. Hence, a source node in the network can select a route towards the destination in a reactive manner. Route selection in the data plane of the small cells follows the priority levels (refer to Section 1.3). The selected route has lower intermediate nodes and traffic intensity for maximizing traffic offload and achieving a higher throughput through increasing packet delivery rate. The channel capacity of a link in a route is determined by Φ t n c n , k n from Equation (2) and it shows the utilization of links l n , including PU activities, in a D2D route k n . Assume that a packet p t ς i P T with size ς i traverses along a link l n with channel C n at time instant t n , the utilization U t n l n k n of links l n of route k n is defined as follows []:
U t n c n l n k n = Σ p t ς i c n l n B W c n C
where B W c n is the available bandwidth of the channel c n for the link l n in the femtocell layer. It is noteworthy that all D2D routes in this study have the same bandwidth but different frequencies, which means the link between two nodes uses different channels. The channel utilization of a route is defined as follows:
τ t n c n k n = Σ ( U t n c n k n )
where Σ ( U t n c n k n ) is the sum of the channel utilization by PUs in the links l n of route k n with channels c n at time instant t n . The channel utilization of a route includes the PU activities, and it is important for source nodes in the network to be aware of the traffic intensity of each route to reduce the possibility of interference from SUs to PUs.
Traffic intensity is particularly important when multiple routes from a source node have the same number of hops (or intermediate nodes). In the D2D type of communication, including node-to-node, node-to-BS, and BS-to-BS communications, the traffic intensity of a route can be calculated as long as the backbone routes are not utilized. For instance, in Figure 2, the source node f c s and the destination node f c d are three hops away in route k 2 ( f c s f c 3 f c 6 f c d ). However, for routes k 3 ( f c s f c 1 f c 4 f c 7 f c d ) and k 4 ( f c s f c 2 f c 5 f c 8 f c d ), both routes have the same number of intermediate nodes and links (i.e., four hops). Hence, traffic intensity is used to select a route with a lower congestion level. In this example, PUs appear in route k 3 , then route k 4 is used. In such a case, route k 4 continues to be used until communication ends or interference occurs. However, the presence of PUs in a route (i.e., k 3 ) makes the route unavailable and recorded as occupied. For a source node that is required to send a data stream to a destination node, it has the initial roadmap comprised of available routes toward the destination node. These routes include nodes and BSs from both macrocell and femtocell layers. Based on the communication priority levels, the source node rearranges the given list by the CC and prefers routes using the femtocell layer. The higher the number of intermediate nodes, the greater the number of links in the route and the more possibility of PUs appearing in its channels. Hence, the route with a lower number of intermediate nodes tends to be selected. A route with a lower PU activity level in the channels of a route has a higher availability, and so the priority level of the route increases, which makes the route more likely to be selected by the source node with an increased learning rate α ( s t k n , a t k n ) . The given routes with a different number of intermediate nodes, links, and channels experience different PU activity levels. The amount of time that a PU appears in a channel determines the channel capacity at the bottleneck link of a route as follows:
Φ t n , β , O F F c n , k n = max k n K min l n L Φ t n , O F F c n , l n , k n
where β signifies the bottleneck link of a route. The traffic intensity Ψ t n k n of a route k n in the femtocell layer at time instant t n is the cumulative of the bottleneck channel capacity in the absence of PUs, as follows:
Ψ t n k n Σ ( Φ t n , β , O F F c n , k n )
The traffic intensity of a route is proportional to its channel capacity. A greater channel capacity in a route has lesser PU interference and vice-versa. The source node learns about the dynamic changes of traffic intensity and adapts the learning rate accordingly. For instance, the greater traffic intensity in route k 1 at the time instant t 1 causes the source node to reduce the learning rate α ( s t 1 k 1 , a t 1 k 1 ) 0 closer to zero, which makes the Q-function more dependent on the previous Q-value. The lower traffic intensity causes the source node to increase the learning rate α ( s t 1 k 1 , a t 1 k 1 ) 1 closer to one, which makes the Q-function more dependent on the current Q-value. The source node makes a route selection decision by choosing a route with the maximum Q-value, which has the minimum PU activities as follows:
a t k n = max k n A Q t n k n s t n k n , k n f c
The dynamic learning rate α min α ( s t n k n , a t n k n ) α max is as follows:
α ( s t n k n , a t n k n ) Ψ t n k n
A source node is equipped with the Q-routing model to rank the D2D routes. The best route toward the destination node is based on the number of intermediate hops and traffic intensity of available routes. Table 1 shows the RL model of the reactive mechanism embedded in BSs and nodes. In this model, state s t k n represents the given routes k n K from the CC towards the destination node f c d , in which both the source and destination nodes are in the data plane of the femtocell. Action a t k n represents the selection of an available route with the highest priority. If one of the selected routes is blocked by PUs, then the next highest priority route is selected by the source node. Reward r t k n , t + 1 represents the cost reflecting the traffic intensity of a route from the source node to the destination node.
Table 1. The RL model for the route selection embedded in the BS of the small cell and its corresponding nodes in the data plane.
Based on Table 1, three criteria are checked between source and destination nodes prior to communication in the data plane: (a) the type of communication (i.e., D2D); (b) the number of intermediate nodes; and (c) traffic intensity. The first criterion uses the available white spaces in channels and intends to offload traffic from MC BS. The second criterion helps to make route decisions more efficient by looking at routes with a lesser number of intermediate nodes, which helps to reduce the possibility of the PU appearance, leading to a higher successful transmission rate. The third criterion selects the route with fewer PU activities when two routes are identical in terms of the number of hops and channels. The route with fewer PU activities has a lower traffic intensity and a higher Q-value, so it is preferred.
In Figure 4, an example of the random appearance of PUs in routes is illustrated for three transmission cycles, namely A, B, C. The PUs can occupy a channel of routes k 2 , k 3 , and/ or k 4 . The presence of PUs in one of the link channels of a D2D route can cause the entire route to be blocked, and the source node must select another route. In this figure, at time t, the source node selects a route to the destination node. Since routes k 3 and k 4 are occupied by PUs, route k 2 is selected. At time t 1 , both routes k 2 and k 4 are occupied; therefore, route k 3 is selected. At time t 2 , all D2D routes k 2 , k 3 , and k 4 are occupied, so the source node uses the backbone route through MC BS. At time t 3 , route k 3 is occupied by PUs, but both routes k 2 and k 4 are available to the source node. The source node selects route k 2 as it has a higher priority due to a smaller number of intermediate hops and nodes compared to route k 4 . At time t 4 , route k 2 is occupied by PUs, and both routes k 3 and k 4 are available to the source node. In this case, the source node selects route k 3 as it has a higher priority over route k 4 . During the communication between the source node and the destination node through D2D routes, the source node learns about the appearance of PUs and the availability of D2D routes. This makes route selection more accurate as time goes by. For instance, both D2D routes k 3 and k 4 have the same number of hops, but the source node has learned that route k 3 has a higher channel capacity, and so it has a higher Q-value compared to route k 4 . Therefore, route k 3 is selected.
Figure 4. Random appearance of PU activities in D2D routes in three transmission cycles (AC).

4.2. Reinforcement Learning Algorithm

In this section, the RL algorithm for the distributed mechanism is presented. All the nodes and BSs of small cells in the data plane are equipped with the RL algorithm and receive a route map from the CC proactively. This enables them to select the best route proactively based on priority levels (see Section 4.1.2), which helps to offload traffic from MC BS. In the data plane of the small cell, a source node transmits a data stream to a destination node in a selected route out of the given routes by the CC. During the transmission, if the route is interrupted by PUs, a second prioritized route is selected and the previous route is identified as a route with a high traffic intensity level. By continuing this process, a table is updated with route scores based on channel capacity at the end of each transmission cycle, which gives a clearer pattern of the random appearance of PUs. An experimental setup and configuration are shown in Figure 5. In this platform, since the network layer is the focus of this study, the physical distance between nodes, as well as phenomena, such as shadowing and fading, are not the concerns in this work.
Figure 5. Experimental setup for the hybrid route selection with three D2D routes and a backbone route via MC BS.
Algorithm 1 provides a general route selection scheme in which the CC sends the initial prioritized routes through the MC BS to the source node in the data plane proactively. As for CC, it is assumed that routes are readily available and prioritized based on the number of hops.
Algorithm 1 General description of the route selection by the source node.
1:
procedurehybrid route selection
2:
     CC sends initial prioritized routes to MC BS proactively
3:
     MC BS sends prioritized routes K, where k 1 has the highest and k k has the lowest priority, to a source node f c s
4:
    for ( k D 2 D K ) do
5:
         Select route k 1 if it has the least hops and traffic intensity Ψ at time instant t when PU is OFF
6:
        if all k D 2 D have ON PUs then
7:
            Select the backbone route k b b
8:
        end if
9:
    end for
10:
end procedure
Algorithm 2 shows the distributed route selection mechanism for traffic offloading. Based on the flowchart shown in Figure 3, the source node receives a route map (i.e., a list of routes created proactively), which is prioritized based on the number of hops, from the CC. The source node rearranges the list and gives priority to D2D routes in order to offload traffic from MC BS. The Q-value is dependent on traffic intensity, which is based on the channel capacity of the links of a route and the learning rate α t i ( s t i , a t i ) that changes dynamically based on the PU activity level. Therefore, routes with lesser PU activities tend to be selected compared to those with higher PU activities.
Algorithm 2 RL mechanism at distributed nodes and BSs.
1:
procedure D2D Route selection
2:
     /* Source node selects a route to offload traffic from MC BS */
3:
     M C B S receives a route map (i.e., { k b b , k 2 , k 3 , k 4 } from the CC and sends it to f c s
4:
     /* Stage 1 */
5:
    for time t 1 ; f c s reprioritize D2D routes in the route map do
6:
         k = { k 2 > k 3 > k 4 > k b b }
7:
        if  k 2 or k 3 or k 4 is not available then
8:
            f c s use backbone route k b b
9:
        end if
10:
    end for
11:
     /* Stage 2 */
12:
    for time t n + 1 , n | N | ; f c s reprioritize D2D routes based on traffic intensity Ψ t n k n and Equation (4) do
13:
        Estimate Φ t n , O F F c n , l n , k n using Equations (2) and (8)
14:
        if  Ψ t n k n < Φ t n , β , O F F c n , k n /* traffic intensity is not updated */ then
15:
           update Ψ t n k n Σ ( Φ t n , β , O F F c n , k n )
16:
        end if
17:
        Calculate dynamic learning rate α ( s t n k n , a t n k n ) using Equation (10)
18:
        Update the Q-value using Equation (4)
19:
        for  f c s , do select a D2D route with the maximum Q-value from routes { k 2 > k 3 > k 4 }
20:
        end for
21:
    end for
22:
end procedure

4.2.1. Implementation Requirements and Parameters

The implementation has eleven USRP/GNU radio units as nodes and BSs. Each of the ten USRP/GNU radio units is connected with a Raspberry Pi3 B+ unit equipped with 30 GB of external memory for storing and running algorithms. The USRP unit, specifically USRP N200, is equipped with the VERT900 antenna, and the GNU radio runs an open-source software-defined radio (SDR). A personal computer, which is equipped with the core i7 processor and 16 GB RAM, serves as MC BS. The D2D nodes have closer proximity among themselves compared to MC BS, so the transmission power is 10 dBm (10 mW) among themselves and 20 dBm (100 mW) with the MC BS.
Table 2 presents the parameters. In this platform, the user datagram protocol (UDP) is the preferred transport layer protocol for multimedia applications because it is connectionless and it does not perform retransmission during packet loss, which reduces delay at the expense of the acceptable packet loss. Figure 5 shows the platform with USRPs equipped with RP3. Nodes are located in the MC BS proximity and receive route information proactively from the CC via MC BS.
Table 2. Experimental setup parameters.

4.2.2. Assumptions

The platform performs multi-hop communication from the source node to the destination node. There are a few assumptions in this setup as follows:
  • The delay incurred in multi-hop communication is not considered in order to focus on routes with less traffic (i.e., with low PU activities).
  • The backbone and D2D routes are readily available, and the source node re-prioritizes them.
  • A D2D route is up to three hops, and the source and destination nodes do not have direct communication.

4.2.3. Appearance of PUs on Channels

Three PUs reappear in the operating channels of the D2D communication randomly. The backbone route k 1 , which serves as a backup, is free from PU activities. When the channel of a route has PU activities, the route breaks and the source node must select another available route following the priority mechanism explained in Section 4.1.1. There are three scenarios related to the presence of PU activities in routes k 2 , k 3 , and k 4 . In all scenarios, the destination node is beyond the transmission range of the source node, and so the traffic stream must go through the intermediate nodes of the network.

Scenario 1

In the first scenario, as shown in Figure 6, PUs reappear in route k 3 ( f c s f c 1 f c 4 f c 7 f c d ) in a random manner. P U 1 , P U 2 , and P U 3 interfere with channels c 2 , c 6 , and c 9 , respectively. The source node selects either D2D routes k 2 or k 4 , or the backbone route k 1 . Since route k 2 has a higher priority due to a lower number of hops, it is selected.
Figure 6. Scenario 1: The appearance of PUs in route k 3 interfere with channels c 2 , c 6 , and c 9 .

Scenario 2

In the second scenario, as shown in Figure 7, PUs reappear in routes k 3 ( f c s f c 1 f c 4 f c 7 f c d ) and k 4 ( f c s f c 2 f c 5 f c 8 f c d ). P U 1 , P U 2 , and P U 3 interfere with channels c 2 , c 6 , and c 3 , respectively. Specifically, two channels, c 2 and c 6 of route k 3 and channel c 3 of route k 4 , are occupied by PUs. The source node selects the D2D route k 2 rather than the backbone route k 1 .
Figure 7. Scenario 2: The appearance of PUs in routes k 3 and k 4 interfere with channels c 2 , c 3 , and c 6 .

Scenario 3

In the third scenario, as shown in Figure 8, PUs reappear in all D2D routes, including routes k 2 ( f c s f c 3 f c 6 f c d ), k 3 ( f c s f c 1 f c 4 f c 7 f c d ), and k 4 ( f c s f c 2 f c 5 f c 8 f c d ). P U 1 , P U 2 , and P U 3 interfere with channels c 2 , c 3 , and c 4 , respectively. Specifically, channel c 2 of route k 3 , channel c 3 of route k 4 , and channel c 4 of route k 2 are occupied by PUs. Only the backbone route k 1 is available to the source node.
Figure 8. Scenario 3: The appearance of PUs in routes k 2 , k 3 , and k 4 interfere with channels c 2 , c 3 , and c 4 .

5. Results and Discussion

Simulation results, including the packet delivery ratio, end-to-end delay, throughput, and the number of route breakages, are presented.

5.1. Packet Delivery Ratio

The packet delivery ratio (PDR) is the ratio of the number of packets received by the destination node to the number of packets sent by the source node. In Figure 9, PDR increases for D2D routes k 1 , k 2 , and k 3 as the PU OFF time increases. When the PU OFF time increases from 50 to 250 s, the PDR of: (a) route k 2 increases from 0.891 (89.1%) to 0.929 (92.2%); (b) route k 3 increases from 0.853 (85.3%) to 0.915 (91.5%); and (c) route k 4 increases from the lowest at 0.844 (84.4%) to 0.912 (91.2%).
Figure 9. Average packet delivery ratio comparison among D2D routes with dynamic α at different PU OFF time.
Route k 2 achieves a better PDR compared to routes k 3 and k 4 since it has a lower number of hops and PU activities. For routes k 3 and k 4 , their PDRs are very close to each other and their gap reduces as the PU OFF time increases. This is because both routes have the same number of hops; however, route k 3 has a lesser presence of PUs, explaining why it is a preferred route over route k 4 .

5.2. End-to-End Delay

The end-to-end delay is the time taken by a data stream to be transmitted from the source node f c s to the destination node f c d . The three D2D routes have different numbers of intermediate nodes (or hops). Route k 2 = f c s f c 3 f c 6 f c d has four nodes with two intermediate nodes and routes k 3 = f c s f c 1 f c 4 f c 7 f c d and k 4 = f c s f c 2 f c 5 f c 8 f c d have five nodes with three intermediate nodes. Route k 2 has the lowest end-to-end delay among the D2D routes since it has the lowest number of intermediate nodes. Since routes k 3 and k 4 have the same number of intermediate nodes, the source node selects one over the other through the learning mechanism, explained in Section 4.2.3.
Figure 10 shows a comparison of the end-to-end delay incurred between the traditional reinforcement learning (TRL) mechanism with α = 0.5 and the dynamic reinforcement learning (DRL) mechanism. In the traditional RL mechanism, the learning rate is constant at α = 0.5 for the entire experiment. The end-to-end delay reduces with increasing PU OFF time. Compared to traditional reinforcement learning, dynamic reinforcement learning shows a lower end-to-end delay for routes k 2 , k 3 , and k 4 when the PU OFF time increases from 50 to 250 s.
Figure 10. End-to-end delay comparison among routes k 2 , k 3 , and k 4 between DRL and TRL with α = 0.5 at a different PU OFF time.

5.3. Throughput

Throughput is the rate of the successful data stream delivered to the destination node through a selected route in a specific time frame. Figure 11 shows a comparison of the throughput achieved by the three D2D routes k 2 , k 3 , and k 4 for different PU OFF times. Based on the experimental results, the average throughput of the routes increases when the PU appearance reduces. Therefore, increasing the PU OFF time has a positive effect on the rate of the data stream transmission. The throughput of the dynamic reinforcement learning (DRL) mechanism, which has a dynamic learning rate α that varies with rewards, is affected by the PU appearance in a route. Specifically, when the PU OFF time increases from 50 to 250 s: (a) the throughput of route k 2 increases from 1.487 to 1.68 Mbps, (b) the throughput of route k 3 increases from 1.473 to 1.652 Mbps, and (c) the throughput of route k 4 increases from 1.468 to 1.642 Mbps. A similar trend is observed in: (a) traditional reinforcement learning (TRL) with a fixed learning rate of α = 0.5 ; and (b) the non-learning approach, called non-RL (NRL), which selects routes using their priority k 2 > k 3 > k 4 , which is based on the number of hops of the routes. Overall, DRL outperforms both TRL and NRL.
Figure 11. Average throughput comparison among routes k 2 , k 3 , and k 4 for dynamic RL with dynamic α , traditional RL with α = 0.5 and non-RL at a different PU OFF time.

5.4. Number of Route Breakages

The source node selects a route based on the priority given by the CC. However, the priority of routes with D2D communication changes with the presence of PUs and successful data transmission to the destination node. For each data transmission cycle, a route breakage occurs when a PU reappears in a selected route, and this causes the source node to switch to another route.
Figure 12 shows a comparison of the cumulative route breakage between DRL and TRL with a fixed α = 0.5 . Although TRL has a better performance with less route breakage at the beginning, the source node learns more about routes with higher successful transmission rates as time goes by, contributing to the improvement in DRL. When the PU OFF time increases from 50 to 250 s: (a) the route breakage of TRL reduces from 15.6 to 5.5 and (b) the route breakage of DRL reduces from 16.4 to 5.1.
Figure 12. Cumulative number of route breakages between TRL with α = 0.5 and the DRL mechanism at a different PU-OFF time.

6. Conclusions and Future Work

This paper proposes CenTri, which is a hybrid route selection scheme that uses white spaces to offload traffic from macrocell to small-cell base stations with heterogeneous nodes in 5G network scenarios. It caters to important characteristics of 5G network scenarios, including the dynamicity of channel availability, heterogeneity, and ultra-densification. In this paper, device-to-device (D2D) communication uses traditional reinforcement learning (TRL) and dynamic reinforcement learning (DRL) approaches. While TRL uses a constant learning rate, DRL uses a dynamic learning rate that changes with primary user (PU) activity levels. Our work was tested in a testbed with eleven USRP/GNU radio units. Each USRP unit was embedded with a mini-computer called RP3 to provide more realistic scenarios. Compared to TRL, experimental results show improvement in different quality of service (QoS) metrics, including a higher packet delivery ratio, throughput, lower end-to-end delay, and the number of route breakages. Routes with higher intermediate nodes also achieved higher end-to-end delay but a lower packet delivery ratio and throughput.
In the future, CenTri will require more testing with a higher number of routes and intermediate nodes. Better processing units can relax the assumptions made, including the processing delay. Moreover, a cross-layer design for studying the physical data link and network layers will help to provide a more realistic testing environment.

Author Contributions

Conceptualization, M.K.C., K.-L.A.Y., M.H.L. and Y.-W.C.; methodology, M.K.C., K.-L.A.Y., M.H.L. and Y.-W.C.; investigation, M.K.C.; resources, M.K.C., K.-L.A.Y., M.H.L. and Y.-W.C.; data curation, M.K.C.; writing—original draft preparation, M.K.C.; writing—review and editing, K.-L.A.Y., M.H.L. and Y.-W.C.; visualization, M.K.C.; project administration, K.-L.A.Y. and Y.-W.C.; funding acquisition, K.-L.A.Y. and Y.-W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a publication fund under the Research Creativity and Management Office, Universiti Sains Malaysia. This research was also supported by Universiti Tunku Abdul Rahman.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chaudhari, A.; Murthy, C.S.R. Femto-to-Femto ( F2F ) Communication: The Next Evolution Step in 5G Wireless Backhauling. In Proceedings of the 2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Paris, France, 15–19 May 2017. [Google Scholar]
  2. Boviz, D.; Member, S.; Chen, C.S.; Member, S.; Yang, S. Effective Design of Multi-User Reception and Fronthaul Rate Allocation in 5G Cloud RAN. IEEE J. Sel. Areas Commun. 2017, 8716, 1825–1836. [Google Scholar] [CrossRef]
  3. Jalil Piran, M.; Tran, N.; Suh, D.; Song, J.B.; Hong, C.S.; Han, Z. QoE-Driven Channel Allocation and Handoff Management for Seamless Multimedia in Cognitive 5G Cellular Networks. IEEE Trans. Veh. Technol. 2016, 66, 6569–6585. [Google Scholar] [CrossRef]
  4. Sucasas, V.; Radwan, A.; Mumtaz, S.; Rodriguez, J. Effect of noisy channels in MAC-based SSDF counter-mechanisms for 5G cognitive radio networks. In Proceedings of the International Symposium on Wireless Communication Systems, Brussels, Belgium, 25–28 August 2015. [Google Scholar] [CrossRef]
  5. Font-Bach, O.; Bartzoudis, N.; Mestre, X.; López-Bueno, D.; Mège, P.; Martinod, L.; Ringset, V.; Myrvoll, T.A. When SDR meets a 5G candidate waveform: Agile use of fragmented spectrum and interference protection in PMR networks. IEEE Wirel. Commun. 2015, 22, 56–66. [Google Scholar] [CrossRef]
  6. Öhlén, P.; Skubic, B.; Rostami, A.; Ghebretensaé, Z.; Mårtensson, J.; Fiorani, M.; Monti, P.; Wosinska, L. Data Plane and Control Architectures for 5G Transport Networks. J. Light. Technol. 2016, 34, 1501–1508. [Google Scholar] [CrossRef]
  7. Carrasco, Ó.; Miatton, F.; Díaz, S.; Herzog, U.; Frascolla, V.; Briggs, K.; Miscopein, B.; Domenico, A.D.; Georgakopoulos, A. Centralized Radio Resource Management for 5G small cells as LSA enabler. arXiv 2017, arXiv:1706.08057. [Google Scholar]
  8. Huang, H.; Guo, S.; Liang, W.; Li, K.; Ye, B.; Zhuang, W. Near-Optimal Routing Protection for In-Band Software-Defined Heterogeneous Networks. IEEE J. Sel. Areas Commun. 2016, 34, 2918–2934. [Google Scholar] [CrossRef]
  9. Ambriz, S.J.G.; Mendez, R.M.; Angeles, M.E.R. 5GTraDis: A novel traffic distribution mechanism for 5G Heterogeneous Networks. In Proceedings of the 2016 13th International Conference on Electrical Engineering, Computing Science and Automatic Control, CCE 2016, Mexico City, Mexico, 26–30 September 2016. [Google Scholar] [CrossRef]
  10. Jaber, M.; Imran, M.A.; Tafazolli, R.; Tukmanov, A. 5G Backhaul Challenges and Emerging Research Directions: A Survey. IEEE Access 2016, 4, 1743–1766. [Google Scholar] [CrossRef]
  11. Tran, T.X.; Hajisami, A.; Pompili, D. Ultra-Dense Heterogeneous Small Cell Deployment in 5G and Beyond Cooperative Hierarchical Caching in 5G Cloud Radio Access Networks. IEEE Netw. 2017, 31, 35–41. [Google Scholar] [CrossRef]
  12. Ge, X.; Tu, S.; Mao, G.; Wang, C.X.; Han, T. 5G Ultra-Dense Cellular Networks. IEEE Wirel. Commun. 2016, 23, 72–79. [Google Scholar] [CrossRef]
  13. Wassie, D.A.; Berardinelli, G.; Catania, D.; Tavares, F.M.L. Experimental Evaluation of Interference Suppression Receivers and Rank Adaptation in 5G Small Cells. In Proceedings of the 2015 IEEE 82nd Vehicular Technology Conference (VTC2015-Fall), Boston, MA, USA, 6–9 September 2015. [Google Scholar]
  14. Wassie, D.A.; Berardinelli, G.; Tavares, F.M.L. Experimental Verification of Interference Mitigation techniques for 5G Small Cells. In Proceedings of the 2015 IEEE 81st Vehicular Technology Conference (VTC Spring), Glasgow, UK, 11–14 May 2015. [Google Scholar] [CrossRef]
  15. Lai, W.K.; Shieh, C.S.; Chou, F.S.; Hsu, C.Y. Handover Management for D2D Communication in 5G Networks. In Proceedings of the 2020 2nd International Conference on Computer Communication and the Internet, ICCCI 2020, Nagoya, Japan, 26–29 June 2020; pp. 64–69. [Google Scholar] [CrossRef]
  16. Ouali, K.; Kassar, M.; Nguyen, T.M.T.; Sethom, K.; Kervella, B. An efficient D2D handover management scheme for SDN-based 5G networks. In Proceedings of the 2020 IEEE 17th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 10–13 January 2020; pp. 1–6. [Google Scholar]
  17. Okasaka, S.; Weiler, R.J.; Keusgen, W.; Pudeyev, A.; Maltsev, A.; Karls, I.; Sakaguchi, K. Proof-of-concept of a millimeter-wave integrated heterogeneous network for 5G cellular. Sensors 2016, 16, 1362. [Google Scholar] [CrossRef]
  18. Jia, M.; Gu, X.; Guo, Q.; Xiang, W.; Zhang, N. Broadband Hybrid Satellite-Terrestrial Communication Systems Based on Cognitive Radio toward 5G. IEEE Wirel. Commun. 2016, 23, 96–106. [Google Scholar] [CrossRef]
  19. Su, Z.; Xu, Q. Content distribution over content centric mobile social networks in 5G. IEEE Commun. Mag. 2015, 53, 66–72. [Google Scholar] [CrossRef]
  20. Soltani, S.; Sagduyu, Y.; Shi, Y.; Li, J.; Feldman, J.; Matyjas, J. Distributed Cognitive Radio Network Architecture, SDR Implementation and Emulation Testbed. In Proceedings of the MILCOM 2015—2015 IEEE Military Communications Conference, Tampa, FL, USA, 26–28 October 2015; pp. 438–443. [Google Scholar] [CrossRef]
  21. He, J.; Song, W. Evolving to 5G: A Fast and Near-optimal Request Routing Protocol for Mobile Core Networks. In Proceedings of the 2014 IEEE Global Communications Conference, Austin, TX, USA, 8–12 December 2014; pp. 4586–4591. [Google Scholar]
  22. Taleb, T.; Samdanis, K.; Mada, B.; Flinck, H.; Dutta, S.; Sabella, D. On Multi-Access Edge Computing: A Survey of the Emerging 5G Network Edge Architecture & Orchestration. IEEE Commun. Surv. Tutor. 2017, 19, 1657–1681. [Google Scholar] [CrossRef]
  23. Zhang, G.A.; Gu, J.Y.; Bao, Z.H.; Xu, C.; Zhang, S.B. Efficient Signal Detection for Cognitive Radio Relay Networks Under Imperfect Channel Estimation. Eur. Trans. Telecommun. 2015, 25, 294–307. [Google Scholar] [CrossRef]
  24. Hwang, R.H.; Peng, M.C.; Huang, C.W.; Lin, P.C.; Nguyen, V.L. An Unsupervised Deep Learning Model for Early Network Traffic Anomaly Detection. IEEE Access 2020, 8, 30387–30399. [Google Scholar] [CrossRef]
  25. Caron, M.; Bojanowski, P.; Joulin, A.; Douze, M. Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 132–149. [Google Scholar]
  26. Tsipi, L.; Karavolos, M.; Vouyioukas, D. An unsupervised machine learning approach for UAV-aided offloading of 5G cellular networks. Telecom 2022, 3, 86–102. [Google Scholar] [CrossRef]
  27. Saleem, Y.; Yau, K.L.A.; Mohamad, H.; Ramli, N.; Rehmani, M.H. SMART: A SpectruM-Aware ClusteR-based rouTing scheme for distributed cognitive radio networks. Comput. Netw. 2015, 91, 196–224. [Google Scholar] [CrossRef]
  28. Saleem, Y.; Yau, K.L.A.; Mohamad, H.; Ramli, N.; Rehmani, M.H.; Ni, Q. Clustering and Reinforcement - Learning-Based Routing for Cognitive Radio Networks. IEEE Wirel. Commun. 2017, 24, 146–151. [Google Scholar] [CrossRef]
  29. Tang, F.; Tang, C.; Yang, Y.; Yang, L.T.; Zhou, T.; Li, J.; Guo, M. Delay-Minimized Routing in Mobile Cognitive Networks for Time-Critical Automation Applications. IEEE Trans. Ind. Inform. 2017, 13, 1398–1409. [Google Scholar] [CrossRef]
  30. Khatib, R.F.E.; Salameh, H.B. A Routing Scheme for Cognitive Radio Networks with Self-Interference Suppression Capabilities. In Proceedings of the 2017 Fourth International Conference on Software Defined Systems (SDS), Valencia, Spain, 8–11 May 2017; pp. 20–25. [Google Scholar]
  31. He, J.; Song, W. Optimizing Video Request Routing in Mobile Networks with Built-in Content Caching. IEEE Trans. Mob. Comput. 2014, 15, 1714–1727. [Google Scholar] [CrossRef]
  32. Guo, J.; Orlik, P.; Parsons, K.; Ishibashi, K.; Takita, D. Resource aware routing protocol in heterogeneous wireless machine-to-machine networks. In Proceedings of the 2015 IEEE Global Communications Conference, GLOBECOM 2015, San Diego, CA, USA, 6–10 December 2015. [Google Scholar] [CrossRef]
  33. Rakshith, K.; Rao, M. Routing Protocol for Device-to-Device Communication in SoftNet Towards 5G. In Proceedings of the Information and Communication Technology for Intelligent Systems (ICTIS 2017), Ahmedabad, India, 25–26 March 2017; Volume 84. [Google Scholar] [CrossRef]
  34. Liu, Y.; Wang, Y.; Sun, R.; Miao, Z. Distributed resource allocation for D2D-Assisted small cell networks with heterogeneous spectrum. IEEE Access 2019, 7, 83900–83914. [Google Scholar] [CrossRef]
  35. Dijkstra, E.W. A Note on Two Problems in Connexion with Graphs. In Edsger Wybe Dijkstra: His Life, Work, and Legacy; ACM: New York, NY, USA, 1959; Volume 271, pp. 269–271. [Google Scholar]
  36. Giambene, G.; Kota, S.; Pillai, P. Integration of Satellite and 5G Networks Satellite-5G Integration: A Network Perspective. IEEE Netw. 2019, 32, 25–31. [Google Scholar] [CrossRef]
  37. Xu, H.; Yu, Z.; Li, X.y.; Huang, L. Joint Route Selection and Update Scheduling for Low-Latency Update in SDNs. IEEE/ACM Trans. Netw. 2017, 25, 3073–3087. [Google Scholar] [CrossRef]
  38. Chai, Y.; Shi, W.; Shi, T.; Yang, X. An efficient cooperative hybrid routing protocol for hybrid wireless mesh networks. Wirel. Netw. 2016, 23, 1387–1399. [Google Scholar] [CrossRef]
  39. Triviño, A.; Ariza, A.; Casilari, E.; Cano, J.C. Cooperative layer-2 based routing approach for hybrid wireless mesh networks. China Commun. 2013, 10, 88–99. [Google Scholar] [CrossRef]
  40. Caria, M.; Jukan, A.; Hoffmann, M. SDN Partitioning: A Centralized Control Plane for Distributed Routing Protocols. IEEE Trans. Netw. Serv. Manag. 2016, 13, 381–393. [Google Scholar] [CrossRef]
  41. ElSawy, H.; Dahrouj, H.; Al-Naffouri, T.Y.; Alouini, M.S. Virtualized cognitive network architecture for 5G cellular networks. IEEE Commun. Mag. 2015, 53, 78–85. [Google Scholar] [CrossRef]
  42. Walikar, G.; Biradar, R.; Geetha, D. Topology based adaptive hybrid multicast routing in mobile ad hoc networks. In Proceedings of the 2016 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), Pune, India, 19–21 December 2016; pp. 19–21. [Google Scholar] [CrossRef]
  43. Sun, L.; Zheng, W.; Rawat, N.; Sawant, V.; Koutsonikolas, D. Performance comparison of routing protocols for cognitive radio networks. IEEE Trans. Mob. Comput. 2014, 14, 1272–1286. [Google Scholar] [CrossRef]
  44. Huang, X.; Lu, D.; Li, P.; Fang, Y. Coolest Path: Spectrum Mobility Aware Routing Metrics in Cognitive Ad Hoc Networks. In Proceedings of the 2011 31st International Conference on Distributed Computing Systems, Minneapolis, MN, USA, 20–24 June 2011. [Google Scholar] [CrossRef]
  45. Ihara, Y.; Kremo, H.; Altintas, O.; Tanaka, H.; Ohtake, M.; Fujii, T.; Yoshimura, C.; Ando, K.; Tsukamoto, K.; Tsuru, M.; et al. Distributed autonomous multi-hop vehicle-to-vehicle communications over TV white space. In Proceedings of the 2013 IEEE 10th Consumer Communications and Networking Conference, CCNC 2013, Las Vegas, NV, USA, 11–14 January 2013; pp. 336–344. [Google Scholar] [CrossRef]
  46. Syed, A.; Yau, K.L.; Qadir, J.; Mohamad, H.; Ramli, N.; Keoh, S. Route selection for multi-hop cognitive radio networks using reinforcement learning: An experimental study. IEEE Access 2016, 4, 6304–6324. [Google Scholar] [CrossRef]
  47. McAuley, A.; Sinkar, K.; Kant, L.; Graff, C.; Patel, M. Tuning of reinforcement learning parameters applied to OLSR using a cognitive network design tool. In Proceedings of the 2012 IEEE Wireless Communications and Networking Conference (WCNC), Paris, France, 1–4 April 2012; pp. 2786–2791. [Google Scholar] [CrossRef]
  48. Briand, A.; Albert, B.B.; Gurjao, E.C. Complete software defined RFID system using GNU radio. In Proceedings of the 2012 IEEE International Conference on RFID-Technologies and Applications (RFID-TA), Nice, France, 5–7 November 2012; pp. 287–291. [Google Scholar] [CrossRef]
  49. Wei, X.; Liu, H.; Geng, Z.; Zheng, K.; Xu, R.; Liu, Y.; Chen, P. Software Defined Radio Implementation of a Non-Orthogonal Multiple Access System towards 5G. IEEE Access 2016, 4, 9604–9613. [Google Scholar] [CrossRef]
  50. Zhao, Y.; Pradhan, J.; Huang, J.; Luo, Y.; Pu, L. Joint energy-and-bandwidth spectrum sensing with GNU radio and USRP. ACM SIGAPP Appl. Comput. Rev. 2015, 14, 40–49. [Google Scholar] [CrossRef]
  51. Elsayed, M.H.M. Distributed interference management using Q-Learning in Cognitive Femtocell networks: New USRP-based Implementation. In Proceedings of the 2015 7th International Conference on New Technologies, Mobility and Security (NTMS), Paris, France, 27–29 July 2015. [Google Scholar]
  52. Xiong, X.; Xiang, W.; Zheng, K.; Shen, H.; Wei, X. An open source SDR-based NOMA system for 5G networks. IEEE Wirel. Commun. 2015, 22, 24–32. [Google Scholar] [CrossRef]
  53. Chamran, M.K.; Yau, K.l.A.; Noor, R. An Experimental Study on D2D Route Selection Mechanism in 5G Scenarios. Electronics 2021, 10, 387. [Google Scholar] [CrossRef]
  54. Yarnagula, H.K.; Deka, S.K.; Sarma, N. Distributed TDMA based MAC protocol for data dissemination in ad-hoc Cognitive Radio networks. In Proceedings of the 2013 IEEE International Conference on Advanced Networks and Telecommunications Systems, ANTS 2013, Kattankulathur, India, 15–18 December 2013; pp. 1–6. [Google Scholar] [CrossRef]
  55. Elwhishi, A.; Ho, P.H.; Naik, K.; Shihada, B. ARBR: Adaptive reinforcement-based routing for DTN. In Proceedings of the 2010 IEEE 6th International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob’2010, Niagara Falls, ON, Canada, 11–13 October 2010; pp. 376–385. [Google Scholar] [CrossRef]
  56. Habak, K.; Abdelatif, M.; Hagrass, H.; Rizc, K. A Location-Aided Routing Protocol for Cognitive Radio Networks. In Proceedings of the 2013 International Conference on Computing, Networking and Communications (ICNC), San Diego, CA, USA, 28–31 January 2013; pp. 729–733. [Google Scholar]
  57. Boushaba, M.; Hafid, A.; Belbekkouche, A.; Gendreau, M. Reinforcement learning based routing in wireless mesh networks. Wirel. Netw. 2013, 19, 2079–2091. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.