Enhancing 5G Small Cell Selection: A Neural Network and IoV-Based Approach

The ultra-dense network (UDN) is one of the key technologies in fifth generation (5G) networks. It is used to enhance the system capacity issue by deploying small cells at high density. In 5G UDNs, the cell selection process requires high computational complexity, so it is considered to be an open NP-hard problem. Internet of Vehicles (IoV) technology has become a new trend that aims to connect vehicles, people, infrastructure and networks to improve a transportation system. In this paper, we propose a machine-learning and IoV-based cell selection scheme called Artificial Neural Network Cell Selection (ANN-CS). It aims to select the small cell that has the longest dwell time. A feed-forward back-propagation ANN (FFBP-ANN) was trained to perform the selection task, based on moving vehicle information. Real datasets of vehicles and base stations (BSs), collected in Los Angeles, were used for training and evaluation purposes. Simulation results show that the trained ANN model has high accuracy, with a very low percentage of errors. In addition, the proposed ANN-CS decreases the handover rate by up to 33.33% and increases the dwell time by up to 15.47%, thereby minimizing the number of unsuccessful and unnecessary handovers (HOs). Furthermore, it led to an enhancement in terms of the downlink throughput achieved by vehicles.

High data rates and the efficient use of a spectrum are crucial requirements for IoT-based 5G networks [22]. Maximizing the 5G data rate should be targeted so that the IoT transmission rate constraints and interference to IoT are considered. In addition, improving the energy efficiency by consuming less power is essential to meet communication requirements [23]. Nowadays, machine learning (ML) is becoming a promising method that can offer fast processing and real-time predictions for complex and large-scale applications by developing models and algorithms [24][25][26]. An artificial neural network (ANN) is a machine learning algorithm that is based on processing elements (called neurons) to simulate the concept of human neurons [27]. ANNs have proven their effectiveness in solving many problems in different fields [28]. Fifth generation (5G) networks require the application of machine learning techniques to operate effectively. Solving issues related to 5G wireless technology is an open direction for future research [29].
In this paper, we study the cell selection issue in 5G UDNs. A novel cell selection strategy is proposed that is based on ANN to perform the multi-classification task of small BSs, based on vehicle information. The main determinant in choosing a cell is the dwell time spent inside the cell. In the experiment, actual datasets are used for training and evaluation that were gathered in the city of Los Angeles.
The traditional scheme and most existing works give high priority to the small BSs that have the maximum received signal strength indicator (RSSI). However, relying on this principle is not effective in ultra-dense environments because it will lead to an increased handover rate [16,30,31]. In addition, machine learning techniques are needed to speed up processing time and to reduce computational complexity.
The main contributions of this work are: 1.
proposing an intelligent ANN-based cell selection strategy for 5G UDNs, called ANN-CS. It aims to select a small BS that has the longest dwell time in the range, using a ML technique. A feed-forward back-propagation ANN (FFBP-ANN) was trained based on real BS and vehicle datasets that were collected in the city of Los Angeles; 2.
evaluating the performance of the trained FFBP-ANN in terms of accuracy, sensitivity, specificity, precision, F-score, and geometric mean (G-mean). In addition, errors are checked based on the root mean square error (RMSE) and the mean absolute error (MAE); 3.
evaluating the performance of the proposed ANN-CS scheme based on the following performance metrics: the average (i) dwell time; (ii) number of handovers; (iii) number of unsuccessful and unnecessary handovers; and (iv) achievable downlink throughput. Then, the performance of the proposed ANN-CS approach is compared with the traditional cell selection method and a recent related approach called Handover based on Residence Time Prediction (HO RTP).
The rest of this paper is structured as follows. Section 2 presents related ML-based cell selection works. The proposed machine-learning-based approach is explained in Section 3. The simulation results are discussed in Section 4. The conclusion of the whole paper and suggestions for future work are given in Section 5. Appendix A gives lists of all abbreviations and symbols that are mentioned in this paper.

Related Work
In this section, recent related user association methods are discussed. Some of these works use machine learning (ML) techniques to solve the cell selection issue, while others do not.

Non ML-Based Cell Selection Strategies
A cell selection approach was proposed by Kiishida et al. in [32] for 5G multi-layered Radio Access Networks (RANs). It considers the direction and velocity of UE movement to reduce the number of frequent handovers. The final decision is based on the value of SINR, whereby the BS that has the maximum SINR value will be selected. Simulation results proved that the proposed approach achieved an approximate 30% improvement in the number of handovers while maintaining the average flow time.
In [33], Elkourdi et al. proposed a cell selection algorithm for 5G heterogeneous networks that based on Bayesian game. There are two players, that is, user equipment (UEs) and access nodes (AN). There are different types of UEs based on the traffic. Simulation results showed that the proposed scheme outperformed the traditional and cellrange-expansion (CRE) methods in terms of the probability of proper connection and end-to-end delay.
Waheidi et al. developed an approach called Cell Association, based on a Multi-Armed Bandit game (CA-MAB) in [30]. There are two classes of devices, that is, UE and IoT, and the proposed CA-MAB scheme was evaluated in static and mobile environments. The evaluation results showed that the CA-MAB approach enhance the energy efficiency and the throughput and the existence of mobility affected the energy savings, throughput, and equilibrium.
Arshad et al. proposed topology-aware skipping approaches in [34], where various skipping techniques are considered. The handover decision is taken based on the position of a user and/or cell size. Simulation results showed that the proposed schemes outperformed the conventional RSSI-based method by up to 47% in terms of the average user throughput.
Two cell selection strategies for HUDNs were proposed by Sun et al. in [35] that depend on the coordinated multipoint (CoMP) technology. The first scheme is called movement-aware CoMP handover (MACH), which select the cooperation BSs set that has the strongest received signal with a dwell time greater than a specific threshold. The second scheme is known as improved MACH (iMACH), which adds the nearest BS to the MACH's cooperation BSs set, instead of the BS that has the lowest RSSI value in the set. The handover is triggered based on MACH, when the farthest BS in the set becomes the nearest one. Conversely, in iMACH, the HO is initiated when the nearest BS becomes the farthest one. Simulation results demonstrated that MACH and iMACH strategies enhanced the average achievable throughput. In addition, they improve the coverage probability and handover rate.
Qin et al. introduced a cell selection strategy for 5G ultra-dense networks in [36]. It is called Handover based on Resident Time Prediction (HO RTP) and it aims to estimate the residence time inside a cell and select the base station that has the strongest RSSI value with a residence period longer than a predefined threshold. Simulation results demonstrated that the HO RTP approach was superior to the traditional method in terms of achievable mean user throughput.
In [16], Alablani and Arafah introduced an adaptive cell selection approach for 5G Heterogeneous UDNs (HUDNs), called ADA-CS. It aims to select the best BS based on the different features of HUDNs and vehicle movements. It passes through six phases to achieve its goals; namely, configuration, decision-making, filtering, narrowing, selecting, and HO triggering. Simulation results demonstrated that the ADA-CS strategy was supe-rior to the conventional and recent related approaches in terms of the average number of handovers, average achievable downlink data rates and spectral efficiency.

ML-Based Cell Selection Strategies
In [37], Dilranjan et al. proposed a BS prediction strategy for 5G wireless networks that uses a Recurrent Neural Network (RNN) classifier. Received Signal Strength (RSS) values are used to train the RNN model. Simulation results showed that the proposed scheme achieved 98% accuracy in predicting the optimal base station to be associated with.
Zhang et al. introduced a machine-learning-based cell selection scheme for drones in wireless networks in [38]. A conditional random field (CRF) model is used to predict the best serving cells depending on signal-to-interference-plus-noise ratio (SINR) values. Simulation results demonstrated that the proposed CRF-based method yielded 90% accuracy in predicting the best cells and it outperformed two simple heuristic methods.
In [39], Perez et al. proposed a machine-learning-based framework to solve the user association problem in 5G heterogeneous networks. The Q-learning algorithm was used to achieve the model goal. 3-dimensional feature vectors were used that included the BS identification (BSID) index, downlink (DL) SINR, and the DL cell load. Simulation results showed the superiority of the proposed framework over alternative decision methods.
Zappone et al. introduced a user association method in [40] that was based on machine learning. A feed-forward artificial neural network (ANN) was trained to perform the optimal user association where the input was the geographical positions of users. The use of the ANN reduced the computational complexity of the assignment procedure compared to conventional methods.
In [41], a cell selection issue was solved by introducing two hidden Markov-model-(HMM) based ML strategies that were proposed by Balapuwaduge et al. The reliability and availability of network resources were the main targets of the proposed HMM-based ML schemes. Simulation results showed the superiority of the proposed strategies compared with a random cell selection method in terms of channel availability and reliability.
An intelligent machine-learning-based user association for 5G heterogeneous networks was developed in [14] by Zhang et al. The problem was treated as a supervised learning task and a cross-entropy algorithm was used for labeling the best base station to be associated with. A U-Net convolutional neural network (CNN) was trained to solve the user association problem under the cell load constraint. Channel gain matrices were mapped onto images to be the inputs of CNN, while the user association matrices were the outputs of the ML model. Simulation results demonstrated that the proposed schemes enhanced computation time and network robustness. Table 1 represents a comparison among recent ML-based cell selection schemes in terms of the ML model used and its inputs. Based on the cell selection works that are represented in this section, we found the following limitations: • The number of cell selection schemes that rely on applying machine learning technologies in predicting the serving BS is small compared with the number of non ML-based works. However, using ML techniques seems to be essential in an environment that has vehicle movement and ultra-high density BSs to decrease the computational complexity of estimating the best BSs; • Few works consider the estimation of the dwell time, which is, in fact, the main determinant in selecting BSs. Moreover, these works did not give the dwelling period a high priority compared to the value of the received signal strength. In addition, the equations used to estimate the dwell time are inaccurate and assume that the user is located at the edge of the cell, which is contrary to reality; • The ML-based works did not give the model enough types of inputs to be able to predict the best BS efficiently.

Problem Formulation
The proposed ML-Based cell selection, the ANN-CS scheme, aims to reduce the handover rate in 5G UDNs by prolonging the dwell time of vehicles within small cells. Millimeter-Wave (mmWave) communication in UDNs has been considered, which operates in a high-frequency band. The association in downlink with single connectivity between small BSs and vehicles is considered. The small BSs located in the Central cluster of Los Angeles are denoted by B small = {B 1 , B 2 , . . . , B K }. The vehicles, which move with different movement-related information, are represented by V = {V 1 , V 2 , . . . , V J }. The BS association vector is expressed as A = {a 11 , a 12 , . . . , a KJ }, where the BS association variable that indicates the connection between small BS k and vehicle j is defined as shown in Equation (1).
1 If there is association between B k and V j 0 Otherwise (1)

Proposed Framework
The framework of the proposed ANN-based small cell selection is represented in Figure 2. The framework is composed of two main components: a 5G ultra-dense environment and an ANN-based agent. In training and testing processes, there is an interaction between the two components. The vehicle-related information, which includes geographical locations, azimuths, and speeds, is entered in the ANN-based agent. The ANN is used to predict the best small BS to be associated with, based on the longest dwell time, by generating BS-association vectors. A converting unit is used to convert the predicted BS association vector to the corresponding BS's ID.

5G Network Model
A 5G ultra-dense network has been modeled in this paper based on real datasets. In the city of Los Angeles, the distribution of small BSs is in three clusters: (a) Burbank, (b) Central, and (c) Long Beach [42]. The Central cluster is considered due to the high density of the small BSs.
The system model is shown in Figure 3, where the black crosses represent the distribution of small BSs and the series of green squares shows the locations of vehicles in LA City.
There are 621 small base stations and 48,864 vehicles.

Machine Learning Model
This study is based on using a machine learning technique to solve the 5G small cell selection issue. There are three main phases involved in building the proposed ML model, as represented in Figure 4. These phases are (1) data preparation, (2) ML model training, and (3) ML model evaluation. The raw data consist of two databases; a dataset of small base stations located in the city of Los Angeles [42], and a dataset of vehicle information, which was collected in LA City [43].

Data Preparation
The data preparation phase is composed of three steps: • Data curation step: the collected data are organized and information that does not serve the proposed ML model is cleared up in this step. From the LA small BS and vehicle datasets, the samples corresponding to the Central cluster area of LA were taken, due to the high density of small cells. In addition, the columns that are used in calculating the longest dwell time within each small cell are kept. Figure 5 shows where C is the chord of a small cell, which indicates the length of the dwelling distance within the small cell. The vehicle speed and the distance between the vehicle and small BS are identified by s and d, respectively. The angle between the small BS and the direction of the vehicle is represented by θ and r is the radius of the small cell. • Data splitting step: The labelled data were split into training and testing sets with percentages of 80% and 20%, respectively. Table 2 shows the number of training and testing samples that are used to train and evaluate the proposed ANN-based model.

. ML Model Training
A feed-forward back-propagation ANN (FFBP-ANN) is used to achieve the multiclassification task, as shown in Figure 6. In the proposed FFBP-ANN structure, there are three layers; input, hidden, and output. The input vector has four values related to vehicles; latitude, longitude, azimuth, and speed. The training data set contains 39,091 feature vectors with different vehicle information. The hidden layer is composed of ten neurons, while the output layer includes K neurons to generate the small BSs association vector. Based on the target vector, the errors are estimated to update the weights of the proposed neural network. Table 3 shows the training parameters that were used for training the proposed FFBP-ANN model.

ML Model Evaluation
The root mean square error (RMSE) is a common measure, which calculates the error distance between the predicted values. The mean absolute error (MAE) is a measure used to compute the average of the absolute difference between the predicted and the target values. RMSE and MAE are defined as shown in Equations (3) and (4), respectively [44].
where the number of testing samples is denoted by N and the predicted and the target small BSs are represented byŷ and y, respectively.
To evaluate the performance of the proposed ANN-based model, a confusion matrix is constructed, which is sometimes called a contingency table [45]. The confusion matrix is an effective tool that reports the numbers of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) [46]. Based on the constructed confusion matrix, accuracy, sensitivity, specificity, precision, F-score, and geometric mean (G-mean) are calculated as defined in Equations (5) G-mean = Sensitivity × Specificity .

Propagation Channel Model
The parameters used in the propagation channel model, that is, path loss (PL), fading and shadowing, are shown in Table 4. The PL model used is the 3rd Generation Partnership Project (3GPP) model, which is given by the 3GPP technical report (specification #38.901, version 16.1.0) [47]. Urban microcell-line-of-sight (UMi-LOS)/street canyon model is considered in this study. In the Central cluster of LA city, streets are flanked by buildings on both sides, resulting in canyon-like environments, and the small BSs are shorter than the buildings. The path loss function, γ(d), is associated with the distance between a small base station and a vehicle, where the distance (d) is measured in meters and the carrier frequency (f c ) is expressed in GHz. The breakpoint distance is represented by d BP and the height and the effective height of the small BS are denoted by h B and h B , respectively. The height and the effective height of the vehicle are expressed as h V and h V . The velocity of light in free space is represented by c. The Rayleigh fading model is a common model that can represent multipath fading in real-world environments [48,49]. In this work, multipath fading is modeled as Rayleigh fading to represent the LA city environment, which follows the exponential distribution with unit mean. In this paper, frequencyselective fading is not considered because measurements made in [50] demonstrate that the delay spread is generally small. Moreover, using techniques like orthogonal frequencydivision multiplexing (OFDM) or frequency domain equalization limits the effect of the frequency-selectivity in fading [51]. In addition, small-scale fading at mmWave cellular systems is less severe than that in Long-Term Evolution (LTE) systems when using base station antennas with narrow beams, as the measurement results show [52]. The log-normal shadowing is included in the propagation model, where σ SF is the shadow-fading standard deviation in decibels (dB).

Simulation Results and Discussion
In this work, the MATLAB simulator 2021a was the simulation tool used to implement and analyze the performance of the proposed ANN-CS algorithm. A high-performance gaming computer was used to perform the data processing and to evaluate the performance.The specifications of the computer are given in Table 5.     The trained ANN-based model was evaluated based on RMSE, MAE, accuracy, sensitivity, specificity, precision, F-score, and G-mean, as shown in Table 6. The performance of the proposed ANN-CS strategy is evaluated in terms of: • Average dwell time: The average dwell time of a vehicle in a small cell is estimated according to Equation (11), where the number of moving vehicles in an ultra-dense network is expressed as J .
• Average number of handovers: The average number of HOs that occurs as vehicles move in the UDN is computed according to Equation (12).
• Average number of unsuccessful HO: An unsuccessful HO occurs when the handover latency is longer than the dwell time within a small cell (l i ) [53]. The probability of an unsuccessful HO (Prα) can be calculated in terms of vehicle speed (s), small cell radius (r), handover latency (l), and the time threshold of an unsuccessful HO (Th α ), as shown in Equation (13). Equation (15) shows the formula to estimate the average number of unsuccessful HOs (E(N α )).
• Average number of unnecessary HOs: An unnecessary HO means a false handover is performed, where the dwell time in a small cell is shorter than the summation of HO latencies to move into (l i ) and out (l o ) of the small cell [54]. The probability of an unnecessary HO (Pr(β)) can be calculated as expressed in Equation (16). The time threshold of the unnecessary handover is denoted by Th β . Equation (18) illustrates the method of computing the average number of unnecessary HOs (E(N β )).
• Average achievable DL throughput: The purpose of deploying a high density of 5G small cells is to provide a high data capacity with a cost-effective method [55]. The achievable DL data rate of vehicles during movement in UDNs is calculated according to Shannon's equation, as expressed in Equation (19). The signal-to-interference-plusnoise ratio (SINR), which is denoted by ζ kj , is the ratio of the received signal to the interference from other wireless BSs plus noise [56].
The maximum transmission power of small BSs is denoted as p tx and the path loss function is represented by γ(d), which is defined in Section 3.5. The channel gain is expressed as H, which includes the effects of Rayleigh fading and log-normal shadowing. The thermal noise (σ 2 ) is modeled as an additive white Gaussian noise (AWGN), as shown in Equation (21). It can be computed in terms of noise power spectral density (N 0 ), and sub-channel bandwidth (W).
A throughput is the sum of effective achievable data rate over the network during movement [35]. The throughput of a vehicle can be calculated based on Equation (22).

Performance Results
In this section, we compare the performance of our proposed ANN-CS approach with the traditional and HO RTP cell selection schemes. The simulation parameters that are used in this work are listed in Table 7.
Figures 9 and 10 represent the average dwell time and average number of handovers under different moving speeds. Increasing the speed will reduce the average dwell time of vehicles inside small cells and, therefore, increase the average number of handovers. The proposed ANN-CS approach prolongs the dwell time by estimating it based on the direction and speed of vehicles in addition to small cell specifications. As the chart indicates, the ANN-CS approach has the longest average dwell time and it is superior to the traditional and HO RTP approaches by 15.47% and 7.56%, respectively. The reason is that the traditional cell selection method chooses the small BS that has the largest RSSI value, even if it does not lie on a vehicle's trajectory. The ANN-CS strategy outperforms the HO RTP approach because HO RTP estimates the time resident inside the cell but it selects the small BS that has the highest signal strength value with residence time greater than a predefined dwell time threshold. Therefore, the primary criterion for selection is the strength of the received signal. In addition, the ANN-CS approach outperforms the traditional and RTP HO schemes by 33.33% and 18.18% in terms of the average number of handovers.   Figure 13 displays the relationship between the average achievable downlink throughput and vehicle speed. We found that the proposed ANN-CS made improvements over the traditional and HO RTP approaches by 1.2% and 0.1%, respectively. Although the ANN-CS method does not choose the closest small cell, it can achieve enhancements over the methods that give high priority to the received signal strength criteria. This is because the achievable DL throughput is negatively affected by an increase in the number of HOs due to the latency caused by moving from one small cell to another. In addition, the peak data rate is usually reached by our ANN-CS scheme when the vehicle is at the middle of the small cell, while the peak data rates may not be achieved by RSSI-based methods. Proposed ANN-CS approach Traditional approach HO RTP approach Figure 13. Average achievable downlink throughput.

Conclusions and Future Work
The IoV is a fundamental technology that will improve the transportation system. In ultra-dense networks, cell selection is considered an NP-hard problem. In this paper, we solve the cell selection issue for 5G UDNs by applying a machine learning technique. A neural network and IoV-based algorithm called the ANN-CS scheme is proposed that uses a trained feed-forward back-propagation ANN model to perform the multi-classification task of selecting small base stations. It aims to prolong the dwell time within small cells and thereby decrease the number of handovers. Real datasets are used for training and evaluation purposes, which were collected in the city of Los Angeles. The trained ANN-FFBP model is able to predict the best small BS with high accuracy and a very low error percentage. Simulation results show that our proposed ANN-CS scheme can achieve its goals by decreasing the HOs rate and prolonging the dwell time of vehicles within small cells, and thus the numbers of unsuccessful and unnecessary HOs are minimized. Moreover, the achievable DL throughput is enhanced when using our approach compared with other existing methods. In addition, the computational complexity is reduced by using the ANN, compared with non-ML-based methods. For future work, other machine learning techniques can be applied to solve the cell selection issue in 5G UDNs. A machine learning model can be trained based on different types of input features to make the model applicable to different environments.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Lists of Abbreviations and Symbols
Lists of the abbreviations and symbols that are used in this paper are given in Tables A1 and A2, respectively.