BRNN-LSTM for Initial Access in Millimeter Wave Communications

: The use of beamforming technology in standalone (SA) millimeter wave communications results in directional transmission and reception modes at the mobile station (MS) and base station (BS). This results in initial beam access challenges, since the MS and BS are now compelled to perform spatial search to determine the best beam directions that return highest signal levels. The high number of signal measurements here prolongs access times and latencies, as well as increasing power and energy consumption. Hence this paper proposes a ﬁrst study on leveraging deep learning schemes to simplify the beam access procedure in standalone mmWave networks. The proposed scheme combines bidirectional recurrent neural network (BRNN) and long short-term memory (LSTM) to achieve fast initial access times. Namely, the scheme predicts the best beam index for use in the next time step once a MS accesses the network, e.g., transition from sleep to active (or idle) modes. The scheme eliminates the need for beam scanning, thereby achieving ultra-low access times and energy efﬁciencies as compared to existing methods.


Introduction
Millimeter Wave (mmWave) frequencies constitute a major component of SA 5G networks for high data rates support in enhanced mobile broadband (eMBB). One key advantage here is the contiguous available spectrum at these bands. However, the aggregated path losses impose the use of beamforming techniques to achieve higher link gains. This results in directional transmission and reception modes, which yields prolonged access times. Now the International Mobile Telecommunications (IMT) framework specifies 10 millisecond (ms) latency levels for eMBB in 5G systems [1]. Hence, a major challenge here is to provide fast access schemes that feature ultralow times, along with reduced power and energy consumption levels. Additionally, these access schemes need to consider channel fluctuations and variations in link status as a function of blockage, as well as mobility effects.
Currently, conventional schemes dictate that the MS and BS perform spatial search over all directions, in order to determine the best beamforming and combining vectors with the highest received signal level. For example, work in [2] proposes a hierarchical codebook for iterative search that uses wide beams in the initial search stages, then refinement is conducted in subsequent stages using narrow beams. However, this technique can suffer from reduced directivity, outages and sensitivity to blockage due to the low gains achieved in the initial codebook stage. Moreover, work in [3,4] uses metaheuristics in efforts to accelerate the access times and reduce energy consumption, e.g., generalized pattern search and Hooke Jeeves methods. The work in [5] exploits the sidelobe information to retrieve the direction of the main lobe. However, this scheme is limited to line-of-sight (LoS) and singleray channel. Moreover, work in [6] exploits grating lobes for simultaneous transmission to increase directivity. However, this scheme features a complex beamforming structure with a large number of antennas and high-power requirements.
Furthermore, the geolocation-assisted access scheme in [7] utilizes global positioning system (GPS) at the MS to determine the BS location. However, this context-based scheme is limited to outdoor settings with permanent GPS connectivity, and it requires the BS to conduct exhaustive beam search to allocate the MS. The work in [8] proposes a single RF chain architecture for multi-user that uses downlink (DL)-uplink (UL) and DL-DL beamtraining techniques. A subset beam group is trained here in a single time slot. However, the use of a single RF chain at the BS limits the number of connected MS and scalability. Finally, the work in [9] proposes a subarray-cooperation multiresolution codebook design. It features a beam alignment scheme that adaptively selects initial layers based on various simultaneous signal-to-noise (SNR) levels. Hence, it quickly aligns the desired beam pairs under single dominant path channels by using hybrid beamforming. Overall, the aforementioned schemes still yield great computational complexity, prolonged access times and high power and energy requirements.
Various studies have used deep learning to solve the problem of beam management for mmWave communication systems. First, the authors of [10] use deep neural network (DNN) to predict the beam direction at the BS, while implementing an omnidirectional antenna at the MS. The work only studies the accuracy of the DNN algorithm using 24 beams at the BS and aims to outperform the exhaustive beam mechanism. Additionally, the work in [10] considers omnidirectional MS and directional BS. However, the omni-directional mode at the MS presents various challenges in terms of signal quality and throughput (requires further investigation). By contrast, the proposed work in this paper uses 64 beams and considers system performance including access times, power and energy consumption. Furthermore, results are compared to the fastest beam access schemes reported in the literature. In addition, this paper proposes beamforming models at the MS and BS, which then uses the deep learning network to study comprehensive performance metrics.
Furthermore, deep-learning-based beam selection is proposed in [11] to reduce the time overhead by exploiting sub-6 GHz channel information. A DNN algorithm is used to estimate the power delay profile (PDP) of a sub-6 GHz channel, then acting an input of the DNN. Overall, this work relies on the support of sub-6 GHz connections, thus limiting the ability of mmWave networks to operate separately. This becomes efficient for a 5G new radio network operating at FR2. Moreover, the work assumes that the sub-6 GHz link is already established, which makes the access time incomplete, i.e., it is required here to study the time complexity for beam association once a MS joins a network until the start of the data-plane. There is also a lack of comprehensive beamforming designs at the MS and BS, where it is limited to a conventional discrete Fourier transform (DFT)-based codebook. Similarly, the work in [12] also relies on sub-6 GHz channel vectors for initial beam access and blockage in mmWave systems. As opposed to the methodology of [11] which extracts spatial channel characteristics at the sub-6 GHz band and then uses them to reduce the mmWave beam training overhead, here mapping functions are predicted directly from the sub-6 GHz channel. Specifically, the model leverages transfer learning to reduce the learning time overhead. However, the estimation of the mapping functions is often complicated and requires large neural network to achieve accuracy. In addition, the work again relies on sub-6 GHz bands to realize the beam access at mmWave. Namely, dual-band (microwave and mmWave transceivers) systems are needed at the BS and MS. Here the power consumption analysis and access time need to be further investigated.
For mmWave vehicular communication, the authors od [13] propose a beam alignment procedure based on fingerprinting approach, over which a set of beam pairs constitute the fingerprint of a given location. Deep learning is deployed at the BS to adapt and update these fingerprints. Moreover, a plurality mechanism is proposed for the beams that meet the received signal strength, i.e., to achieve multiplexing and diversity gains. The outcomes aim to improve the fidelity as compared to exhaustive beam search fingerprinting without deep learning.
Another use for DNN is for beam management and interference coordination in indoor dense mmWave networks for IEEE 802.11ay networks in [14] that optimizes the beam directions, beamwidths and transmit power. The goal is to reduce the computational complexity and time, while obtaining comparable sum-rate to conventional methods. However, it uses a beamforming training mechanism to establish the directional links between mobile access point (MAB) and stationary access point (SAP), which is used subsequently to generate training data for DNN to mitigate interference. Therefore, the deep learning network is not used here for initial access. Moreover, the implementation is limited to indoor wireless local area networks (WLAN) and not applied for outdoor settings at larger separation distances.
The authors of [15] describe a specific dataset for beam selection techniques on vehicleto-infrastructure using millimeter waves. A methodology for channel data generation in mmWave multiple-input multiple-output (MIMO) scenarios is presented that aims to simplify creating data in mobility scenarios by invoking a traffic simulator and a ray-tracing simulator. However, the context here is different and unrelated to initial access between MS and BS cells. Namely, the propagation channel dataset developed here by raytracing is specific for vehicle-to-infrastructure mmWave networks. Overall, the work in [15] focuses on modeling mobility only and lacks the analysis of the downlink performance. By contrast, the proposed work in this paper focuses on standalone mmWave networks, considering link performance with beamforming architectures at the BS and MS.
Furthermore, deep learning is also used for beam training in [16] for a mmWave massive MIMO system. The nonlinear properties of channel power leakage are used in the estimation process, where DNN to predict the best beam combination that yields the strongest channel path based on the probability vector in efforts to improve the successful and achievable rates at lower overhead. However, this work lacks latency and power models, as well as comprehensive beamforming modeling at the BS/MS. Moreover, the work in [17] presents a beam alignment technique with partial beams using neural networks for multi-user mmWave massive MIMO system in efforts to improve the spectral efficiency at reduced training overhead, as compared to hierarchical search and compressed sensing methods. Offline training is conducted on the channel model, after which online prediction is achieved for beam distribution vector using partial beams. Here the obtained dominant indices from the beam distribution vector are used to align the beams for the multi-user. The work in [18] combines machine learning and situational awareness to learn the power and optimal beam index, during which the angles of arrival (AoAs) are first estimated based on the location and then this information is used as input to the neural network for beam selection. However, the requirement for user location information (prior knowledge) here for training weakens the proposed algorithm and adds to the system complexity.
A joint beamforming approach between distributed BSs is developed in [19] that deploys machine learning to simultaneously serve a mobile MS. The latter transmits a single uplink training sequence to the participating BSs using omni or quasi-omni beam patterns to develop a pattern for the location signatures. The signatures are then deployed at the deep learning stage to estimate the beamforming vectors at these BSs, thus reducing the training overhead. The limitation of this work is the use of wide beams (omni or quasi-omni), which makes it inefficient in blockage scenarios during the user mobility. These wide beams also yield low channel gains and throughput levels. Out-ofband information is likewise used in [20] for deep learning beam prediction scheme to minimize the training complexity. Namely, a dual-band (sub-6 GHz and mmWave links) approach is implemented, where the optimal beam in mmWave band is estimated from sub-6 GHz channel state information (CSI). The work focusses on testing the network accuracy without investigating the beamforming and channel models. One limitation here is the assumption of similar spatial features between the channels of the two bands, which is an inaccurate assumption.
Overall, existing deep-learning-based beam access schemes (summarized in Table 1) still lack key operating assumptions that are in conflict with the objectives of the FR2 NR of 5G systems. First and foremost, some models assume omni-directional mode at the MS, where the beam discovery is limited to the BS. Others are dependent on the multiple BSs, the MS location and sub-6 GHz bands, and thus fail to operate mmWave as a standalone network. Other limitations include indoor implementation and marginal enhancement to existing conventional methods (e.g., beam sweeping and exhaustive searches). These models overall lack time delay and power consumption models in the control plane and hence work is needed to investigate the delay in standalone beamforming-based mmWave networks. In light of the above, this paper proposes a first use of a deep learning network model for initial beam access in mmWave communications, with the goal to develop one of the fastest beam access schemes. The model operates in learning and training modes, which aims to predict the best beam index over subsequent time steps, where these indices are affiliated with specific beamforming and combining vectors.
The key practical application for the proposed work is enhancing mmWave networks as part of the 5G FR2 New Radio. Current 5G implementation relies on conventional sub-6 GHz and leverages mmWave bands as a supplementary component, e.g., dual bands and carrier aggregation. However, this is projected only in the first phase of 5G, where the mmWave bands are expected to work independently starting in 2022-2024, contingent upon the development of mature technologies and optimization to support the targeted throughout and latencies. Hence, the mmWave bands are projected to provide standalone service without dependency on microwave (legacy) bands. Thus, the beamforming capability enhances the channel quality and throughput for the user. Furthermore, the deep learning algorithm reduces the access times and control-plane latencies, which meets the ultra-low delays, as defined by the 3GPP targeted at 1 ms. This improves the quality of service (QoS) and enables the implementation of mmWave standalone networks.
Moreover, the technique can be adopted in wireless local area networks (WLAN) as part of the IEEE 802.11ay standard and mmWave links for vehicle-to-everything (V2X) after adding the mobility component. Moreover, the deep learning algorithm will enable the use of highly directional beams without the reliance on wide-beam codebooks; this in turn eliminates the vulnerability of bean-blockage due to low directivity. Namely, the MS will be able to use narrow beams when transiting from sleep (off) mode to idle or active mode in the control plane, thus helping to reduce the beam search time. As a result, high data rates can be supported here, i.e., leveraging the high channel capacities and aggregated antenna gains. This paper is organized as follows. Section 2 presents the beamforming, signals and channel models. Then the beam access scheme is proposed in Section 3. Performance evaluation is presented in Section 4, along with conclusions in Section 5.

System Model
where a n , k, and δ MS denote the amplitude of the n-th antenna at the MS, wavenumber, i.e., k = 2π/λ, and the progressive phase shift between the elements at the MS, respectively. Furthermore, Θ MS i is specified by Next, the half-power beamwidth (HPBW) for each beam is given by [21], (3)

Digital Beamforming Model at the BS
Digital beamforming architectures are used at the BS due to the abundant input power. This is also necessary to support multi-user connectivity. Hence, consider a BS equipped with a ULA composed of N BS antennas. In contrast to the MS design, each n BS antenna here is connected to one RF chain, r BS . Note that the total number of antennas is equal to the number of RF chains and the number of transmit data streams D tran , i.e., N BS = R BS = D tran . Additionally, the overall radiated pattern from r BS is represented by a beamforming vector, p BS , in the beamforming matrix, p BS = p bb , where p bb represents the baseband beamforming stage, i.e., p bb ∈ C N BS D tran .

Signal Model
Consider a MS using the aforementioned analog beamformer, and which communicates with a BS in data-plane at d (meters) separation distance. The MS uses its primary beam b MS i for initial access procedure (e.g., iterative search). The DL signal, y MS , at the l-th path after the RF stage at the MS is as where pr MS and (.) H denote the average received power and the Hermitian matrix, respectively. Here the combining vector v MS is at Θ MS i pointing direction, and the beamforming vector p BS is at Θ BS i pointing direction. Furthermore, Z is the control signal that carries the synchronization information. Finally, w denotes the additive white Gaussian noise (AWGN), i.e., w~N(0, σ 2 w ), with σ 2 w variance. Finally, H is the channel between MS and BS, specified by the geometric model. This is attributed to the small wavelength at mmWave bands, which results in high dependence on the geometry of the objects in the propagation channel, i.e., where Γ bl and h l in order denote the blockage loss and the gain of the l-th path, for L paths received in K clusters.

Beam Prediction Access Scheme
The key processing elements of the proposed BRNN-LSTM deep learning model for beam prediction are now presented. First, unidirectional RNN updates hidden layers based upon information received from the input layer as well as the activation state. However, a limitation of unidirectional RNN is learning from past only. Hence, a bidirectional approach is adopted in this work to improve RNN. In this merge, one direction learns the past state, whereas the other learns the future state, then the two outputs are combined for an enhanced estimate. Therefore, the bidirectional feature enables LSTM to train each input sequence in disjoint forwards and backward states that are subsequently connected to the same output layer. This strengthens LSTM to retrieve additional beam index contextual information as compared to the conventional LSTM method. The process at the backward and forward states are similar at each of the bidirectional units.
Another limitation for RNN networks during the training process is the gradient vanishing problem for long data sequences. Hence LSTM networks are adopted to solve this problem by introducing memory blocks (units) that are comprised of self-connected memory cells and multiplicative gate, thus enabling the learning of long term dependencies. Therefore, the work here combines the saliences of past/future (backward and forward) state information at the BRNN with the powerful memory blocks for extended training periods and more information in LSTM, to improve the quality and accuracy of the beam prediction problem. Furthermore, four BRNN-LSTM layers are stacked (chained) to achieve higher precision. The proposed BRNN-LSTM method has three phases, i.e., input, hidden layers and output, where each hidden layer is represented by a bidirectional LSTM cell. Along these lines, the prediction scheme combines BRNN and LSTM to achieve a suitable solution for time-series prediction of variable sequences lengths, i.e., duplicating the training on the input sequences (information from dataset) by leveraging forward and backward states. The architecture for the proposed scheme is presented next.

Network Architecture
The network architecture for the proposed scheme is presented in Figure 2. It is composed of input sequences, four BRNN-LSTM layers, where each layer is composed of 50 cells (neurons). Each LSTM cell in the BRNN-LSTM model is composed of an input g in t , input modulation g mod t , forget g f t and output g out t gates that determine information entering the cell state, see Figure 3. The output of the last BRNN-LSTM layer is fed as the input of the dense layer (as per Figure 2), which is composed of linear activation function. The output of the dense layer presents the output of the proposed scheme, which is the beam index prediction at time step t + 1. The process over which this prediction process is achieved is now presented.

Operating Modes
The network operates in two modes, i.e., the learning (Mode I) and training (Mode II). Learning Mode (Mode I): The network here operates in normal mode, where beam scanning is performed at the MS and BS using conventional schemes, e.g., codebook-based iterative search. Namely, once a MS transits from sleep to active mode and joins the mmWave network, then search is conducted over all beamforming and combining vectors to determine the best beam index and its affiliated direction at time step t, i.e., yielding the highest signal level. In notations, Thereafter, the BS and MS will feed the best beam index at every time index (step) for use in the training mode. After the model is trained well, the MS and BS leverage it to predict the next best beam used, as presented next in Mode II.
Training Mode (Mode II): Given the sequences of beam indices with the highest selection over time step t, retrieved from the dataset, the MS and BS now predict the next most likely beam to be used at time step t + 1. Namely, the prediction scheme leverages parametric information of previous time steps (periods) and then labels the next to predict the beam index that returns the highest signal. The BRNN-LSTM scheme here recursively processes beam sequences at every time step of the input. It then maintains a hidden state which is a function of the previous state and the current input.
Problem Formulation: Letŝ t be the prediction status at time step t. Hence, the beam access problem is defined as a prediction of the best beam direction at time step t + 1, given the status at time step t. Thus, the goal is to maximize the probability of successful beam prediction at the BS using the proposed BRNN-LSTM deep learning model.

BRNN-LSTM Deep Learning Model
The beam prediction algorithm relies on the BRNN-LSTM network in two stages. First, the outer processing stage that is between the layers, and the input state inside the LSTM cell. For the outer stage between the layers, first the input data is processed by both the is activated with a tanh activation function with a [−1, 1] range that allows the cell state to forget memory. Overall, the training settings for the model include four layers and one dense layer, as presented in Figure 2.
The dropout layer is used to control the weight of the hidden layers, where the dropout regularization rate in each layer used as a regularizer is set at 0.2. The model is trained with 350 epoches over a period of two weeks. A data structure is created with 60 time steps, each of 10 min, and a single output is created, since LSTM cells store long-term memory state. Hence, in each training stage, there are 60 previous training set elements for each taken sample. Consequently, in the testing stage, the first 60 samples are needed for an accurate estimate of the subsequent best beam index. Overall, the training objective is to compute weight matrices, and bias vectors that minimize the loss function for all training time steps, as shown next. See Table 2 for the parametric settings chosen for the layers in the BRNN-LSTM model. Dataset: The dataset used in this paper is part of the BigData Challenge in [23] recorded over the period of two weeks. Namely, this dataset revealed MSs traffic volumes and used beam indices in sectorized geographical grids.

Simulation Results and Performance Evaluation
The proposed BRNN-LSTM prediction scheme is simulated in Figure 4 over various time steps, which shows high approximation between the ground truth and the prediction pattern. This shows that the proposed scheme achieves high accuracy. Thereby, it can successfully predict the subsequent beam indices with high success probabilities, once the network is sufficiently trained. Moreover, the accuracy of the proposed scheme is further studied by computing the loss function as presented next. Loss Function: The training objective aims to reduce the loss function, i.e., mean square error (MSE) between the prediction vector of the proposed modelŶ and the actual ground truth Y at the upcoming time step generated from a sample of U data points on all variables. This function is evaluated at every time step t as The reduced loss functions are computed over 350 Epochs, as depicted in Figure 5, which shows that the proposed scheme yields high accuracy and success probability. This in turn highly impacts the beam access procedure. This is shown next, by evaluating the proposed scheme for key metrics in beam access versus major existing schemes, i.e., access times and energy consumption.

Access Times
The initial beam access time at the MS T MS acc (likewise at BS) is defined as the duration required to determine the best beam index that returns the highest signal level, i.e., T MS acc = τ acc D RS /R MS (13) where τ acc , D RS and R MS are the number of time slots occupied during control signals exchanged between MS and BS, the reference signal duration, and number of RF chains at the MS. Figure 6 shows that the proposed scheme yields significant reduction in access times versus existing schemes. Namely, 0.2 ms are required to acquire a pencil beam (5.5 • ) when using 64 beamforming vectors. This is compared to 0.8 ms for the grating lobes, 4.8 ms for the sidelobes, 7.6 ms for the metaheuristics, 9 ms for the iterative search, 12.8 ms for the subarray and GPS, and 25 ms DL-UL beam-training schemes, respectively.

Energy Consumption
The energy consumption is measured at the MS in Figure 7, and it is defined as the power consumption (in microjoules) during the beam access time interval. It is given by where Q ABF MS is the power consumption in the ABF at the MS, where q n , q PS , q LNA , q RF , q ADC , q BB , q M , q LO , q LPF , q AMP are the power consumption values for a single microstrip antenna, the phase shifter (PS), low-noise amplifier (LNA), RF chain, ADC, baseband combiner (BB), mixer (M), local oscillator (LO), low-pass filter (LPF) and the baseband amplifier (AMP), respectively. Additionally, the terms E step ADC , Sr ADC and B are the energy consumption per conversion in the ADC, sampling rate and number of bits, respectively [7]. The power consumption values (in milliwatts) for these components are listed in Table 3, recorded from studies in [24].

Conclusions
The energy consumption levels in Figure 7 show that the proposed scheme yields very low energy requirements compared to other schemes. The energy efficiency here is attributed to the reduced time using the RF chains. For example, 4.9 millijoulles are required to achieve initial access for the deep learning scheme. Meanwhile, the grating lobes, sidelobes, metaheuristics, iterative search, GPS and subarray, and DL-UL schemes consume 16, 31.25, 47, 62.5, 84 and 166 microjoules, respectively. Overall, the proposed scheme achieves 75% higher energy efficiency and faster access times compared to the closest scheme, i.e., grating lobes approach.
In this paper, a novel initial access scheme is proposed for standalone millimeter wave communications using deep learning models. The network operates in learning and training modes, where the best beam index in the subsequent time step is predicted without the requirement for beam scanning. Hence, the scheme features ultralow access times, along with efficient power and energy consumption levels versus existing schemes. Future efforts will investigate the learning model while taking into account outage effects caused by blockage and mobility.