Hybrid Precoding Algorithm for Millimeter-Wave Massive MIMO-NOMA Systems

: In this paper, the performance of the millimeter-wave (mmWave) massive multiple-input multiple-output (mMIMO) non-orthogonal multiple access (NOMA) systems is investigated under multiple user scenarios. The performance of the system has been analyzed in terms of spectral efﬁciency (SE), energy efﬁciency (EE), and computational complexity. In the case of the mMIMO system, the linear precoder with matrix inversion becomes less efﬁcient due to its high computational complexity. Therefore, the design of a low-complex hybrid precoder (HP) is the main aim of this paper. Here, the authors have proposed a symmetric successive over-relaxation (SSOR) complex regularized zero-forcing (CRZF) linear precoder. Through simulation, this paper demon-strates that the proposed SSOR-CRZF-HP performs better than the conventional linear precoder with reduced complexity.


Introduction
Future generation communication systems are looking for high-speed and low-latency characteristics in the system. To fulfill this goal, millimeter-wave (mmWave) communication combined with massive multiple-input multiple-output (mMIMO) plays a significant role [1]. The mmWave-mMIMO system enhanced the spectral efficiency (SE) and energy efficiency (EE) of the system tremendously by exploiting huge available bandwidth in the mmWave frequency bands and high multiplexing gains. Because of the potential of the mmWave-mMIMO system, it is considered to be a promising solution for the future generation of wireless communication systems [2]. Generally, three architectures of mmWave mMIMO systems are thoroughly investigated. These are commonly known as, fully digital (FD) architecture, fully connected (FC) architecture, and sub-connected (SC) architecture. For FD architecture, each antenna is associated with a dedicated RF chain. For such a large antenna system, a conventional MIMO system with fully digital signal processing makes the system unrealizable as the cost, energy consumption and complexity of the system become unaffordable [3][4][5]. To deal with this issue, hybrid precoding (HP) has precoders. The performances have been analyzed in terms of SE, EE, and computational complexity. The proposed SSOR-SRZF precoder exhibits better performance in comparison to MRT, ZF, RZF, TPE, and SSOR precoders. Whereas it provides a marginal improvement against the CRZF with reduced computational complexity.
This paper provides an assessment of the effectiveness of the SSOR-CRZF precoder for enhancing the SS and EE of the mmWave-mMIMO-NOMA system with reduced complexity in comparison to the ZF precoder. The main contribution of this work can be briefly summarized as follows • In a mMIMO system, reduction in the computational complexity is an important aspect. In this paper, the authors have proposed a SSOR-CRZF procoder to improve the mMIMO-NOMA system performance. • The performance of the proposed SSOR-CRZF precoder is compared with MRT, ZF, RZF, TPE, SSOR, CRZF precoders in terms of SE, EE, and computational complexity. • Demonstrate that the proposed algorithm significantly enhances the system performance than the conventional linear precoders. • Demonstrate that the proposed algorithm improves the system performance marginally in comparison with the CRZF algorithm and provides additional benefits of low computational complexity.
The rest of this paper is structured as follows. In Section 2, a system model is presented. Section 3 represents the complexity comparison between the proposed and conventional precoders. In Section 4, various algorithms are evaluated through the simulation and comparative analysis has been carried out. The conclusion is given in Section 5.
Notations: C denotes the complex field. E[·] denotes the expectation. · 2 denotes the l 2 vector norm. |·| is used for absolute value. Here, F r f i , F r f :,j and F r f i,j denote the ith row, the jth column, and the entry in the ith row and the jth column of F r f , respectively. Here, h T , h H , and h −1 denote the transpose, Hermitian transpose and inverse of h. All the necessary acronyms are defined in Table 1.

System Model
In this paper, the authors have considered a multiuser mmWave-MIMO-NOMA system under a downlink scenario as in Figure 1. All the necessary symbols for mathematical formulation are presented in Table 2. Here the base station (BS) is equipped with N t transmit antennas. In this paper, N r f RF chains are used to support K single antenna UEs [33,34]. The proposed system supports the users by exploiting the spatial diversity with N r f ≤ K < N t . Here, the sub-connected (SC) structure is proposed for the mmWave-MIMO system. We also assume that (M = N t /N r f ) M antennas are connected to each RF chains. In HP, B n , the number of beams produced cannot exceed N r f [33]. In this paper, it is assumed that B n = N r f . In the case of an HP-aided MIMO system, each beam can be utilized to support a single user but, by exploiting NOMA, each beam can support multiple users. Thus, for such a system, the K UEs can be supported through B n clusters corresponding to the number of beams.   Channel matrix corresponding to Kth UE in bth beam.

RF
Complex gain of the lth path for the kth UE in the bth beam.
Azimuth angle of AOD of the lth path for the kth UE in the bth beam.
Elevation angleof AOD of the lth path for the kth UE in the bth beam.

a(θ)
Array response corresponding to N t no. of altenna elements. P Power Allocation Matrix. S Transmitter signal vector. Achievable rate at the kth user in the bth beam.

R sum
Achievable sum rate of the system.
In this structure the UEs data streams are passed through the baseband digital precoder be the transmitted signal towards the kth UE through the bth beam. As a part of the NOMA transmission protocol, BS utilized superposition coding for the successful transmission of multiple users information simultaneously. It is worth noting that the total power P is distributed equally over all the K UEs. Successive interference cancellation (SIC) is used at the receiver side to extract the information. The received baseband signal y b,k at the kth UE in the bth beam [k = 1, . . . , |S b |] is written as Inter-beam Interference where F RF ∈ C N t ×N r f is the analog precoder matrix, F BB = [f BB 1 , . . . , f BB B n ] ∈ C N r f ×B n is the digital precoder matrix, h b,k ∈ C N t ×1 is the channel vector, P = diag{p 1 , . . . , p B n } corresponding to the power allocation matrix, the transmitted signal vector is represented by s = s 1,1 , . . . , s 1,|S b | , . . . , s B n ,1 , . . . , s B n ,|S b | T ∈ C K×1 , p b,k denotes the transmit power for the kth UE in the bth beam, and n b,k ∼ CN (0, σ 2 n ) represents the additive white Gaussian noise (AWGN) with zero mean and σ 2 n variance added at the kth UE in the bth beam. At the UE end, SIC will be utilized to detect the interference free signal.

Channel Model
The millimeter-wave channel can be characterized by high path loss in free space, limited spatial selectivity, and a highly correlated channel. This paper adopts the geometric extended Saleh-Valenzuela model [35]. The channel matrix h b,k corresponding to kth UE in the bth beam can be expressed as where β b,k is the complex gain of the lth path for the kth UE in the bth beam. It is considered to be with complex Gaussian zero mean and unity variance. Here, θ l b,k and φ l b,k represent the azimuth and elevation angle of departure (AOD) of the lthpath. Here a R θ l b,k , φ l b,k and a T θ l b,k , φ l b,k represent normalized receive and transmit array response vectors. Assuming an uniform linear array (ULA) with N t number of elements, the array response vector a(θ) can be expressed as

Sum Rate
Here it is considered that the inter-element distance d = λ/2 and = 2π/λ. As the array responds in elevation, the direction is invariant, therefore, φ is not considered in Equation (4). It is also assumed that both the BS and UEs have the perfect and instantaneous CSI and also the receivers are perfectly synchronized w.r.t time and frequency.
As mentioned, in this mmWave-mMIMO-NOMA system, at the receiver side SIC is performed to extract the desired information. That means by exploiting SIC, the kth user in the bth beam can effectively cancel out the interference from the dth user (∀j > b). Therefore, the received signal at the kth user in the bth beam can be expressed aŝ where the effective channel vectors is denoted byh Accordingly the signal to interference plus noise ratio (SINR) (γ b,k ) for the kth UE in the bth beam can be expressed as where Therefore, the achievable rate at the kth user in the bth beam is The achievable sum rate of the system is given by The sum rate can be improved by suitable design analog and digital precoder.

User Grouping
In this system, the number of UEs served (K) is greater than the N r f , B n = N r f and channel matrix H = [h 1 , . . . , h K ]. Thus, it is required to allocate K UEs into B n groups. To serve the users, this paper considered the modified K-Means user grouping algorithm for the proposed mmWave-NOMA system. Based on the normalized channel correlation among user channels, the algorithm forms user groups. At the initial stage, one representative UE is selected corresponding to each beam by minimizing the normalized channel correlation among the beam selected representatives. To minimize the inter beam interference, UEs are grouped into different beams based on the channel correlation. The modified K-Means algorithm [34] is presented in Algorithm 1. In traditional K-means user grouping algorithm [33], the cluster heads are selected randomly. The distinct advantage of Algorithm 1 is basically its selection criterion for the cluster heads (Step 10-15). In this algorithm, the optimal representative (Step 11) is chosen by considering the minimum channel correlation among the chosen representatives. Thereafter, the UEs that belong to highly correlated channels are assigned to the same beam to minimize intra-beam interference.

Algorithm 1 Modified K-Means User Grouping Algorithm
] ; Normalized channel gains for each user On the other hand, the UEs belonging to low correlated channels are assigned to different beams to minimize inter-beam interference.

Hybrid Analog-Digital Precoder
The main aim of this paper is to maximize the sum-rate [as in (9)] by jointly solving the power allocation, digital and analog RF precoder optimizing problem. The problem statement can be expressed as Here, the constraints (10b) ensures that the kth user in the bth beam must attain the minimum desirable data rate. The constraints (10c) and (10d) represent that the power transmitted by the BS for each UE must be positive and total power can not exceed the limit P (transmitted power constraint). In the case of analog precoder, the non-zero elements in the precoding matrices are realized by the phase shifter and it satisfies the constantmodulus constraint as in (10e). The constraint (10f) represents the unit power constraint for the HP matrix. Looking at (10f), it can be seen that an optimized digital precoder for each beam is required. As the optimization problem turns out to be a non-convex optimization problem, therefore it is difficult to obtain a globally optimal solution.
The channel capacity of the UEs can be improved by reducing inter-beam interference and also by improving the effective channel gain. The HP scheme is capable to achieve the full potential of the mmWave-mMIMO system with reduced hardware constraints. Motivated by the works presented in [33,34,36], it is considered to discuss the analog RF precoder and digital baseband precoder separately. The authors have implemented an efficient analog RF precoding algorithm (for F r f ) as in [34]. In this paper, the authors have proposed a low-dimensional digital baseband precoding algorithm (for F BB ) and compared its performance with existing digital precoders.

Analog Precoder
The main motive of the analog precoder is to orient the phases of H = [h 1 , . . . , h K ] to produce a large array gain by exploiting a large number of antennas in mMIMO system. The analog precoding algorithm is presented in Algorithm 2. As in [33], quantized phase shifters are used for the analog precoder. In this paper, the authors have considered both FC and SC structures. The non-zero elements corresponding to the FC analog precoder (AP f ull ) matrix belong to Similarly, for SC structure, the non-zero elements corresponding to the analog precoder (AP sub ) matrix belong to The analog precoding matrix (AP f ull /(AP sub ) ) can be designed by maximizing the array gain. In other words, analog precoding matrix can be obtained by considering the channel matrix corresponding to users in cluster (Γ B n ). Therefore, the array gain for FC and SC structure can be expressed as H Hā p f ull b 2 and H Hā p sub b 2 , respectively. Here,H represents the aggregate downlink channel from the BS to UEs corresponding to bth beam. Thus, the analog precoding matrix for FC structure can be expressed as, where θ = arg min n ∈ 0, 1, . . . , Here, Θ = ∠H, is the phases corresponding to the aggregated channel matrixH and m = 1, 2, . . . , N t . Similarly, for SC structure, the analog precoding matrix can be expressed as,ā where m = (b − 1)M + 1, (b − 1)M + 2, . . . , gM, and θ as in (14).

Algorithm 2 Analog Precoder
The Algorithm 2 is addressing the analog precoder for FC structure (F r f = AP f ull ). The same can be extended for SC structure (F r f = AP sub ) with the necessary changes for implementing Equation (15). The combination of analog and digital precoder maximize the achievable sum rate by mitigating the interference.

Digital Precoder
A brief review related to the commonly used linear precoders is discussed. According to [11], the conventional ZF precoding (F BB ZF ) matrix can be expressed as where, β ZF denotes the power normalization factor that can be defined as β ZF = K tr(HH H ) −1 . Similarly, for RZF the precoding matrix can be expressed as whereα is the regularization parameter and it is predefined during the transmission. I is a N r f × K identity matrix and its dimension is chosen in accordance with the hybrid precoder design as in this paper. As in [11], RZF precoder is independent of the power allocation to the UEs and also this precoder maintains a constant value ofα, regardless of any changes in the noise power σ 2 n . As in Equations (16) and (17), the precoders requires the matrix inversion and therefore the complexity is of the order of O(K 3 ).
Recently in [12], authors have proposed CRZF precoding scheme to enhance the system performance. The precoding matrix for CRZF can be expressed as where α is a complex valued regularization parameter and it is due to the complex nature of the AWGN. As in [12], α can be estimated from the covariance matrix of the AWGN and can be expressed as R −1 η /γ ∆ = αI. In this context, R −1 η can be expressed as R −1 η = E[ηη H ] = E(η real + jη imag )(η real + jη imag ) H and β CRZF = γ −1 . As ℵ is a complex term and can be expressed as α = α r + jα im . Here, α r . Therefore, it is very clear that for α im = 0, the regularization parameter is a real valued and the CRZF scheme reduces to the RZF.
Proposed Method (SSOR-CRZF): As investigated the CRZF significantly improves the system performance in comparison to conventional ZF and RZF precoders, however, like ZF and RZF, it also involves matrix inversion. In the case of the MIMO system, such computation becomes practically impossible to realize. In this paper, the authors have proposed an SSOR-CRZF digital precoder for the system as in Figure 1. The proposed Algorithm 3 utilizes the iterative SSOR method to form a CRZF matrix without any matrix inversion. In this proposed HP scheme, after the design of the analog precoder (F r f ), a low-dimensional baseband digital (SSOR-CRZF) precoder is implemented considering the effective channel (H). As in Algorithm 3, the precoding algorithms begin with the calculation of the CRZF filtering matrix (P) and it can be expressed as follows, However, it is required to take the pseudo-inverse and this leads to an increase in the computational complexity. In order to reduce the complexity, authors in [30,32] have proposed SSOR-based precoding by exploiting the asymptotical orthogonality property of the wireless channel in massive MIMO. Now as in [30], for CRZF precoder, the transmitted signal (x) The main motivation of the SSOR method is to achieve the precoder matrix t without having any matrix inversion (P −1 ). As the initial step in the SSOR method, it decomposes the matrix P and can be expressed as P = D + L + L H , where, D, L and L H represent the diagonal matrix, lower triangular matrix and the upper triangular matrix of P, respectively. The iteration in the SSOR method can be carried out by utilizing the following steps: step 1: Compute the forward first half iteration by step 2: Compute the reverse second half iteration by where i represents the number of iteration and ω is the relaxation parameter. As in [30], the optimal ω can be calculated by ω = 2/(1 + sqrt((1 − a 2 ))) where a = [1 + sqrt(Nr/Nt)] 2 − 1].
It is very clear that once the massive MIMO configuration is fixed the relaxation parameter ω also become fixed. The required vector t can be obtained after several iterations as mentioned in Equations (21) and (22). Thus, the desired precoding matrix cab be obtained by multiplying vector t with β CRZFH H .Thus, it is clear that the computationally complex matrix inversion can be achieved through iterative methods. 8 [Nr, Nt] = size(P) 9 a = [1 + sqrt(Nr/Nt)] 2 − 1; ω = 2/(1 + sqrt((1 − a 2 ))); relaxation parameter 10 Symbol = eye(Nr, Nr) 11 s0 = zeros(Nr, length(Symbol)) 12 i_iter = 1 13 s2 = s0; 14  In this paper, as a part of hybrid precoder design, analog RF precoder (F r f ) is designed first and low dimensional digital precoder (SSOR-ZF) is implemented based on the effective channel matrix (H). Step 3 is for generating the CRZF filtering matrix and steps 5 to 19 are for the SSOR process to obtain the desired SSOR-CRZF precoder (F BB ). Finally, using steps 23 to 27, after N r f iterations the base-band digital precoder (F BB ) is designed.

Computational Complexity
In this section, the computational complexity of the SSOR-CRZF precoder and some existing precoders are analyzed [as in Table 3]. As in case of ZF, RZF, CRZF and SSOR-ZF, it is required to compute HH H , and also it is possible to compute the complexity beyond HH H . Therfore, in case of ZF [as per Equation (16)], the computational complexity can be expressed as O(2(N r f ) 3 + (N r f ) 2 ). Similarly, for RZF [as per Equation (17)], the complexity can be expressed as O(2(N r f ) 3 + 3(N r f ) 2 ).
In the proposed precoder, P = HH H + αI N r f ×N r f and therefore, for SSOR-CRZF precoder, it can be written [from Equation (21)].
where P nn (n = 1, . . . , N r f ) is the diagonal elements of P and the subscript n denotes the nth element in a vector. Equation (23) is basically responsible for the first section of the complexity and as in [32] the computational complexity is i2(N r f ) 2 after i iterations. Furthermore the multiplication H H t required additional computational complexity of (N r f ) 2 . Furthermore, as the last section, multiplication with β CRZF with H H t gives rise a computational complexity of (N r f ). Therefore, from the above analysis, it can be concluded that the overall computational complexity of the proposed SSOR-CRZF precoder in O(N r f + (N r f ) 2 + i2(N r f ) 2 ). Table 3. Computational complexity comparison.

Numerical and Simulation Results
To establish the superiority of the proposed SSOR-CRZF-aided hybrid precoding algorithm, this section presents the performance comparison of the proposed HP with the conventional ZF, RZF, MRT, TPE, SSOR, and CRZF precoders. Here in this work, a sub-connected (SC) structure for the hybrid precoding has been considered, and also it is considered that full/partial CSI information is available at the transmitter side. This section numerically evaluates the performance of the proposed HP in an mMIMO-NOMA system in terms of spectral efficiency, energy efficiency, and computational complexity of the proposed algorithm. As a part of the comparison, the authors have considered fully connected (FC), sub-connected structure (SC) of the precoding, and also fully digital (FD) system.
The simulation parameters are shown in Table 4. The MATLAB platform is used for the simulation. Here, the results are presented after taking an average of over 1000 random channel implementations.  Figure 2 represents the performance comparison between different precoders in terms of spectral efficiency as a function of SNR. The result under a mmWave mMIMO-NOMA system with N t = 16 and N r f = 4. As a part of the channel model, the authors have considered K = 6 users and L = 10 paths per user. The simulation results demonstrate that the proposed SSOR-CRZF performs much better than the other precoders like ZF, RZF, MRT, TPE, SSOR, and CRZF based on SC architecture. It is very much evident from the result that under SC structure SSOR-CRZF provides higher spectral efficiency at high SNR regions.  The impact of RF chains on the attainable spectral efficiency for N r f = [4,8] is investigated. It is clearly visible that there is a significant gain in spectral efficiency with N r f = 8 in comparison to N r f = 4. For example, at SNR = 10 dB with N r f = 4 and SSOR-CRZF precoder, the achievable spectral efficiency is 2.683 bps/Hz. Under the same condition, with N r f = 8 the achievable spectral efficiency is 6.29 bps/Hz.

Spectral Efficiency
In this paper, the authors have evaluated the performance of the proposed lowresolution HPs for the mmWave-mMIMO-NOMA system. For this performance analysis, it is assumed that the mmWave-mMIMO-NOMA ULA system with N t = 32 and K = 10 users. As a part of the HP design, it is assumed N r f = 4 RF chains and digital PS as an element of an analog precoder with (2,4) bits resolution. The proposed SSOR-CRZF technique for 4-bit resolution manifests better performance compared with its counterparts as presented in Figure 4. It is very much evident that the system performance improves with the increase in the PS resolution but high-resolution PSs are not warranted as it is associated with additional cost and complexity.

Energy Efficiency
In this section, the performance of mmWave-mMIMO-NOMA system is analysed in terms of energy efficiency (EE (bps/Hz/W)). As in [33], the EE can be expressed as EE = R sum P t +N r f P r f +N ps P ps +P bb , where, N ps is the total number of phase shifters. In case of FC, N ps = N t N r f and for SC, N ps = N t . For this analysis, it is considered that the maximum transmitted power, P t = 30 mW. It is considered to be the same for all precoder algorithms. The power is consumed by each RF chain, P r f = 300 mW. For the simulation, the selected parameters, P ps = 40 mW, P bb = 200 mW. Figure 5 shows the energy efficiency comparison between different precoders against the SNR variation. For this performance analysis, it is assumed that the mmWave mMIMO-NOMA system with N t = 16. Furthermore, also K = 6 users and L = 10 paths per user. As a part of the HP design, it is considered that N r f = 4 RF chains and digital PS is an element of an analog precoder with 4 bits resolution. As in the figure, it is very much obvious that in the case of a fully digital (FD) system the EE is worst compared with others. This is because, in the FD system, the number of RF chains is equal to the number of base station antennas and it gives rise to energy consumption. On the contrary, the number of RF chains is much small in the case of an SC system. The SC-HPs are more energy-efficient than the FC-HPs as a less number of PS is utilized in the SC-HP. As in Figure 5, the proposed SSOR-CRZF outperforms the existing schemes under consideration in terms of energy efficiency. For example, at SNR = 10 dB the energy efficiency for the proposed SC-SSOR-CRZF, SC-CRZF, SC-SSOR, SC-TPE, SC-RZF, and SC-ZF are 1.702 bps/Hz/W, 1.687 bps/Hz/W, 1.299 bps/Hz/W, 1.466bps/Hz/W, 1.604 bps/Hz/W, and 0.8193 bps/Hz/W, respectively.  Figure 6 shows the energy efficiency comparison between different precoders against the variation in the number of users. For this analysis, SNR is kept at 10 dB. For this performance analysis, the mmWave-mMIMO-NOMA system with N t = 32 is considered in this paper, and also L = 10 paths per user. As a part of the HP design, N r f = 4 RF chains and digital PS as an element of an analog precoder with 4 bits resolution are considered. It is very much clear that even if the number of users is high, the EE of the SC-HPs is always higher than the other schemes.  Figure 7 shows the energy efficiency comparison between difference precoders against the variation in the number of transmitting antennas (N t ). For this performance analysis, a mmWave-mMIMO-NOMA system with K = 10 users and L = 10 paths per user is considered. As a part of the HP design, we assume N r f = 8 RF chains and digital PS as an element of an analog precoder with 4 bits resolution. From Figure 7 there exists an optimal antenna array size to maximize the EE of the system under a fixed number of RF chains.

Impact of CSI
As far as the CSI is concerned, having the perfect knowledge of the channel state information is an ideal state of assumption, so it is always advisable to analyze the system performance under imperfect CSI conditions. The estimated channel (Ĥ) with the estimation error is modelled [37] Here, 0 ≤ t ≤ 1 presents the CSI accuracy and the error matrix E populated with the i.i.d distributed entries. Figure 8 shows the performance of the proposed algorithm SSOR-CRZF under imperfect channel conditions. As per the simulation results, the spectral efficiency of the mmWave-mMIMO-NOMA system with SC-SSOR-CRZF HP is relatively stable from the perfect channel state information. Furthermore, with t = 0.9 and t = 0.8, the performance of the algorithm does not decrease greatly.

Conclusions
This paper studied the performance of the mmWave-mMIMO-NOMA system with subconnected architecture and under multiple user scenarios. Here, the authors have utilized the modified K-Means user grouping algorithm and then implemented the hybrid precoder. In this paper, the authors have proposed and demonstrated the performance of the SSOR-CRZF-based hybrid precoder in terms of SE, EE, and computational complexity. The proposed SSOR-CRZF provides significant performance gain against conventional precoders like ZF, MRT, RZF, TPE, and SSOR. The SSOR-CRZF marginally improves the system performance but with reduced computation complexity in comparison to CRZF. As a part of future works, the authors would like to extend this work for energy harvesting applications.

Conflicts of Interest:
The authors declare no conflict of interest.