An Efficient Precoding Algorithm for mmWave Massive MIMO Systems

: Symmetrical precoding and algorithms play a vital role in the ﬁeld of wireless communications and cellular networks. This paper proposed a low-complexity hybrid precoding algorithm for mmWave massive multiple-input multiple-output (MIMO) systems. The traditional orthogonal matching pursuit (OMP) has a large complexity, as it requires matrix inversion and known candidate matrices. Therefore, we propose a bird swarm algorithm (BSA) based matrix-inversion bypass (MIB) OMP (BSAMIBOMP) algorithm which has the feature to quickly search the BSA global optimum value. It only directly ﬁnds the array response vector multiplied by the residual inner product, so it does not require the candidate’s matrices. Moreover, it deploys the Banachiewicz–Schur generalized inverse of the partitioned matrix to decompose the high-dimensional matrix into low-dimensional in order to avoid the need for a matrix inversion operation. The simulation results show that the proposed algorithm e ﬀ ectively improves the bit error rate (BER), spectral e ﬃ ciency (SE), complexity, and energy e ﬃ ciency of the mmWave massive MIMO system as compared with the existing OMP hybrid and SDRAltMin algorithm without any matrix inversion and known candidate matrix information requirement.

However, while the PSs are studied substantially, there still exist an inevitable gap compared with the performance of PSs. The authors in [22] proposed an analog PSs and spatial multiplexing by utilizing multiple RF chains connected to a fixed subset of antenna elements. This algorithm is based on analog beamforming and therefore, it lacks the ability to provide the required precoding performance for mmWave massive MIMO systems. In [23], for single-stream single-user MIMO-OFDM systems, a hybrid precoding is proposed to maximize the received signal strength.
Therefore, in view of the problems in [20][21][22], this proposed an efficient algorithm for effective precoding in mmWave massive MIMO systems.
The novel contributions of this paper are as follows: • This study deploys bird swarm algorithm (BSA) to eliminate the need for the known candidate matrices in mmWave channel estimation as in the traditional algorithms.

•
The proposed algorithm uses the characteristics of BSA with global search optimal (GSO) value to search for the largest array response vector multiplied by the residual matrix, and uses the Banachiewicz-Schur (BS) block matrix generalized inverse to transform the high-dimensional matrix into a low-dimensional matrix, avoiding matrix inversion and reducing the amount of calculation.

•
It uses the results of each iteration to avoid matrix inversion and to simplify the computational complexity of the system.
The rest of the paper is organized as follows. Section 2 described the system model. Section 3 explains the proposed algorithm. Section 4 gives the numerical simulation analysis while Section 5 concludes the paper.

System Model
The shared array architecture of the mmWave massive MIMO system is shown in Figure 1. The number of transmitting antennas at the transmitting end is N t , the number of receiving antennas at the receiving end is N r , the number of RF links at the transmitting end is N RF t , the number of RF links at the receiving end is N RF r , and the data streams of the transmitting end and the receiving end are both N s . phase shifters (PSs). However, while the PSs are studied substantially, there still exist an inevitable gap compared with the performance of PSs. The authors in [22] proposed an analog PSs and spatial multiplexing by utilizing multiple RF chains connected to a fixed subset of antenna elements. This algorithm is based on analog beamforming and therefore, it lacks the ability to provide the required precoding performance for mmWave massive MIMO systems. In [23], for single-stream single-user MIMO-OFDM systems, a hybrid precoding is proposed to maximize the received signal strength. Therefore, in view of the problems in [20][21][22], this proposed an efficient algorithm for effective precoding in mmWave massive MIMO systems.
The novel contributions of this paper are as follows: • This study deploys bird swarm algorithm (BSA) to eliminate the need for the known candidate matrices in mmWave channel estimation as in the traditional algorithms.

•
The proposed algorithm uses the characteristics of BSA with global search optimal (GSO) value to search for the largest array response vector multiplied by the residual matrix, and uses the Banachiewicz-Schur (BS) block matrix generalized inverse to transform the high-dimensional matrix into a low-dimensional matrix, avoiding matrix inversion and reducing the amount of calculation.

•
It uses the results of each iteration to avoid matrix inversion and to simplify the computational complexity of the system.
The rest of the paper is organized as follows. Section 2 described the system model. Section 3 explains the proposed algorithm. Section 4 gives the numerical simulation analysis while Section 5 concludes the paper.

System Model
The shared array architecture of the mmWave massive MIMO system is shown in Figure 1. The number of transmitting antennas at the transmitting end is , the number of receiving antennas at the receiving end is , the number of RF links at the transmitting end is , the number of RF links at the receiving end is , and the data streams of the transmitting end and the receiving end are both . In order to ensure multi-stream transmission, it is necessary to satisfy ≤ ≤ and ≤ ≤ . Under this hardware structure, the signal is transmitted to the channel ∈ ℂ × by the processing of the baseband precoder ∈ ℂ × and the RF precoder ∈ ℂ × , and the transmitting signal of the transmitting end is: where = , , … , is the data stream of the signal, and [ ] = ; = , , … , is the transmitting signal for the transmitting end.
The received signal arriving at the receiving end antenna after channel transmission is: In order to ensure multi-stream transmission, it is necessary to satisfy N s ≤ N RF t ≤ N t and N s ≤ N RF r ≤ N r . Under this hardware structure, the signal is transmitted to the channel H ∈ C N r ×N t by the processing of the baseband precoder F BB ∈ C N RF t ×N s and the RF precoder F RF ∈ C N t ×N RF t , and the transmitting signal of the transmitting end is: where S = [s 1 , s 2 , . . . , s N s ] T is the data stream of the signal, and E SS H = 1 N s I N s ; x = [x 1 , x 2 , . . . , x N t ] T is the transmitting signal for the transmitting end. The received signal arriving at the receiving end antenna after channel transmission is: where n ∈ C N r represents a Gaussian white noise with zero mean and a covariance matrix of σ 2 I N r , y = [y 1 , y 2 , . . . , y N r ] T is the signal received by the receiving antenna, ρ is the average received power, H is the channel matrix, and E H 2 F = N t N r . The received signal is further processed by the RF combiner W RF ∈ C N r ×N RF r and the baseband combiner W BB ∈ C N RF r ×N s , and the received signal received by the receiver is: whereŷ = [ŷ 1 ,ŷ 2 , . . . ,ŷ N s ] T is the signal received at the receiving end. F RF is implemented using the analog network so that it satisfies the elements F The total power constraint at the transmitting end is F RF F BB = 2 F = N s . Considering the high path loss of the mmWave channel, the sparse distribution in space, the close arrangement of the antenna array on the transceiver in the massive MIMO system, and the high correlation of the antenna elements, the traditional fading statistical channel model is not applicable. Therefore, the ray-tracing model is usually used for modeling. If the mmWave channel contains N cl scattering clusters, with each cluster containing N ray strip propagation paths, the channel H of the system can be described as: where α i,l represents the gain factor of the lth propagation path in the ith scattering cluster that follows a complex Gaussian distribution with zero mean and a variance of σ 2 α i , and it satisfies N cl i=1 σ 2 α i = γ, and γ must satisfy E H 2 F = N t N r with ϕ r il as the angle of arrival (AoA). For the ith scattering cluster, ϕ r il is randomly distributed on ϕ r i with ϕ t il as the angle of departure (AoD), and for the ith scattering cluster, ϕ t il is randomly distributed on ϕ t i . Generally, we choose the Laplacian distribution as a random distribution; α t ϕ t il and α r ϕ r il represent the array response vectors of the transmitter and receiver, respectively.
The types of antenna arrays can be combined into various configurations depending on the arrangement of the antenna in the array. In mmWave massive MIMO systems, a uniform antenna array is generally selected to design the antennas at both ends of the transceiver. Common uniform antenna arrays have a uniform linear array (ULA) and uniform planar array (UPA). In a ULA, either the elevation or the azimuth perspective is considered, since it is a one-dimensional antenna array. The UPA is a two-dimensional antenna array, and this type of antenna arrays is preferable for mmWaves since it can accommodate more antenna elements within a small area at both the user equipment and the base station (BS). It also facilitates beamforming in an extra dimension, which results in the 3D-beamforming. For the convenience of analysis, a ULA is used in this paper. For a ULA, assuming that there are N antennas on the y-axis, the array response vector can be expressed as: where ϕ ∈ [0, 2π], k = 2π λ , λ is the wavelength of the signal, and d is the spacing between antenna elements. In an actual system, channel state information (CSI) can be known by channel estimation. In order to focus only on precoding research, assuming that the transceiver knows the CSI, the spectral efficiency of the system is: where R n = σ 2 n W H BB W H RF W RF W BB is the noise covariance matrix processed by the receiver and I N s is the identity matrix of the noise. Reference [9] approximates the spectral efficiency to minimize the Euclidian distance of the hybrid precoding matrix and the all-digital precoding matrix, whereby the precoding design problem can be written as: where F opt is an all-digital precoding matrix, which is the first N s column of the right singular matrix of the channel matrix H. Then, this precoding design problem can be expressed as finding the projection of F opt on the subspace formed by the hybrid precoder F RF F BB set under the condition of F RF ∈ F RF . Similarly, the design method of the combiner at the receiving end is similar.

BSA-Based Solution
The RF precoder is constructed by the OMP-based hybrid precoding algorithm using the candidate matrix to select the largest multiplication of the inner product of the residual, wherein the candidate matrix is constructed by the antenna array response vector. By observing the structure of the array response vector, we can find that the complete array response vector can be constructed by determining the AOA, so here we use the BSA algorithm to search for the array response vector that multiplies the inner product of the residual by the maximum. BSA is a group of intelligence extracted from the social behavior and social interaction of flocks [24,25]. Birds have three main behaviors: foraging behavior, vigilance, and flight behavior. Each bird searches for food based on his own experience and the experience of the group. Each bird can switch between vigilance and foraging behavior. If a bird's random number at (0,1) is less than the threshold P ∈ (0, 1), the bird will look for food. Otherwise, the bird will remain vigilant. Compared with other swarm intelligence algorithms, the BSA has fewer adjusting parameters, faster convergence speed and stronger robustness. The bird swarm algorithm has been successfully applied to multi-objective optimization of a wireless system, optimal operation of cascade networks, and flexible task scheduling. The global optimal solution for the following objective function is expressed as: where ψ = w H F res is the correlation vector, F res is the residual matrix, and w = is the antenna array response vector. The motivation of the objective function is to determine the global optimal solution by utilizing the correlation vector, residual matrix and the antenna response vector.
where x t i,j represents the position of the ith bird when iterating t times, the total number of birds is N, i ∈ [1, N], and j is the dimension; rand(0, 1) represents a uniformly distributed random number; C and S are two positive numbers, called cognitive acceleration factor and social acceleration factor; p i,j represents the best position of the ith bird; and g j represents the best position for the bird group. Birds will try to move to the center of the flock, and they will inevitably compete with each other. Therefore, each bird does not move directly to the center of the flock. These moves can be expressed as: Among them: where k (k i) is a positive integer, that is, a randomly choose an integer between 1 and N; and a 2 are two normally distributed random numbers between [0, 2]; pFit i represents the optimal adaptation value of the ith bird; sumFit represents the sum of the best adaptation values of the flock; ε is a small constant used to avoid the divisor as 0; and mean j represents the average of the j-dimensional position of the flock. Birds may fly to another place to cope with prediction threats and foraging. When they arrive at a new location, they will look for food again. Some birds play the role of producers and look for food supplies, while others try to steal food from producers. The behavior of producers and thieves can be described as: where randn(0, 1) represents a random number obeying the standard normal distribution, and k ∈ {1, 2, . . . , N}, k i; FL is normally distributed random number between [0, 2].

The Generalized Inverse of Banachiewicz-Schur Block Matrix Based Solution
For matrix inversion, it is often replaced by the generalized inverse of Banachiewicz-Schur block matrix. It converts the high-dimensional matrix into a low-dimensional matrix and updates it with the result of the previous iteration to avoid matrix inversion and reduce the amount of computation. For convenience, we define G as: where ϕ is an N t × N cl N ray matrix whose column consists of the array's response vector at the transmitting end, I and J are two arbitrary index sets, and ϕ I is a sub-array consisting of an index set I of ϕ. In addition, let I i , be the index set of the base vector selected in the ith iteration. Therefore, the least squares solution can be rewritten as: Applying the Banachiewicz-Schur block matrix generalized inverse to the matrix G −1 (14), the matrix can be written as: where where k is the index of the currently selected base vector, G −1 iterations, A is an auxiliary vector of 1 × (i − 1), and V is an auxiliary scaler of the generalized inverse of the block matrix. By directly using the previous iteration, G −1 can be reduced to matrix multiplication, matrix addition, and a reciprocal of a real number. Therefore, the calculation of F BB can skip the calculation of G −1 l i ,l i with the calculation results of the previous iteration. F i is defined as the least square's solution matrix in the ith iteration, which can be decomposed into: where M is an auxiliary vector of 1 × N s , which is expressed as: where ψ 0 is the correlation matrix of the base vector ϕ and the initial residual matrix F res = F opt . Therefore, Equation (18) can be simplified to: In summary, G −1 I i ,I i , F i , and ψ i can all be updated simultaneously using the auxiliary variables (A, V, M) and the calculation results of the previous iteration. This replaces the process of matrix inversion to reduce the amount of computation.

Algorithm Flow
The proposed Algorithm 1 uses Equation (1) as the fitness function and uses the BSA algorithm to find the global optimal value of Equation (1) in order to transform the high-dimensional matrix into a low-dimensional matrix by using the Banachiewicz-Schur block matrix generalized inverse to avoid matrix inversion.
The specific steps of the proposed bird swarm algorithm (BSA) based matrix-inversion bypass (MIB) orthogonal matching pursuit (BSAMIBOMP) algorithm flow is as follows in Algorithm 1.

3:
According to Equation (1), obtain the optimal array response vector w using the BSA algorithm.

4:
Combine the RF precoding matrix F RF with the array response vector w, i.e., F RF = [ F RF |w].

Computational Complexity Analysis
The complexity of designing a hybrid precoding matrix is discussed in detail and compared with the hybrid precoding method based on the OMP algorithm in [15]. It is mainly divided into two phases: In the first phase, the initial RF precoding matrix is designed, and the complexity is O(N t N s ). In the second stage, the initial precoding matrix is updated. Then, the digital precoding matrix design and the residual matrix are designed, which are the same as those in the OMP based algorithm, and the complexity is O N t N t RF 2 and O N t N t RF 2 N s , respectively. The next step is mainly to process the residual matrix to construct a new RF precoding vector n. Its complexity is O N t N RF N 2 s . Combining these two phases, the overall complexity of the proposed BSAMIBOMP precoding method is O N t N t RF 2 . For the sake of analysis and comparison, it is assumed that the number of elements in the candidate vector set in the OMP algorithm is N t . In the OMP algorithm, the hybrid precoding method based on the OMP algorithm first uses the calculation of the correlation between the candidate vector matrix and the residual matrix to select the appropriate atom, the complexity O N 2 t N t RF N s and O N 2 t N t RF , respectively. Then, the digital precoding matrix and the residual matrix are designed, and the complexity has been given. Finally, the residual matrix is normalized and the complexity is Based on the analysis of each part, the overall complexity of the OMP algorithm in [15] is mainly concentrated on the calculation of the correlation between the candidate vector set matrix and the residual matrix, which is O N 2 t N t RF N s . According to the actual condition, N t N t RF ≥ N s , the algorithm complexity of hybrid precoding design is linear with the number of antennas. Therefore, compared with the hybrid precoding method based on the OMP algorithm, the proposed BSAMIBOMP precoding complexity is lower. Table 1 summarizes the complexity of the algorithms.

Numerical Simulation Analysis
In order to verify the performance of the proposed hybrid precoding Algorithm 1, this section gives the simulation results of full-digital precoding, analog precoding, and OMP-based hybrid precoding, and compares them with the proposed Algorithm 1 in mmWave massive MIMO systems. The simulation parameters are shown in Table 2. We used MATLAB R2017a for simulations, while the results are averaged over 1000 random channel implementations.  Figure 2 shows the difference in spectral efficiency (SE) with SNR for different precoding algorithms with N t = 64 antennas at the transmitter, N r = 16 antennas at the receiver, and N t RF = N r RF = N s = [2,3]. It can be seen from Figure 2a,b that as the SNR increases, the spectral efficiency of the different precoding algorithms in improved to different degrees, and as the data streams increase, the spectral efficiency of different precoding will be improved to varying degrees. For digital data streams, the full-digital precoding algorithm performs best because it is optimal precoding, and all precoding is aimed at approaching it. The proposed BSAMIBOMP algorithm performs better than the traditional OMP-based hybrid precoding algorithm as it eliminates the matrix inversion operation and candidate matrix requirement. Also, the proposed algorithm outperforms traditional hybrid OMP precoding as it utilizes the BSA algorithm to search for the global optimal solution, while the traditional OMP-based precoding algorithm uses the candidate matrix to select the column with the highest correlation. However, there is an interval between the angles of each column in the candidate matrix, so the selected array response vector is not necessarily the global optimal solution. The analog precoding performance is the worst because the analog precoding is constant modulus; only the phase characteristics are utilized, and the amplitude characteristics are not utilized. Therefore, the proposed algorithm can achieve better results under N t RF = N r RF = N s .  and 4, respectively. As can be seen from Figure 3a, Figure 3a,b shows the spectral efficiency versus SNR for different precoding algorithms with N t = 64 antennas at the transmitter, N r = 16 antennas at the receiver, N t RF = N r RF = 4, and N s = 3 and 4, respectively. As can be seen from Figure 3a,b, as the SNR and data streams increase, the spectral efficiency of different algorithms improves to different degrees. Similarly, for all data streams, the performance of the fully-digital precoding algorithm is the best. The performance of the proposed BSAMIBOMP algorithm is close to the full-digital algorithm and better than the traditional OMP-based hybrid precoding algorithm. The analog precoding performance is the worst of all precoding schemes. Comparing Figures 2 and 3, it can be found that the effect is better when N t RF = N r RF ≥ N s , because in this case, the dimensions of baseband precoding F BB and RF precoding F RF are higher, and the precoding matrix contains more information, so the effect is better. Therefore, the proposed algorithm can achieve better results for N t RF = N r RF ≥ N s . streams, the performance of the fully-digital precoding algorithm is the best. The performance of the proposed BSAMIBOMP algorithm is close to the full-digital algorithm and better than the traditional OMP-based hybrid precoding algorithm. The analog precoding performance is the worst of all precoding schemes. Comparing Figures 2 and 3, it can be found that the effect is better when = ≥ , because in this case, the dimensions of baseband precoding and RF precoding are higher, and the precoding matrix contains more information, so the effect is better. Therefore, the proposed algorithm can achieve better results for = ≥ .   Figure 4a,b shows the spectral efficiency with the number of RF links for different precoding algorithms with N t = 64 antennas at the transmitter, N r = 16 antennas at the receiver, N s = 1,3, and SNR = 0 dB. As can be seen from Figure 4a,b, since full-digital precoding is precoded only at the baseband, the analog precoding is precoded only at the radio frequency, so they are not affected by the change in the number of RF links. With the increase of the number of RF links, the proposed algorithm has better spectral efficiency than the traditional OMP-based hybrid precoding scheme. When N s = 1, the proposed algorithm and the traditional OMP-based hybrid precoding algorithm have approximately similar performances (Figure 4a). When N s = 3, the proposed algorithm shows a better performance than the traditional OMP-based hybrid precoding algorithm (Figure 4b). Therefore, the proposed algorithm performance gets better when the number of RF links and the number of data streams are relatively large. However, if the difference is too large, then the meaning of hybrid precoding is lost. So, the number of RF links is generally twice that of the data streams for better performance. Therefore, the proposed algorithm can achieve better results for different number of RF link. the baseband, the analog precoding is precoded only at the radio frequency, so they are not affected by the change in the number of RF links. With the increase of the number of RF links, the proposed algorithm has better spectral efficiency than the traditional OMP-based hybrid precoding scheme. When = 1, the proposed algorithm and the traditional OMP-based hybrid precoding algorithm have approximately similar performances (Figure 4a). When = 3, the proposed algorithm shows a better performance than the traditional OMP-based hybrid precoding algorithm (Figure 4b). Therefore, the proposed algorithm performance gets better when the number of RF links and the number of data streams are relatively large. However, if the difference is too large, then the meaning of hybrid precoding is lost. So, the number of RF links is generally twice that of the data streams for better performance. Therefore, the proposed algorithm can achieve better results for different number of RF link.  antennas at the receiving end, = = 4, and at 1 and 3, respectively. As can be seen from Optimal Full-Digital Precoding Proposed BSAMIBOMP Algorithm OMP Precoding [15] SDRAltmin Precoding [16] Analog Precoding [17] Figures 5 and 6 show the different BERs with N t = 64 antennas at the transmitter end, N r = 16 antennas at the receiving end, N t RF = N r RF = 4, and N s at 1 and 3, respectively. As can be seen from Figures 5 and 6, the BER of the full-digital precoding, the proposed algorithm, and the traditional OMP-based hybrid precoding algorithms decrease with the increase of SNR, while the analog precoding BER remains unchanged. This is because the analog precoder selects the N s column array response vector with the largest channel gain, and its selection is independent of the SNR. Comparing Figure 5 with Figure 6, it can be found that when N s = 1, the full-digital precoding algorithm, the proposed BSAMIBOMP, and the traditional OMP-based hybrid precoding algorithm achieved the best BER performance, that is, with no error rate. When N s = 3, the full-digital precoding algorithm can be optimized at 5 dB, while the OMP-based hybrid precoding and the proposed algorithm tend to be stable after 5 dB SNR, and a closer observation can be found in this paper. The proposed algorithm shows better BER performance than the traditional OMP-based hybrid precoding, and also its performance is closer to the full-digital precoding scheme. We also conclude that the proposed algorithm is especially suitable when there is a difference between the number of RF links and the number of data streams. From the above results, it is clear that the proposed algorithm also shows better performance in terms of BER.  Figures 5 and 6, the BER of the full-digital precoding, the proposed algorithm, and the traditional OMP-based hybrid precoding algorithms decrease with the increase of SNR, while the analog precoding BER remains unchanged. This is because the analog precoder selects the column array response vector with the largest channel gain, and its selection is independent of the SNR. Comparing Figure 5 with Figure 6, it can be found that when = 1, the full-digital precoding algorithm, the proposed BSAMIBOMP, and the traditional OMP-based hybrid precoding algorithm achieved the best BER performance, that is, with no error rate. When = 3, the full-digital precoding algorithm can be optimized at 5 dB, while the OMP-based hybrid precoding and the proposed algorithm tend to be stable after 5 dB SNR, and a closer observation can be found in this paper. The proposed algorithm shows better BER performance than the traditional OMP-based hybrid precoding, and also its performance is closer to the full-digital precoding scheme. We also conclude that the proposed algorithm is especially suitable when there is a difference between the number of RF links and the number of data streams. From the above results, it is clear that the proposed algorithm also shows better performance in terms of BER.

BER
Analog Precoding [17] SDRAltmin Precoding [16] OMP Precoding [15] Proposed BSAMIBOMP Precoding Optimal Full-Digital Precoding  Figure 7 compares the computational complexity of the proposed BSAMIBOMP algorithm with another existing algorithm under a different number of RF chains. As can be seen from Figure 7, when the number of RF chains increases, the number of multiplications and additions required for all the algorithms increases. It is also shown in the results that the proposed algorithm requires a smaller number of multiplications and additions than the OMP precoding [15] and the SDRAltMin precoding [16] for the same number of RF chains and operating conditions, which makes effective for mmWave systems. This means that the proposed algorithm requires a lesser amount of energy to operate the systems and also results in reduced hardware complexity. On the other hand, the conventional OMP hybrid precoding scheme requires a greater number of complex multiplications and additions, which makes it unsuitable for mmWave communications systems hardware.   Figure 7 compares the computational complexity of the proposed BSAMIBOMP algorithm with another existing algorithm under a different number of RF chains. As can be seen from Figure 7, when the number of RF chains increases, the number of multiplications and additions required for all the algorithms increases. It is also shown in the results that the proposed algorithm requires a smaller number of multiplications and additions than the OMP precoding [15] and the SDRAltMin precoding [16] for the same number of RF chains and operating conditions, which makes effective for mmWave systems. This means that the proposed algorithm requires a lesser amount of energy to operate the systems and also results in reduced hardware complexity. On the other hand, the conventional OMP hybrid precoding scheme requires a greater number of complex multiplications and additions, which makes it unsuitable for mmWave communications systems hardware.  Figure 8 shows the energy efficiency analysis under different number of RF chains for different precoding schemes. As it can be seen, the energy efficiency of all the precoding schemes decreases with an increasing number of RF chains. Moreover, the proposed BSAMIBOMP precoding algorithm shows better energy efficiency than the existing OMP hybrid precoding [15] and SDRAltMin [16] precoding under the same number of RF chains and operating conditions, which makes it more effective for mmWave systems. Therefore, the proposed algorithm has a close performance to Fully-Digital precoding and an overall better performance than other competing alternatives.  Figure 8 shows the energy efficiency analysis under different number of RF chains for different precoding schemes. As it can be seen, the energy efficiency of all the precoding schemes decreases with an increasing number of RF chains. Moreover, the proposed BSAMIBOMP precoding algorithm shows better energy efficiency than the existing OMP hybrid precoding [15] and SDRAltMin [16] precoding under the same number of RF chains and operating conditions, which makes it more effective for mmWave systems. Therefore, the proposed algorithm has a close performance to Fully-Digital precoding and an overall better performance than other competing alternatives. Symmetry 2019, 11, x FOR PEER REVIEW 14 of 16

Conclusions and Future Recommendations
In this paper, we proposed a matrix-inversion bypass (MIB) bird swarm algorithm (BSA) based OMP algorithm (BSAMIBOMP) to eliminate the matrix inversion operation and known candidate matrix requirement so that the number of computations is reduced. The algorithm uses the characteristics of BSA with a global search optimal value to search for the largest array response vector multiplied by the residual matrix and uses the Banachiewicz-Schur block matrix generalized inverse to transform the high-dimensional matrix into a low-dimensional matrix, avoiding matrix inversion and reducing the amount of calculation. Compared with the existing OMP-based hybrid precoding [15], SDRAltMin precoding [16], and analog precoding [17], the proposed algorithm achieves better performance in terms of system spectral efficiency and bit error rate without known candidate matrix and matrix inversion. For future directions, [this study can be further extended by considering the energy efficiency of the proposed algorithm and comparing it with other state-of-theart algorithms under different important parameters and constraints. Moreover, the future work can also focus on analyzing the hardware impairments versus different parameters for the proposed algorithm and comparing it to existing competing alternatives.

Conclusions and Future Recommendations
In this paper, we proposed a matrix-inversion bypass (MIB) bird swarm algorithm (BSA) based OMP algorithm (BSAMIBOMP) to eliminate the matrix inversion operation and known candidate matrix requirement so that the number of computations is reduced. The algorithm uses the characteristics of BSA with a global search optimal value to search for the largest array response vector multiplied by the residual matrix and uses the Banachiewicz-Schur block matrix generalized inverse to transform the high-dimensional matrix into a low-dimensional matrix, avoiding matrix inversion and reducing the amount of calculation. Compared with the existing OMP-based hybrid precoding [15], SDRAltMin precoding [16], and analog precoding [17], the proposed algorithm achieves better performance in terms of system spectral efficiency and bit error rate without known candidate matrix and matrix inversion. For future directions, [this study can be further extended by considering the energy efficiency of the proposed algorithm and comparing it with other state-of-the-art algorithms under different important parameters and constraints. Moreover, the future work can also focus on analyzing the hardware impairments versus different parameters for the proposed algorithm and comparing it to existing competing alternatives.