A Model-Driven Channel Estimation Method for Millimeter-Wave Massive MIMO Systems

Aiming at the problem of low estimation accuracy under a low signal-to-noise ratio due to the failure to consider the “beam squint” effect in millimeter-wave broadband systems, this paper proposes a model-driven channel estimation method for millimeter-wave massive MIMO broadband systems. This method considers the “beam squint” effect and applies the iterative shrinkage threshold algorithm to the deep iterative network. First, the millimeter-wave channel matrix is transformed into a transform domain with sparse features through training data learning to obtain a sparse matrix. Secondly, a contraction threshold network based on an attention mechanism is proposed in the phase of beam domain denoising. The network selects a set of optimal thresholds according to feature adaptation, which can be applied to different signal-to-noise ratios to achieve a better denoising effect. Finally, the residual network and the shrinkage threshold network are jointly optimized to accelerate the convergence speed of the network. The simulation results show that the convergence speed is increased by 10% and the channel estimation accuracy is increased by 17.28% on average under different signal-to-noise ratios.


Introduction
At present, the fifth-generation (5G) wireless communication network technology has entered a critical period of large-scale application, and the ultimate goal is to realize the mobile Internet and the Internet of Everything [1]. However, the 5G communication system cannot meet all the needs of future wireless communication systems, especially the explosive growth of mobile data traffic and the need for business diversification. Many universities and some foreign scholars have started the research and development of the sixth-generation (6G) wireless communication network technology [2]. To meet the application requirements of the ultra-high transmission rate (huge traffic) and ultra-high connection density (giant connection), the spectrum needs all spectrums below 6 GHz, i.e., millimeter wave, terahertz, and optical band. While the spectrum resources are abundant, its path loss is also high. To overcome the huge attenuation in the millimeter-wave transmission path, the millimeter-wave technology is usually used in combination with massive multiple-input multiple-output (MIMO) technology. Higher energy efficiency and spectrum efficiency can be provided by using large-scale antennas [3]. In recent years, a new digital modulation technique, spatial modulation, has also been extensively studied, which strikes a good balance between spectral efficiency and energy efficiency [4,5]. At the same time, the channel estimation of this single input stream multiple input multiple output technology (SM-MIMO) has also been studied [6]. Therefore, accurate channel state information is indispensable for any communication system, especially for millimeter-wave MIMO-OFDM broadband systems with "beam squint" effect; accurate CSI is the main factor affecting the performance of the system, so accurate channel estimation becomes a key problem to be solved.
Aiming at the problem of channel estimation in the millimeter-wave massive MIMO system, relevant scholars have used the sparse characteristic of millimeter-wave propagation [7] to adopt the restoration method of compressed sensing, implementing algorithms such as Orthogonal Matching Pursuit (OMP) [8][9][10], Basis Pursuit De-Noising (BPDN) [11], and some other similar algorithms [12]. However, the performance of the compressed sensing algorithm is greatly affected by the prior sparsity and the iteration step size, and the anti-noise ability of the algorithm is poor in the environment of low signal-to-noise ratio, so it is not suitable for the channel estimation of the millimeter-wave massive MIMO broadband system. The authors of [13] proposed a Support Detection (SD) method by using the structural characteristics of the beam space channel of a millimeter wave. This method decomposes the channel estimation problem into a series of sub-problems, reducing pilot overhead and ensuring reliable performance. In [14], a Cosparse Analysis Approximate Message Passing (SCAMPI) algorithm based on isomorphism analysis was proposed. This method treats the channel matrix as a two-dimensional image, transforms the channel estimation problem into an image reconstruction problem, and improves channel estimation accuracy. However, the methods proposed in the above documents are all applicable to narrowband systems, but in practice, to achieve higher data rates, millimeter-wave systems are more likely to be broadband. For broadband systems, [15] proposed a method called Synchronous Orthogonal Matching Pursuit (SOMP), which first estimates the support of the wideband beam spatial channel for some subcarriers, then creates public support, and finally restores the entire beam space channel. Some other similar algorithms [16,17] also assume common support, ignoring the "beam squint" effect [18] in wideband millimeterwave systems; i.e., larger bandwidths cause subcarrier directions to change less at different frequencies. Large bandwidths in turn lead to differences in spatial channels, so the assumed common support is wrong. The authors of [19] proved that each path component in a broadband system has a unique frequency-dependent sparse structure, and proposed a Continuous Support Detection (SSD) method based on applying continuous interference cancellation. This method estimates all sparse path components one by one, but because only the least squares algorithm is used to recover the channel vector, the estimation accuracy in the low SNR area is low. Due to the huge path loss and limited transmit power, the uplink Signal-Noise Ratio (SNR) is usually low [20]. In [21], the authors proposed a model-driven unsupervised generalized expectation consensus (Learned Denoising-based GEC, LDGEC) signal recovery network based on learning denoising for broadband beam space channel estimation, which combines the denoising convolutional neural network DnCNN into the generalized expectation consistent signal recovery algorithm, although it has achieved better performance than the algorithm based on compressed sensing, it can only obtain −7 dB NMSE performance when SNR = 0, so there is still room for improvement in the estimation accuracy under low SNR.
In view of the above method, the channel estimation accuracy is low under a low signalto-noise ratio for wideband systems with "beam squint" effect, and this paper proposes a new channel estimation method based on Adaptive Threshold Learning Iterative Shrinkage Threshold Network (ALISTA-Net). The method uses model-driven deep learning, based on the iterative shrinkage threshold algorithm, and the algorithm is expanded into the deep iterative network, and the millimeter-wave channel matrix is transformed into the sparse transform domain through training data learning. In the beam domain denoising stage, the Shrinkage Threshold Network (ST-Net) based on the attention mechanism is proposed. The network learns from the idea of the attention mechanism to provide a set of thresholds adaptively through feature learning so that the model is suitable for situations with different noise contents. The denoising effect under different signal-tonoise ratios is used to improve the accuracy of channel estimation; at the same time, to improve the network convergence speed, residual learning is introduced into the network structure. Simulation experiments show that compared with the existing channel estimation algorithms for broadband systems, this method has achieved higher estimation accuracy under different SNRs. The structure of this paper is divided into the following parts: Section 2 introduces the system model and the description of channel estimation. Section 3 introduces the structure of the ALISTA-Net network proposed in this paper. Section 4 analyzes and discusses the experimental results. Section 5 introduces the conclusion of this paper.

System Model
In this paper, in Time Division Duplexing (TDD) mode, a wide-band beam space massive MIMO-OFDM system based on lens array is considered, as shown in Figure 1. The signals are received by the antenna at the receiving end. These signals are processed by radio frequency to obtain baseband analog signals, and then the analog signals are converted into digital signals and then processed synchronously. After the cyclic prefix is removed, they are demodulated by FFT and then processed by baseband signals. structure. Simulation experiments show that compared with the existing channel estimation algorithms for broadband systems, this method has achieved higher estimation accuracy under different SNRs. The structure of this paper is divided into the following parts: Section 2 introduces the system model and the description of channel estimation. Section 3 introduces the structure of the ALISTA-Net network proposed in this paper. Section 4 analyzes and discusses the experimental results. Section 5 introduces the conclusion of this paper.

System Model
In this paper, in Time Division Duplexing (TDD) mode, a wide-band beam space massive MIMO-OFDM system based on lens array is considered, as shown in Figure 1. The signals are received by the antenna at the receiving end. These signals are processed by radio frequency to obtain baseband analog signals, and then the analog signals are converted into digital signals and then processed synchronously. After the cyclic prefix is removed, they are demodulated by FFT and then processed by baseband signals.   The system is equipped with an N element-lens antenna array and a base station with RF chains to simultaneously serve single-antenna users. This paper considers the widely used Saleh-Valenzuela channel model, then the N × 1 spatial channel of the user at the subcarrier m(m = 1, 2, . . . , M) can be expressed as L is the number of paths, and β l and τ l are the complex gain and time delay of the l-th path, respectively. Furthermore, α(φ l,m ) is the array response vector, and φ l,m is the spatial direction at the subcarrier m, defined as Among them, is the frequency of the subcarrier m, f c and f b are the carrier frequency and bandwidth, respectively, c is the speed of light, θ l is the physical direction, and d = 0.5 · c/ f c is the antenna spacing. In millimeter-wave broadband systems, a larger f b will lead to φ l,m a larger change in subcarriers of different frequencies, which in turn will lead to a difference in the spatial channel h m , which is called the "beam squint" effect. When the base station is equipped with a uniform linear lens antenna array, the array response vector α(φ l,m ) can be expressed as . . , N−1 2 ] denotes the indices of different antennas. The lens antenna array shown in Figure 1 can transform the space domain in (1) into the beam space domain. The lens antenna array is similar to the N × N spatial discrete Fourier transform (DFT) matrix F. Therefore, the wideband beam spatial channel h m at the subcarrier m is expressed as where c l,m denotes the l-th path component at the subcarrier m in the beam space, and c l,m is determined φ l,m by Ξ(x) = sin Nπx sin πx is the function of Dirichlet, and φ n = 1 N n − N+1 2 (n = 1, 2, . . . , N) is the predefined spatial orientation of the lens antenna array.

Problem Description
The received signal vector at the base station can be expressed as Among them, y m,q ∈ C N RF ×1 and s m,q is defined as the transmission pilot of the subcarrier m at the moment q(q = 1, 2, . . . , Q) (each user only sends one pilot at each moment). n m,q ∼ N C 0, σ 2 I denotes a Gaussian noise vector, σ 2 denoting the noise power. W q ∈ C N RF ×N is a fixed adaptive selection network for different subcarriers. Since the pilot signal is known at the receiver side, it is assumed that s m,q = 1. After Q instants of pilot transmission, the total measurement signal can be obtained as ∈ C QN RF ×N is expressed as the overall combination matrix, and n eq m = (W 1 n m,1 ) , . . . , W Q n m,Q is expressed as the effective noise vector. In this paper, a low-cost 1-bit phase shifter is used in the adaptive selection network, where elements of W are randomly selected in the set By stacking the M beam space channel vectors into a matrix and converting the complex values to real values, the following signal recovery problem description can be obtained Y=WH+N (8) Y represents the received signal, and is the beam frequency matrix. Since h m is sparse, if the beam frequency matrix is regarded as a natural image, the compressed image restoration method can be used for beam space channel estimation. Therefore, a channel estimation network based on model-driven deep learning is developed.

Channel Estimation Network Based on Model-Driven Deep Learning
This article applies the Iterative Shrinkage Thresholding Algorithm (ISTA) [22] to the deep iterative network. The main update steps of the ISTA algorithm to restore the signal are as follows wherein k represents the iteration index of ISTA, and ρ represents the iteration step size. Ψ is the transformation matrix that makes ΨH sparse in the transform domain. The two update  (9) and (10) correspond to gradient descent and optimization procedures, respectively. Therefore, R (k) can be regarded as a noisy signal. In the broadband millimeter-wave system, to reduce the recovery error caused by manually setting parameters (such as λ, ρ and Ψ), this paper adopts the idea of learnable parameters, expands the iterative steps of the ISTA algorithm to ALISTA-Net, using the powerful learning power of neural networks to obtain an optimal set of parameters, and an adaptive selection threshold method based on ST-Net network is proposed in the beam domain denoising stage to reduce the loss-of-sparse recovery. The network consists of cascaded K layers with the same structure, which strictly corresponds to the K sub-iterative processes in the ISTA algorithm. The network structure is shown in Figure 2.
wherein k represents the iteration index of ISTA, and  represents the iteration step size. Ψ is the transformation matrix that makes ΨH sparse in the transform domain.
The two update steps (9) and (10) correspond to gradient descent and optimization procedures, respectively. Therefore, ( ) k R can be regarded as a noisy signal. In the broadband millimeter-wave system, to reduce the recovery error caused by manually setting parameters (such as  ,  and Ψ ), this paper adopts the idea of learnable parameters, expands the iterative steps of the ISTA algorithm to ALISTA-Net, using the powerful learning power of neural networks to obtain an optimal set of parameters, and an adaptive selection threshold method based on ST-Net network is proposed in the beam domain denoising stage to reduce the loss-of-sparse recovery. The network consists of cascaded K layers with the same structure, which strictly corresponds to the K sub-iterative processes in the ISTA algorithm. The network structure is shown in Figure 2.  Using learnable parameters, learnable sparse matrices, and introducing an attention-based shrinkage threshold network (ST-Net) to replace manually set hyperparameters (ρ, λ, Ψ and thresholds), this work aims to select the best parameters during network training to compensate for recovery loss caused by improper manual settings. In this stage, the operation to expand to ALISTA-Net mainly includes the following processes: 1. Gradient descent process: R (k) can be rewritten as Among them, ρ (k) are the parameters that can be learned from the data, rather than manually setting each layer to share.

2.
(Inverse) sparse transformation and residual learning process: The specific method of using sparse transformation is as follows Among them, A (k) ∈ R ω 1 ×2M and B (k) ∈ R ω 2 ×ω 1 are two learnable matrices, as proved in [22], so the optimization process in Formula (10) can be rewritten as , the optimization problem in Equation (13). can be viewed as a noise reduction problem, which is solved by the soft threshold operation in the frequency domain, i.e., so f t(x, θ) = sign(x)(max(0, |x| − θ)), and sign(·) is a sign function. F (k) represents the inverse transformation that converts the signal from the sparse transform domain to the beam domain, F (k) and F (k) the network structure of and is symmetrical, consisting of two fully connected layers, but with different weights. So, the solution of Equation (13) is as followsĤ θ (k) represents the k-th threshold of the contraction operation, which is also a learnable parameter. To speed up the convergence speed of the network, residual learning is introduced into ALISTA-Net, which is the part connected by the red line in Figure 2, so Equation (14) is rewritten aŝ Modules R (k) and R (k) modules are similar, but the input is changed, as follows To transform and denoise in the beam domain, it is first necessary for R (k) to perform a transpose operation. In the denoising stage, inspired by the deep residual shrinkage network, a shrinkage threshold network (ST-Net) based on the attention mechanism is proposed, which will be introduced later. Finally, the inverse transposition operation is performed, so the output can be expressed by the following formulâ (17) has a great influence on the estimation accuracy of the ALISTA-Net model. Since the noise amount of each input is different, ST-Net is introduced in the beam domain denoising stage, inspired by the deep residual shrinkage network DRSN-CW structure in the literature [23]. The basic module is shown in Figure 3. The structure of the proposed shrinkage threshold network based on the attention mechanism is shown in Figure 4. First, the threshold in the soft threshold function is automatically learned and set by the network, which reduces the loss of accuracy caused by the inaccurate manual setting of the threshold. Secondly, the threshold value in the soft threshold function of the network is a positive number within the appropriate value range, to avoid the output situation of all zeros. At the same time, each sample has its own unique set of thresholds, making the model more applicable to situations with different noise contents. ure 3. The structure of the proposed shrinkage threshold network based on the attention mechanism is shown in Figure 4. First, the threshold in the soft threshold function is automatically learned and set by the network, which reduces the loss of accuracy caused by the inaccurate manual setting of the threshold. Secondly, the threshold value in the soft threshold function of the network is a positive number within the appropriate value range, to avoid the output situation of all zeros. At the same time, each sample has its own unique set of thresholds, making the model more applicable to situations with different noise contents. The network structure draws on the idea of the attention mechanism. First, by taking the absolute value of the input, the absolute value of the sparse transformation feature map is simplified into a 16-dimensional vector by using Global Average Pooling (GAP). To make the intermediate output value of each layer the value more stable, and reduce the risk of model overfitting, the obtained features are input into two fully connected networks 1  and 2  with batch normalization (BN), and these two networks have 4 and 16 neurons, respectively, using Sigmoid and ReLU as activation functions to generate scaling vectors. The iteration threshold can be obtained by multiplying the dimensionality reduction vector and the scaling vector. The obtained threshold is a 16-dimensional vector, which changes in different iteration stages, and a set of thresholds for each channel  4. Training parameters: The trainable parameter set in ALISTA-Net is used  . According to the above description,  can be expressed as follows

Shrink Threshold Network (ST-Net) Based on Attention Mechanism: The choice of threshold in Equation
Among them, K is the total number of iterations of ALISTA-Net, and all parameters will be learned as parameters of the neural network.

Loss function:
Since ALISTA-Net contains K stages, the loss function of the training process designed in this paper is as follows  is a measure of the loss between the output and the true value, and is a key The network structure draws on the idea of the attention mechanism. First, by taking the absolute value of the input, the absolute value of the sparse transformation feature map is simplified into a 16-dimensional vector by using Global Average Pooling (GAP). To make the intermediate output value of each layer the value more stable, and reduce the risk of model overfitting, the obtained features are input into two fully connected networks N 1 and N 2 with batch normalization (BN), and these two networks have 4 and 16 neurons, respectively, using Sigmoid and ReLU as activation functions to generate scaling vectors. The iteration threshold can be obtained by multiplying the dimensionality reduction vector and the scaling vector. The obtained threshold is a 16-dimensional vector, which changes in different iteration stages, and a set of thresholds for each channel can be adaptively obtained according to the characteristics of the input feature map. Finally, the contraction operation is performed according to the contraction function in the formula.

4.
Training parameters: The trainable parameter set in ALISTA-Net is used Θ. According to the above description, Θ can be expressed as follows Among them, K is the total number of iterations of ALISTA-Net, and all parameters will be learned as parameters of the neural network.

5.
Loss function: Since ALISTA-Net contains K stages, the loss function of the training process designed in this paper is as follows L mse is a measure of the loss between the output and the true value, and is a key component of the loss function. L iteration is the sum of the errors between the true value and the iterated value of ALISTA-Net in each stage. This helps to produce estimates that are close to the true value at each stage. L symmetry is to establish F · F = I as much as possible.

Simulation Data and Parameter Settings
To complete the training and testing of the ALISTA-Net network model, the specific parameter settings of the system model channel in this paper are shown in Table 1. the parameters of the training plan are set as shown in Table 2. 20ns Physical direction of the path θ l ∼ µ(−π/2, π/2) Delay of the path τ l ∼ µ(0, 20ns) Number of resolvable paths L = 3

Simulation Result Analysis
In this paper, the normalized mean squared error (NMSE) is used as the difference between the channel matrix generated by the estimation estimate and the real channel matrix, which is defined as wherein the smaller the value of NMSE, the higher the accuracy of channel estimation by the model.

•
Ablation experiment To compare the NMSE performance of the expanded iterative network (LISTA-Net) and the expanded iterative network (ALISTA-Net) using ST-Net, this paper designs an ablation simulation experiment for analysis. The simulation results are shown in Figures 5 and 6. In Figure 6, the abscissa is the number of layers deployed in the network, and the ordinate is the normalized mean square error. As the number of layers increases, the NMSE gradually decreases, and the convergence speed of ALISTA-Net is fast. Specifically, the ALISTA-Net method proposed in this paper can converge within six layers. In Figure 6, the abscissa is the number of layers deployed in the network, and the ordinate is the normalized mean square error. As the number of layers increases, the NMSE gradually decreases, and the convergence speed of ALISTA-Net is fast. Specifically, the ALISTA-Net method proposed in this paper can converge within six layers.  Table 3 lists the time it takes for the two methods to complete channel estimation in the online phase. The result obtained is the online test time per 10 epochs. The running speed can reflect the practicability of the algorithm in the time environment. As shown in Figure 5, the abscissa is the signal-to-noise ratio, and the ordinate is the normalized mean square error. As the SNR increases, the NMSE decreases gradually and the estimation accuracy increases. This is because as the signal-to-noise ratio increases, the amount of noise decreases gradually, and the impact on channel estimation gradually decreases, so the estimation accuracy gradually increases. Under five different signal-tonoise ratios, the NMSE of ALISTA-Net is lower than that of LISTA-Net, and the estimation accuracy is better than that of LISTA-Net. When SNR = −10, compared with LISTA-Net, ALISTA-Net has achieved 1.06 dB accuracy improvement. This is due to the proposed ST-Net adopting a strategy of providing different thresholds for each channel according to different noise characteristics instead of sharing thresholds for all channels.
In Figure 6, the abscissa is the number of layers deployed in the network, and the ordinate is the normalized mean square error. As the number of layers increases, the NMSE gradually decreases, and the convergence speed of ALISTA-Net is fast. Specifically, the ALISTA-Net method proposed in this paper can converge within six layers. Table 3 lists the time it takes for the two methods to complete channel estimation in the online phase. The result obtained is the online test time per 10 epochs. The running speed can reflect the practicability of the algorithm in the time environment. Since ALISTA-Net introduces the ST-Net module, the running time of ALISTA-Net is slightly higher than that of LISTA-Net, but the final time consumption is acceptable in the transmission of real-time communication systems.

•
Performance analysis As shown in Figure 7, the ALISTA-Net algorithm proposed in this paper can not only obtain higher accuracy but also have a faster convergence speed than the current classic broadband system channel estimation method. Specifically, the proposed algorithm can converge within six layers; thus, the online stage can be deployed with a small number of layers, speeding up the training rate.  Since ALISTA-Net introduces the ST-Net module, the running time of ALISTA-Net is slightly higher than that of LISTA-Net, but the final time consumption is acceptable in the transmission of real-time communication systems.  Performance analysis As shown in Figure 7, the ALISTA-Net algorithm proposed in this paper can not only obtain higher accuracy but also have a faster convergence speed than the current classic broadband system channel estimation method. Specifically, the proposed algorithm can converge within six layers; thus, the online stage can be deployed with a small number of layers, speeding up the training rate. As shown in Figure 8, the ALISTA-Net method proposed in this paper achieves NMSE lower than that of the existing classic wideband system channel estimation algorithm in the range of 0-20 dB SNR. When SNR = 0, compared with the algorithm based on compressed sensing, ALISTA-Net obtains the largest gain and achieves NMSE of −16.24 As shown in Figure 8, the ALISTA-Net method proposed in this paper achieves NMSE lower than that of the existing classic wideband system channel estimation algorithm in the range of 0-20 dB SNR. When SNR = 0, compared with the algorithm based on compressed sensing, ALISTA-Net obtains the largest gain and achieves NMSE of −16.24 dB, which verifies the effectiveness of the proposed scheme. Compared with deep learning-based ISTA-Net + [24], a lower NMSE value is also achieved, which indicates that the proposed algorithm based on model-driven deep learning has been optimized for millimeter-wave massive MIMO broadband system channel estimation.

Complexity analysis
As shown in Table 4, the complexity of the proposed ALISTA-Net is compared with other channel estimation methods, and ALISTA-Net achieves a lower computational complexity than LDGEC and ISTA-Net + . This is because the computational complexity of ALISTA-Net is mainly determined by matrix multiplication, and the computational complexity of LDGEC is mainly determined by matrix inversion. The computational complexity of ISTA-Net + is mainly determined by the convolution operation, where K is the filter size, in C and out C are the number of input channels and output channels of the convolution, respectively (in [21], . Although the number of trainable parameters of ALISTA-Net is higher than that of LISTA-Net and ISTA-Net + , it is known that both ALISTA-Net achieves better estimation accuracy than LISTA-Net and ISTA-Net + based on the results of ablation experiments and performance tests.

Method Parameters Computational Complexity
LISTA-Net Comparison of NMSE performance between ALISTA Net and other channel estimation algorithms.

•
Complexity analysis As shown in Table 4, the complexity of the proposed ALISTA-Net is compared with other channel estimation methods, and ALISTA-Net achieves a lower computational complexity than LDGEC and ISTA-Net + . This is because the computational complexity of ALISTA-Net is mainly determined by matrix multiplication, and the computational complexity of LDGEC is mainly determined by matrix inversion. The computational complexity of ISTA-Net + is mainly determined by the convolution operation, where K is the filter size, C in and C out are the number of input channels and output channels of the convolution, respectively (in [21], C in = C out = 32). Although the number of trainable parameters of ALISTA-Net is higher than that of LISTA-Net and ISTA-Net + , it is known that both ALISTA-Net achieves better estimation accuracy than LISTA-Net and ISTA-Net + based on the results of ablation experiments and performance tests.

Conclusions
Aiming at the problem of low estimation accuracy of existing channel estimation algorithms in millimeter-wave massive MIMO broadband systems with a low signal-tonoise ratio, this paper proposes a channel estimation algorithm based on model-driven deep learning considering the "beam squint" effect. Ablation experiments show that, compared with LISTA-Net without ST-Net, ALISTA-Net achieves a higher estimation accuracy and convergence speed in low SNR areas. Performance experiments show that compared with existing wideband system algorithms, ALISTA-Net can still achieve higher estimation accuracy in areas with low SNR. Therefore, the ALISTA-Net proposed in this paper is suitable for channel estimation in millimeter-wave massive MIMO broadband systems and has faster convergence speed and higher channel estimation accuracy.