Low-Resolution Precoding for Multi-Antenna Downlink Channels and OFDM

Downlink precoding is considered for multi-path multi-input single-output channels where the base station uses orthogonal frequency-division multiplexing and low-resolution signaling. A quantized coordinate minimization (QCM) algorithm is proposed and its performance is compared to other precoding algorithms including squared infinity-norm relaxation (SQUID), multi-antenna greedy iterative quantization (MAGIQ), and maximum safety margin precoding. MAGIQ and QCM achieve the highest information rates and QCM has the lowest complexity measured in the number of multiplications. The information rates are computed for pilot-aided channel estimation and a blind detector that performs joint data and channel estimation. Bit error rates for a 5G low-density parity-check code confirm the information-theoretic calculations. Simulations with imperfect channel knowledge at the transmitter show that the performance of QCM and SQUID degrades in a similar fashion as zero-forcing precoding with high resolution quantizers.


Introduction
Massive multiple-input multiple-output (MIMO) base stations can serve many user equipments (UEs) with high spectral efficiency and simplified signal processing [1,2]. However, their implementation is challenging due to the cost and energy consumption of analog-to-digital and digital-to-analog converters (ADCs/DACs) and linear power amplifiers (PAs). There are several approaches to lower cost. One approach is hybrid beamforming with analog beamformers in the radio frequency (RF) chain of each antenna and where the digital baseband processing is shared among RF chains. Second, constant envelope waveforms permit using non-linear PAs. Third, all-digital approaches use lowresolution ADCs/DACs or low-resolution digitally controlled RF chains. The focus of this paper is on the all-digital approach.

Single-Carrier Transmission
We study the multi-antenna downlink and UEs with one antenna each, a model referred to as multi-user multi-input single-output (MU-MISO). Most works on low-cost precoding for MU-MISO consider phase-shift keying (PSK) to lower the requirements on the PAs. For instance, the early papers [3,4] (see also [5]) use iterative coordinate-wise optimization to choose transmit symbols from a continuous PSK alphabet for flat and frequency-selective (or multipath) fading, respectively. We remark that these papers do not include an optimization parameter (called α below, see (8)) in their cost function, which plays an important role at high signal-to-noise ratio (SNR), see [6,7]. This parameter is related to linear minimum-mean square error (MMSE) precoding.

Discrete Signaling and OFDM
Our main interest is discrete-alphabet precoding for multipath channels with OFDM as in 5G wireless systems. Precoding for OFDM is challenging because the alphabet constraint is in the time domain after the inverse discrete Fourier transform (IDFT) rather than in the frequency domain. We further focus on using information theory to derive achievable rates. For this purpose, we consider two types of receivers: classic pilot-aided transmission (PAT) and a blind detector that performs joint data and channel estimation.
Discrete-alphabet precoding for OFDM was treated in Ref. [37], who used QLP and low resolution DACs. A more sophisticated approach appeared in Ref. [38], who applied a squared-infinity norm Douglas-Rachford splitting (SQUID) algorithm to minimize a quadratic cost function in the frequency domain. The performance was illustrated via bit error rate (BER) simulations with convolutional codes and QPSK or 16-quadrature amplitude modulation (QAM) by using 1-3 bits of phase quantization.
The paper [39] instead proposed an algorithm called multi-antenna greedy iterative quantization (MAGIQ) that builds on [19] and uses coordinate-wise optimization of a quadratic cost function in the time domain. MAGIQ may thus be considered an extended version of [4] for OFDM and discrete alphabets. Simulations showed that MAGIQ outperforms SQUID in terms of complexity and achievable rates. Another coordinate-wise optimization algorithm appeared in [40,41] that builds on the papers [21,22]. The algorithm is called constant envelope symbol level precoding (CESLP) and it is similar to the refinement of MAGIQ presented here. The main difference is that, as in [38], the optimization in [40,41] uses a cost function in the frequency domain rather than the time domain. We remark that processing in the time domain has advantages that are described in Section 3.1.
The MSM algorithm was extended to OFDM in [42]. MSM works well at low and intermediate rates but MAGIQ outperforms MSM at high rates both in terms of complexity and achievable rates. Finally, the recent paper [43] uses generalized approximate message passing (GAMP) for OFDM.

Contributions and Organization
The contributions of this paper are as follows. • The analysis of MAGIQ in the workshop paper [39] is extended to larger systems and more realistic channel conditions; • Replacing the greedy antenna selection rule of MAGIQ with a fixed (round-robin) schedule is shown to cause negligible rate loss. The new algorithm is named quantized coordinate minimization (QCM); • The performance of QLP-ZF, SQUID, MSM, MAGIQ, and QCM are compared in terms of complexity (number of multiplications and iterations) and achievable rates; • We develop an auxiliary channel model to compute achievable rates for pilot-aided channel estimation and a blind detector that performs joint channel and data estimation. The models let one compare modulations, precoders, channels, and receivers; • Simulations with a 5G NR low-density parity-check (LDPC) code [44] show that the computed rate and power gains accurately predict the gains of standard channel codes; • Simulations with imperfect channel knowledge at the base station show that the achievable rates of SQUID and QCM degrade as gracefully as those of LP-ZF.
We remark that our focus is on algorithms that approximate ZF based on channel inversion, i.e., there is no attempt to optimize transmit powers across subcarriers. This approach simplifies OFDM channel estimation at the receivers because the precoder makes all subcarriers have approximately the same channel magnitude and phase. For instance, a rapid and accurate channel estimate is obtained for each OFDM symbol by averaging the channel estimates of the subcarriers, see Section 4.1. Of course, it is interesting to develop algorithms for other precoders and for subcarrier power allocation.
This paper is organized as follows. Section 2 introduces the baseband model and OFDM signaling. Section 3 describes the MAGIQ and QCM precoders. Section 4 develops theory for achievable rates, presents complexity comparisons, and reviews a model for imperfect channel state information (CSI). Section 5 compares achievable rates and BERs with 5G NR LDPC codes. Section 6 concludes the paper. Figure 1 shows a MU-MISO system with N transmit antennas and K UEs that each have a single antenna. The base station has one message per UE and each antenna has a resolution of 1 bit for the amplitude (on-off switch) and b bits for the phase per antenna. All other hardware components are ideal: linear, infinite bandwidth, no distortions except for additive white Gaussian noise (AWGN).

Baseband Channel Model
The discrete-time baseband channel is modeled as a finite impulse response filter between each pair of transmit and receive antennas. Let where the noise ) T has circularly-symmetric, complex, Gaussian entries that are independent and have variance σ 2 , i.e., we have z ∼ CN (0, σ 2 I). The H[τ], τ = 0, . . . , L − 1, are K × N matrices representing the channel impulse response, i.e., we have where h kn [.] is the channel impulse response from the n-th antenna at the base station to the k-th UE. For instance, a Rayleigh fading multi-path channel with a uniform power delay profile (PDP) has h kn [τ] ∼ CN (0, 1/L) and these taps are independent and identically distributed (iid) for all k, n, τ.
The vector x[t] is constrained to have entries taken from a discrete and finite alphabet The transmit energy clearly satisfies x[t] 2 ≤ P and we define SNR = P/σ 2 . The inequality is due to the 0 symbol that permits antenna selection. Antenna selection was also used in [45] to enforce sparsity. Our intent is rather to allow antennas not to be used if they do not improve performance. Figure 1 shows how OFDM can be combined with the precoder. Let T = T F + T c be the OFDM blocklength with T F symbols for the DFT and T c symbols for the cyclic prefix. We assume that T F ≥ L and T c ≥ L − 1. For simplicity, all T F subcarriers carry data and we do not include the cyclic prefix overhead in our rate calculations below, i.e., the rates in bits per channel use (bpcu) are computed by normalizing by T F .

OFDM Signaling
Consider the frequency-domain modulation alphabetÛ that has a finite number of elements, e.g., for times t = 0, . . . , T F − 1 and UEs k = 1, . . . , K. For the simulations below, we generated theû k [m] uniformly from finite constellations such as 16-QAM or 64-QAM. We assume that E[û k [m]] = 0 for all k and m. Each UE k uses a DFT to convert its time-domain symbols y k [t] to the frequency-domain symbolŝ

Linear MMSE Precoding
To describe the linear MMSE precoder, consider the channel from base station antenna n to UE k: and denote its DFT asĥ The linear MMSE precoder (or Wiener filter) for subcarrier m is where , and I is the K × K identity matrix. The precoder multipliesû[m] by (7) for all subcarriers m, and performs N IDFTs to compute the resulting . We remark that ZF precoding is the same as (7) but with

Quantized Precoding
We wish to ensure compatibility with respect to LP-ZF. In other words, each receiver k should ideally see signals u k [t], t = 0, . . . , T − 1, that were generated from the frequency- [·] denotes the expectation with respect to the noise z[t]. The optimization problem is as follows: The parameter α in (8) and (9) can easily be optimized for fixed x[0], . . . , x[T − 1] and the result is (see [18] Equation (26)) For the MAGIQ and QCM algorithms below, we use alternating minimization to find the x[0], . . . , x[T − 1] and α. For the linear MMSE precoder, we label the α in (10) as α WF .
Observe that we use the same α for all K UEs because all UEs experience the same shadowing, i.e., all K UEs see the same average power. For UE-dependent shadowing, a more general approach would be to replace α with a diagonal matrix with K parameters α k , k = 1, . . . , K, and then modify (8) appropriately.

MAGIQ and QCM
For multipath channels, the vector x[t] influences the channel output at times t, t + 1, . . . , t + L − 1. A joint optimization over strings of length T seems difficult because of this influence and because of the finite alphabet constraint for the x n [t]. Instead, MAGIQ splits the optimization into sub-problems with reduced complexity by applying coordinate-wise minimization across the antennas and iterating over the OFDM symbol.
For this purpose, consider the precoding problem for time t starting at t = 0 and ending at t = T − 1. Observe that x[t ] influences at most L summands in (8), namely the summands for t = (t ) T , . . . , (t + L − 1) T where (t) T = min(t, T − 1). To compute the new cost after updating the symbol x n [t ], one may thus compute sums of the form for t = 0, . . . , T − 1. In both cases, one computes a first and second sum having the old and new x n [t ], respectively. One then takes the difference and adds the result to (8) to obtain the updated cost.
We remark that the time-domain cost function (8) is closely related to the frequencydomain cost functions in [38,40,41]. However, the time-domain approach is more versatile as it can include acyclic phenomena such as interference from previous OFDM blocks. The time-domain approach is also slightly simpler because updating the symbol x n [t ] in (8) or (11) requires taking the norm of at most L vectors of dimension K for each test symbol in X while the frequency-domain approach in ( [40] Equation (17)) takes the norm of T F vectors of dimension K for each test symbol. Recall that T F ≥ L, and usually T F ≥ 10L to avoid losing too much efficiency with the cyclic prefix that has length T c ≥ L − 1.
The MAGIQ algorithm is summarized in Algorithm 1. MAGIQ steps through time in a cyclic fashion for fixed α. At each time t, it initializes the antenna set S = {1, . . . , N} and performs a greedy search for the antenna n and symbol x n [t] that minimize (8) (one may equivalently consider sums of L norms as in (11)). The resulting antenna is removed from S and a new greedy search is performed to find the antenna in the new S and the symbol that minimizes (8) while the previous symbol assignments are held fixed. This step is repeated until S is empty. MAGIQ then moves to the next time and repeats the procedure. To determine α, MAGIQ applies alternating minimization with respect to α and the precoder output Simulations show that MAGIQ exhibits good performance and converges quickly [39]. However, the greedy selection considerably increases the computational complexity. We thus replace the minimization over S in line 9 of Algorithm 1) with a round-robin schedule or a random permutation. We found that both approaches perform equally well. The new QCM algorithm performs as well as MAGIQ but with a simpler search and a small increase in the number of iterations.
Finally, one might expect that α is close to the α WF of the transmit Wiener filter [6,7] since our cost function accounts for the noise power. However, Figure 2 shows that this is true only at low SNR. The figure plots the average α of the QCM algorithm, called α QCM , against the computed α WF for simulations with System A in Section 5. Note that α QCM is generally larger than α WF . if Algo = MAGIQ then 9: (x n , n ) = argminx n ∈X ,n∈S 10: 11:

Achievable Rates
We use generalized mutual information (GMI) to compute achievable rates [46,47], (Ex. 5.22) which is a standard tool to compare coded systems. Consider a generic input distribution P(x) and a generic channel density p(y|x) where x = (x 1 , . . . , x S ) T and y = (y 1 , . . . , y S ) T each have S symbols. A lower bound to the mutual information is the GMI where q(y|x) is any auxiliary density and s ≥ 0. In other words, the choices q(y|x) = p(y|x) for all x, y and s = 1 maximize the GMI. However, the idea is that p(y|x) may be unknown or difficult to compute and so one chooses a simple q(y|x). The reason why p(y|x) is difficult to compute here is because we will measure the GMI across the end-to-end channels from theû k [m] to theŷ k [m] and the quantized precoding introduces non-linearities in these channels. The final step in evaluating the GMI is maximizing over s ≥ 0. Alternatively, one might wish to simply focus on s = 1, e.g., see [48]. We study the GMI of two non-coherent systems: classic PAT and a blind detector that performs joint data and channel estimation. For both systems, we apply memoryless signaling with the product distribution where the x p,i are pilot symbols, 1(a = b) is the indicator function that takes on the value 1 if its argument is true and 0 otherwise, and P(x) is a uniform distribution. Joint data and channel estimation has S p = 0 so that we have only the second product in (14). At the receiver we use the auxiliary channel where the symbol channel q x,y (.) is a function of x and y. Observe that q x,y (.) is invariant for S symbols and the channel can be considered to have memory since every symbol x or y , = 1, . . . , S, influences the channel for all "times" i = 1, . . . , S. The GMI rate (13) simplifies to ∑ x,y One may approximate (16) by applying the law of large numbers for stationary signals and channels. The idea is to independently generate the B pairs of vectors . . , B, and then the following average rate will approach I q,s (X; Y)/S bpcu as B grows: where We choose the Gaussian auxiliary density q x,y (y|x) = 1 where for PAT the receiver computes joint maximum likelihood (ML) estimates with sums of S p terms: For the blind detector we replace S p with S in (20). Note that for the Gaussian channel (19) the parameter s multiplies 1/σ 2 q in (16) or (18), and optimizing s turns out to be the same as choosing the best parameter σ 2 q when s = 1. Summarizing, we use the following steps to evaluate achievable rates. Suppose the coherence time is S/T F OFDM symbols where S is a multiple of T F . We index the channel symbols by the pairs ( , m) where is the OFDM symbol and m is the subcarrier, 1 ≤ ≤ S/T F , 0 ≤ m ≤ T − 1. We collect the pilot index pairs in the set S p that has cardinality S p , and we write the channel inputs and outputs of UE k for OFDM symbol and subcarrier m asû k [ , m] andŷ k [ , m], respectively. 3. Each UE estimates its own channel h k and σ 2 q,k , i.e., the channel estimate (20) of UE k is For the blind detector, in (21) we replace S p with the set of all index pairs ( , m), and we replace S p with S; a in (18) for each UE k by averaging, i.e., the rate for UE k is a,k ; 6.
Compute the average UE rate R a = 1 K ∑ K k=1 R a,k . Our simulations showed that optimizing over s ≥ 0 gives s ≈ 1 if the channel parameters are chosen using (21).

Discussion
We make a few remarks on the lower bound. First, the receivers do not need to know α. Second, the rate R a in (17) is achievable if one assumes stationarity and coding and decoding over many OFDM blocks. Third, as S grows, the channel estimate of the blind detector becomes more accurate and the performance approaches that of a coherent receiver. Related theory for PAT and large S is developed in [49]. However, the PAT rate is generally smaller than for a blind detector because the PAT channel estimate is less accurate and because PAT does not use all symbols for data.
Next, let |Û | be the number of elements in the modulation setÛ . The blind detector must generate |Û | S likelihoods, which is prohibitively large unless |Û | and S are small. Moreover, the receiver must perform joint data and channel estimation. Blind detection algorithms can, e.g., be based on high-order statistics and iterative channel estimation and decoding. For polar codes and low-order constellations, one may use the blind algorithms proposed in [50]. We found that the PAT rates are very close (within 0.1 bpcu) of the pilot-free rates multiplied by the rate loss factor 1 − S p /S for pilot fractions as small as Of course, the transmitter needs to know the channel also, e.g., via time-division duplex, which requires the coherence time to be substantially larger. The main point is that channel estimation at the receiver is not a bottleneck when using ZF based on channel inversion. Finally, for the coded simulations we chose T F = 396 and S = 4T F = 1548 because the LDPC code occupies four OFDM symbols.

Algorithmic Complexity
This section studies the algorithmic complexity in terms of the number of multiplications and iterations. The complexity of SQUID is thoroughly discussed in [38] and Table 3 shows the order estimates take from [38] (Table I). Note the large number of iterations. Table 3. Algorithmic complexity.

Algorithm Multiplications per Iteration Iterations Pre-Processing Multiplications
The complexity of MSM depends on the choice of optimization algorithm and [42] considers a simplex algorithm. Unfortunately, the simplex algorithm requires a large number of iterations to converge because this number is proportional to the number of variables and linear inequalities that grow with the system size (N, K, T). An interior point algorithm converges more quickly but has a much higher complexity per iteration.
For MAGIQ and QCM, Equation (8) shows that updating x[.] requires updating L of the T terms that each require a norm calculation. The resulting terms u[t] 2 do not affect the maximization; terms such as αHx 2 2 can be pre-computed and stored with a complexity of NKL|X |, and then reused as they do not change during the iterations. On the other hand, products of the form αu H Hx must be computed for each of the L terms for each antenna update and at each time instance, resulting in a complexity of O(NKLT). The initialization requires KNT multiplications and one must transform the solutions to the time domain. We neglect the cost of updating α because the terms needed to compute it are available as a byproduct of the iterative process over the time instances.

Sensitivity to Channel Uncertainty at the Transmitter
In practice, the CSI is imperfect due to noise, quantization, calibration errors, etc. We do not attempt to model these effects exactly. Instead, we adopt a standard approach based on MMSE estimation and provide the precoder with channel matricesH[τ] that satisfy where 0 ≤ ε ≤ 1 and Z[τ] is a K × N matrix of independent, variance σ 2 h = 1/L, complex, circularly-symmetric Gaussian entries. Note that ε = 0 corresponds to perfect CSI and ε = 1 corresponds to no CSI. The precoder treatsH[τ] as the true channel realization for τ = 0, . . . , L − 1.

Numerical Results
We evaluate the GMIs of four systems. The main parameters are listed in Table 2 and we provide a few more details here. The average GMIs for Systems A-C were computed using S = 256, B = 200, and a blind detector. The coded results of System D instead have S = 1584 symbols to fit the block structure determined by the LDPC encoder. For System D we considered both PAT and a blind detector. For all cases, the GMI was computed by averaging over the sub-carriers, i.e., channel coding is assumed to be applied over multiple sub-carriers and OFDM symbols. The MAGIQ and QCM algorithms were both initialized with a time-domain quantized solution of the transmit matched filter (MF). Figures 3 and 4 show the average GMIs for System A with b = 2 and b = 3, respectively. In Figure 3, MAGIQ performs four iterations for each OFDM symbol while QCM performs six iterations. Observe that MAGIQ and QCM are best at all SNRs and they are especially good in the interesting regime of high SNR and rates. The gap to the rates over flat fading channels (L = 1) is small. SQUID with 64-QAM requires 100-300 iterations for SNR > 15 dB and a modified algorithm with damped updates, otherwise SQUID diverges. In addition, we show the broadcast channel capacity with uniform power allocation and Gaussian signaling as an upper bound for the considered scenario [52,53]. Figure 4 shows that QCM with three iterations operates within ≈0.2-0.4 dB of MAGIQ with five iterations when b = 3, which shows that QCM performs almost as well as MAGIQ.     Figure 5 compares achievable rates of QCM, SQUID, and MSM for a smaller system studied in [42]. We use PSK because the MSM algorithm was designed for PSK. The figure shows that MSM outperforms SQUID and QCM at low to intermediate SNR and rates, but QCM is best at high SNR and rates. This suggests that modifying the cost function (8) to include a safety margin will increase the QCM rate at low to intermediate SNR, and similarly modifying the MSM optimization to more closely resemble QCM will increase the MSM rate at high SNR. We tried to simulate MSM for System A but the algorithm ran into memory limitations (we used 2 AMD EPYC 7282 16-Core processors, 125 GB of system memory, and Matlab with both dual-simplex and interior-point solvers).
Consider next the Winner2 non-line-of-sight (NLOS) C2 urban model [51], which is more realistic than Rayleigh fading. The model parameters are as follows.
• Base station at the origin (x, y) = (0, 0); • 100 drops of 8 UEs placed on a disk of radius 150 m centered at (x, y) = (0, 200 m); the locations of the UEs are iid with a uniform distribution on the disc; • 8 × 10 uniform rectangular antenna array at the base station with half-wavelength dipoles at λ/2 spacing; • 5 MHz bandwidth at center frequency 2.53 GHz; • No Doppler shift, shadowing and pathloss. Figure 6 shows the average GMIs for LP-ZF and MAGIQ. At high SNR, there is a slight decrease in the slope of the MAGIQ GMI as compared to LP-ZF. This suggests that one might need a larger N or b. The performance for the Rayleigh fading model is better than for the Winner2 model but otherwise behaves similarly.     Figure 7 shows BERs for the LDPC code with 64-QAM. Each codeword is interleaved over 4 OFDM symbols, all 396 subcarriers, and the 6 bits of each modulation symbol by using bit-interleaved coded modulation (BICM). The interleaver was chosen randomly with a uniform distribution over all permutations of length 9504. The solid curves are based on estimating the channel with the transmitted symbols, i.e., these curves are for a genie-aided channel estimator and give lower bounds on the performance of a blind detector. The dotted curves show the performance of PAT when the fraction of pilots is S p /S = 10%. The pilots were placed uniformly at random over the four OFDM symbols and 396 subcarriers. A good blind detector algorithm that performs joint channel and data estimation should have BERs between the solid and dotted curves.
The dashed curves in Figure 7 show the SNRs required for the different algorithms based on Figure 3. In particular, the rate 5.33 bpcu requires SNRs of 9 dB, 12.9 dB, and 15.2 dB for LP-ZF, QCM, and SQUID, respectively. SQUID is run with 300 iterations and QCM is run with 6 iterations. Each UE computes its log-likelihoods based on the parameters (20) of the auxiliary channel. The GMI predicts the coded behavior of the system within approximately 1 dB of the code waterfall region, except for SQUID, where the gap is about 2 dB. The gap seems to be caused mainly by the finite-blocklength of the LDPC code, since the smaller gap of approximately 1 dB is also observed for additive white Gaussian noise (AWGN) channels. The sizes of the gaps are different, and the reason may be that the slopes of the GMI at rate 5.33 bpcu are different, see Figure 3. Observe that LP-ZF exhibits the steepest slope and SQUID the flattest at R a = 5.33 bpcu; this suggests that SQUID's SNR performance is more sensitive to the blocklength.

Conclusions
We studied downlink precoding for MU-MISO channels where the base station uses OFDM and low-resolution DACs. A QCM algorithm was introduced that is based on the MAGIQ algorithm in [39] (see also [19]) and which performs a coordinate-wise optimization in the time-domain. The performance was analyzed by computing the GMI for two auxiliary channel models: one model for pilot-aided channel estimation and a second model for a blind detector that performs joint channel and data estimation. Simulations for several downlink channels, including a Winner2 NLOS urban scenario, showed that QCM achieves high information rates and is computationally efficient, flexible, and robust. The performance of QCM was compared to MAGIQ and other precoding algorithms including SQUID and MSM. The QCM and MAGIQ algorithms achieve the highest information rates with the lowest complexity measured by the number of multiplications. For example, Figure 4 shows that b = 3 bits of phase modulation operates within 3 dB of LP-ZF. Moreover, BER simulations for a 5G NR LDPC code show that GMI is a good predictor of the coded performance. Finally, for noisy CSI the performance degradation of QCM and SQUID is qualitatively similar to the performance degradation of LP-ZF.