Low Complexity Hybrid Precoding Designs for Multiuser mmWave/THz Ultra Massive MIMO Systems

Millimeter-wave and terahertz technologies have been attracting attention from the wireless research community since they can offer large underutilized bandwidths which can enable the support of ultra-high-speed connections in future wireless communication systems. While the high signal attenuation occurring at these frequencies requires the adoption of very large (or the so-called ultra-massive) antenna arrays, in order to accomplish low complexity and low power consumption, hybrid analog/digital designs must be adopted. In this paper we present a hybrid design algorithm suitable for both mmWave and THz multiuser multiple-input multiple-output (MIMO) systems, which comprises separate computation steps for the digital precoder, analog precoder and multiuser interference mitigation. The design can also incorporate different analog architectures such as phase shifters, switches and inverters, antenna selection and so on. Furthermore, it is also applicable for different structures, namely fully-connected structures, arrays of subarrays (AoSA) and dynamic arrays of subarrays (DAoSA), making it suitable for the support of ultra-massive MIMO (UM-MIMO) in severely hardware constrained THz systems. We will show that, by using the proposed approach, it is possible to achieve good trade-offs between spectral efficiency and simplified implementation, even as the number of users and data streams increases.


I. INTRODUCTION
Over the last few years, significant advances have been made to provide higher-speed connections to users in wireless networks with several novel technologies being proposed to achieve this objective.However, future generations of communication systems will have to fulfil more demanding requirements that cannot be met by the methods adopted in today's communications systems.This motivates the exploration of other candidate technologies like the millimeter wave (mmWave) and Terahertz (THz) bands.These bands offer great underutilized bandwidths and also allow a simplified implementation of large antenna arrays, which are crucial to combat the severe signal attenuation and path losses that occurs at these frequencies [1]- [4].While these technologies (THz systems in particular), are expected to ease the spectrum limitations of today's systems, they face several issues, such as the reflection and scattering losses through the transmission path, the high dependency between distance and frequency of channels at the THz band and the need of controllable time-delay phase shifters, since the phase shift will vary with frequencies based on the signal traveling time, which will also affect the system performance.These limitations require not only the proper system design, but also the definition of a set of strategies to enable communications [5], [6].
The exploration of the potentialities of millimeter and submillimeter wavelengths is closely related to the paradigm of using very large arrays of antennas in beamforming architectures.This gives rise to the so-called ultra-massive multiple-input multiple-output (UM-MIMO) systems.Still, to achieve the maximum potential of these systems it is necessary to consider the requirements and the challenges related not only to the channel characteristics but also to the hardware component specially regarding THz circuits [5], [7], [8].Considering that high complexity and power usage are pointed out as the major constraints of large-antenna systems, the adoption of hybrid digital-analog architectures becomes crucial to overcome these issues.By adopting this type of design, it is possible to split the signal processing into two separate parts, digital and analog, and obtain a reduction of the overall circuit complexity and power consumption [9].Adopting a proper problem formulation, the analog design part can then be reduced to a simple projection operation in a flexible precoding or combining algorithm that can cope with different architectures, as we proposed in [10], [11].Despite the ultra-wide bandwidths available at mmWave and THz bands, and besides considering the problem of distance limitation, MIMO systems should take into account the operation in frequency selective channels [12].To make the development of hybrid schemes for these systems a reality, it is necessary to handle the fading caused by multiple propagation paths typical in this type of channels [13].Therefore, solutions inspired on multi-carrier schemes, such as orthogonal frequency division multiplexing (OFDM) are often adopted to address such problems [14].Spectral Efficiency (SE) of point-to-point transmissions is a major concern in SingleUser (SU) and MultiUser (MU) systems.To achieve good performances, it is necessary to develop algorithms that are especially tailored to the architecture of these systems.Several hybrid precoding schemes have been proposed in the literature [16]- [18].The authors of [15] proposed two algorithms for low complexity hybrid precoding and beamforming for MU mmWave systems.Even though, they assume only one stream per user, i.e., the number of data streams (Ns) is equal to the number of users (Nu), it is shown that the algorithms achieve interesting results when compared to the fully-digital solution.The concept of precoding based on adaptive RF-chain-to-antenna was introduced in [16] for SU scenarios only but with promising results.In [17], a nonlinear hybrid transceiver design relying on Tomlinson-Harashima precoding was proposed.Their approach considers fully-connected architectures only but can achieve a performance close to the fully-digital transceiver.A Kalman based Hybrid Precoding method was proposed for MU scenarios in [18].While designed for systems with only one stream per user and based on fully connected structures, the performance of the algorithm is competitive with other existing solutions.A hybrid MMSE-based precoder and combiner design with low complexity was proposed in [19].The algorithm is designed for MU-MIMO systems in narrowband channels, and it presents lower complexity and better results when compared to Kalman's precoding.Most of the hybrid solutions for mmWave systems aim to achieve near-optimal performance using Fully-Connected (FC) structures, resorting to phase shifters or switches.However, the difficulty of handling the hardware constraint imposed by the analog phase shifters or by switches in the THz band is an issue that limits the expected performance in terms of SE.
Array-of-SubArrays (AoSAs) structures have gained particular attention over the last few years as a more practical alternative to FC structures, especially for the THz band.In contrast to FC structures, in which every RF chain is connected to all antennas via an individual group of phase shifters (prohibitive for higher frequencies), the AoSA approach allows us to have each RF chain connected to only a reduced subset of antennas.The adoption of a disjoint structure with fewer phase shifters reduces the system complexity, the power consumption and the signal power loss.Moreover, all the signal processing can be easily carried out at the subarray level by using an adequate number of antennas [6].
Following the AoSA approach, it was shown in [20] that, to balance SE and power consumption in THz communications, adaption and dynamic control capabilities should be included in the hybrid precoding design.Therefore, Dynamic Arrays-of-SubArrays (DAoSA) architectures could be adopted.The same authors proposed a DAoSA hybrid precoding architecture which can intelligently adjust the connections between RF chains and subarrays through a network of switches.Their results showed that it is possible to achieve a good trade-off for the balancing between the SE and power consumption.Within the context of multiuser downlink scenarios, the authors of [21] studied some precoding schemes considering THz massive MIMO systems for Beyond 5 th Generation (B5G) networks.Besides showing the impact on EE and SE performance, carrier frequency, bandwidth and antenna gains, three different precoding schemes were evaluated and compared.It was observed that the hybrid precoding approach with baseband Zero Forcing for multiuser interference mitigation (HYB-ZF) achieved much better results than an ANalog-only BeamSTeering (AN-BST) scheme with no baseband precoder.In fact, this approach was capable of better approaching the upper bound defined by the singular value decomposition precoder (SVD-UB).Other relevant conclusion is that the design of precoding algorithms should be adapted to the communication schemes.While considering all the specific constraints may allow the maximization of the system performance of the system, formulating and solving the corresponding optimization problem may not be so simple.
Motivated by the work above, in this paper we developed an algorithm for hybrid precoding design which can accommodate different low-complexity architectures suitable for both mmWave and THz MU MIMO systems.It is based on the idea of accomplishing a near-optimal approximation of the fully digital precoder for any configuration of antennas, RF chains and data streams through the application of the alternating direction method of multipliers (ADMM) [22].ADMM is a well-known and effective method for solving convex optimization problems but can also be a powerful heuristic for several non-convex problems [22], [23].To use it effectively within the context of MU MIMO, proper formulation of the hybrid design problem as a multiple constrained matrix factorization problem is first presented.Using the proposed formulation, an iterative algorithm comprising several reduced complexity steps is obtained.The main contributions of this paper can be summarized as follows:  We propose a hybrid design algorithm with near fully digital performance, where the digital precoder, analog precoder and multiuser interference mitigation are computed separately through simple closed-form solutions.The hybrid design algorithm is developed independently of a specific channel or antenna configuration, which allows its application in mmWave and THz system.Whereas our previous work [10] also proposed an hybrid design algorithm for mmWave, it did not address multiuser systems, and in particular the MIMO broadcast channel.Therefore, it does not include any step for inter-user interference mitigation within its design.As we show in here, for this multiuser channel the hybrid design method must also deal with the residual inter-user interference as it can degrade system performance, particularly at high SNRs. We explicitly show how the proposed design can be applied to a DAoSAs approach, where a reduced number of switches are inserted at each AoSA panel which allows the connections to the RF chains to be dynamically adjusted.Through extensive simulations it is shown that our proposed solution is capable of achieving good trade-offs between spectral efficiency, hardware complexity and power consumption, proving to be a suitable solution for the deployment of ultra-massive MIMO especially in hardware constrained THz systems.
The paper is organized as follows: section II presents the adopted system model.The adopted formulation of the hybrid design problem for the MU MIMO scenario and the proposed algorithm are described in detail in section III, which includes the implementation of the algorithm for different analog architectures.Performance results are then presented in section IV.Finally, the conclusions are outlined in section V.
Notation: Matrices and vectors are denoted by uppercase and lowercase boldface letters, respectively.The superscript ( .)H

II. SYSTEM MODEL
In this section, we present the system and channel models adopted for the design of the hybrid precoding algorithm.Let us consider the OFDM base system illustrated in Fig. 1.In this case we have a mmWave/THz hybrid multiuser MIMO system, where a base station (BS) is equipped with tx N antennas and transmits to u N users equipped with rx N antennas over F carriers, as can be seen in Fig. 1  analog precoder (combiner) is located after (before) the IFFT (FFT) blocks, it is shared between the different subcarriers, as in [25], [26] where is the frequency domain channel matrix (assumed to be perfectly known at the transmitter and receiver) between the base station and the u th receiver at subcarrier k.Vector , respectively.Regarding the channel model, it is important to note that even though the mmWave and THz bands share a few commonalities, the THz channel has several peculiarities that distinguish it from the mmWave channel.For example, the very high scattering and diffraction losses in the THz band will typically result in a much sparser channel in the angular domain with fewer multipaths components (typically less than 10) [21].Furthermore, the gap between the line of sight (LOS) and non-line of sight (NLOS) components tends to be very large making it often LOS-dominant with NLOSassisted [26].An additional aspect relies on the much larger bandwidth of THz signals which can suffer performance degradation due to the so-called beam split effect, where the transmission paths squint into different spatial directions depending on the subcarrier frequency [21].In light of this, in this paper we consider a clustered wideband geometric channel, which is commonly adopted both in mmWave [15] and THz literature [20], [26], [27], [29].However, it should be noted that the hybrid precoding/combining approach proposed in this paper is independent of a specific MIMO channel.In this case the frequency domain channel matrices can be characterized as , , , where cl N denotes the scattering clusters with each cluster i having a time delay of , the k th subcarrier frequency, B is the bandwidth, fc is the central frequency and γ is a normalizing factor such that . By carefully selecting the parameters of the channel model we can make it depict a mmWave or a THz channel.Considering Gaussian signaling, the spectral efficiency achieved by the system for the transmission to MS-u in subcarrier k is [29] , , where , k u R is the covariance matrix of the total inter-user interference plus noise at MS-u, which is characterized by , , ,

III. PROPOSED HYBRID DESIGN ALGORITHM
In this section, we will introduce the algorithm for the hybrid precoding problem and show how it can be adapted to different architectures.Although we will focus on the precoder design, a similar approach can be adopted for the combiner.However, since our design assumes that inter-user interference suppression is applied at the transmitter, only single-user detection is required at the receiver and therefore the algorithm reduces to the one described in [10].

A. Main Algorithm
Although there are several problem formulations for the hybrid design proposed in the literature, one of the most effective relies on the minimization of the Frobenius norm of the difference between the fully digital precoder and the hybrid precoder [22], [30], [31], [32].In this paper we follow this matrix approximation-based approach which can be formulated as where is the set of feasible analog precoding matrices, which is defined according to the adopted RF architecture (it will be formally defined for several different architectures in the next subsection).Matrix opt k F denotes the fully digital precoder which can be designed so as to enforce zero inter-user interference using for example the block-diagonalization approach described in in [33].Even if F is selected in order to cancel all interference between users, the hybrid design resulting as a solution of ( 5)-( 7) will correspond to an approximation and, as such, residual interuser interference will remain.To avoid the performance degradation that will result from this, an additional constraint can be added to the problem formulation, namely ] min RF subject to To derive a hybrid precoder/design algorithm that can cope with the different RF architectures we can integrate the RF constraint directly into the objective function of the optimization problem.This can be accomplished through the addition of an auxiliary variable, R, combined with the use of the indicator function.The indicator function for a generic set  is defined as x  and +∞ otherwise.A similar approach can be adopted for integrating the other constraints, ( 11) and ( 12), also into the objective function.The optimization problem can then be rewritten as , aprox RF BB , , . The augmented Lagrangian function (ALF) for ( 13)-( 16) can be written as where Based on the ADMM [22], we can apply the gradient ascent to the dual problem involving the ALF, which allows us to obtain an iterative precoding algorithm comprising the following sequence of steps.We start with the minimization of the ALF over RF F for iteration 1 t  defined as which can be obtained from leading to the closed form expression After obtaining the expression for RF F , ( 1) BB t  F can be found by following the same methodology.In this case the minimization is expressed as from which by applying leads to the closed form expression .
The next steps consist of the minimization over R and k B . The minimization of (18) with respect to R and k B can be written as and where and onto the set of matrices whose squared Frobenius norm is u s N N , respectively.While the former projection depends on the adopted analog architecture and will be explained in the next subsection, the second projection is simply computed as The minimization of (18) , } , which also involves a projection, The general solution for this problem is presented in [30] corresponding to Reordering the column vectors in the original matrix form results the final expression which can be rewritten as In this expression, V denotes the matrix containing the right singular vectors corresponding to the nonzero singular values associated to the singular value decomposition (SVD) given by Therefore, to compute matrix X one can perform a single value decomposition of , k u H and then use this to remove the projection of A onto the row space of , k u H . Finally, the expressions for the update of dual variables U, W and Z are given by Appropriate values for the penalty parameters can be obtained in a heuristic manner by performing numerical simulations.Regarding the initialization and termination of the algorithm, the same approach described in [10] can be adopted.The whole algorithm is summarized in Table I.In this table, Q denotes the maximum number of iterations.24), for all k=1, ..., F.

14:
1 The projection operation is the only step specific to the implemented architecture, as will be explained in the next subsection.The projection operation is the only step specific to the implemented architecture, as will be explained in the next subsection.

B. Analog Rf Precoder/Combiner Structure
The projection required for obtaining matrix R in step 5 of the precoding algorithm, has to be implemented according to the specific analog beamformer [6], [20], [34]- [38].This makes the proposed scheme very generic, allowing it to be easily adapted to different RF architectures.In the following we will consider a broad range of architectures that can be adopted at the RF precoder for achieving reduced complexity and power consumption implementations.We will consider FC, AoSA and DAoSA structures as illustrated in Fig. 2.Besides phase shifters, we will also consider several alternative implementations for these structures, as shown in Fig. 3.

1) Unquantized Phase Shifters (UPS)
In the first case we consider the use of infinite resolution phase shifter.For this architecture the RF constraint set is given by and the corresponding projection can be performed simply using 2) Quantized Phase Shifters (QPS) The second case considers a more realistic scenario, in which phase shifters can be digitally controlled with b N bits.These devices allow the selection of 2 b N different quantized phases and the RF constraint set becomes The implementation of the projection in line 5 of Table I can be obtained as the following element-wise quantization 3) Switches and inverters (SI) Assuming that 1 b N  , then each variable phase shifter of the previous architecture can be replaced by a pair of switched lines, including also an inverter.The corresponding constraint set can be reduced to and the implementation of the projection simplifies to 4) Switches (Swi) Alternatively, each of the variable phase shifters can be replaced by a switch.This simplification results in a network of switches connecting each RF chain to the antennas.The RF constraint set can be represented as and the projection can be implemented elementwise as

5) Antenna Selection (AS)
The simplest scenario that we can consider corresponds to an architecture, where each RF chain can be only connected to a single antenna (and vice-versa).The RF constraint set will comprise a matrix with only one nonzero element per column and per row, i.e.,

 
, , ,: In this definition 0 .represents the cardinality of a vector.

Defining
The computation of j t is performed for all columns j=1, …, tx RF N , sorted by descending order in terms of highest real components.It should be noted that during this operation, the same row cannot be repeated.

6) Array-of-Subarrays (AoSA)
Within the context of UM-MIMO, one of the most appealing architectures for keeping the complexity acceptable relies on the use of AoSA, where each RF chain is only connected to one or more subsets of antennas (subarrays the projection can be implemented by setting all the elements in X as 0 except for the subblocks in each column j which fulfill , assuming UPS in these connections.Clearly, the phase shifters can be replaced by any of the other alternatives presented previously.

7) Dynamic Array-of-Subarrays (DAoSA)
As a variation of the previous AoSA architecture, we also consider an implementation where each subarray can be connected to a maximum of max L RF chains (which can be non-adjacent).In this case, the constraint set comprises matrices where each contains a maximum of max L columns with constant modulus elements.The rest of the matrix contains only zeros.In this case, starting with X=0, the projection can be obtained by selecting the max L columns of where 1,..., SA j n  with the largest 1-norm and setting the corresponding elements of R as , assuming the use of UPS.Care must be taken to guarantee that at least one subblock will be active in every column of R.
Similarly to the AoSA, the phase shifters can be replaced by any of the other presented alternatives.

8) Double Phase Shifters (DPS)
Another appealing architecture relies on the use of double phase shifters (DPS) since these remove the constant modulus restriction on the elements of RF F , following the idea in [38].
In this case the projection can be implemented elementwise simply as Similarly to other architectures, DPS can be used not only in the fully connected approach but also in the AoSA and DAoSA cases, replacing the constant modulus setting operation.

C. Complexity
In the proposed algorithm, the Table II presents the total complexity order of the proposed method and compares it against other existing low complexity alternatives namely, AM -Based [15], LASSO -Based Alt-Min (SPS and DPS) [14] and element-by-element (EBE) [20] algorithms.Taking into account that in UM-MIMO, Ntx will tend to be very large, it means the algorithms with higher complexity will typically be EBE and the one proposed in this paper due to the terms  


It is important to note however, that while the computational complexity of these two design methods may be higher, both algorithms can be applied to simple AoSA/DAoSA architectures and, in particular, the proposed approach directlysupports structures with lower practical complexity (and more energy efficient) such as those based on switches.Furthermore, in a single-user scenario, the interference cancellation step of the proposed algorithm is unnecessary, and the complexity reduces to . Regarding the other algorithms, they have similar complexities.However, the AMbased algorithm is designed for single stream scenarios whereas the others consider multiuser multi-stream scenarios.

IV. NUMERICAL RESULTS
In this section, the performance of the proposed algorithm will be evaluated and compared against other existing alternatives from the literature, considering multiuser MIMO systems.We consider that both the transmitter and receivers are equipped with uniform planar arrays (UPAs) with 1,..., , is assumed (in this case we are admitting very weak NLOS paths compared to LOS which is typical in the THz band [28]).A fully digital combiner was considered at each receiver and all simulation results were computed with 5000 independent Monte Carlo runs.N N .Besides our proposed precoder, several alternative precoding schemes are compared against the fully digital solution, namely the LASSO-Based Alt-Min, the AM-Based and ADMM-Based precoding [14], [15], [10].It can be observed that when F=1, only the LASSO-Based Alt-Min with single phase shifters (SPS) and the ADMM-Based precoder from [10] (which does not remove the inter-user interference) lie far  from fully digital precoder.All the others achieve near optimum results and, in fact, can even match them when adopting DPS (proposed approach and LASSO-Based Alt-Min).As explained in Section II, whereas for F=1 we have BB F and RF F designed for that specific carrier, when F=64, RF F has to be common to all subcarriers.While this reduces the implementation complexity, it also results in a more demanding restriction that makes the approximation of opt k F

A. Fully Connected Structures
(problem ( 5)-( 7)) to become worse.Additionally, when this approximation worsens, there can also be increased interference between users.Therefore, it can be observed in the results of Fig. 5 that the gap between the fully digital precoder and all the different hybrid algorithms is substantially wider.Still, the proposed precoder manages to achieve the best results.Given the performances of the different approaches, it is important to remind that the AM-based precoding algorithm has the lowest performance in wideband but also one of the lowest computational complexity (see Table II of section III.C).In general, the proposed precoding algorithm is the one that can achieve better results at the cost of some additional computational complexity.Later on, we will address strategies based on lower complexity architectures that will allow reducing the power consumption associated to its complexity.In Fig. 6 we consider a scenario where the BS employs a larger array with 256 tx N  antennas to transmit 2 s N  simultaneous streams to each user, where 2 u N  .To better fit this scenario to a typical communication in the THz band we consider the existence of a LOS component, a center frequency of fc=300 GHz and a bandwidth of B=15 GHz (it is important to note that the beam split effect is also considered in the channel model).AM precoder from [15] requires a single stream per user and thus was not included in the figure.In this scenario, the LASSO-Based Alt-Min precoding schemes present a performance substantially lower when compared to the proposed approaches.Furthermore, the best performance is achieved with the use of double phase shifters, as expected.Once again, comparing the curves of the proposed precoder against the ADMM-based precoder from [10], it is clear the advantage of adopting an interference cancellation-based design over a simple matrix approximation one.

B. Reduced Complexity Architectures
Next, we will focus on the adoption of different reduced complexity architectures according to the typologies presented in section III.B.The objective is to evaluate the performance degradation when simpler architectures are adopted.  and    .This figure is placed in a perspective of simplifying the implementation of the analog precoder but keeping a fully connected structure.We can observe that the versions based on DPS and single UPS achieve the best results, as expected.Considering the more realistic QPS versions, the results can worsen but it is visible that it is not necessary to use high resolution phase shifters since with only 3 bits resolution the results are already very close to the UPS curve.It can also be observed that the simplest of the architectures, AS, results in the worst performance but the spectral efficiency improves when the antenna selectors are replaced by a network of switches, or even better if branches with inverters are also included.
In Fig. 8, we intend to simplify the implementation even further with the adoption of AoSAs.In this case we considered that the maximum number of subarrays that can be connected to a RF chain ( max L ) is only one.The scenario is the same of Fig. 7 but considers the existence of a LOS component.In fact, hereafter the existence of a LOS component is assumed for the remaining figures of the paper in order to fit the AoSA/DAoSA results to a more typical scenario in the THz band.We can observe for AoSA structures, the degradation of the spectral efficiency is notorious, since all candidate versions present worse results when compared to the corresponding fully connected design and are all far from the fully digital solution.To reduce the large performance loss due to the adoption of a simple AoSA architecture, we can allow the dynamic connection of more subarrays to each RF chain by adopting a DAoSA structure, as introduced in section III.B.In Fig. 9 we study the effect increasing the maximum number of subarrays that can be connected to an RF chain ( max L ) in the performance of these schemes.Each subarray has a size of 32 antennas (nt).Curves assuming the use of SPS as well as of DPS are included.It can be observed that the increase in the number of connections to subarrays, max L , has a dramatic effect on the performance, resulting in a huge improvement by simply going from Lmax=1 to Lmax=2.Increasing further to Lmax=4, the results become close to the fully connected case showing that the DAoSA can be a very appealing approach for balancing the spectral efficiency with hardware complexity and power consumption.Combining the increase of Lmax with the adoption of DPS can also improve the results but the gains become less pronounced for Lmax>1.It is important to note that the penalty parameters can be fine-tuned for different system configurations.One of the objectives of adopting these low complexity solutions is to reduce the overall power consumption.Based on [20], we can calculate the total power consumption of each precoding scheme using where P BB is the power of the baseband block (with NBB=1), P DAC is the power of a DAC, P OS is the power of an oscillator, P M is the power of a mixer, P PA is the power of a power amplifier, P PC is the power of a power combiner, P PS is the power of a phase shifter, P SW I is the power of a switch and P T denotes the transmit power.The Nx variable represents the number of elements of each device used in the precoder configuration.
For the fully-connected structure with UPS, we assumed that PPS=100 mW which corresponds to quantized phase shifter with Nb=4 bits [39].For the remaining phase-shifter based precoder structures we assumed that PPS=40 mW which corresponds to quantized phase shifters with Nb=3 bits, since with only 3 bits resolution the results are already very close to the UPS curve (see Fig. 7).As can be seen from this table, the use of architectures based on DAoSAs allows us to reduce considerably the amount of power that is consumed at the precoder.In fact, we can reduce up to 55% the amount of consumed power if we consider a precoder scheme based on DAoSA with DPS and Lmax=4 versus a FC structure precoder based on UPS, with only a small performance penalty (Fig. 9).This saving increases to 73% if the DPS structure is replaced by an SPS one.In the particular case of architectures based on quantized phase shifters, we observed that by decreasing the number of quantization bits, it is possible to substantially reduce the power consumption without excessively compromising the complexity (as seen in Fig. 7).The conclusion is corroborated by [20] and [39], since the architectures based on low resolution QPS, AoSAs and in [41]).While we have shown how the proposed approach can deal with several relevant types of analog precoders/combiners, it is important to note that are other alternative structures that have been recently proposed in the literature.For example, some authors have considered precoding paradigms based on time-delayers structures for THz systems [28], [42].One of the most notorious is the Delay Phase Precoding (DPD), which consists in the use of a Time Delay (TD) network between the RF chains and the traditional phase shifters network in order to convert phasecontrolled analog precoding into delay-phase controlled analog precoding.The main advantage related with this type of precoding is that the time delays in the TD network are carefully designed to generate frequency-dependent beams which are aligned with the spatial directions over the whole bandwidth [42].While we do not address the adoption of time-delay structures in this paper, it should be possible do derive a projection algorithm that simultaneously takes into account the constraints imposed in both analog-precoding steps: time-delay network and frequency-independent phaseshifters.

V. CONCLUSION
In this paper, we proposed an iterative algorithm for hybrid precoding design which is suitable for multiuser MIMO systems operating in mmWave and THz bands.The adopted approach splits the formulated design into a sequence of smaller subproblems with closed-form solutions and can work with a broad range of configuration of antennas, RF chains and data streams.The separability of the design process allows the adaptability of the algorithm to different architectures, making it suitable to be implemented with low-complexity AoSA and DAoSA structures which are particularly relevant for the deployment of ultra-massive MIMO in hardware constrained THz systems.It was shown that good trade-offs between spectral efficiency and hardware implementation complexity can in fact be achieved by the proposed algorithm for several different architectures.


I and u  denotes the average received power.The digital baseband precoders and combiners are


and r a y N propagations paths.


are the complex gains of the LOS component and of the l th ray from cluster i.Index u is the user ( u H is a matrix corresponding to k H with the r x N lines of user u removed) which we denote as

.
In this case, the corresponding elements of R are set as     4 in TableI) are defined using closed-form expressions that encompass several matrix multiplications, sums and an RF
the antenna indices, λ is the signal wavelength and d is the inter-element spacing, which we assume to be 2 d   .We consider a sparse channel with limited scattering where ray 4 N  and 6 cl N  .The angles of departure and arrival were selected according to a Gaussian distribution whose means are uniformly distributed in   0,2 and whose angular spreads are 10 degrees.In the scenarios where we consider the existence of a LOS component,

First, we evaluate
the performance assuming a fully connected structure.Simulation results for a scenario, where a base station with 100 are shown in Fig. 4 for F=1 and Fig. 5 for F=64.The number of RF chains at the transmitter ( tx RF N ) is equal to u s

Fig. 4 . 4 u
Fig. 4. Spectral efficiency versus SNR achieved by different methods with 4 u N  , 1 s N  ,

Fig. 5 . 4 u
Fig. 5. Spectral efficiency versus SNR achieved by different methods with 4 u N  , 1 s N  ,

Fig. 6 . 2 u
Fig. 6.Spectral efficiency versus SNR achieved by different methods with 2 u N  , 2 s N  ,

Fig. 7 . 4 u
Fig. 7. Spectral efficiency versus SNR achieved by the proposed precoder using different fully-connected architectures for 4 u N  , 2 s N  ,

Fig. 7
Fig. 7 considers a scenario in which we have more than one data stream ( 2 s N  ) being sent from the BS to each user (

Fig. 8 .
Fig. 8. Spectral efficiency versus SNR achieved by the proposed precoder using different AoSA architectures with max 1 L  , 4 u N  , 2 s N  ,

Fig. 9 .
Fig. 9. Spectral efficiency versus SNR achieved by the proposed precoder considering an architecture based on DAoSAs and the variation of the maximum number of subarrays that can be connected to a RF chain ( ax m L Regarding the phase shifters, we assume values of PPS=10, 20, 40, 100 mW for 1, 2, 3 and 4 quantization bits.Considering the same configuration scenario as Figures 7we provide the values of power consumption for different precoder configurations in Table

Fig. 12 . 4 u
Fig. 12. Spectral efficiency versus SNR achieved by different methods for a mmWave/THz MIMO-OFDM system with 4 u N  , 1 s N  , . Regarding the analog precoder and combiner, which are represented by matrices RF procedure to compute the projection of matrix A onto the null-space of

TABLE I GENERAL
ITERATIVE HYBRID DESIGN ALGORITHM.