Precoded Generalized Spatial Modulation for Downlink MIMO Transmissions in Beyond 5G Networks

: The design of multiple input multiple output (MIMO) schemes capable of achieving both high spectral and energy e ﬃ ciency constitutes a challenge for next-generation wireless networks. MIMO schemes based on generalized spatial modulations (GSM) have been widely considered as a powerful technique to achieve that purpose. In this paper, a multi-user (MU) GSM MIMO system is proposed, which relies on the transmission of precoded symbols from a base station to multiple receivers. The precoder’s design is focused on the removal of the interference between users and allows the application of single-user GSM detection at the receivers, which is accomplished using a low-complexity iterative algorithm. Link level and system level simulations of a cloud radio access network (C-RAN) comprising several radio remote units (RRUs) were run in order to evaluate the performance of the proposed solution. Simulation results show that the proposed GSM MU-MIMO approach can exploit e ﬃ ciently a large number of antennas deployed at the transmitter. Moreover, it can also provide large gains when compared to conventional MU-MIMO schemes with identical spectral e ﬃ ciencies. In fact, regarding the simulated C-RAN scenario with perfect channel estimation, system level results showed potential gains of up to 155% and 139% in throughput and coverage, respectively, compared to traditional cellular networks. The introduction of imperfect channel estimation reduces the throughput gain to 125%.

combination of active antennas and, also in the modulated symbols transmitted in the active antennas, GSM can also achieve a greater SE than single antenna communications.
GSM can be considered as a compromise between conventional MIMO and simple radio frequency (RF) transmissions, since only a subset of the available transmission antennas is active for a certain period of time, thus reducing the number of RF chains required. Several detectors have been reported in the literature for single user scenarios. The authors of [6] proposed a minimal average square block error detector (OB-MMSE) that can achieve a close to optimal performance, while its required complexity is much lower when compared to other detectors. This detector uses an algorithm that sorts the possible transmit antenna combinations (TAC), followed by the detection in sequence of the possible signal vector for each TAC using block minimum mean square error (MMSE). A termination threshold must be applied in order to reduce the number of tested TACs. Although this detector is able to achieve near-optimal performance, it can incur in substantial complexity in large scenarios. A different GSM iterative detector is proposed in [7], which is based on dividing the problem of the maximum likelihood detection (MLD) into a sequence of simpler steps, such as the minimization of the unrestricted Euclidean distance, the projection of the elements onto the signal constellation and the projection onto the set of valid active antenna combinations. This approach allows a substantial complexity reduction when compared with the optimal MLD while still achieving near-optimal performance.
Although the wide range of precoding schemes referred to in the literature for MIMO systems considering both uplink and downlink scenarios [8][9][10][11][12], there is a significant imbalance between the number of approaches aimed at those scenarios for GSM-based schemes. In fact, there are very few studies that have extended the use of GSM to downlink MU [13]. Despite describing a system for scalable video broadcast communications in [3], the proposed scheme also considered the use of GSM for multiple users. However, the removal of inter-user interference is made at the receiver, which demands a large number of antennas at the users. A better suited alternative for dedicated links relies on removing inter-user interference at the transmitter through a precoder. This approach is often applied for conventional MU-MIMO as presented in [14], where the authors describe a precoder that accomplishes block diagonalization (BD) of the equivalent channel matrix. The proposed BD precoding guarantees zero inter-user interference and can be thought of as a generalization of channel inversion. Despite the similarity of the approach proposed by the authors in [15], their method cannot only provide improved bit error rate (BER) and throughput performances, but also additional diversity gain by adopting a partial nulling technique for the generalized block diagonalization (GBD). A few precoded schemes have been introduced for spatial modulations (SM) and GSM since then. A new precoder scheme for the downlink of MU-SM systems was proposed in [16], which exploits the channel status information (CSI) at the base station (BS). Here, a precoding matrix is computed, which allows the MU downlink system to be broken down into several independent single user SM systems. A precoded scheme designed for multi-user (MU) GSM systems was reported in [17], with the aim of eliminating all inter-user interference while maintaining the antenna selection features of GSM, which means that only some of the antennas are active, while the rest are silenced. Both proposals of [16,17] are limited in terms of spectral efficiency, since the first one was only defined for SM, while the later was designed specifically for a version of GSM, where the M-quadrature amplitude modulations (M-QAM) symbols are the same in all active antennas.
CSI is fundamental in channel estimation process in order to enable uplink and downlink transmissions in MIMO systems. However, the channel estimation for downlink transmissions on massive MIMO systems operating at frequency division duplexing (FDD) represents a very complex problem, since it is unfeasible for practical applications [18,19]. Time Division Duplexing (TDD) represents an interesting solution that can be used as alternative in order to overcome the aforementioned problem in context of downlink transmission in FDD systems. Considering the use of TDD mode, it is possible to exploit the channel reciprocity, which allows the estimation of the downlink channel by the base station through the uplink channel information. In the uplink scenario, orthogonal pilot signals are sent from the users to the base station, and based on that, signals at the base station will estimate the CSI to the user equipment (UE). After accomplishing this task, the base station beamforms the downlink data towards the UE. Considering that there is a limited number of orthogonal pilots that can be reused between cells, a pilot contamination issue may appear and become a critical problem for massive MIMO channel estimation [20]. In order to overcome this issue and others, such as the increasing amount of required hardware and computational complexity cost due to use of large number of required antennas in those schemes, several channel estimation algorithms have been developed over last few years [21][22][23]. The success of the channel estimation process affects the performance of massive MIMO schemes [24] and, such as, should also be taken into account in the system evaluation. It is important to highlight that even though we will not cover in this paper, massive MIMO systems such as the GSM schemes addressed here are prone to several hardware impairments such as non-linear distortions from power amplifier, I/Q imbalance, sampling jitter, and finite-resolution quantization in analog digital converters (ADCs) [20]. To reduce the impact of these effects on the overall performance of the system, compensation algorithms can be developed to mitigate the impairments.
Another issue that must be considered with the introduction of 5G and beyond is the extreme densification of the network, which requires an increase in the network capacity [25,26]. Poor cell-edge coverage and throughput are the most limiting factors of 4G cellular radio access network (RAN). Some research has been dedicated to decrease inter-cell interference by base station coordination and coordinated beamforming [27,28]. Coordinated multi-point (CoMP) transmission or reception is one of the key techniques in 5G that mitigates inter-cell interference (ICI) from neighboring cells, providing higher spectral efficiency and coverage. CoMP indeed extends the cell coverage area and improves cell edge throughput. Joint processing coordinated multipoint transmission (JP-CoMP) requires the clustering of neighboring cells and cooperative transmission within each cluster. Clustering algorithms can be static and dynamic, centralized or distributed [29,30]. Static clustering relies on a predetermined fixed base station cluster. Each static clustering algorithm utilizes different strategies to determine the efficient cluster formation. The network then decides on base station clusters. Dynamic clustering adapts to network changes, where the usual methods are designed based on centralized control on the network, which requires extensive information sharing. In our study, we only consider static clustering based on channel state information (CSI). The techniques mentioned above are essential to improve the overall spectral and energy efficiencies and also increase the throughput and coverage gains, when compared to traditional cellular networks [31].
Motivated by the work above, in this paper we provide a study on MU-MIMO systems, where GSM symbols are transmitted simultaneously to multiple users (differences between the proposed approach and a conventional MU-MIMO assumed as reference are shown in Table 1). To increase the SE of the transmission, different modulated symbols are sent on different (virtual) antennas, where high-order M-QAM constellation with sizes reaching M = 1024 symbols are considered. To remove inter-user interference and transform the MU transmission into several independent SU links, a BD precoder is applied at the BS, while a modified and improved version of the low-complexity SU GSM detector presented in [7] is used at the receiver.
The influence of imperfect channel estimation on the performance of this massive MIMO GSM-based system is also analyzed. Link level simulations show that the presented GSM MU-MIMO approach can provide substantial performance gains over conventional MU-MIMO. Additionally, system level simulations show that deployments based on cloud-RAN (C-RAN) comprising several radio remote units (RRUs) can achieve large throughput and coverage gains over traditional cellular networks. The paper is organized as follows: Section 2 presents the model for the MU GSM system, Section 3 presents the transmitter and receiver structure followed by the numerical results obtained in Section 4. Finally, the conclusions are outlined in Section 5.

System Model
Let us consider a downlink MU-MIMO system where a BS transmits simultaneously to N u users. The BS is equipped with N tx antennas and each user has N rx antennas, as illustrated in Figure 1.
We assume that the signal can be represented as , where s k ∈ C N s ×1 contains the information transmitted to user k and N s ≤ N tx /N u .

System Model
Let us consider a downlink MU-MIMO system where a BS transmits simultaneously to u N users. The BS is equipped with tx N antennas and each user has rx N antennas, as illustrated in  Considering that GSM is being used, only a N positions of k s are nonzero. These correspond to active indexes, which carry M-QAM modulated symbols. The signal vector of each user can be written as where s j k ∈ A( j = 0, . . . , N a − 1) with A denoting an M-QAM complex valued constellation set. According to this model, the information will be divided in such a way that part of the data are used to select an active index (AI) from a total N conb = 2 log 2 ( Ns Na ) AI combinations (AICs) available per user. The remaining data are mapped onto N a complex-valued M-QAM symbols. The resulting SE is then bits per channel use (bpcu).

Transmitter and Receiver Structure
In this section the transmitter and receiver structures' design will be addressed. The receiver design will be based on the alternating direction method of the multipliers (ADMM), which will be explained further in Section 3.2.

Transmitter Design
Channel state information at the transmitter (CSIT) will be used to pre-process the symbols through a linear precoder F = [F 0 , . . . , F N u −1 ], where F k ∈ C N tx ×N s . Considering that the transmitted signal propagates through a flat fading channel, the baseband signal received by user k can be written as where In this expression, H k ∈ C N rx ×N tx corresponds to the channel matrix for the link between the BS and user k and n k ∈ C N rx ×1 is the noise vector with samples taken according to a zero-mean circularly symmetric Gaussian distribution with covariance 2σ 2 I N rx . The first term in (4) is related to the desired signal and the second one is the interference caused by the other users' signals. Moreover, the multiuser interference can be eliminated by using a BD method as proposed in [9]. Following this approach, the equivalent overall channel matrix HF, with H = H 0 T , . . . , H N u −1 T T , will become block diagonal. A simple BD precoder without any power loading optimization is assumed in this paper, with each precoder matrix F k designed so as to enforce that H i F k = 0 for all i k. This particular condition can be satisfied using vectors selected from the null space of matrixH k , which is defined as H k corresponds to the concatenation of the channel matrices between the BS and all users except user k. An orthonormal basis for the null space of H k can be found by computing its singular value decomposition (SVD) asH whereŨ k is the matrix with the left-singular vectors andΛ k is a rectangular diagonal matrix containing the nonzero singular values.Ṽ k (1) andṼ k (0) contain the right singular vectors corresponding to the nonzero singular values and the null singular values, respectively. F k is obtained fromṼ k (0) by selecting its first N s columns. In this case, the signal arriving at each receiver reduces to Appl. Sci. 2020, 10, 6617 6 of 19 whereĤ k = H k F k is the equivalent single user channel seen by the receiver.

Receiver Design
Considering the system model and the BD precoder described in the previous sections, each receiver will have to apply simple single user GSM detection. This can be seen as an attempt to solve the MLD problem related to receiver k, which is formulated as where A 0 de f = A ∪ {0} and S denotes the set of valid AICs, which has a size of N comb . Solving this non-convex problem directly would require excessive or even unfeasible computational complexity for moderate to large problem settings. To tackle the problem, we adopt instead a similar approach to the one we applied in [7], which is based on the idea of using ADMM as an heuristic for splitting a complex problem into a sequence of simpler ones (as addressed in [32]). Being an heuristic based approach, there will be no guarantee that the resulting algorithm will converge to the solution of the original MLD problem. While this means that the detector will be suboptimal, it will require a much lower computational cost. Following a similar derivation to the one provided in [7], we can arrive at the iterative detection algorithm shown in Table 2 which can be used in each GSM receiver. Table 2. Iterative GSM detection algorithm for each user k. 6: 8: 10: if f (ŝ candidate ) < f best then 11:ŝ k,I ← 0,ŝ k,I ←ŝ I candidate .

17: Output:ŝ k
In this table, u, w ∈ C N tx /N u ×1 are scaled dual variables and ρ x and ρ z are penalty parameters associated to constraints (9) and (10). A careful tuning of these parameters will ensure that the algorithm reaches a good performance during its execution. In the algorithm, Q is the maximum number of iterations, D (.) denotes the projection onto set D = {s : supp(s) ∈ S}, and A 0 Ns (.) is the projection over A 0 N s . The projection over set D can be accomplished by keeping the N a largest magnitude elements whose indices also match a valid antenna combination, whereas A 0 Ns (.) can be computed as simple rounding of each component to the closest element in A 0 .
Although heuristic-based approaches as the one adopted in the proposed GSM detector can reach a solution faster, it may not be the optimal one. Therefore, it is not guaranteed that the algorithm will converge to the optimal solution of the original MLD problem (which is nonconvex). To increase the chances of finding an optimal solution and to improve the performance of the GSM detector, we present several different strategies. The first method is the simplest and consists of restarting the algorithm multiple times by using different initializations [32] for the variables u 0 , w 0 , x u (0) , z 0 required by the algorithm. Another improvement strategy that we propose relies on checking at the end of the algorithm if any of the P neighboring candidates result in an improvement of f (ŝ candidate ). These P neighbors can be selected amongst those with the closest supports using the algorithm presented in Table 3. A last possible refinement method that we consider consists of re-solving the MLD problem with the support set fixed according to the candidate pointŝ k generated by the main algorithm. Table 3. Solution refinement algorithm based on a closest neighbor search for user k.

13: Output:ŝ k
In this case, the resulting formulation becomes a conventional MIMO detection problem which can also be approximated by a simple projected MMSE estimate, i.e., aŝ We refer to this third approach as the MMSE polishing step. In terms of computational complexity, the BD precoding requires the computation of N u SVDs, which is the step with the heaviest cost resulting in a complexity order of O N u 4 N rx 3 + N u 2 N 2 tx N rx . This cost is supported by the BS which typically can have higher computational capabilities. More critical is the required computational complexity at the users. Regarding the receiver, the s-update step (line 5 of Table 2) has the highest cost as it involves an N tx /N u × N tx /N u matrix inversion (although it is only computed in the beginning of the algorithm). Considering a fixed number of iterations, the total complexity order is O (N tx /N u ) 3 . [33] is O N comb N a 3 and of multipath matching pursuit with slicing (sMMP) [34]

For comparison, the complexity order of MLD is
T is the number of child candidates expanded at each iteration). Therefore, the proposed approach has a similar complexity order to the linear MMSE. Note that the complexity of OB-MMSE does not grow exponentially with the signal constellation size, M, like in the case of MLD, but it still depends on N comb = 2 which can restrict its use when a large number of bits are conveyed on antenna indices.

Numerical Results
In this section, we present numerical simulations, both link level and system level. Link performance results, namely, block error rate (BLER), are used as input by the system level simulator. The system is illustrated in Figure 2, where the C-RAN is comprised of 19 radio remote units (RRUs) connected through fiber to a central unit (CU), each RRU with N = 60 active pedestrian users. Each RRU consists of three transmission and reception points (TRP), each one equipped with N tx total = 256 antennas while users have N rx antennas (i.e., each RRU corresponds to a BS according to the system model presented in Section 2). The RRUs array configuration corresponds to cylindrical arrays: 16 × 16 × 3, where the separation between antennas of the array is half wavelength [35].
 (T is the number of child candidates expanded at each iteration). Therefore, the proposed approach has a similar complexity order to the linear MMSE. Note that the complexity of OB-MMSE does not grow exponentially with the signal constellation size, M, like in the case of MLD, but it still depends on

Numerical Results
In this section, we present numerical simulations, both link level and system level. Link performance results, namely, block error rate (BLER), are used as input by the system level simulator. The system is illustrated in Figure 2, where the C-RAN is comprised of 19 radio remote units (RRUs) connected through fiber to a central unit (CU), each RRU with = 60 N active pedestrian users. Each RRU consists of three transmission and reception points (TRP), each one equipped with = 256 total tx N antennas while users have rx N antennas (i.e., each RRU corresponds to a BS according to the system model presented in Section 2). The RRUs array configuration corresponds to cylindrical arrays: 16 × 16 × 3, where the separation between antennas of the array is half wavelength [35].  The system level block diagram can be found in references [36,37]. This simulator is based on the one described in [37]. In the system level simulator, there are general parameters that must be defined, such as network layout and antenna parameters. The setup used considers several of the same parameters adopted in the case study presented in Section 7.7 of [38] for the deployment of a Massive MIMO based outdoor network. Our system level simulator considers the 3D urban macro 3D-Uma scenario [36], where the BSs are mounted above rooftop levels of surrounding buildings with antenna height: 25 m and pedestrians height: 1.5-2.5 m. To each pedestrian is assigned line-ofsight (LOS) or non-line-of-sight (NLOS) propagation conditions, depending on the distance to RRU. It is generated correlated large-scale and small-scale parameters to create channel coefficients and pathloss and shadowing are applied with σ SF = 7.8 dB. For the NLOS pathloss distance, we have PL = 32.4 + 20 log( f c ) + 30 log(d 3D ) dB, where d 3D is the distance in meters [36]. Other simulator parameters are: carrier frequency f c = 3.5 GHz, maximum TRP transmit power 46 dBm, receiver spectral noise power density −174 dBm/Hz, cyclic prefix overhead 5%, pilots/TRP = 15 and arrays with uni-polarized antennas. We choose the 5G NR numerology 1 and slot configuration parameters taken from [39]: the bandwidth is B t = 20 MHz with normal CP where the subcarrier spacing is 30 kHz and 28 OFDM symbols are transmitted in every subframe of 1ms. Each user feedbacks all CSI and signal-to-interference-plus-noise ratio (SINR) to TRPs. The static clustering technique partitions the network into three adjacent RRUs sets where each user is served by at least one RRU, while the others perform inter-user interference. The RRU inter-site distance is 433 m corresponding to a radius of 250 m.

Link Level Simulations
The first simulation results had the objective of evaluating the behavior of the iterative GSM receiver and of the overall proposed GSM MU-MIMO transmitter/receiver scheme. Figures 3 and 4 present the results of BER performance versus the signal-to-noise ratio (SNR) in dB of the proposed GSM MU-MIMO system with N tx = 255, N rx = 10, N u = 15, N s = 17 and N a = 2, which corresponds to a spectral efficiency of 23 bpcu/user for 256-QAM and 27 bpcu/user for 1024-QAM.
The expression n 1 × n 2 , mentioned in the legend of both figures, denotes that the receiver algorithm was ran with n 1 restarts and n 2 iterations. The type of polishing applied as well as the number of neighbors is also shown. Besides the expected improvement when using more iterations, it can be observed that by increasing the number of algorithm restarts, we can achieve a better system performance. When considering the 1 × 500 and 10 × 50 cases, which have the same total number of iterations, it is clear that the best results are achieved by the case with more restarts (10 × 50). Considering the scenarios where MMSE polishing is used and those where it is not, one can observe that those where polishing is applied have better performance. We also studied the impact of changing the number of neighbors on the performance of the algorithm and we concluded that the greater the number of neighbors, the better the performance will be (see the cases where P = 1, 4, 9 and 19). Moreover, the combination of the three proposed improvement strategies for the ADMM receiver lead to a better performance than the usage of the individual approaches. Globally, the proposed ADMM algorithm tends to lead to better results when compared to the case where the well-known OB-MMSE receiver (which we included as benchmark) is used [6].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 20 The system level block diagram can be found in references [36,37]. This simulator is based on the one described in [37]. In the system level simulator, there are general parameters that must be defined, such as network layout and antenna parameters. The setup used considers several of the same parameters adopted in the case study presented in Section 7.7 of [38]

Link Level Simulations
The first simulation results had the objective of evaluating the behavior of the iterative GSM receiver and of the overall proposed GSM MU-MIMO transmitter/receiver scheme.    The expression 12  nn , mentioned in the legend of both figures, denotes that the receiver algorithm was ran with 1 n restarts and 2 n iterations. The type of polishing applied as well as the number of neighbors is also shown. Besides the expected improvement when using more iterations, it can be observed that by increasing the number of algorithm restarts, we can achieve a better system performance. When considering the 1 × 500 and 10 × 50 cases, which have the same total number of iterations, it is clear that the best results are achieved by the case with more restarts (10 × 50). Considering the scenarios where MMSE polishing is used and those where it is not, one can observe that those where polishing is applied have better performance. We also studied the impact of changing the number of neighbors on the performance of the algorithm and we concluded that the greater the number of neighbors, the better the performance will be (see the cases where P = 1, 4, 9 and 19). Moreover, the combination of the three proposed improvement strategies for the ADMM receiver lead to a better performance than the usage of the individual approaches. Globally, the proposed ADMM algorithm tends to lead to better results when compared to the case where the wellknown OB-MMSE receiver (which we included as benchmark) is used [6].
Our next goal is to provide a comparison between a conventional BD precoded MU-MIMO [9] and the proposed GSM MU-MIMO. Figure 5 shows the results for two different configurations. The first case concerns a comparison between the precoded GSM MU-MIMO with  [34]). In the case of sMMP, a lower number of child nodes (T = 3) was adopted for 12 Our next goal is to provide a comparison between a conventional BD precoded MU-MIMO [9] and the proposed GSM MU-MIMO. Figure 5 shows the results for two different configurations. The first case concerns a comparison between the precoded GSM MU-MIMO with N tx = 160, N rx = 6, N u = 10, N s = 16, N a = 1, 16-QAM and the conventional BD precoded MU-MIMO with N tx = 60, N rx = 6, N u = 10, N s = 1 and 256-QAM, both with a spectral efficiency of 8 bpcu/user. In the second case, we present a comparison between the precoder based on GSM MU-MIMO with N tx = 90, N rx = 8, N u = 10, N s = 9, N a = 3, quadrature phase-shift keying (QPSK) and the precoder based on conventional MU-MIMO with N tx = 80, N rx = 8, N u = 10, N s = 3 and 16-QAM, both with a spectral efficiency of 12 bpcu/user. Regarding the GSM MU-MIMO scheme, results with the proposed receiver as well as other alternative ones are included, namely with a linear MMSE and with the sMMP (from [34]). In the case of sMMP, a lower number of child nodes (T = 3) was adopted for 12 bpcu due to the very high computational complexity when operating with higher values in this scenario.
In the results, it is clear that the proposed ADMM receiver achieves the best results when compared against sMMP and MMSE. In the case of MMSE, it simply cannot correctly detect the information (8 bps) or it has a high irreducible BLER (12 bps). This is due to the fact that from a receiver point of view, both scenarios correspond to underdetermined systems (N rx < N s ), which a simple MMSE has a high difficulty to cope with. Through this figure, it can also be seen that the GSM MU-MIMO precoder with the proposed ADMM receiver achieves a better performance when compared to the conventional MU-MIMO precoder (which also uses the same receiver). When we focus on the curve's behavior for a  In the results, it is clear that the proposed ADMM receiver achieves the best results when compared against sMMP and MMSE. In the case of MMSE, it simply cannot correctly detect the information (8 bps) or it has a high irreducible BLER (12 bps). This is due to the fact that from a receiver point of view, both scenarios correspond to underdetermined systems (  rx s N N ), which a simple MMSE has a high difficulty to cope with. Through this figure, it can also be seen that the GSM MU-MIMO precoder with the proposed ADMM receiver achieves a better performance when compared to the conventional MU-MIMO precoder (which also uses the same receiver). When we focus on the curve's behavior for a 4 10  BER considering a 8bpcu/user scenario, the GSM MU-MIMO shows a gain of about 10 dB over the conventional MU-MIMO. Moving on to the 12bpcu/user scenario and maintaining the BER at 4 10  , the GSM MU-MIMO presents a gain of about 5 dB over the conventional MU-MIMO. These results suggest that GSM MU-MIMO can be a potential alternative to increase the SE of the system when compared with the adoption of higher-level modulations in conventional MU-MIMO.
A second set of simulations were performed in order to analyze the block error rate (BLER) performance versus the energy per symbol to noise power spectral density ( 0 s EN ) in dB of the proposed GSM MU-MIMO system. These results are required for the system level evaluation in the next subsection. Both perfect channel estimation and imperfect channel estimation curves are presented. For the imperfect channel estimation results we adopted the same model as in [40].  A second set of simulations were performed in order to analyze the block error rate (BLER) performance versus the energy per symbol to noise power spectral density (E s /N 0 ) in dB of the proposed GSM MU-MIMO system. These results are required for the system level evaluation in the next subsection. Both perfect channel estimation and imperfect channel estimation curves are presented. For the imperfect channel estimation results we adopted the same model as in [40]. In our simulations, a minimum of 25,000 blocks were transmitted for computing each BLER result. In Figures 6 and 7, we have N sc = 256 subcarriers, N tx = 17 antennas/user, N rx = 16 antennas/user and N u = 15 users. The number of active antennas are N a = 2 and N a = 3, respectively. The case N a = 3 and 1024-QAM corresponds to a spectral efficiency of 39 bpcu/user. The peak bit rate per user achieved assuming 5G NR numerology 1 is 279.552 Mbps. This means that 1 bpcu/user is equivalent to bit rate of 7.168 Mbps. Doubling N sc = 256 to N sc = 512 doubles the spectral efficiency to 78 bpcu/user use which is equivalent to a bit rate of 14.336 Mbps. In both figures, the BLER of GSM MU-MIMO is presented versus (E s /N 0 ) in dB, for five uniform M-QAM constellations namely, M ∈ {4,16,64,256,1024}. As expected, independently of N a , higher values of M require higher values of E s /N 0 (dB) to reach the reference BLER = 10 −1 . In Figure 6, 1024-QAM with perfect estimation requires an additional 24 dB of E s /N 0 compared to 4-QAM(QPSK). With imperfect channel estimation there is an additional 15 dB penalty to reach BLER = 10 −1 in the detection of 1024-QAM (it has a higher sensitivity to channel estimation errors).
In Figure 7, for N a = 3, it is clear the higher sensitivity of 1024-QAM, as one can notice that with imperfect estimation there is the emergence of a BLER floor. The other modulations only reveal small or negligible degradation. and 1024-QAM corresponds to a spectral efficiency of 39bpcu/user. The peak bit rate per user achieved assuming 5G NR numerology 1 is 279.552Mbps. This means that 1bpcu/user is equivalent to bit rate of 7.168Mbps.    and 1024-QAM corresponds to a spectral efficiency of 39bpcu/user. The peak bit rate per user achieved assuming 5G NR numerology 1 is 279.552Mbps. This means that 1bpcu/user is equivalent to bit rate of 7.168Mbps.

System Level Simulations
Using the BLER results described previously, several system level simulations were performed. The signal-to-noise ratio in dB used in the system level simulations is obtained using SNR = (E s /N 0 ) + 10 log(R s /B) dB, where R s is the total transmitted symbol rate per antenna and user, B is the total bandwidth (we considered 20MHz), and E s /N 0 is the ratio of symbol energy to noise spectral density in dB. Values of E s /N 0 are obtained from the link level BLER results.
In Figure 8, we have chosen N a = 2 with perfect estimation, and computed the SNR values corresponding to the BLER = 10 −1 so as to obtain the coverage results vs. the percentage of transmitted carrier power. Based on the parameters N u = 15, N tx = 17 transmit antennas/user and N rx = 16 receive antennas/user there are a total of 255 active antennas at each TRP. The coverage of each of the five different M-QAM constellations and the arithmetic average of the coverage of all constellations (labelled as AllQAM) is presented for two different clusterings. In the present cellular topology, RRUs correspond to base stations, and each user is served by one RRU while the other RRUs generate inter-interference when transmitting towards their users. The label 1C means that the cluster contains one RRU. According to BLER performance results of Figure 6, it is expected that the 1024-QAM constellation has the minimum coverage due to more demanding signal-to-noise ratio, while 4-QAM has the maximum coverage for 100% of the transmitted carrier power. Only users close to RRUs are able to decode correctly 1024-QAM symbols, whereas 4-QAM symbols are decoded everywhere. We can check in Figure 8 that only for 100% of transmitted carrier power, the average coverage of all constellations reaches 71.5% of the area. The remaining coverage curves correspond to clustering where the network is partitioned into three adjacent RRU sets and each user is served by three RRUs (labelled as 3C). It is clear that there is an improved coverage obtained for all constellations, which is due to a much lower inter-interference between RRUs. Now, the average coverage of all constellations for 100% of carrier power is 99.6%, which corresponds to a coverage gain of 139%.       Figure 10 presents the throughput averaged over all users uniformly distributed, for the C-RAN scenario where three RRUs (3C) transmit to each user. The parameters of the previous figures are kept the same, namely N tx = 17 antennas/user, N rx = 16 antennas/user, N sc = 256, and we vary the number of users N u from 1 up to 15, considering 100% of transmitted carrier power. We consider that the channel estimation is perfect. We can confirm that the BD MU precoding used at the RRUs and the ADMM receivers are operating as expected because every throughput curve is a straight line with slope dependent of the constellation but independent of N u . The increase in throughput depends on the spectral efficiency. We present two set of results. For N a = 2, the minimum is 11 bpcu (4-QAM) and the maximum is 27 bpcu (1024-QAM). The second set of performance curves have N a = 3, starting from 15 bpcu (4-QAM) up to a maximum of 39 bpcu (1024-QAM). We observe the same throughput results for 15 bpcu with 16-QAM and N a = 2 or 4-QAM with N a = 3. The throughput results are almost the same between the average of all constellations with N a = 2 (19 bpcu) Figure 11 considers the same parameters of Figure 10 but the channel estimation is imperfect instead of perfect. Some performance degradation due to imperfect channel estimation can be observed. The throughput results are not anymore the same between the average of all constellations with N a = 2 (19 bpcu) or N a = 3 (27 bpcu) and the 64-QAM constellation having the same spectral efficiencies. Indeed, the simulation results indicate that the throughput of the average of all constellations is lower than the 16-QAM constellation results with N a = 2 (15 bpcu) or N a = 3 (21 bpcu). For both numbers of active antennas, the throughput results for 1024-QAM become the lowest instead of the highest and for N a = 3, the throughput is zero (does not attain a BLER of 10 −1 as observed previously). There is an obvious decrease in the simulated throughput results compared to the expected results based on the constellation bpcu. The ratio of throughput results for the average of all constellations with N a = 3 (27 bpcu) compared to those of N a = 2 (19 bpcu) is 2062.5/1546.5 = 1.33, lower than the expected ratio (27 bpcu/19 bpcu) = 1.42. The comparison between Figures 10 and 11 indicates that the throughput reduction due to imperfect channel estimation for N a = 2 is (1-1546.5/2031.0) = 0.24 and for N a = 3 is (1-2062.5/2875.8) = 0.28. Therefore, the throughput reduction due to imperfect estimation increases with the number of GSM active antennas which was expected based on the BLER results of Figures 6 and 7. Figure 12 presents the cumulative distribution function (CDF) of a RRU with three TRPs, each TRP with N tx = 255 active antennas serving 60 users each with N rx = 16 antennas. The CDF of this figure corresponds to the case of 100% of carrier transmission power. We consider only the C-RAN scenario with clusters of three RRUs (3C), with curves for both perfect channel estimation and imperfect channel estimation cases. As expected, only for 1024-QAM there is an obvious difference in CDF results due to imperfect estimation compared to perfect estimation. For the other constellations, there are almost the same CDF results which is in agreement with the BLER results of Figure 6. The receiving throughput of all users exceeds 2.5 Gbps, 3.5 Gbps, 4.5 Gbps, 5.5 Gbps, and 6.5 Gbps for 10% of the users with 4-QAM, 16-QAM, 64-QAM, 256-QAM, and 1024-QAM (perfect estimation), respectively. For 50% of the users, the throughput received corresponds to the performance results presented on Figure 9. Only less than 10% of users receive a throughput level lower than 100 Mbps, with the exception of 1024-QAM users with imperfect estimation.
instead of the highest and for 3 a N  , the throughput is zero (does not attain a BLER of 10 −1 as observed previously). There is an obvious decrease in the simulated throughput results compared to the expected results based on the constellation bpcu. The ratio of throughput results for the average of all constellations with   antennas. The CDF of this figure corresponds to the case of 100% of carrier transmission power. We consider only the C-RAN scenario with clusters of three RRUs (3C), with curves for both perfect channel estimation and imperfect channel estimation cases. As expected, only for 1024-QAM there is an obvious difference in CDF results due to imperfect estimation compared to perfect estimation. For the other constellations, there are almost the same CDF results which is in agreement with the BLER results of Figure 6. The receiving throughput of all users exceeds 2.5 Gbps, 3.5 Gbps, 4.5 Gbps, 5.5 Gbps, and 6.5 Gbps for 10% of the users with 4-QAM, 16-QAM, 64-QAM, 256-QAM, and 1024-QAM (perfect estimation), respectively. For 50% of the users, the throughput received corresponds to the performance results presented on Figure 9. Only less than 10% of users receive a throughput level lower than 100Mbps, with the exception of 1024-QAM users with imperfect estimation.  Table 4 summarizes the average throughput per user with perfect and imperfect channel estimation for C-RAN with clusters 1C and 3C and the corresponding throughput gain. The maximum throughput gain is 1.55 and the minimum is 1.25.  Table 4 summarizes the average throughput per user with perfect and imperfect channel estimation for C-RAN with clusters 1C and 3C and the corresponding throughput gain. The maximum throughput gain is 1.55 and the minimum is 1.25.

Conclusions
In this paper, a novel MIMO system where GSM symbols are transmitted simultaneously to multiple users has been described. By combining large antenna settings at the BS with high-order M-QAM constellations, the proposed approach is capable of improving the spectral efficiency and energy efficiency. A precoder is applied at the BS to completely remove inter-user interference, while a reduced complexity iterative SU GSM detector is implemented at each receiver. Simulation results show that the proposed approach can achieve a very competitive and very promising performance compared to conventional MU-MIMO systems with identical SE. In fact, system level results based on a C-RAN scenario with multiple RRU showed potential gains of up to 155% in throughput and 139% in coverage when compared to traditional cellular networks. The introduction of imperfect channel estimation reduces the throughput gain to 125%. Future work will include a thorough evaluation of the impact of several hardware impairments (such as phase-noise, non-linear distortion, and I/Q imbalances) and robust mitigation algorithms.