Complex-Valued Phase Transmittance RBF Neural Networks for Massive MIMO-OFDM Receivers

Multi-input multi-output (MIMO) transmission schemes have become the techniques of choice for increasing spectral efficiency in bandwidth-congested areas. However, the design of cost-effective receivers for MIMO channels remains a challenging task. The maximum likelihood detector can achieve excellent performance—usually, the best performance—but its computational complexity is a limiting factor in practical implementation. In the present work, a novel MIMO scheme using a practically feasible decoding algorithm based on the phase transmittance radial basis function (PTRBF) neural network is proposed. For some practical scenarios, the proposed scheme achieves improved receiver performance with lower computational complexity relative to the maximum likelihood decoding, thus substantially increasing the applicability of the algorithm. Simulation results are presented for MIMO-OFDM under 5G wireless Rayleigh channels so that a fair performance comparison with other reference techniques can be established.


Introduction
In recent years, with the increasing demand for the real-time processing of big data, the Internet of Things (IoT), and 4K video streaming, technologies to increase area throughput [1] in base station (BS) coverage and hotspot tiers [2] have become increasingly important. In general, the system throughput can be improved by three independent factors: the number of BSs, bandwidth, and spectral efficiency. While the number of base stations is a complicated variable to handle, there are substantial bandwidths in the millimeter wavelength (mmWave) bands that could be employed for BS hotspot tiers. On the other hand, as objects and human bodies easily block mmWaves, increasing the spectral efficiency (SE) of BS coverage tiers arises as a potential solution for wide-area coverage. In order to increase SE, advanced techniques are necessary to use the available BSs and bandwidth more efficiently. In view of this, both BSs and user equipment (UE) currently operate with multiple antennas and orthogonal frequency-division multiplexing (OFDM) [3][4][5] to increase spectral efficiency.
Multicarrier modulation schemes, such as OFDM, have been widely employed in digital communications systems due to their low susceptibility to intersymbol interference (ISI) [6][7][8][9]. OFDM divides the channel bandwidth into K orthogonal subcarriers [10]. The serial stream at a high data rate applied to the OFDM input is first converted to multiple parallel low transmission rate sub-streams. Each of the K parallel sub-streams modulates one of the K subcarriers. In this way, the OFDM symbol duration is K times longer than the symbol duration of the equivalent single carrier system, thus avoiding ISI [11]. Another

Background
The main goal in multi-antenna systems is to increase the channel capacity with M T transmit and M R receive antennas by a factor of min(M T , M R ) without using additional transmit power or spectral bandwidth [43]. Considering the MIMO digital communication system r(k) = H T x(k) + η(k), with transmitted signal x(k) ∈ C M T , received signal r(k) ∈ C M R , and additive white Gaussian noise (AWGN) vector η(k) ∈ C M R , the channel capacity of H ∈ C M T ×M R is expressed as [44] in which I M R is an M R × M R identity matrix, [·] T is the transpose operator, [·] H is the conjugate transpose operator, E s is the total transmitted signal power, E 0 is the AWGN power, R xx = E{x(k)x H (k)} is the correlation matrix of x(k), and E{·} is the expectation operator. However, if no channel state information (CSI) is available at the transmitter, we can assume that the channel components are equally probable. In this case, we consider that power is equally divided among the transmitting antennas, which implies R xx = I M T . The capacity in such a case is then given by [11,45] Note that Equation (2) can be outperformed if the channel information is available at the transmitter (leading to a coding gain). However, Equation (2) is the maximum diversity capacity without channel knowledge at the transmitter. Furthermore, if M T = M R = 1, Equation (2) represents the Shannon capacity for SISO systems [11].
In order to increase capacity, the concepts of diversity [46], coding [47], and array [48] gains play key roles in MIMO systems. The array gain is the average increase in the signal-to-noise ratio (SNR) at the receiver that arises from the coherent combining effect of multiple antennas at the transmitter, receiver, or both. Multiple antenna systems require perfect channel knowledge at the transmitter, receiver, or both to achieve this array gain [48]. On the other hand, diversity gain is obtained by the provision of replicas of the transmitted signal at the receiver [46]. Diversity gain techniques are used to mitigate degradations in the error performance due to wireless fading channels (e.g., due to multipath). Since the probability that statistically independent fading channels simultaneously experience deep fading is insignificant, there are various ways of performing diversity gain and space diversity. To accomplish this, it is necessary to use sufficiently separated antennas in the array (by more than 10λ on base stations and 2λ to 5λ on mobile devices [44]) to guarantee independent wireless channels. In contrast, coding gain is usually provided by temporal channel coding, e.g., convolutional and block codes [11].
Space-time code is a digital communication technique used to transmit multiple copies of a data stream via multiple antennas to compensate for fading and AWGN. At the receiver side, these multiple copies of the signal are received by one or more antennas, improving the communication reliability. Depending on the encoder algorithm at the transmitter, we can have different space-time codes. Space-time trellis codes (STTCs) combine modulation and trellis coding to transmit signals over a MIMO channel. Although STTCs provide both coding gain and diversity gain, the computational complexity is higher than other space-time codes, mainly in the receiver, where a Viterbi decoder is necessary [49]. Spacetime block codes (STBCs) combine multiple symbols from a digital modulation, creating a block of symbols. The components of this block (i.e., matrix of symbols) are indexed by the transmitting antenna and the transmitting time. At the transmitter, STBC decoding is performed in linear processing. Another technique is the space-time labeling diversity (STLD), a variation of STBC that takes two bit-streams and outputs two pairs of symbols. Two symbols in each pair are transmitted by two transmit antennas in two time slots, which results in full-diversity and half-rate [50]. In addition, STLD only works with a limited number of transmit antennas.

Space-Time Block Coding and OFDM
MIMO systems are mainly designed for narrowband or flat channels. Applying MIMO systems in the wideband frequency selective channel implies a constant penalty factor in the coding gain compared with that in flat-frequency channels. Furthermore, at high SNRs, an irreducible error rate floor is inevitable [51]. This irreducible error rate floor is due to the existence of multipath delay spread, and it persists even if we increase the number of antennas. Since the ISI is the root cause of the error floor, in principle, it can be mitigated by resorting to adaptive equalization, but this can be too complex to implement in such an environment. Another option that is widely used is to resort to OFDM, which naturally converts a frequency-selective fading channel into a frequency-nonselective fading channel. The subcarriers (i.e., tones) in an OFDM symbol are essentially narrowband signals. Since these tones fit perfectly as vehicles for space-time codes, OFDM is an enabler for this efficient coding technique [11]. Figure 1 shows the coding scheme for a generic coding matrix X[k] ∈ C M T ×P , where P is the number of time samples for the transmission of one block of coded symbols and k = 1, 2, · · · , K is the carrier index of the kth MIMO-STBC encoded symbol matrix X[k] along the OFDM symbol. R[k] ∈ C M R ×P is the matrix of received symbols,ŝ ∈ C M S is the decoded vector, and M S is the number of modulated symbols in a MIMO-STBC matrix.
The transmitting space-time block coder (STBC) encodes the data symbol vector s[k] ∈ C M S using the code matrix to construct the transmitting matrix X[k] of length K. The streams X m T ,p [k] are fed to the IFFT modulator of each m T transmitting antenna, at each p period of time relative to the OFDM symbol sequence. In this manner, the information is transmitted in X[k] blocks of M T antennas and P OFDM symbols in each k-th carrier. To illustrate this scheme, Figure 2 shows an example for a two-transmitting antenna system using Alamouti coding. Consequently, the channel is given by H ∈ C M T ×M R ×P×K . It should be emphasized that in the simulation in Section 4 of this work, the channel is not assumed to be static over the entire MIMO-STBC block since it spreads over time in P-consecutive OFDM symbols. This is particularly necessary for the proposed work, as the receiver will fit and adapt to the characteristics and variations of the channel over time. This is also necessary for a massive number of broadcast antennas due to the length of the long block coding P that transmits over time in consecutive OFDM symbols [11].  As in OFDM systems, MIMO-OFDM also requires the channel state information to decode the received symbols. One of the most popular and widely used approaches to MIMO channel estimation is to employ pilot signals (also referred to as training sequences) and then estimate the channel based on the received data and the knowledge of the training sequence, as detailed in Figure 3. Based on pilot signals, in [52,53], the least-squares (LS) channel estimation technique is applied for orthogonal frequency-division multiplexing systems with multiple transmit antennas [11].
A generalized coding scheme referred to as space-time block codes (STBCs) [16,44,54], based on the theory of orthogonal matrix designs, can achieve the full-transmit diversity of M T M R employing the maximum likelihood decoding algorithm at the receiver [44]. The idea is to transmit M T orthogonal streams, which implies that the receiver antennas receive M T orthogonal streams. This special class of space-time block codes is the so-called orthogonal STBC (OSTBC) [11,54,55].
An OSTBC example of coding matrix for M T = 4 [44] is given by in which s[m s ] is the transmitted signal in the discrete symbol index m s . Notice that, as proved by Tarokh et al. [54], the inner product of any two distinct rows of this matrix is equal to zero (i.e., the matrix is orthogonal) and of full-rank, yielding full-diversity [11]. One of the disadvantages of OSTBC is the code rate. Let P represent the number of time samples to convey one block of coded symbols and M s represent the number of symbols transmitted per block. The space-time block code rate is defined as the ratio between the number of symbols that the encoder receives at its input and the number of space-time coded symbols transmitted from each antenna, given by R = M s /P. This implies that Equation (3) has a code rate R = 1/2, which consequently reduces the spectral efficiency. Supplementary to the diversity gain, the OSTBC leads to a secondary linear coding gain G c = 10 log(R) at the receiver due to the coherent detection of multiple copies of the signal over time. Furthermore, the multi-antenna system, as presented in Figure 1, will lead to an array gain G a = 10 log(M R ) due to the coherent combination of multiple received signals over the receiving antennas [11].

Quasi-Orthogonal Special Case
In order to increase the spectral efficiency in orthogonal codes, Jafarkhani [56] proposed quasi-orthogonal STBC (QOSTBC) of rate one, relaxing the requirement of orthogonality. However, when compared with orthogonal codes, the diversity gain is reduced by a factor of two. Besides, in contrast to orthogonally designed codes that process one symbol at a time at the decoder, quasi-orthogonal codes process pairs of transmitted symbols, which exponentially increases the computational complexity of decoding [11].
Jafarkhani [56] proposed a coding matrix of rate one for M T = 4, given by In the literature, related approaches with a maximum of M T = 6 antennas were proposed for quasi-orthogonal codes [57][58][59]. In [58], the authors developed an architecture similar to [56]; however, this presents full-diversity at the cost of more processing and is limited to M T = 4 antennas. In the same way, by increasing the decoding processing, Sindhu and Hameed [59] proposed two quasi-orthogonal schemes with M T = 5 and 6 antennas [11].

Decoding for Space-Time Block Codes
Maximum likelihood (ML) detection calculates the Euclidean distance among the received signal matrix R and the product of all possible transmitted signal vectors by the channel matrix H. Considering A, the set of constellation symbols of the transmitted signal, and M S , the number of transmitted symbols per MIMO block, ML detection determines the estimation of the conveyed signal vector s as [11] As in maximum a posteriori (MAP) detection, ML detection achieves the optimal performance when all transmitted vectors are equally probable. However, the number of ML computation metrics is A M S , where A is the modulation order. Thus, the ML complexity increases exponentially with the modulation order or the number of transmit symbols, or both [44,54,60]. Although this method has a high computational complexity, the ML decoding is used as a benchmark due to its optimal performance [11].
For orthogonal coding schemes, the ML metric can be simplified, decoding symbol by symbol [54]. Via this simplification, it is possible to circumvent the issue of exponential computational complexity. However, even with this simplification, the computational complexity can be considerably high. In QOSTBC, the ML metric can be also simplified, but the computational complexity remains higher than the orthogonal case, because QOSTBC is decoded in pairs of symbols [11].

Coding Scheme
Similarly to the work of [56], the present work is derived from the full-rate fulldiversity complex-valued space-time block code scheme proposed by Alamouti [61]. The transmission matrix proposed in [61] is given by [11] in which s[i] is the ith input symbol to be encoded. Based on [61], Jafarkhani [56] proposed a quasi-orthogonal coding scheme using four antennas and consequently four encoded symbols as [11] where S is the quasi-orthogonal coding matrix. The main idea behind the work of [56] is to build a 4 × 4 matrix from two 2 × 2 matrices, keeping a fixed transmission rate [11].
In the present paper, we generalize the idea presented in [56] to a new recursive method of generating coding schemes, as given by [11] in which M T = 2 n , ∀ n ≥ 1 is the number of transmitting antennas and M s is the number of encoded symbols. In the proposed scheme, M s M T and the code rate is R = M T /M s = 1 [11]. The recurrence is performed until we find S 1 n = s[n], ∀n ∈ [1, 2, · · · , M S ] in Equation (8).
The main issue of the proposed coding scheme is that we cannot define a simplified ML decoding method as in the former cases. Then, it is here that the system proposed in this paper takes shape, with the MM-PTRBF decoding, making the joint solution feasible. We have observed, by extensive simulations, that Equation (8) achieves half of the diversity presented by the orthogonal coding schemes but keeps full-rate (i.e., R = 1), which is essentially the characteristics of the quasi-orthogonal scheme proposed by [11,56].

Complex MIMO-PTRBF Neural Network for Massive MIMO Decoding
In the proposed system, the maximum likelihood decoder is replaced by a neural network, the MIMO-PTRBF, to decode the received symbols, as shown in Figure 4. The MIMO-PTRBF has a supervised learning stage, in which a training sequence is used to fit the hyper-parameters of the neural network. A pseudo-random generator creates this training sequence, which is known both at the transmitter and receiver sides. When the neural network output achieves the desired MSE, it switches from the learning stage to the decoding stage. At this time, the information data are then effectively transmitted over the system, and the BER is computed. These two stages are implemented as in Figure 4, with the input switch of the MIMO STBC Encoder block and the output switch of the Neural Network Decoder block. The switches have two states represented by (a) and (b), which shift between the training and decoding stages [11]. As in the maximum likelihood detector, the input signal to the MIMO-PTRBF algorithm is the set of received vectors r, as shown in Figure 4. The MIMO-PTRBF architecture, with N neurons, has three free parameters: the matrix of synaptic weights W ∈ C M s ×N , the matrix of center vectors Γ ∈ C M R P×N , and the vector of variances σ 2 ∈ C N×1 . The MIMO-PTRBF is an extension of the PTRBF for multiple outputs. The key difference between both architectures is the multiple-output layer, which fits each output individually. Figure 5 shows a closer view of the receiver side using the MIMO-PTRBF neural network for decoding [11]. The output vector is thus given bŷ Following the complex-valued radial basis function presented in [41], the nth neuron output of the MIMO-PTRBF (φ n ), for the pth output vector of r, is [11] where || · || 2 is the operator which returns the Euclidean norm of its argument, and Re{·} and Im{·} are the respective real and imaginary parts of their arguments. Additionally, as shown in Figure 6, the output of the neurons can be represented by the vector . This kernel partitioning into real and imaginary components has an important role in avoiding any phase invariance at the output of the neurons [11,35,41]. Thus, by means of the steepest descent algorithm, the update of the MIMO-PTRBF free parameters is given by , in which η w , η γ , and η σ are the adaptive steps of w m s ,n , γ n , and σ 2 n , respectively. Furthermore, ∇ w , ∇ γ , and ∇ σ are the complex gradient operators of w m s ,n , γ n , and σ 2 n , respectively. Thus, with r and s, the MIMO-PTRBF algorithm can be used to estimate the output vectorŝ at the uth training epoch by the minimization of the following cost function: where s andŝ are the training sequence and the output vector, respectively. Applying the complex gradient operators (∇ w , ∇ γ , and ∇ σ ) to (17) yields is the instantaneous error for the outputŝ m s at the uth training epoch. Then, substituting Equation (18) in (16) yields (19) in which [·] * denotes the complex conjugate operator and ξ n [u] is the nth synaptic transmittance, given by Furthermore, α n [u] ∈ C R is the mth vector of the matrix of weighted centers (A[u] ∈ C N×R ): In a similar way, β n [u] ∈ C is the nth element of the vector of weighted kernel (β[u] ∈ C N ): Generalizing Equation (19) to matrix structures results in in which Ξ[u] is the diagonal matrix of synaptic transmittance: Each training update is given by Equation (23); however, for u = 0, the MIMO-PTRBF free parameters are initialized following some criterion defined by the user (e.g., based on the probability distribution of the input data). Although (23) minimizes the error between the output vectorŝ and the reference vector s, as the neurons are dependent on exponential functions, a risk of instability is assumed if the exponential argument is positive. In order to circumvent this issue, based on Theorem A1 of [35], the real and imaginary parts of each scalar component of the vector of variances are lower-bounded by the limit µ > 0, which, consequently, bounds the real and imaginary parts of the neurons output from 0 to 1 [11]. In addition, taking into account Theorem A2 (see Appendix A), the adaptive step of the matrix of synaptic weights is limited by η w < 1/N for all simulations, to guarantee convergence in the mean. In addition, in the Appendix, Corollaries A1 and A2 are of utmost importance to prove Theorem A1. In addition, Definition A1 is used to prove Corollary A2.

Simulation Results
Using the formerly mentioned OSTBC [54] and QOSTBC [56] coding schemes, several setups are compared with the proposed approach to validate and assess their performance in massive MIMO-OFDM. OSTBC and QOSTBC are simulated with the maximum likelihood (ML) decoding with perfect channel knowledge. This configuration achieves the maximum diversity gain G d = M T M R at the cost of half of the theoretical bandwidth efficiency, since R = 1/2 in this case. Considering a practical QOSTBC application, we also implement the EQOSTBC with the least-squares (ML-LS) channel estimation [11]. In Figure 7, the binary input data are created by a pseudo-random generator with uniform distribution. The bit-stream is then modulated according to the M-QAM or M-PSK modulation scheme used in the simulation. Subsequently, using the coding scheme proposed in Section 3, the modulated symbols are encoded in the STBC block. In the IDFT block, the STBC symbols are frequency-multiplexed for OFDM transmission. At the receiver side, after the transmitted signal passes through the channel, the DFT is applied to demultiplex the STBC symbols. In the decoder, the proposed ANN-based technique presented in Section 3 and the ML algorithm (either with perfect channel knowledge at the receiver or with channel estimation by LS) are employed to assess the system performance. In the sequel, the decoder output symbols are demodulated, and BER is computed.
For the sake of comparison, the BER as a function of E b /N 0 (energy per bit to noise power spectral density ratio) is used in the simulations. By adjusting the transmitting power for each antenna, the received signals are normalized by M T transmitting antenna, by the receiver array gain M R , and by the code rate gain R, implying in which b is the number of bits per QAM symbol. In Figure 8, aiming to validate the simulator shown in Figure 7, we compare the obtained results with the theoretical performance of OSTBC for 4th, 8th, 16th, and 64th diversity orders using 4-QAM modulation for a Rayleigh channel with AWGN. For all OSTBC diversity orders, theoretical and simulated results were approximately the same, validating the framework. In addition, Figure 9 presents the reference results of [56] with M T = 4 antennas and M R = 1 antenna for 16-QAM OSTBC and 4-QAM QOSTBC and the obtained results for the same scenarios. The simulated results are in line with theoretical results, which also corroborates the framework's reliability.  Figure 9. Simulated, reference, and theoretical results for equal diversity order and bitrate [11].
With the simulation framework validated, we can compare the proposed coding algorithm with results from the literature. Firstly, Figure 10 shows the results of the proposed coding algorithm and theoretical results for 2nd, 3rd, 4th, 5th, 8th, and 10th order diversity. As addressed by [56], quasi-orthogonal transmitting schemes with four antennas achieve at least half of the theoretical diversity (D o = M T 2 = 2) of the orthogonal four antenna scheme (D o = M T = 4). This can be seen in the solid blue curve with squares of Figure 10, which is located between the theoretical second and third-order curves. Using simulations, we extend the concept introduced by [56] for QOSTBC with 8 × 1 and 16 × 1 antennas. In order to simulate these scenarios, we employ the proposed coding algorithm presented in Equation (8). As expected, the solid green curve with diamonds and the solid orange curve with circles are between the theoretical 4th and 5th order and the 8th and 10th order, respectively. Then, utilizing this analysis, we validate the proposed code algorithm and show that it is a suitable approach for generating QOSTBC matrices for at least 16 antennas. Higher-order QOSTBC architectures using Equation (8) are not simulated because of the extensive time required to perform maximum likelihood detection. In order to represent more practical scenarios, we set the simulation system with a 3GPP TS 38.211 specification [62] for 5G Physical channels and modulation. The Subcarrier Spacing (∆ f ) scales from 15 kHz to 240 kHz. The number of active carriers is 256, and the pilot sample rate (when applicable) is M T × 8 × f Doppler with the conventional block-based pilot scheme [63]. We perform simulations in the extremes to demonstrate the robustness of the proposed approach.
The radio channel realizations are created using the 3GPP TR 38.901 report on 5G: Study on channel model for frequencies from 0.5 GHz to 100 GHz [64]. The 3GPP channel models [64] are applicable for frequency bands in the range of 0.5 GHz to 100 GHz. From Tapped Delay Line (TDL) models in [64], TDL-B is selected from Table 7.7.2-2 (depicted in Table 1) for the channel model simulated in this work. In Table 1, as the channel model delays are normalized, they need to be scaled according to a desired delay spread in nanoseconds (ns): τ scaled = τ model DS ns (27) in which τ model is the normalized delay value of the TDL model, τ scaled is the new delay value (in [ns]), and DS ns is the desired delay spread (in [ns]). From Table 7.7.3-1 [64], examples of scaling delay spreads are very short (DS ns = 10 ns), short (DS ns = 30 ns), nominal (DS ns = 100 ns), long (DS ns = 300 ns), and very long (DS ns = 1000 ns). In this work, we use a delay spread of DS ns = 50 ns. From the channel model of Table 1, a Rayleigh distribution is used to compute each sub-channel of H ∈ C M T ×M R (MIMO channel matrix). The M-QAM BER figure for the AWGN channel is also used to define a lower bound on BER vs. E b /N 0 performance. Additionally, it is assumed that all received signals are uncorrelated [11].
A realistic scenario to assess the performance of a MIMO-OFDM system must also include the nonlinear effects of the transmitter power amplifiers [11]. This is necessary because the OFDM signal can have relatively high peak values (i.e., high PAPR) in the time domain since many subcarrier components are added via an IFFT operation. A high PAPR is one of the most detrimental aspects of the OFDM system, as it decreases the SQNR (signalto-quantization noise ratio) of ADCs (analog-to-digital converters) and DACs (digital-toanalog converters), while also imposing a back-off that degrades the efficiency of the power amplifier in the transmitter. The PAPR issue is usually more critical in the uplink since the efficiency of the power amplifier is critical due to the limited battery power in a mobile terminal. For this purpose, from now on, the results assume mild amplifier nonlinearities, represented by a first-grade power amplifier or an appropriate back-off operating point [11]. Based on [41], the nonlinearity vector ρ = [ρ 1 ρ 2 ρ 3 ] T = [0.9 0.1 0.05] T implies 90%, 10%, and 5% first, second, and third-order coefficients, respectively.
With the model properly validated and the specified 5G channel model, we are now able to analyze the proposed complex-valued ANN-based decoder for MIMO-OFDM systems. First, we present the MSE convergence curves during the learning process. The MSE curves are averaged over 10 subsequent simulation traces, and a 4-QAM modulation with E b /N 0 = 12 dB is employed. Figure 11a Figure 11 refer to the MSE standard deviations over the 10 subsequent simulation traces, and the green curves refer to the mean values. Although the steady-state MSEs decrease slightly as the number of antennas increases, one may notice that the decays of the standard deviations are more conspicuous as M T = M R increases. This is due to the MIMO characteristics that mitigate the channel effects by sending several samples of the same signal to the receiver. Thus, sudden channel variations are smoothed, suggesting that the PTRBF learning process presents a robust and cohesive behavior.
The 4-QAM scatter plots presented in Figure 12 show the convergence of the proposed neural network decoder for the first 35 training epochs. For this sequence of scatter plots, each training epoch corresponds to one OFDM symbol; i.e., 256 4-QAM symbols. As shown in Figure 12, the proposed algorithm has a fast convergence rate since only 10 training epochs are sufficient to separate the 4-QAM constellation symbols efficiently [11].
The 16-QAM scatter plots presented in Figure 13 show the convergence of the proposed neural network decoder for the first 140 training epochs, spaced in intervals of 20 OFDM symbols. In this case, the number of training epochs necessary for algorithm convergence is greater than for the 4-QAM case, given the intrinsic complexity of the higher-order constellation. Nevertheless, with only 20 training epochs, it is already possible to visually identify the 16-QAM constellation symbols and, with 80 training epochs, to see the correctly grouped symbols in the scatter plot [11].
The 64-QAM scatter plots presented in Figure 14 show the convergence of the proposed neural network decoder for the first 7000 training epochs, spaced in intervals of 1000 OFDM symbols. In this case, the number of training epochs necessary for algorithm convergence is greater than for the former 16/4-QAM cases, in view of the intrinsic complexity of the much higher-order 64-QAM constellation. Nevertheless, with 3000 training epochs, it is already possible to visually identify the 64-QAM constellation symbols and, with 5000 training epochs, to see the correctly grouped symbols in the scatter plot [11]. Although an abrupt increase in training epochs occurs, when compared with 4-QAM in Figure 12, it represents a time interval of only 80 ms (5000 OFDM symbols with 240 kHz sub-carrier spacing).
After analyzing the MSE curves and constellations of the MMPTRBF, we can further investigate the BER vs. E b /N 0 of the proposed approach. Figure 15 shows the BER vs.  Figure 15 indicate that the QOSTBC system outperforms the proposed work when perfect channel knowledge is available at the receiver, which is impractical. Although the EQOSTBC system is a feasible and practical version of the QOSTBC, due to the channel estimation block at the receiver, simulations using the least-squares channel estimation show that the EQOSTBC performance is degraded by more than 2.5 dB when compared with the QOSTBC [11]. Furthermore, even with a perfect channel estimator, it is computationally expensive to decode QOSTBC codes with maximum likelihood for more than four antennas, as addressed by [56].  The 4-QAM scatter plots presented in Fig. 12 show the convergence of the pro- symbols. As shown in Fig. 12, the proposed algorithm has a fast convergence rate 455 since only 10 training epochs are sufficient to separate the 4-QAM constellation symbols 456 efficiently [11].

457
The 16-QAM scatter plots presented in Fig. 13 show the convergence of the proposed 458 neural network decoder for the first 140 training epochs, spaced in intervals of 20 OFDM     Figures 16 and 17 highlight the diversity gain of the proposed system when compared with the EQOSTBC. Although the mathematical derivation of the proposed system diversity gain has not been obtained yet, simulations indicate a significant diversity gain [11]. Contrasting Figure 15 with Figure 16, one can see the increase in the diversity gain as the number of transmitting antennas increases.  To further investigate the effects of a larger number of transmitting and receiving antennas on the performances of BER, in Figure 19, we present the simulation results for a higher-order system with M T = M R = 8. It can be seen that the performance of the quasi-orthogonal code with channel estimation is worse for M T = 8 than for M T = 4 antennas, as shown in Figure 19. It is shown in [53] that the performance of the linear estimator decreases proportionally with the number of transmitting antennas, which adds a constraint to the number of transmitting antennas for linearly decoded systems.  It is important to highlight that, in contrast to the maximum likelihood detection, the proposed MMPTRBF decoding is able to operate with more than 16 antennas due to its reduced computational complexity, as discussed below in Section 5.  Figure 21 shows the BER vs. E b /N 0 results of the ML-QOSTBC and ML-LS-QOSTBC with 16-PSK and MIMO-PTRBF with 16-QAM to further examine the extent of the proposed work for higher-order modulations with M T = M R = 4 antennas. This result shows that the proposed work operates efficiently with a higher modulation order of 16-QAM. It is important to note, however, that different modulation formats are used in this scenario because the maximum likelihood QOSTBC decoding is not capable of dealing with quadrature amplitude modulation, as addressed in [56]. For this reason, in order to keep 16-order modulation, a 16-PSK modulation format for QOSTBC is used. Figure 21 shows that the proposed work outperforms the QOSTBC using maximum likelihood (which is the optimal decoder for 16-PSK) by about 2 dB and outperforms the channel estimated scenario by more than 4 dB. Although this robust result seems to show a great advantage of using the proposed approach, we should be careful as it is not quite fair to compare 16-QAM and 16-PSK formats under the proposed nonlinear scenario.    Figure 23 shows the potential of the proposed approach of working with high-order modulation and highlights the gains over the QOSTBC with 64-PSK and perfect channel estimation, by about 5 dB, and for the channel estimated scenario by about 7 dB. It is important to emphasize, once again, that it is not quite fair to compare 64-QAM and 64-PSK formats under the proposed nonlinear scenario.   Table 2 presents the computational complexities of the OSTBC and EOSTBC with ML decoding, both with the additional complexity of channel estimation, and the proposed scheme with MIMO-PTRBF for training and decoding operation modes. M T and M R are the number of transmitting and receiving antennas, M s is the number of transmitted symbols per MIMO block, P is the number of time samples per block of coded symbols, A is the constellation order (e.g., A = 4 in the case of 4-QAM), and N is the number of neurons used in the PTRBF neural network. Since the exp(·) function can be easily implemented in hardware by lookup tables, multiplication is the most costly operation. One may note that, for M T ≤ 8, the complexity of the proposed algorithm is similar to the complexity of the OSTBC, for which no code exists for M T > 8 [54]. The case of QOSTBC is similar, for which no simplified ML metric exists for M T = 4 [11,56].

Computational Complexities
refers to the additional complexity of channel estimation. Table 3 presents the computational complexities for the OSTBC, QOSTBC, and the proposed system, for M T = M R = 4. In Table 3, N = 100 neurons are used in the neural network, and maximum likelihood decoding is simulated with R = 1. Note that generic maximum likelihood decoding refers to the minimization of Equation (5) for a rate one (R = 1) coding scheme (e.g., it could decode the QOSTBC for the case M T = 4 at a higher computational cost), and it will be assumed as an upper bound for the computational complexities of the other quasi-orthogonal systems. Furthermore, appropriate modulation schemes are used to provide the desired transmission rate for the evaluated systems; i.e., 4-QAM for rate one code (R = 1) and 16-QAM for half-rate code (R = 1/2) [11].  Table 4 displays the computational complexities for M T = M R = 8, when the PTRBF is equipped with N = 150 neurons. QOSTBC is defined as not applicable since no simplified ML decoding metric has been presented in the literature. Thus, we need to rely on the usual ML metric to perform decoding, which implies the limitation of using QOSTBC combined with ML for a higher number of antennas in practical approaches.  Table 5 displays the computational complexities for M T = M R = 32, when the PTRBF is equipped with N = 600 neurons. As the OSTBC coding matrix is limited to eight antennas (see [54]), it is not applicable for M T = M R = 32. As already mentioned, since there is no simplified ML metric to perform ML decoding with QOSTBC, it results in an explosion of computational complexity for M T = M R = 32. On the other hand, the proposed approach can expand the number of antennas, maintaining a reasonable compromise between computational complexity and BER, as discussed in Section 4. The decoding computational complexities, shown in Figure 25, are addressed in terms of real-valued multiplications per MIMO symbol, as a function of M T = M R antennas. The orthogonal and quasi-orthogonal systems are not illustrated for the entire simulation range in Figure 25 due to the absence of coding matrices and the simplification of the ML metric for configurations with M T = M R > 8 [11].

Conclusions
This work proposes a novel MIMO scheme for M-QAM systems that aims to achieve diversity gain for any number of antennas and at a lower computational cost when compared with traditional methods. The presented architecture is based on existing systems but with substantial improvements in the coding and decoding methods, based on conventional MIMO-OFDM systems with quasi-orthogonal coding but implemented with complex-valued Radial Basis Functions neural networks. The state-of-the-art algorithms and the proposed approach have been simulated in MATLAB to measure their relative performance under fading scenarios.
Based on the synergistic combination of the coding and decoding algorithms presented in Section 2, the proposed MIMO-PTRBF system is discussed and analyzed in Section 3. The main functional features of the proposed architecture can be summarized as follows: (1) the proposed coding algorithm generalizes the generation of quasi-orthogonal coding matrices, (2) the MIMO-PTRBF algorithm decodes the signal with satisfactory performance and feasible computational cost, presenting low steady-state MSE with fast convergence, and (3) the proposed approach seems practically feasible, at least for 32 × 32 MIMO systems, which are simulated in this work. We conjecture the practical feasibility of higher-order systems if faster hardware, such as FPGAs, is used.
The MIMO-PTRBF algorithm has been proposed in this work to implement massive MIMO schemes as an alternative to the classic MIMO-OSTBC systems under maximum likelihood detection. Simulations have shown that the proposed technique has a great potential to improve the signal-to-noise ratio at the receiver, with competitive computational complexity. Although there are recent works in the literature proposing techniques for MIMO decoding, they are focused on reducing computational complexity at the cost of performance and are limited by the simulated ML decoding. In this work, results show that the proposed approach achieves better results than ML decoding for higher-order modulation schemes with nonlinearities from power amplifiers, keeping a competitive computational complexity. Moreover, the proposed system is easily scalable in terms of the number of antennas, meaning that a wide range of transmitting and receiving antennas can be used. This is especially important for the next generations of mobile communications, such as 5G, 6G, and probably beyond.
The proposed architectures and algorithms find potential applications in some configurations of the next generations of wireless systems. For example, some specialized hardware improvements currently aim exclusively at real-time neural network algorithms. These are intended to be implemented in low-power graphical processing units (LPGPUs), favoring the speed and energy consumption of these algorithms. Therefore, the proposed architecture will be able to work with low-power consumption devices, with the ability to handle the distortions of nonlinear power amplifiers while maintaining a fast convergence rate. It should be emphasized that a fast convergence characteristic is essential for wireless channels with dynamic fluctuations. This paper addresses some crucial aspects of MIMO-OFDM coding and decoding schemes for quasi-static channels. A complementary analysis of dynamic scenarios is also presented. We conjecture that the proposed work may be further improved using additional techniques, such as a mathematical approach for designing an optimum adaptive configuration.
Furthermore, it would be interesting to study and validate the proposed architecture for dynamic scenarios. In addition, as challenging and promising future work, the proposed algorithm can be adapted and implemented in advanced optical communication systems with Spatial Division Multiplexing (SDM), which is similar to a MIMO wireless system.

Patents
A patent application with the results presented in this paper is being prepared by Inova-UNICAMP Innovation Agency. (A8) Applying the expectation operator to both sides of (A8): which results in: where I is the identity matrix and R φφ is the correlation matrix of the neuron outputs. As R φφ is Hermitian and positive semidefinite (see [68], pp. 387, 469), it can be rotated into a diagonal matrix by the unitary transformation R φφ = QΛQ H , in which Λ = diag(λ 1 λ 2 · · · λ N ) is the diagonal matrix of real and positive eigenvalues of R φφ (see [68], p. 471), in the form of λ 1 ≥ λ 2 ≥ · · · ≥ λ N , and Q ∈ C N×N is the orthonormal matrix of eigenvectors that diagonalizes R φφ through a similarity transformation [69]. Then, rotating E(V[k]) by the matrix of eigenvectors Q-i.e.,V[k] = E(V[k])Qdecouples the evolution of its coefficients. By means of this rotation, we can express the modes of convergence (see [70], p. 77) of (A11) as As (I − η w Λ) is diagonal and the nth row ofV[k] represents the projection of E(V[k]) onto the nth eigenvector of R φφ , all elements ofV[k] evolve independently. Hence, (A12) converges to zero if |1 − η w λ n | < 1 (see [69], p. 84). As the fastest mode of convergence corresponds to the maximum eigenvalue λ max , using the identity λ 1 = λ max ≤ tr(R φφ ) (see [70], p. 77), the condition for the convergence in the mean becomes As the trace of R φφ is equal to the product of the number of neurons outputs and the respective signal power, the adaptive step bound is given by Note that the convergence in the mean of both PT-RBF and complex LMS is similar (see [70], p. 77), and it is natural in some way, considering that after the trans-dimensional transformation step of the PT-RBF, both algorithms have comparable architectures.
However, in view of Corollary A2, E(|φ[k]| 2 ) can be difficult to obtain. To circumvent this issue, using Corollary A1, we can replace φ[k] by its maximum value (1 + ) into (A14), which yields which is the adaptive step bound for the convergence in the mean.