Modeling and Identification of Nonlinear Effects in Massive MIMO Systems Using a Fifth-Order Cumulants-Based Blind Approach

Pre-processing associated with massive multiple input-multiple output (MIMO) systems can lead to signals with high envelope fluctuations, which are very prone to nonlinear effects, especially when massive MIMO schemes are combined with orthogonal transform multiplexing (OFDM) modulations. If the nonlinear characteristics that affect a given system are known, we can design appropriate receivers that take into account the nonlinear effects introduced by the transmitter. Cubic systems are particularly important, not only because they can approximate many nonlinear effects (e.g., due to the power amplifier or clipping effects), but also because many more complex nonlinear characteristics in communication schemes can be replaced by equivalent lower-order nonlinear characteristics in general, and cubic characteristics in particular. To compensate the effects at the receiver side (e.g., by using the so-called Bussgang receivers), we need to estimate the nonlinear operation that was introduced at the transmitter, and this should be done blindly, without the need of training symbols. The paper contains a description of a mathematical approach for modeling and identification of nonlinear kernels in cubic systems. Based on theoretical tools of HOC in cubic systems, we build a new formula which relates the secondand fifth-order cumulants. Our performance results indicate that the proposed approach allows an accurate identification, yielding the desired kernels via fifth-order cumulants, and ensures a very good convergence, outperforming existing adaptive methods. This is achieved blindly, by exploiting the maximum information of the output system, making it suitable for many practical nonlinear effects.


Introduction
In digital communication, strong nonlinear effects throughout the transmission chain can have a negative impact on the overall system's performance [1]. Pre-processing associated with massive MIMO systems can lead to signals with envelope fluctuations, which are very prone to nonlinear effects, especially when massive MIMO schemes are combined with OFDM modulations [1][2][3][4][5][6][7][8][9][10]. Several works focused on MIMO systems and their important parameters such as mutual coupling [11][12][13][14]. Pre-processing techniques employed at the transmitter side of MIMO systems, such as the ones used by MIMO singular value decomposition (MIMO-SVD) techniques, can lead to signals with large envelope fluctuations and high peak to-average power ratio (PAPR) [3]. The same occurs with the Zero Forcing Transmitter (ZFT), Maximum Ratio Transmitter (MRT), and Equal Gain Transmitter (EGT), when used in m-MIMO implementation with pre-or post-processing [4]. On the other hand, several authors have qualified nonorthogonal multiple access (NOMA) as the main candidate to support fifth-generation (5G) wireless communication. This has been investigated in several works (see, e.g., [15][16][17]). By combining signals with different power levels, NOMA schemes can also be very prone to nonlinear effects. identified blindly by using the second-order moment (SOM) or TOM sequences of the system outputs in [48]. The authors of [42] investigated the third-order Volterra nonlinear systems with a novel approach based only second-order statistics and proved that was able to remove white noise with any distribution offering significant reduction in the computational burden. A complex random sequence input coupled with output HOC was employed in [40] to identify blindly the linear quadratic Volterra systems by examining different cases and the simulations were also used to illustrate the performance of the proposed approaches. The authors in [45] proposed an extension of the main concept behind the error whitening criterion (EWC) in the linear case to the unbiased identification by analytically examining the true parameters of order-2 Volterra series models of nonlinear dynamical systems. It was emphasized by the authors that this extension does not apply to higher-order Volterra systems. A hydroturbine shaft system modeled as a quadratic Volterra system has been identified blindly using third-order cumulants and a reversely recursive method by Bai and Zhang in [46]. The proposed method was tested using engineering applications and three numerical experiments, and their applicability was demonstrated by the blind identification of the hydroturbine shaft system. Furthermore, identification of the quadratic nonlinear systems has been reported in the past (see, e.g., [49][50][51][52]). In [49], the properties of higher-order moment sequences and the calculation of the higher-order spectral moments of an i.i.d for various frequencies were used to develop algorithms to identify time-invariant quadratic nonlinear systems. A concrete application of higher-order statistics theory with more experimental discussion was developed by Chow et al. [50], to identify machine faults through vibration measurements using higher-order statisticsbased methods, such as a nonparametric phase-bearing and a parametric linear or nonlinear modeling approach. Another application was developed in [51] to model the delay of videopackets transmission by identification of quadratic nonlinear systems using fourth-order cumulants-based blind approaches, which were compared to the Levenberg-Marquardt algorithm, least mean square (LMS) and recursive least square (RLS). A HOC was also used by Zidane et al. [52] to develop both extending approaches, developed for the linear and nonlinear case, which uses fourth-order only and combined third-and fourth-order cumulants, to identify the diagonal parameters of nonlinear quadratic systems. Numerical simulations were presented to verify the theoretical results with comparison to the exiting method in terms of the normalized mean-squared error (NMSE) and the fluctuations around the true kernels. Motivated by reductions in computational requirements and the mathematical tractability of nonlinear system identification issue, Ralston et al. [41] used a Hammerstein series to build a new method to identify a specific time invariant nonlinear system when the input is a non-Gaussian stationary signal.
As previously indicated, several authors considered the issue of nonlinear system identification using HOC theory, but most works focused on second-order or quadratic system identification, and the order of cumulants was limited to fourth-order cumulants. The fifth-order received little attention up to now. Indeed, crosscumulants up to the fifth-order were used by Koukoulas et al. [39] to identify the input-output quadratic system. Cubic system identification has also received so far little attention. Indeed, the Volterra kernels of the cubic system identification method using higher-order moments were developed by Tseng and Powers in [53]. The authors showed that the proposed method reduces the complexity of Volterra kernel identification compared to the non-Gaussian and non-i.i.d. input case. The authors of [54] used the TOM from the extending SOM to identify the nonlinear cubic system. It was shown that more statistical knowledge can be obtained in the third-order statistics domain for blind system identification. In [55], mixed third-and fourth-order cumulants were used to extend linear HOC-based methods to nonlinear systems.
In general, when we increase the nonlinearity degree in the model, the performances of methods relying on cumulants with small orders can degrade substantially in the presence of Gaussian noise. Therefore, we increase the order of the proposed approach by considering the fifth-order. In this framework, we consider cubic nonlinear systems, which can be used to describe a wide range of nonlinear systems, requiring only a small number of kernels in the Volterra series. The main contributions of this work are the development of new fifth-order cumulants which exploit the maximum information of the output cubic system without the need of training signals, i.e., blindly. The methodology of the proposed blind approach is based on the main theoretical tools of HOC using the link between Fourier transform of second-order and fifth-order cumulants in the case of cubic systems which are established to build a new formula which relates the second-and fifth-order cumulants by using the inverse Fourier transform. Based on this formula, which relates the second-and fifth-order cumuants, we can propose an approach based only on fifth-order under some assumptions for the kernels of systems and the properties of the autocorrelation function. To analyze the convergence and test the effectiveness of the proposed blind approach, a set of simulations is carried out for various signal to noise ratio (SNR) levels and the proposed approach is compared with adaptive existing algorithms [56] that use the input-output relations to identify the kernels of cubic systems.
The outline of this paper is organized as follows. We start in the Section 2 with the problem formulation and some assumptions regarding the system model representation. Section 3 describes the main theoretical tools of HOC in the case of cubic systems. Theoretical development based using polyspectra are provided in the Section 4 to propose a blind approach based only on fifth-order cumulants. A set of performance results is supplied to support the theoretical development of the proposed blind approach, which is compared with an adaptive algorithm [56] in Section 5.

Mathematical Definitions
The characteristic function of the vector X = (x 1 , x 2 , . . . , x k ) T composed of k real random variables x i is defined by where where The kth-order moments of these random variables are given by The cumulant-generating function is defined as logarithm of the first characteristic function Ψ X (V) = ln(Φ X (V)).
The mth-order cumulant of these random variables are defined as the coefficient of (v 1 , v 2 , . . . , v m ) in the Taylor series development of the cumulant-generating function where The operator Cum[·] stands for the nth-order joint cumulant of the random variables x n 1 1 , x n 2 2 , . . . , x n k k . Thus, considering a zero-mean random process {y(t)}, we have where, Ψ = y(t), y(t + τ 1 ), . . . , y(t + τ n−1 ) , and, hence, the involved real random variables are time-shifted samples of the process {y(t)}. Notice that, for stationary processes, the nth-order statistics depend only on the n − 1 time lags This allows us to introduce the following notations: where {y(t)} is a zero-mean stationary random process.

Problem
Cubic systems can approximate many nonlinear effects (e.g., due to the power amplifier or quantization effects), and we find that more complex nonlinear characteristics can be replaced by equivalent lower-order nonlinear characteristics in general, and cubic characteristics in particular [21]. Indeed, Figure 1 describes the block diagram of the nonlinear system identification problem which is represented by a Volterra series with only a small number of kernels. The system, which we propose to identify in this work, is a cubic nonlinearity that has the form where {y 0 (k)} is the cubic system output, and {y(k)} is the observation output, which is contaminated by a zero-mean additive Gaussian noise {n G (k)}, assumed independent of {y 0 (k)}, excluding the effect of the noise where nth-order cumulants are superior of 2 (i.e., C n,y 0 (τ 1 , τ 2 , . . . , τ n ) = C n,y (τ 1 , τ 2 , . . . , τ n )) as indicated above. The input random signal, {x(k)}, of the model is the unobservable zero mean, independent identically distributed (i.i.d) stationary, ergodic and non-Gaussian with known distribution statistics . . , q}, and ∑ i |h(i, i, i)| < ∞, which suggest that the system is stable causal implying bounded input-output. The cubic system identification is more complex and we need even more information to calculate the cumulants of the generated output signal. The main objective of this investigation is to identify the kernels of the cubic nonlinear system in (10) by using the blind approach based on fifth-order cumulants of the system's measured output, C 5,y (τ 1 , τ 2 , τ 3 , τ 4 ), and some knowledge of the properties of the input random signal {x(k)}.

Theoretical Tools of HOC
In this section, we focus on the theoretical development of HOC, which present the main relationships linking diagonal kernels of cubic systems and the cumulants of the output that received random signals up to the fifth-order for the purpose to use the HOC to identify the cubic system described by (10).
The starting point for all linear-quadratic blind methods is the Leonov-Shiryaev formula [57,58], which links the different order of cumulants to the moments. The latter formula allows expressing any cumulant as a function of moments of lower or equal orders and is well-known in the case of non-delayed cumulants [59]. It is assumed above that the output signal is stationary and ergodic and these cumulants do not depend on time but on the difference time between the instant of observation.
The rth-order cumulants of the output random signal are linked to the moments, where the order p is inferior or equal to r, by the following formula of Leonov and Shiryayev [57,58]: The summation extends over all ensembles {v 1 , v 2 , . . . , v p : 1 ≤ p ≤ r} forming a partition {1, 2, . . . , r}. In this formula, k is the number of elements that compose the partition.
For the second-order terms, both partitions are possible (1,2) and (1) (2). Thus, we have the following expression: For the stationary random output signal, {y(t)}, its second-order cumulants, in the cubic system case, become Under the assumption that the input sequence x(k) is i.i.d zero mean, stationary, non-Gaussian with γ n,x = E[x n (k)] = 0, ∀ n = 3, 6, 9, 12, 15 and (15), the second-order cumulants and the diagonal kernels of cubic systems are linked by the following expression In the case of fifth-order cumulants, we have 52 possible partitions of 7 different types: Cum y 1 , y 2 , y 3 , y 4 , y 5 = E y 1 y 2 y 3 y 4 y 5 The fifth-order cumulants which relate the diagonal kernels of cubic systems can be expressed as (see Appendix A):

Performance Results
To study the effectiveness and analyze the convergence of the proposed blind approach under various signal-to-noise levels (SNR), defined as SNR = E[y 2 0 (k)]/E[n 2 G (k)]. We consider 200 Monte Carlo runs and use two cubic systems.
To measure the precision of cubic kernel identification with respect to the true values, we define the NMSE for each run as where h(i, i, i) are the estimated kernels in each run, and h(i, i, i) are the true kernels in the cubic system.
Then, we try to increase the system order to test robustness of our approach. We consider the fifth-order cubic system given by System II: With reference to (10) Table 1. To analyze the convergence of this proposed blind approach to identify the nonlinear system I with high levels of noise (i.e., SNR = 0 dB), we represent the fluctuations around the means of the estimated nonlinear cubic kernels in Figures 2-4, respectively. To illustrate this convergence and to test and validate it, these fluctuations are compared with those obtained using the adaptive algorithm such as CIM-LLAD (µ = 0.01, ρ = 0.0001, τ = 1.2, σ = 0.02, M = 1024) developed in [56]. The proposed blind method has lower fluctuations around the true values of the identifying cubic kernels when compared with the CIM-LLAD algorithm of [56].     Table 1 compares the different techniques. Clearly, the standard deviations obtained with the proposed approach are smaller than for the CIM-LLAD algorithm proposed in [56]. Moreover, the proposed approach uses only the output and can identify the cubic system blindly without any information about the input model with a good precision. The main reason is that the proposed method uses the fifth-order cumulants which exploit the maximum information by calculating the cumulants of the output making it more robust when we use a small sample size of non-Gaussian signal input. Additionally, Gaussian noise had a minor effect on the estimated cubic kernels when using the proposed approach as indicated in Table 1. The advantages of the proposed method are even more clear when we look to the NMSE, especially for high noise levels (i.e., SNR = 0 dB). This is well supported by the existing literature as indicated above. Indeed, the additive Gaussian noise will almost vanish in the higher cumulants domain (superior to second-order) and the proposed approach, which uses fifth-order cumulants can achieve excellent identification kernels in very noisy environments compared to the CIM-LLAD algorithm. To illustrate the differences in the performance between the proposed and existing methods, we calculate the corresponding NMSE for various SNR and input signal sample sizes. The corresponding identification results are depicted in Figure 5.
Clearly, the NMSE and standard deviation values obtained using the developed approach are much lower than those for the CIM-LLAD algorithm, for all considered values of SNR and input data sizes, being able to improve significantly the identification of cubic kernels of the system blindly, even in very noisy environments and/or small sample sizes (e.g., SNR = 0 dB, N = 400). NMSE NMSE using AlgCum5 (N=400) NMSE using CIMLLAD (N=400) NMSE using AlgCum5 (N=1400) NMSE using CIMLLAD (N=1400) NMSE using AlgCum5 (N=2400) NMSE using CIMLLAD (N=2400) Figure 5. Comparison of NMSE in estimating the system I for different SNR and data input N.
To study the convergence of the proposed fifth-order cumulants-based approach more clearly, an additional set of simulation results is represented in Figures 6 and 7. It describes through system I the behavior distribution of NMSE and indicates the effect of the Gaussian noise on the estimated cubic kernels in noisy environments for various SNR levels over 200 Monte-Carlo runs.  SNR=0 dB SNR=8 dB SNR=16 dB SNR=24 dB Figure 7. Behavior of the distributions of estimated NMSE using the CIM-LLAD method in system I. Figure 8 concerns the differences between the true and estimated output signal using the proposed blind approach (AlgCum5) and the CIM-LLAD method. We take MSE on the true and estimated outputs as criterion of comparison, i.e., where y(n) is the true output signal of the cubic system considered and y(n) is the estimated output signal of the cubic system to be identified. We conclude from Figure 8 and the results summarized in Table 2 using MSE criterion that no noticeable differences between true output signal of the system I and their corresponding estimates using the proposed blind approach. On the other hand, for the CIM-LLAD method, we have an MSE about 12 times higher.

Approach
System I System II AlgCum5 0.0452 0.0762 CIMLLAD [56] 0.5655 0.7218 The obtained results of nonlinear system II with non-Gaussian input sample of length N = 2400 and different SNR levels are given in Table 3. Clearly, we can make remarks similar to those made in the case of system I. In fact, there is still a significant improvement when compared to the CIM-LLAD method, regardless of the SNR. Figures 9-13 illustrate the convergence of the proposed blind approach in a very noise environment (SNR = 0 dB). From these figures and the Std. Dev. depicted in Table 3, it is clear that our approach converges to the true values of cubic kernels with little fluctuation, even if we use a small sample length input such as N = 1000. The convergence corresponding to CIM-LLAD algorithm has more fluctuations of the estimated kernels when the SNR varies from 0 to 24 dB values as shown in Figures 9-13, respectively, and Std. Dev. calculated in Table 3 which is still in the same level for various SNR. For example, the Std. Dev. of the first estimated kernels, h(1, 1, 1), obtained using the proposed method is about 4 times lower than for CIM-LLAD, once again, with the additional advantage that we can estimate the kernels of the cubic system blindly (i.e., without any information about the excitation non-Gaussian signal), which is not the case of the adaptive algorithm that requires the input and output to identify the kernels of the model.     Figure 14, where we present the NMSE as function the SNR, gives us a good idea about the precision of the estimated kernels. For high noise level (SNR = 0 dB), the NMSE using the proposed approach is almost 9 times lower than for the CIM-LLAD. The reason is the same as in system I: the proposed approach uses only the fifth-order cumulants, which are able to remove most of the Gaussian noise, allowing a more accurate identification of the cubic kernels. Figures 15 and 16 concern the convergence of the proposed blind approach and the CIM-LLAD method, respectively. Figure 17 shows the differences between the true and estimated output signal using the proposed blind approach (AlgCum5) and the CIM-LLAD method. First, we identify the cubic kernels, then, we convolve it with the non-Gaussian input signal. We also computed the MSE, which is depicted in Table 2, once again, showing the clear advantages of the proposed blind approach.

Conclusions
The identification kernels of cubic systems using a new blind fifth-order-based approach was carried out in this paper. The proposed method allows an accurate identification approach and yields the desired kernels via fifth-order cumulants which exploit the maximum information of the output system, allowing estimating it blindly and being able to to remove the Gaussian noise almost completely. The proposed technique was compared with existing ones, showing excellent convergence capabilities and the advantage and accurate estimation of system parameters, even for low SNR and/or small input data size. In fact, it was shown that the proposed schemes achieve significantly better convergence with little fluctuations around the true values of the identifying cubic kernels, even in a very noisy environment, i.e., SNR =0 dB, and input data size N = 800 compared to the adaptive method which presents more fluctuations.  Acknowledgments: This work was supported in part by FCT/MCTES and Instituto de Telecomunicações through projects MASSIVE5G (POCI-01-0145-FEDER-030588), PES3N (POCI-01-0145-FEDER-030629) and UIDB/50008/2020.

Conflicts of Interest:
The authors declare no conflicts of interest.