Nested Variational Chain and Its Application in Massive MIMO Detection for High-Order Constellations

Multiple input multiple output (MIMO) technology necessitates detection methods with high performance and low complexity; however, the detection problem becomes severe when high-order constellations are employed. Variational approximation-based algorithms prove to deal with this problem efficiently, especially for high-order MIMO systems. Two typical algorithms named Gaussian tree approximation (GTA) and expectation consistency (EC) attempt to approximate the true likelihood function under discrete finite-set constraints with a new distribution by minimizing the Kullback–Leibler (KL) divergence. As the KL divergence is not a true distance measure, ’exclusive’ and ’inclusive’ KL divergences are utilized by GTA and EC, respctively, demonstrating different performances. In this paper, we further combine the two asymmetric KL divergences in a nested way by proposing a generic algorithm framework named nested variational chain. Acting as an initial application, a MIMO detection algorithm named Gaussian tree approximation expectation consistency (GTA-EC) can thus be presented along with its alternative version for better understanding. With less computational burden compared to its counterparts, GTA-EC is able to provide better detection performance and diversity gain, especially for large-scale high-order MIMO systems.


Introduction
Multiple input multiple output (MIMO) technology has attracted broad attention over the last decade and has been widely applied into practical communication systems.The benefit of MIMO technology lies in the improvement of spectral efficiency and link reliability due to the multiplexing and diversity gain that grows with the number of elements, and a MIMO system is referred to as a massive MIMO system when the scale of array elements grows large enough, which brings increasing difficulty to the signal detection due to huge computational burden, hindering the prevailing usage of massive MIMO systems [1,2].
Many research studies have been carried out for signal detection in massive MIMO systems [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17].It is well known that the maximum likelihood detection presents the best detection performance with the cost of exponentially growing computational burden [3].Neglecting the finite-set constraint, the minimum mean square error (MMSE) approach can be applied by solving the least square fit, and a closest lattice point can then be found by treating symbols independently [4].The MMSE approach normally exhibits a benchmark performance when comparing different detectors, and its performance can be vastly improved by MMSE-SIC when combining with the successive interference cancellation (SIC) technique [5].However, as MMSE or MMSE-SIC cannot provide satisfied performance, several alternatives have been proposed instead, which can be divided into two major categories, i.e., sub-space searching-based and variational inference-based detectors.
The sub-space searching-based category originates from the idea of reducing the searching space of all possible lattice points with unacceptable complexity.Sphere decoding tries to replicate the maximum likelihood performance by diminishing the searching space, the dimension of which grows up with the number of antennas as well as the modulation order, making it prohibitive for the large-scale or high-order MIMO systems [6,7].Another two local searching-based approaches were proposed by the name of likelihood ascending search and reactive tabu search [8][9][10], and the basic idea behind them is to search through a proximity sub-space around a given initial solution.They present good performance for a large number of antennas with low-order constellations but poor performance for highorder constellations.A layered tabu search algorithm was proposed in [11] by performing detection over layers, requiring a higher order of complexity for high-order constellations, and a Gibbs sampling-based detector was proposed in [12] by performing a serial of onedimensional searches over iterations.It may provide good performance for low-order constellations with the cost of enormous processing time.Therefore, algorithms in the former category suffer from poor performance or prohibitive computational burden in large-scale high-order MIMO systems.
Proved to be suitable for the detection problem resulted by high-order constellations, the variational inference based category tries to approximate the true likelihood function into a new distribution that is much easier to handle.A Gaussian tree approximation (GTA) algorithm was proposed in [13,14] by transforming the fully connected factor graph into a tree graph, based on which belief propagation based message passing can be proceeded for inference.The GTA algorithm has comparable performance with MMSE-SIC at a similar complexity only to MMSE.The expectation propagation (EP) algorithm was proposed for MIMO detection in [15] by substituting true priors belonging to a discrete finite set with the introduced Gaussian priors being able to be updated over iterations.EP performs the best at a complexity several times that of MMSE, and its alternative named expectation consistency (EC) was then proposed to provide a more general perspective than EP [16].Two lowcomplexity EP/EC-based algorithms were proposed for scenarios when the number of transmit antennas is less than that the number of receiver ones [17,18], and a double-EP based iterative detection and decoding was proposed by iteratively exploiting decoders in [19].EP/EC-based algorithms can also be applied into channel estimation problems in massive MIMO systems [20,21].
In this paper, we would like to expand the variational inference paradigm by proposing a nested variational chain.The basic idea behind it is that 'exclusive' and 'inclusive' KL divergences employed by GTA and EC, respectively, are not exclusive and can be combined in a nested way so as to form an approximation chain, by which both GTA and EC are improved.The major contributions are listed as follows.

•
Firstly, the basic idea of the nested variational chain is proposed, and an algorithm is then proposed to establish a general framework.By referring to 'general', it means this framework is able to combine 'exclusive' and 'inclusive' KL divergences, or it degrades to either one as a special case.

•
Secondly, providing several examples, we show that existing algorithms, such as MMSE, GTA, and EC, can be regarded as special cases of the variational chain.

•
Finally, to provide an initial application of the variational chain into massive MIMO detection, a GTA-embedded Expecatation Consistency (GTA-EC) algorithm is proposed which proves to provide better detection performance, especially for high-order constellations.The complexity of GTA-EC is analyzed as well along with comparisons.
This paper is arranged as follows.Section 2 introduces the system model and MIMO detection problem, based on which the nested variational chain is provided in Section 3 along with a generic algorithm framework.Section 4 derives the GTA-EC algorithm with complexity analyses.Simulation results are demonstrated in Section 5 along with discussions, and the conclusion is presented in Section 6.Throughout this paper, matrices and vectors are denoted by symbols in boldface, and variables are denoted in italics.The notation A or a is used to represent the transpose of a vector or matrix, and I represents a unit matrix.

Preliminary 2.1. Signal Model
A multiuser MIMO system is considered, without loss of generality, in which N transmitters, each equipped with one antenna, communicate with a base station that is equipped with M antennas.Assume each transmitter transmits any symbol x i ∈ C that is selected from a Quadrature Amplitude Modulation (QAM) constellation set A, where C stands for the complex domain, and the cardinality of the constellation set is | A| = A. The transmitted symbols can be represented as a vector x ∈ A N×1 with the average energy of each QAM symbol defined as E s .After propagating through the wireless channels, the received signal y ∈ C M×1 at the base station can be expressed as where n ∈ C M×1 stands for additive Gaussian white noises (AWGN), each element having zero mean and σ 2 n variance.H [ h 1 , . . ., h i , . . ., h N ] is defined as a matrix by stacking up channel coefficients with h i = diag{ h i,1 , . . ., h i,m , . . ., h i,M } being Rayleigh flat-fading channel coefficients of the ith symbol.Perfect channel state information (CSI) is assumed such that H is known at the base station.
The channel model above is usually re-expressed in the real domain by taking into consideration real and imaginary parts, respectively.By defining R(•) and I(•) as operations to take the real and imaginary part of a variable or matrix, one can define where y, n ∈ R M×1 , x ∈ R N×1 , H ∈ R M×N , M = 2 M, N = 2 N, and R stands for the real domain.The equivalent model in the real domain is then given as where the variance of each element of n equals σ 2 n = σ 2 n /2, and x belongs to the pulse amplitude modulation (PAM) constellation set A containing real and imaginary parts of the A-QAM alphabets with its cardinality being |A| = √ A. The average energy of a PAM symbol is E s = E s /2, and the signal-to-noise (SNR) of the MIMO system is then defined as

MIMO Detection
As the received signal in a MIMO system is a superposition of transmitted symbols weighted by channel coefficients, the purpose of MIMO detection is to estimate successfully all transmitted symbols impaired by channel fading and noises.As is well known, the maximum a posteriori (MAP) detector could achieve the best detection performance by maximizing the a posteriori probability as follows where the aposteriori distribution given the received signal y and CSI H is expressed as N y : Hx, σ 2 n I is defined as Gaussian distribution with a mean vector of Hx and a covariance matrix of σ 2 n I, and P (x) is defined as the a priori probability of symbols.When ∈A is uniformly distributed with I x i ∈A being an indication function that takes value one if x i ∈ A and zero otherwise, the MAP detection degrades into the maximum likelihood detection, i.e., The complexity of maximum likelihood detection grows up exponentially with the number of symbols N, making it prohibitive for middle-or large-scale MIMO systems with especially high-order constellations.

Nested Variational Chain
In order to perform low-complexity MIMO detection, one popular approach is to approximate the true posterior with another distribution that is much simpler to perform inference on, and the KL divergence is commonly used to obtain the desired distribution.Defining Q(x) as the distribution utilized to approximate the true posterior, the minimization of the 'exclusive' and 'inclusive' KL divergences can be expressed as and For instance, the GTA algorithm takes the former way, while the EC algorithm takes the latter one.However, with only one approximation, GTA is unable to update its approximated tree structure, while EC only treats symbols in an independent way rather than exploiting correlation among symbles.In this case, we then would like to demonstrate that the two KL divergences could be combined together, and a nested variational chain is then proposed in what follows.Suppose there is a desired variational distribution G(x) that can be obtained with 'exclusive' KL divergengce: which is embedded in an optimization for Q(x) with Q(x) obtained in the first place as The processing above actually forms a variational chain with a nested structure given as indicating Q(x) should be obtained according to the minimization of KL(P (x|y, H)||Q (x)) with respect to Q (x), and that the desired G(x) could then be obtained by minimizing KL(Q(x)||G (x)) with respect to G (x).Following the roadmap, we may derive an algo- rithm for the nested variational chain combining the two asymetric KL divergences.

A Generic Framework for Nested Variational Chain
To begin with, a general statistical model should first be defined as follows [12], where F (x) is a function belonging to the exponential family, and t i (x) for i = 1, . . ., I are non-negative factors.Normally, it is intractable or prohibitive complex to perform inference over P (x) such that the variational inference-based approaches provide another distribution Q(x) that is tractable or easy to handle.The nested variational chain consists of four steps: factor substitution, inner approximation, symbol detection and factor updating.
As for factor substitution, the optimization of the KL divergence should first be achieved, i.e., Q(x) = arg min Q (x) KL(P (x)||Q (x)), for which the EC framework can be employed.The EC algorithm assumes a distribution that belongs to the exponential family: where t i (x) instead of t i (x) for i = 1, . . ., I are modified factors, belonging to the exponential family as well.
It should be noticed that the EC framework replaces each non-negative factor t i (x) by another t i (x) in the exponential family.However, the distribution F (x) remains constant during this optimization process, which may be further exploited.Based on this idea, another variational distribution could be embedded inside as an inner approximation so as to achieve a final distribution G(x), and the optimization in ( 6) could be performed: The approximation is normally expressed as G(x) = ∏ j G j (x).When G j (x) for j = 1, . . ., J are defined as disjoint groups, it is mean-field approximation, and structured approximation can be employed when G j (x) for j = 1, . . ., J are overlapped with each other.Toward symbol detection, a cavity distribution for each factor can then be acquired as and the final distribution can be represented as with p i (x) G \i (x)t i (x) defined as a new distribution by attaching the true factor.The moments of p i (x) are then obtained by exploiting the true distribution t i (x) as E p i (x) [φ(x)], where φ(x) stands for the sufficient statistics of the exponential family.A new factor t new i (x) is updated as well by satisfying the moment-matching condition such that the distribution Q(x) is able to be updated iteratively.An algorithm is provided in Algorithm 1, which is used to approximate a statistical model P (x) ∝ F (x) ∏ i t i (x).Associating with the four steps described above, the algorithm first substites all factors in Step 1 with much easier accessible ones by using the 'inclusive' KL divergence, as seen in the EC algorithm.After that, the algorithm further approximates Q(x) with a new distribution in Step 2 such that better detection performance is expected.With detection proceeded on the new distribution, moment matching can be achieved in Step 3 so as to update substituted factors in Step 4. Note that any of the steps, such as factor substitution, inner approximation, or factor updating, may be skipped for a certain purpose so as to form a special case.In the next subsection, we would like to demonstrate that the MMSE, GTA and EC algorithms could be deemed as special cases.

Algorithm 1 An algorithm for nested variational chain
Step 1: Factor Substitution.
Substitute each non-negative factor t i (x) with t i (x) Step 2: Inner Approximation.
Obtain a new distribution to approximate Q(x) as Step 3: Symbol Detection.for i ∈ [1, . . . ,I] do Obtain a cavity distribution as G \i (x) = G(x)/ t i (x), and then achieve moment matching between p i (x) ∝ G \i (x)t i (x) and q i (x) ∝ G \i (x) t new i (x).end for (4) Step 4: Factor Updating.Substitute t i (x) with t new i (x) into Q(x) ∝ F (x) ∏ i t i (x) and repeat this procedure if necessary.until Convergence is achieved.Output: Detection results on the approximated distribution.

MMSE, GTA and EC MIMO Detectors as Special Cases
In a MIMO system, the distribution F (x) can be expressed as the likelihood function, i.e., F (x) ∝ N (y : Hx, σ 2 n I), and each non-negative factor t i (x) for i = 1, . . ., I could be regarded as the apriori probability with respect to symbols.When a factor t i (x) corresponds only to one symbol x i , it reduces to Hence, as there are N symbols in a MIMO system, there would be N factors or priors as well, and the expression for the substituted factor t i (x i ) for i = 1, . . ., N depends on any specific algorithm.
(1) Minimum Mean Square Error The MMSE approach could be obtained by assuming that each non-negative factor t i (x i ) for i = 1, . . ., N can be replaced by a Gaussian distributed factor t i (x i ) = N (x i : 0, E s ) of zero-mean and a variance of E s , and the modified distribution with factor substitution for MMSE is then given as whose second-order and first-order moments are derived as Not mentioned though before, there is actually a simple inner approximation for MMSE to approximate the distribution Q MMSE (x).
With a fully factorized distribution G MMSE (x) = ∏ N i G MMSE (x i ), each factorized one can be obtained as which is known as the mean-field approximation.The expression • ∼G MMSE (x i ) refers to expectation with respect to all factors G MMSE (x j ) for j = 1, . . ., N except for G MMSE (x i ).
This process is equivalent to marginalization of Q MMSE (x) with µ i,MMSE and Σ i,MMSE being the i th element of µ MMSE and of the diagonal of Σ MMSE .
The MMSE approach skips factor updating, but instead it may output directly the hard detection results.The final distribution of MMSE is expressed as where p MMSE,i (x i ) G MMSE (x i )t i (x i ) is defined as a new distribution by attaching true priors, based on which symbol detection can be proceeded for each symbol independently. (

2) Expectation Consistency
The EC algorithm defines a substitution factor for each symbol as well.It replaces the prior i so the posterior can be expressed as Note that t i (x i ) ∝ e γ i x i − 1 2 Λ i x 2 i is Gaussian distributed.In this regard, it can be noticed that EC relates essentially to MMSE with the difference that it is able to update priors.The second-order and first-order moments of Q EC (x) are derived as where Λ is a diagonal matrix containing Λ i , and γ is a vector containing γ i for i = 1, . . ., N.
The EC algorithm employs mean-field approximation for inner approximation as well, by which the fully factorized distribution is defined as G EC (x) = ∏ N i G EC (x i ), and each factorized distribution G EC (x i ) is Gaussian distributed such that: with µ i,EC and Σ i,EC being the i th element of µ EC and of the diagonal of Σ EC , respectively.By doing so, factor updating is then operated with a cavity distribution: and the final distribution for EC is represented as where p EC,i (x i ) G \i EC (x i )t i (x i ) is defined as a new distribution.Symbol detection can then be performed to achieve the moment-matching condition so the pairs (γ i , Λ i ) for i = 1, . . ., N are updated in parallel.
(3) Gaussian Tree Approximation The GTA algorithm was proposed based on the modified distribution of MMSE, and its distribution with substituted factors can be represented as As for inner approximation, the GTA algorithm chooses to optimally approximate the distribution with a tree graph, which can be constructed based on Q GTA (x) as where G GTA (x i |x pa(i) ) stands for the conditional probability of x i given its parent x pa(i) , and G GTA (x i |x pa(i) ) = G GTA (x i ) in case that x i is the root of the tree.This leads to a result that GTA skips factor updating as well, similar to MMSE, and the performance of the GTA algorithm is subject to the fixed initial distribution Q GTA (x) that is not able to be updated.In this case, by directly attaching the true priors where p GTA,i (x i ) G GTA (x i )t i (x i ).Proceeding on such a loop-free tree graph, message passing can then be utilized to perform efficient detection during all but one iteration.

Applications into MIMO High-Order Detection
Introducing the nested variational chain for MIMO detection, it can be seen that all existing approaches employ factor substitution.As for inner approximation, MMSE and EC actually perform mean-field approximation with fully factorized distribution, while GTA performs the maximum spanning tree approximation.Finally, only EC performs factor updating, while MMSE and GTA choose to perform direct detection.
This analysis puts forward the question of whether any improvement can be achieved when one enables GTA to update its substituted factors or whether any better inner approximation can be derived for EC rather than being fully factorized.Both thoughts lead us to an idea that it is worth trying to update the GTA factors iteratively since the approximated Gaussian tree is capable of capturing correlation among symbols rather than keeping independence among them.Following this idea, an initial application of the nested variational chain can be performed.By utilizing EC as an outer approximation, an algorithm named GTA-embedded EC (GTA-EC) is proposed in the following.

The GTA-EC Algorithm
Given t i (x i ) ∝ I x i ∈A for i = 1, . . ., N, the algorithm starts from the likelihood function with discrete priors as in (3), i.e., which could be divided into two parts, i.e., It is then possible to define a new distribution q(x) as of which the moments can be expressed as Note that the pair (γ q , Σ q ) acts as priors of all symbols to be updated, and that the definition of q(x) actually serves as factor substitution.
To achieve moment consistency, another distribution s(x) is then defined as where moment matching between s(x) and q(x) should be achieved so as to obtain γ s and Λ s .The EC algorithm assumes another distribution: with moments derived as It can be observed that exp γ r x − x Σ r x 2 partly in r(x) actually serves as a cavity distribution of symbols by subtracting their substituted priors (γ q , Σ q ).The next step involves inner approximation.Since fully factorization for r(x) neglects correlation among symbols, we instead propose utilizing the Gaussian approximation tree to perform detection according to the moments µ r , Σ r , µ q , and Σ q .This is because the Gaussian approximation tree may capture correlation among symbols rather than treating them independently.In this case, we define a new Gaussian tree-based distribution g(x) rather than r(x) as where p i|pa(i) G \i x i |x pa(i) I x i ∈A is a new distribution by attaching true priors, and the conditional distribution can be represented as where µ r i and Σ r i,i for i = 1, . . ., N are taken from µ r and the diagonal of Σ r , respectively, while µ i and Σ i,i for i = 1, . . ., N are taken from µ q and the diagonal of Σ q .
Based on g(x), message passing on the Gaussian tree can then be proceeded: and To achieve consistency, the distribution s(x) is finally utilized once again to achieve moment matching between g(x) and s(x) so as to obtain γ s and Σ s , and the a priori moments can be updated: The GTA-EC algorithm is concluded and depicted in detail in Algorithm 2. In step 1, the GTA-EC algorithm initiliazes the distribution q(x), which behaves as an outer approximation by substituting true factors.In step 2, the inner approximation is applied to q(x) by using its moments, such that a maximum spanning tree is constructed.With the derived tree structure, the algorithm repeats step 3 and step 4 over iterations such that factors can be updated by performing symbol detection and moment matching, and hard outputs can then be obtained according to the final distribution.
Step 2: Inner Approximation.The maximum Gaussian spanning tree is constructed according to the initial covariance matrix such that the tree structure and relationship among symbols can be obtained.repeat (3) Step 3: Symbol Detection.
Obtain r(x) by achieving consistency between q(x) and s(x), and obtain g(x) according to the established tree structure and derived moments.Perform message passing in updating M i→pa(i) (x pa(i) ) and M pa(i)→i (x i ) according to (33) and (34), and obtain the aposteriori statistics by achieving consistency between g(x) and s(x).
Update γ new q and Λ new q such that q new (x) can be updated.until A maximum number of iterations has been achieved.Output: Hard outputs according to the first-order moments of the latest q new (x).

Complexity Analysis
The calculation of GTA-EC resides mainly on three parts.The first one involves the factor substitution step, which necessitates the calculation of second-order and first-order moments in (29), the same as MMSE in (16) or EC in (19).As is well known, its complexity in one iteration can be given as O(NM 2 ).The second part involves construction of the tree graph for inner approximation, which needs only to be initialized at the very beginning of iterations.The construction is based on Prim's algorithm, whose complexity is O(M 2 ).The last part involves the calculation of message passing and factor updating.For each iteration, the major complexity lies in calculating messages in (33) and (34), each requiring the maximum likelihood detection on the conditional distribution with the cardinality of PAM constellation being |A| = √ A. Since there are M − 1 conditional distributions in the tree graph, the complexity can be represented as O M|A| 2 = O(MA).Therefore, by defining N iter as the number of iterations to proceed, the total complexity can be expressed as MA is normally satisfied in a massive MIMO system.This indicates that the complexity of GTA-EC is about N iter times more than that of MMSE or GTA, namely O N M 2 .As a comparison, the complexity of EC can be expressed as O((N iter + 1)N M 2 + M + N iter M √ A) ≈ O (N iter + 1)N M 2 , suggesting that the complexity of GTA-EC is approximately in the same order.The less iterations one algorithm needs to perform, the less complexity it requires.In the next section, when comparing the performance of GTA-EC with EC, the number of iterations should be utilized for complexity comparison.A summary of complexity comparison is demonstrated in Table 1, in which it can be found that the total complexity is dominated by the complexity of factor substitution as well as the number of iterations.

Simulation Parameters
In this section, the detection performance of a MIMO system is evaluated in terms of bit error rate (BER).Uncorrelated scattering flat-fading channel model is assumed with channel coefficients being modeled as complex Gaussian distributed variables that are independently generated for all antennas.During the simulation, 20,000 realizations of the channel matrix are employed with each used to send one message.As a comparison, several existing algorithm are evaluated as well such as the MMSE, GTA, and EC algorithms.And we mainly take into consideration the 'worst-case' scenarios of load α = N/M = 1 when N = M = 16 and N = M = 64 with high-order constellations 16-QAM, 64-QAM, and 256-QAM considered.The factor β is set as 0.2 for all algorithms, and the iteration number of EC and GTA-EC is set as 2, 4, and 6 since convergence can be achieved within six iterations.

Performance Evaluation
Figures 1-3 demonstrate the BER comparison of the GTA-EC algorithm with existing algorithms.The number of antennas deployed at both the transmitter and receiver in the system is set as N = M = 16 with the constellations being 16-QAM, 64-QAM and 256-QAM, respectively.It can be found that GTA-EC outperforms EC with the same number of iterations, and that GTA-EC with f our iterations outperforms EC with six iterations, indicating that GTA-EC may achieve better performances than EC does with lower complexity.While in Figures 2 and 3, GTA-EC with two iterations almost exibits better performance than EC with six iterations, revealing better performance gain when high-order constellations are employed.One can further obseve that the BER slopes of GTA-EC decrease faster than that of EC, demonstrating that superior divergence gain can also be obtained by GTA-EC in a high SNR regime.Figures 4-6 demonstrate a BER comparison of the GTA-EC algorithm with existing algorithms.The number of antennas deployed at both the transmitter and receiver is given as N = M = 64 with the constellations being 16-QAM, 64-QAM and 256-QAM, respectively.In these figures, it can be found that GTA-EC outperforms EC with the same number of iterations, while GTA-EC with f our iterations may have similar performance to that of EC with six iterations.This indicates that GTA-EC exhibits better performance than EC does at the same order of complexity or that GTA-EC presents similar performance to that of EC with lower complexity.And in Figures 4 and 5, one can further observe that the BER slope of GTA-EC decreases faster than that of EC, leading to better performance in a high SNR regime.By observing and analyzing the figures in different scenarios, we may come to conclusions about the performance comparison of GTA-EC with existing algorithms.

•
On one hand, both EC and GTA-EC significantly outperform existing algorithms such as MMSE and GTA.In most scenarios, GTA-EC can obviously outperform EC with either 16-QAM, 64-QAM, or 256-QAM employed.The performance gain of GTA-EC becomes larger when high-order constellation is employed.For example, both the 64-QAM and 256-QAM cases exhibit larger gain than the 16-QAM case when employing 16 or 64 antennas.This indicates that GTA-EC has superior performance gain and is especially suitable for high-order constellations.We believe that the performance gain comes from exploiting additonal relations (correlation) among symbols rather than treating them independently.
• On the other hand, as for the complexity issue, GTA-EC with f our iterations may outperform or have comparable performance to EC with six iterations, suggesting that GTA-EC requires less complexity than EC by recalling that their computational burdens are dominated by the number of iterations needed.As a result, f our iterations are recommended for GTA-EC according to the simulation results, and hence the complexity of GTA-EC is approximately f our times more than MMSE, indicating that it is a practical method for massive MIMO systems.

Conclusions
A nested variational chain is proposed along with an algorithm provided, which combines two asymmetic KL divergences.Introduced into MIMO systems, it can be found that several existing algorithms such as MMSE, GTA, and EC can be regarded as special cases.As initial applications for MIMO detection, an algorithm named GTA-EC is proposed with complexity analysis, and numerical results prove that it may achieve better detection performance with less complexity compared to existing algorithms.As for further research topics, it is suggested that one can find better inner approximation that may capture much more correlation among symbols by applying this framework to other detection fields, such as space code multiple access (SCMA), orthogonal time frequency space (OTFS), or low-density parity check (LDPC) decoding systems.

Table 1 .
Comparisons of complexity.