Equivalent MIMO Channel Matrix Sparsification for Enhancement of Sensor Capabilities

One of the development directions of new-generation mobile communications is using multiple-input multiple-output (MIMO) channels with a large number of antennas. This requires the development and utilization of new approaches to signal detection in MIMO channels, since the difference in the energy efficiency and the complexity between the optimal maximum likelihood algorithm and simpler linear algorithms become very large. The goal of the presented study is the development of a method for transforming a MIMO channel into a model based on a sparse matrix with a limited number of non-zero elements in a row. It was shown that the MIMO channel can be represented in the form of a Markov process. Hence, it becomes possible to use simple iterative MIMO demodulation algorithms such as message-passing algorithms (MPAs) and Turbo.


Introduction
This paper is an extended version of the conference paper [1]. The study introduces and compares various multiple-input multiple-output (MIMO) channel matrix sparsification methods. A description of the Turbo detection algorithm in a MIMO system with a sparsed equivalent channel matrix based on a minimum mean square error (MMSE) detector is also considered. Finally, a comparison of the link-level performances for all of the analyzed algorithms is provided.
One of the directions of the development of new generations of mobile communications is the use of MIMO channels with a large number of antennas. The MIMO technology is one of the most efficient technologies, providing a significant increase in the throughput and an increase in the number of active users. Moreover, the performance grows linearly with an increase in the number of transmitting and receiving antennas. This addresses the requirements for communication systems of 5G and 6G generations [2][3][4].
However, an increase in the number of antennas leads to a significant complication of signal-processing algorithms in MIMO systems. There are many options for constructing MIMO detectors, which differ both in their characteristics and in the complexity of implementation [5,6]. Among the variety of options for constructing MIMO detectors, two limiting cases can be distinguished [7]. At one extreme is the optimal maximum likelihood (ML) receiver, which has the best characteristics but is also the most difficult to implement. Its complexity grows exponentially with the number of antennas and modulation order. At the other extreme, there is a linear MMSE receiver, which has a simple implementation (the complexity grows in proportion to the third power of the number of antennas), but it significantly loses the characteristics to an ML receiver.
In [5], a comprehensive overview of MIMO detectors is given, of which the characteristics are between those of ML and MMSE receivers. There are two directions for improving MIMO detectors: simplifying the ML detector with losses in the energy efficiency and complicating the MMSE detector by increasing the energy efficiency. The first direction includes algorithms based on spherical decoding, e.g., K-best [6,8,9]. The second direction includes various iterative algorithms using the MMSE detector as a basis, e.g., ESA, EPA, and V-BLAST detector [2,[10][11][12][13][14][15][16].
It should be noted that a similar problem also exists in the implementation of multiuser detectors in communication systems with multiple access, especially in systems of non-orthogonal multiple access (NOMA) [17][18][19]. However, unlike MIMO systems, communication systems with NOMA have the ability to select various templates and user signals matching a specific type of processing at the receiver. This makes it possible to provide good energy performance with acceptable implementation complexity.
One of the effective approaches of signal detection is based on iterative methods, such as low-density signature (LDS) [20] or sparse code multiple access (SCMA) [3,[20][21][22][23][24][25][26][27]. A necessary condition for their use is the ability to represent the channel model in the form of a sparse matrix, as a result of which an individual observation is not a superposition of all symbols at once but only a subset of the symbols. In this case, the number of combinations for one observation will be significantly less than the number of combinations for the entire set of symbols. This makes it possible to use the sequential (optimal) processing of each observation and transfer the received information for the processing of the next observation [8,16,19,[28][29][30][31][32]. Similar algorithms are used in Markov processes with finite fixed connectivity.
In MIMO systems, in the general case, the channel matrix is completely filled. Therefore, it is impossible to directly use iterative algorithms to detect MIMO channel signals [14,15,25,28]. This paper proposes a new direction in the development of MIMO detectors. It is based on the transformation of the MIMO channel model to a form convenient for using simple algorithms, i.e., the processing algorithm is not adjusted for the channel model, but the channel model is converted to the processing algorithm.
The goal of the paper is the development of a method for transforming a MIMO channel model into a channel with a sparse matrix containing a limited number of nonzero elements in a row or representing a MIMO channel signal in the form of a Markov process. Hence, it becomes possible to use simple iterative MIMO demodulation algorithms (message-passing algorithms (MPAs), Turbo, etc.). The utilization of these algorithms is especially beneficial in low-power devices, such as, sensors, meter, Internet of things (IoT), and reduced-capacity (RedCap) devices.
The rest of the paper is arranged as follows. In the next section, we introduced a MIMO system model and lay down theoretical prerequisites and potential possibilities of an accurate representation of the MIMO channel matrix in the sparse format. Then, in Section 3, we derived several methods of the approximation of the MIMO channel matrix by a sparse matrix. In Section 4, the modeling and verification of the efficiency of the methods introduced above are presented. Finally, we drew the conclusions in Section 5.
In the paper, the following mathematical symbols, parameters, and operators are used:

Representation of the MIMO Channel Matrix in the Sparse Format
Let there be an observation: where Y is an N-dimensional complex vector of signal samples at the input of the MIMO detector, which can be considered a vector of output samples of the MIMO channel, X is an M-dimensional complex vector of transmitted QAM symbols, which can be considered a vector of input samples of a MIMO channel, η is the N-dimensional vector of complex samples of the observation noise with zero mathematical expectations and a correlation matrix, R = 2σ 2 η I N is an identity matrix of size (N × N), and H is a (N × M)-dimensional complex matrix of the MIMO channel, the elements of which are complex Gaussian random variables with zero mathematical expectations and unit variance for Rayleigh fading. The observation noise variance σ 2 η determines the signal-to-noise ratio (SNR) at the input of the receiving antenna: It is known that for any square matrix H of full rank, there exists a QR decomposition: where R is an upper triangular matrix, which is completely described by N × (N + 1)/2 coefficients, and Q is an orthogonal matrix, which is also uniquely determined by N × (N − 1)/2 coefficients. From the point of view of the amount of information, such a decomposition does not lead to the appearance of redundancy or to the loss of information, due to the following equation: which is the amount of information contained in the original matrix H that is fully preserved. Therefore, it can be assumed there exists such a decomposition of the channel matrix: for which matrix S contains N × (N + 1)/2 nonzero elements. The only problem is that these non-zero elements are evenly distributed over all rows, with the average number of non-zero elements in each row less than or equal to (N + 1)/2. If such a transformation shown in Equation (4) is found, then it will allow the use of simpler iterative algorithms for demodulation.

Methods for the Approximation of the Channel Matrix by a Sparse Matrix
Unfortunately, it is not possible to find the exact decomposition of the channel matrix into an orthogonal and sparse matrix with a fixed number of non-zero entries in a row. The application of numerical methods to test this approach aslo shows that, most likely, such a decomposition does not exist. In addition, even if we assume that such a decomposition exists, then it will allow using iterative demodulation algorithms with a complexity of N it N2 MM bit /2 , instead of an optimal demodulator with a complexity of 2 MM bit , where M bit = log 2 (M) is the number of bits in one M-QAM symbol. For large M, such an algorithm will also be difficult to implement. Therefore, we considered approximate methods for representing a MIMO channel in the form of decimated channels as belows, i.e., such channels in which each observation contains only part of the symbols.

Kullback Distance for the Approximated Channel Model
Let us consider as an approximation criterion the Kullback distance between two distributions, which is determined as follows [33,34]: where p(X) is the original distribution and q(X) is the approximated distribution. Let distributions p(X) and q(X) be multivariate complex Gaussian distributions and are represented in the following forms: Substituting Equation (6) into Equation (5), we can obtain: Let us assume we need to approximate a distribution with a fully filled correlation matrix V of which the distribution determined by a partially filled correlation matrix Q (for example, diagonal, block-diagonal, or sparse) or by a correlation matrix with a certain relationship between elements. In this case, minimizing the distance D(p,q) is reduced for the fulfillment of the equality: and the minimization of the following functional: which is subjected to restrictions on the form of the matrix Q. We use the criterion D(p, q) → ( X,Q) min for minimizing the distance between the channel matrix H by an approximate sparse matrix H, i.e., instead of Equation (1), we use the following model: where matrix H in each row has at most m non-zero elements. For now, we consider the case when the placement of these elements in the matrix is given. As the initial and approximated distributions, we use the posterior distributions obtained for Equations (1) and (10) using the Gaussian prior distribution X ∼ N X pr , V pr . The posterior distributions can be found from the Bayes' theorem: where L(Y|X ) and L Y|X are likelihood functions that are Gaussian according to the models (1) and (10).
It is easy to show that the parameters of the distributions p ps (X) and q ps (X) are MMSE solutions (i.e., from the estimates and correlation matrices perspectives) for the corresponding models and are determined by the expressions [7]: For an accurate model: For the approximated model: According to Equation (8), the mathematical expectations of the original and approximated distributions should be fulfilled, which are MMSE estimates: From Equation (15), we obtain that: Taking into account that combinations of QAM symbols X pr = 0 and V pr = I M are independent and equiprobable, as well as the independence of the noise samples R η = 2σ 2 η I N , and substituting the expression for the correlation matrix Q in Equation (9), we find the conditions which must meet the channel matrix: This condition is minimized under the given constraints on the form of the matrix H, i.e., on the arrangement of non-zero elements. Obviously, the minimum value of D(p,q) also depends not only on the value of these elements, but also on their locations.

Diagonal Matrix Approximation
The simplest form of the approximating matrix is the diagonal matrix where v mm are the diagonal elements of the correlation matrix V. Performing differentiation with respect to h mm 2 and equating it to zero, we obtain the system of equations: From here, we obtain the solution: In this case, the accuracy of the approximation depends only on the modulus h mm 2 .

Block Diagonal Matrix Approximation
Let the matrix H be represented as a block-diagonal matrix with blocks of size (m × m): where O mm is a zero matrix of size (m × m).
In this case, the Kullback distance is equal to: where V (ii) is the corresponding diagonal block of size (m × m) from the general correlation matrix V.
Optimal matrices minimizing this distance are determined by the expressions: From this, we obtain: To find the matrix H (ii) , one can use known variants of matrix decomposition, for example, SVD decomposition or Cholesky transform [35][36][37][38][39][40]: Sensors 2022, 22, 2041 7 of 19 In this case, the matrix H (ii) is the upper triangular matrix. As a result of this procedure, we obtain the channel matrix of the following form: The accuracy of such an approximation is determined by the following value of the Kullback distance: It can be noted that the value of D depends on how the original matrix is split into blocks. Therefore, Equation (27) can be used as an optimization criterion for splitting V into V (ii) blocks (groups of symbols).
The detection for Equation (10) with a matrix of Equation (26) was carried out separately for each i-th block (i = 1, K). The complexity of the optimal demodulation of one such block is proportional to 2 mM bit .

Strip Matrix Approximation
The next version of the sparse matrix representation is a strip matrix, which for m = 2 has the following form: It is known that the correlation matrix can be represented by the Cholesky expansion: where U is the upper triangular matrix. Likewise, where (ug) ij is the ij-th element of the matrix UG . For a strip matrix G with m diagonals, we obtain:

Approximation by a Markov Process
The possibility of using sequential iterative procedures in MIMO detection is allowed by the possibility of representing the evaluated process in the form of a Markov process. In this case, the multivariate posterior distribution is represented as a product of conditional distributions with a finite fixed connection: where k is the order (memory, connectivity coefficient) of the Markov process.
Having obtained the distribution parameters p ps (x i+1 , . . . , x i+k ), it is possible to calculate the parameters of the next distribution p ps (x i , . . . , x i+k−1 ) in Equation (32). In this case, the complexity is determined by the value of the parameter k, not by the dimension of the entire vector M.
It should be noted that the representation of the channel matrix in the form of a strip matrix with k non-zero diagonals also leads to the representation of the posterior distribution in Equation (32). Next, we consider an approach that also follows the MMSE solution, but without using a direct transition to the observation equation.
Let the posterior distribution be obtained as: This distribution can be written in a factorized form: where This conditional distribution can be represented as the result of combining the prior distribution p pr (x i ) and the equivalent likelihood function: Hence, the equivalent likelihood function is determined by the expression: The concept of an equivalent likelihood function, in general, is associated with the concept of external information, which was obtained from general observation and refers only to the considered vector [x i , x i+1 , . . . x M ] [7].
The idea of representing a sequence x 1 , x 2 , x 3 , . . . , x M by a Markov process with a connection of the k-th order is to approximate p ps ( and describe the general distribution in Equation (32). In this case, the equivalent likelihood functions only depend on k + 1 symbols: For k = 0, we obtain a usual approximation by a diagonal matrix. Taking into account Equation (32), the complete equivalent likelihood function is the product: where Below, we present the rules for calculating the parameters of equivalent likelihood functions. Firstly, there is an initial distribution: From Equation (41), one can select the distribution parameters of the truncated vector: We represent the vector of mathematical expectations and the correlation matrix in the block form: ..i+k are determined by expressions: The equivalent likelihood function is determined by the following expression: where The equivalent likelihood function Λ eq,M−k+1 (x M−k+1 , . . . x M ) used as the initial one for constructing the demodulation procedure is determined based on its definition-Equation (40)-and has the following form: After transformation, this function can be represented as follows: where The obtained likelihood functions can be used to calculate metrics and implement an iterative demodulation procedure (MPA, Turbo detector, etc.): It is easy to notice (M − k + 1) metrics were calculated with this algorithm, each with a volume 2 (k+1)M bit . This is significantly less than in the optimal combination algorithm with 2 MM bit combinations.
Thus, the considered method makes it possible to factorize the likelihood function into factors with a less connectivity, i.e., with a smaller number of observed symbols in individual FP components. This is equivalent to using a decimated channel matrix. Therefore, the methods of the sparsification of the MIMO channel matrix described in the previous sub-sections and the Markov approximation method described in this sub-section solve the same problem of reducing the number of enumerated combinations with the separate processing of the likelihood function components. Such an approach makes it possible to use simple iterative methods.
The considered methods of channel matrix sparsification or FP factorization are approximate, and their accuracy depends on the connectivity coefficient k (or on the fill factor of the rows of the sparse channel matrix m/M). However, the accuracy of the approximation also depends on the ordering of symbols during the sparsification procedure, since the considered approaches mainly take into account the correlation of neighboring symbols. Therefore, to improve the accuracy of the approximation, one can preliminarily introduce some ordering of symbols χ = ΠX, where Π is the permutation matrix, which depends on the correlation matrix of the original distribution. Obviously, to improve the accuracy of the approximation, before sparsification, it is necessary to arrange the symbols, so that the largest values of the correlation coefficients are concentrated near the main diagonal. Figure 1 shows the curves of the Kullback distance D(p, q) between the original posterior distribution and the one approximated by the number of non-zero symbols in the rows of the sparsed channel matrix (connectivity coefficient) for the following algorithms:

•
Algorithm with a block-diagonal matrix-"block"; • Algorithm with a strip matrix and the calculation of the coefficients of the sparse matrix by the method of stochastic optimization-"Opt."; • Algorithm with approximation by a Markov process-"Markov's". • These curves are shown for three options of symbols ordering: • Without ordering-"without order"; • Simple ordering-"simple order". In this option, the symbol with the highest total power of all mutual correlation coefficients is put in the first place, and the rest are arranged in descending order of the magnitudes of their mutual correlation coefficients with the first symbols; • Serial ordering-"serial order". In this option, the symbol with the highest total power of (m − 1) cross-correlation coefficients is selected, and a set Ω (1) of (m − 1) symbols having the maximum correlation with the first symbol is set. Next, from this set, the second symbol is selected, with the largest total power of (m − 1) cross-correlation coefficients, where (m − 2) symbols are already specified and are taken from the set Ω (1) (except for the first and second selected symbols) and one symbol is selected from the rest, not included in this set. Then, the third symbol is determined according to the same principle, etc. • Algorithm with a block-diagonal matrix-"block"; • Algorithm with a strip matrix and the calculation of the coefficients of the sparse matrix by the method of stochastic optimization-"Opt."; • Algorithm with approximation by a Markov process-"Markov's".

•
These curves are shown for three options of symbols ordering: • Without ordering-"without order"; • Simple ordering-"simple order". In this option, the symbol with the highest total power of all mutual correlation coefficients is put in the first place, and the rest are arranged in descending order of the magnitudes of their mutual correlation coefficients with the first symbols; • Serial ordering-"serial order". In this option, the symbol with the highest total power of (m − 1) cross-correlation coefficients is selected, and a set (1) Ω of (m − 1) symbols having the maximum correlation with the first symbol is set. Next, from this set, the second symbol is selected, with the largest total power of (m − 1) cross-correlation coefficients, where (m − 2) symbols are already specified and are taken from the set (1) Ω (except for the first and second selected symbols) and one symbol is selected from the rest, not included in this set. Then, the third symbol is determined according to the same principle, etc. With the number of non-zero symbols in the string m = 1, all algorithms have the same maximum Kullback distance. This case corresponds to a diagonal approximation, and in terms of demodulation characteristics, it corresponds to a conventional MMSE receiver. As m increases, as expected, the Kullback distance decreases. Moreover, for the stripe approximation of the channel model and approximation by the Markov process, the distance decreases to a greater extent than with the block approximation. Especially, a With the number of non-zero symbols in the string m = 1, all algorithms have the same maximum Kullback distance. This case corresponds to a diagonal approximation, and in terms of demodulation characteristics, it corresponds to a conventional MMSE receiver. As m increases, as expected, the Kullback distance decreases. Moreover, for the stripe approximation of the channel model and approximation by the Markov process, the distance decreases to a greater extent than with the block approximation. Especially, a strong decrease is observed when using symbol ordering. Sequential ordering is most effective for m > 2. Figure 2 shows a generalized block diagram of the MIMO detection algorithm using the proposed Markov approximation approach. It is based on the MMSE receiver, which calculates the MMSE estimates vectorX MMSE and the correlation matrix V MMSE . These parameters, after ordering, are used to calculate the parameters of the equivalent likelihood functions, which in turn are used to calculate the metrics, which are then used in a soft iterative QAM detector. The output of this MIMO detector is soft bit estimates or a loglikelihood ratio (LLR) for each bit. strong decrease is observed when using symbol ordering. Sequential ordering is most effective for m > 2. Figure 2 shows a generalized block diagram of the MIMO detection algorithm using the proposed Markov approximation approach. It is based on the MMSE receiver, which calculates the MMSE estimates vector ˆM MSE X and the correlation matrix MMSE V . These parameters, after ordering, are used to calculate the parameters of the equivalent likelihood functions, which in turn are used to calculate the metrics, which are then used in a soft iterative QAM detector. The output of this MIMO detector is soft bit estimates or a log-likelihood ratio (LLR) for each bit. Next, we consider the application of Turbo processing in the detection of the signal received after the observation model is parsed. As a result of the performed sparsification procedure, the parameters describing the following likelihood functions can be obtained: where parameters , , , It is easy to see that the likelihood function  Next, we consider the application of Turbo processing in the detection of the signal received after the observation model is parsed. As a result of the performed sparsification procedure, the parameters describing the following likelihood functions can be obtained: where parameters z i , f i , Z M−k , F M−k can be calculated using Equations (48) and (51). Let each QAM symbol x i contains information about m bits, i.e., it depends on an m-dimensional vector of binary symbols B i . The total number of bits transmitted by the vector X is mM. Therefore, we can introduce an (m × M)-dimensional vector of binary symbols B. Let us introduce a notation for the set of indices Ψ i of size m(k + 1), which joins the numbers of the bits included in the vector It is easy to see that the likelihood function Λ eq,i contains information about the transmitted bits with numbers n ∈ Ψ i . Therefore, by processing this likelihood function, the transmitted bits with numbers n ∈ Ψ i can be estimated. It should be noted that the sets Ψ i intersect with each other, so one can get several estimates of the same bit. The task of demodulation is to process all likelihood functions and evaluate each bit using all received information. To solve this problem, it is possible use Turbo processing.
Before processing the i-th likelihood function, there is an a priori distribution of binary symbols: where b n = ±1 is the transmitted information bit. The parameters describing this distribution are determined by the following expressions ν n,i,pr = p pr,n,i (b n = 1) − p pr,n,i (b n = −1) = tanh λ n,i,pr (55) Notice that the parameter ν n,i,pr is the a priori mathematical expectation of the n-th bit before processing the i-th likelihood function, and the parameter λ n,i,pr is proportional to the logarithm of the ratio of the prior probabilities of the n-th bit before processing the i-th likelihood function. At the first processing step, the combinations of all bits are equiprobable, therefore, ν n,1,pr = 0.
With this in mind, the posterior distribution is written as: where b n∈Ψ i denotes a set of symbols b n , with numbers n ∈ Ψ i . For those bits that do not belong to the set, we can obtain: Taking into account Equations (52), (54) and (57), after a series of transformations, we can obtain: where t n+ is the set of numbers of combinations t in which the value of the n-th bit in the binary bipolar representation is equal to +1 (or equal to 0 in the binary representation); t n− is the set of numbers of combinations t in which the value of the n-th bit in the binary bipolar representation is equal to −1 or is equal to 1 in the binary representation. These parameters of the posterior distribution describe an independent distribution, i.e., it is the same as the input prior distribution. Thus, these distributions can be used as a priori distributions for the next (i + 1)-th step: λ n,i+1,pr = λ n,i,ps .
As a result, we obtain a sequential multi-step procedure for calculating the posterior parameters of the distribution of the full vector of binary bits B. Figure 3 shows a block diagram of one signal-processing cycle using a sequential multi-step procedure for a model with a Markov approximation. As a result, we obtain a sequential multi-step procedure for calculating the posterior parameters of the distribution of the full vector of binary bits B. Figure 3 shows a block diagram of one signal-processing cycle using a sequential multi-step procedure for a model with a Markov approximation. The resulting multi-step signal-processing procedure at each step uses the optimal algorithm for estimating the vector of the observed binary bits. However, when passing to the next step, not the complete posterior distribution is transferred, but only the parameters describing the independent posterior distribution. Therefore, in the multi-step procedure itself, at each step, the approximation of the full posterior distribution by an independent (factorized) one is used. Obviously, this leads to losses. In such cases, itera- The resulting multi-step signal-processing procedure at each step uses the optimal algorithm for estimating the vector of the observed binary bits. However, when passing to the next step, not the complete posterior distribution is transferred, but only the parameters describing the independent posterior distribution. Therefore, in the multi-step procedure itself, at each step, the approximation of the full posterior distribution by an independent (factorized) one is used. Obviously, this leads to losses. In such cases, iterative Turbo processing allows for the reduction of losses.
The considered algorithm allows closing the loop and implementing the iterative procedure, using the posterior distribution obtained at the last step of the (l − 1)-th iteration as the prior distribution at the first step of the i-th iteration, i.e., n,M−k,ps , and therefore, in the a priori parameters λ (l) n,1,pr , in part, there is already information obtained during the processing of the i-th equivalent observation likelihood function at the previous iteration expressed through its parameter: Therefore, to avoid the duplication and incorrect accumulation of information, it is necessary to exclude the result of processing the i-th equivalent likelihood function at the (l − 1)-th iteration from the prior distribution at the i-th processing step and at the l-th iteration. Consequently, we use the following parameters: The result is an algorithm with the block diagram shown in Figure 4. The complexity of the proposed algorithm is determined by the complexity of the MMSE detector, which is proportional to , and the complexities of Turbo processing and QAM demodulation, which are proportional to 2 ( ) , where is the num- The complexity of the proposed algorithm is determined by the complexity of the MMSE detector, which is proportional to M 3 , and the complexities of Turbo processing and QAM demodulation, which are proportional to N it M2 m(k+1) , where M is the number of transmitting antennas, m is the number of bits in the QAM constellation, N it is the number of iterations in the Turbo algorithm, k is the connective parameters of the models. Variables N it and k are the parameters of the algorithm and can be changed during operation to control the complexity and characteristics of the algorithm. For N it = 1 and k = 0, the complexity and characteristics of the algorithm are the same as those of the MMSE. Note that the complexity of the ML algorithm is proportional to 2 mM , which is significantly above the proposed Turbo algorithm with the sparse transformation of the channel model.
Thus, the proposed approach to sparse the MIMO channel matrix using the Markov approximation method allows simplified Turbo processing algorithms for signal demodulation in MIMO systems.

Modeling and Verification
In order to verify the efficiency of the proposed channel matrix sparsification algorithm, link-level simulations were carried out with various types of MIMO detectors. The frame error rate (FER) is a generally accepted performance characteristic for the analysis of various algorithms used in communication systems. It allows a comparison of energy efficiency with different types of modulation, coding, and processing algorithms. Figure 5 shows the dependences of the FER on the SNR per bit (Eb/No) for a MIMO channel with a size of 8 × 8. The results were demonstrated with different approaches for the sparsification of the channel matrix (for m = 2) and with different types of MIMO detectors. The simulations were carried out using QPSK modulation and Turbo code with a rate of 1/2. There are results for the following options: 1.
"MMSE"-MMSE demodulator for the exact model. It also corresponds to the variant of the channel model approximation by a diagonal matrix (Section 3.1); 3.
"Band, Stoh., Opt"-a demodulator using a striped two-diagonal MIMO channel model (Section 3.3; Equation (28)), in which the parameters of the striped channel matrix are calculated by the stochastic optimization method and the optimal demodulator for this model is used; 5.
"Markov's, Turbo det"-a Markov approximation of the channel model (Section 3.4) with connectivity parameters and iterative detection using the method of equivalent likelihood functions (Equations (47) and (50)) and the principle of Turbo processing (two iterations); 6.
"Band, Stoh., MPA det"-iterative MPA detector (three iterations) using a twodiagonal striped MIMO channel model (Section 3.3), in which the channel strip matrix parameters are calculated by the stochastic optimization method.
The results from option (4) above are presented in order to determine the potential capabilities of different variants of the channel model approximation without taking into account the complexity of the detector implementation, i.e., without the influence of suboptimal post-processing.
The curves show that the use of the Markov approximation of the iterative algorithm adopting the method of equivalent likelihood functions and the principle of Turbo processing achieves the same performance as the approximation of the channel model by a strip sparse matrix with the optimization of the coefficients by the method of stochastic optimization and using the optimal soft demodulator. Thus, we can conclude that this approach (Markov approximation) and this demodulation method (Turbo processing + equivalent likelihood functions) achieve theoretically possible characteristics for a given value of the connectivity parameter. In this case, the loss in comparison with the optimal detector and the exact channel model is~0.8 dB. The difference in the energy efficiency between the Markov approximation (or the approximation of the channel model by a strip matrix) and the block-diagonal approximation of the channel model is about 0.5 dB, in favor of the Markov one. modulator for this model is used; 5. "Markov's, Turbo det"-a Markov approximation of the channel model (Section 3.4) with connectivity parameters and iterative detection using the method of equivalent likelihood functions (Equations (47) and (50)) and the principle of Turbo processing (two iterations); 6. "Band, Stoh., MPA det"-iterative MPA detector (three iterations) using a two-diagonal striped MIMO channel model (Section 3.3), in which the channel strip matrix parameters are calculated by the stochastic optimization method.
The results from option (4) above are presented in order to determine the potential capabilities of different variants of the channel model approximation without taking into account the complexity of the detector implementation, i.e., without the influence of suboptimal post-processing. The curves show that the use of the Markov approximation of the iterative algorithm adopting the method of equivalent likelihood functions and the principle of Turbo processing achieves the same performance as the approximation of the channel model by a strip sparse matrix with the optimization of the coefficients by the method of stochastic optimization and using the optimal soft demodulator. Thus, we can conclude that this approach (Markov approximation) and this demodulation method (Turbo processing + equivalent likelihood functions) achieve theoretically possible characteristics for a given value of the connectivity parameter. In this case, the loss in comparison with the optimal Figure 5. Dependences of the frame error rate (FER) on the signal-to-noise ratio per bit for an 8 × 8 MIMO channel with different options for channel matrix sparsification and for different detectors (QPSK modulation and Turbo code with a rate of 1/2).
The MPA demodulation method (variant 6) is similar in complexity in comparison to variant 5 but demonstrates losses of about 0.25 dB. Figure 6 shows similar curves for a coding rate of 3/4. It also demostrates dependencies on different connectivity coefficients of the Markov model (k = 1, 2, 3).  Figure 6 shows similar curves for a coding rate of ¾. It also demostrates dependencies on different connectivity coefficients of the Markov model (k = 1, 2, 3). For a given coding rate, the difference between the block approximation and the Markov model is about 1 dB in favor of the Markov model. An increase in the connectivity coefficient of the Markov model increases the energy efficiency but, at the same time, leads to a complication of the demodulation algorithm. For k = 3, which is equivalent to the number of non-zero elements in the channel matrix m = 4 (half of the total number of For a given coding rate, the difference between the block approximation and the Markov model is about 1 dB in favor of the Markov model. An increase in the connectivity coefficient of the Markov model increases the energy efficiency but, at the same time, leads to a complication of the demodulation algorithm. For k = 3, which is equivalent to the number of non-zero elements in the channel matrix m = 4 (half of the total number of symbols), the loss of the proposed approach with respect to the optimal demodulation algorithm is 1-1.5 dB. Figure 7 shows the FER curves for a 16 × 16 MIMO channel, 16QAM modulation, and Turbo coding with a rate of 3/4. The MIMO detection algorithm with Markov approximation (with a connectivity coefficient k of 1) and Turbo processing with different numbers of iterations are demonstrated. It can be observed that the second iteration leads to an improvement in the energy characteristics by only 0.15 dB. The third iteration does not lead to an improvement in the performance. This is largely due to a fairly effective procedure of symbols ordering. Compared to the MMSE algorithm (diagonal approximation of the model) for a given MIMO channel configuration, modulation, and coding parameters, the gain in the performance is in the range of 8-14 dB.

Conclusions
In this paper, efficient methods to approximate the MIMO channel model by sparse matrices or with an equivalent Markov process were introduced and thoroughly analyzed. The proposed approach allows the use of iterative detection methods and leverages the complexity of the implementation of the MIMO detector and its energy efficiency.
Based on the results of link-level simulations, it was demonstrated that the utilization of the proposed approach makes it possible to achieve performance characteristics of a MIMO communication system that are close to the theoretical maximum but with a significantly less complexity. For example, the reduction in the complexity compared to the optimal ML receiver is ~60 times, whereas the SNR decrease does not exceed 0.5-1 dB Compared to the MMSE algorithm (diagonal approximation of the model) for a given MIMO channel configuration, modulation, and coding parameters, the gain in the performance is in the range of 8-14 dB.

Conclusions
In this paper, efficient methods to approximate the MIMO channel model by sparse matrices or with an equivalent Markov process were introduced and thoroughly analyzed. The proposed approach allows the use of iterative detection methods and leverages the complexity of the implementation of the MIMO detector and its energy efficiency.
Based on the results of link-level simulations, it was demonstrated that the utilization of the proposed approach makes it possible to achieve performance characteristics of a MIMO communication system that are close to the theoretical maximum but with a significantly less complexity. For example, the reduction in the complexity compared to the optimal ML receiver is~60 times, whereas the SNR decrease does not exceed 0.5-1 dB in an 8 × 8 MIMO configuration, QPSK modulation, with two iterations and a connectivity of the approximated Markov model k of 1.
The proposed methods are based on the MMSE solution and can be used as the SNR deteriorates. If the MMSE algorithm provides required FER characteristics, further processing can be excluded. However, if the quality of the MMSE detection is not satisfactory (low SNR), then the first-or second-order Markov model is used. In most cases, it is sufficient to use one iteration. Funding: This paper has been supported by the RUDN University Strategic Academic Leadership Program (the recipient is Dmitry Petrov).