Iterative QR-Based SFSIC Detection and Decoder Algorithm for a Reed–Muller Space-Time Turbo System

An iterative QR-based soft feedback segment interference cancellation (QRSFSIC) detection and decoder algorithm for a Reed–Muller (RM) space-time turbo system is proposed in this paper. It forms the sufficient statistic for the minimum-mean-square error (MMSE) estimate according to QR decomposition-based soft feedback successive interference cancellation, stemmed from the a priori log-likelihood ratio (LLR) of encoded bits. Then, the signal originating from the symbols of the reliable segment, the symbol reliability metric, in terms of an a posteriori LLR of encoded bits which is larger than a certain threshold, is iteratively cancelled with the QRSFSIC in order to further obtain the residual signal for evaluating the symbols in the unreliable segment. This is done until the unreliable segment is empty, resulting in the extrinsic information for a RM turbo-coded bit with the greatest likelihood. Bridged by de-multiplexing and multiplexing, an iterative QRSFSIC detection is concatenated with an iterative trellis-based maximum a posteriori probability RM turbo decoder as if a principal Turbo detection and decoder is embedded with an iterative subordinate QRSFSIC detection and a RM turbo decoder, exchanging each other’s detection and decoding soft-decision information iteratively. These three stages let the proposed algorithm approach the upper bound of the diversity. The simulation results also show that the proposed scheme outperforms the other suboptimum detectors considered in this paper.


Introduction
Multiple-input multiple-output (MIMO) systems achieve high data rate transmission and substantial gains in channel capacity over a rich-scattering environment [1,2], which makes very high spectral efficiency available for the emerging wireless standard IEEE 802.11n/ac [3], etc. Moreover, if a powerful channel code, such as turbo codes [4], can be serially concatenated with this architecture, a kind of MIMO system, i.e., the turbo space-time system, is obtained in order to achieve near-capacity performance [5,6].
Endowed with the turbo learning principle, iterative detection (inner decoder) and decoder (outer decoder) receivers for the turbo space-time system [5], detecting and decoding, by exchanging soft bits of information mutually, can approximately approach optimal performance with reduced complexity [5,6]. Figure 1 presents the configuration of a Reed-Muller space-time turbo system with a receiver based on a QR-based SFSIC detection concatenated with an iterative trellis-based MAP RM turbo decoder. In this system, there are N t transmit antennas and N r receive antennas. Let M b be the number of symbols in the complex constellation C. The user's information bits u are first encoded into the Reed-Muller turbo code stream by the outer encoder, which is then bit-interleaved using an offline-designed pseudo-random interleaver referred to as Π. Followed by de-multiplexing, the high-speed RM turbo code stream transfers into low-speed N t parallel sub-vectors. In the end, each sub-vector is independently mapped into the transmitted vector constellation symbol x whose entries [x 1 , · · · , x m , · · · , x N t ] T are chosen from C, where x m is a symbol transmitted out of the mth transmit antenna at the ith interval. The received N r × 1 vector can be represented by:

Reed-Muller Space-Time Turbo System Model
where H denotes the N r × N t channel matrix, the elements of which are independent and identically distributed (i.i.d.) with the zero-mean unit-variance complex Gaussian distribution. η is a N r × 1 complex white Gaussian noise with zero mean and a covariance of 2σ 2 I, where I is an N r × N r identity matrix.  Figure 1 presents the configuration of a Reed-Muller space-time turbo system with a receiver based on a QR-based SFSIC detection concatenated with an iterative trellis-based MAP RM turbo decoder. In this system, there are Nt transmit antennas and Nr receive antennas. Let Mb be the number of symbols in the complex constellation C. The user's information bits u are first encoded into the Reed-Muller turbo code stream by the outer encoder, which is then bit-interleaved using an offline-designed pseudo-random interleaver referred to as Π. Followed by de-multiplexing, the high-speed RM turbo code stream transfers into low-speed Nt parallel sub-vectors. In the end, each sub-vector is independently mapped into the transmitted vector constellation symbol x whose

Reed-Muller Space-Time Turbo System Model
x is a symbol transmitted out of the mth transmit antenna at the ith interval. The received Nr × 1 vector can be represented by: where H denotes the Nr × Nt channel matrix, the elements of which are independent and identically distributed (i.i.d.) with the zero-mean unit-variance complex Gaussian distribution. η is a Nr × 1 complex white Gaussian noise with zero mean and a covariance of Figure 1. The configuration of a Reed-Muller space-time turbo system with a receiver based on a QR-based SFSIC detector and decoder.
In order to obtain the most likely information vector, we apply the statistical features among all components of the received signals. By doing this, the multi-stream interference from other transmit antennas is mitigated, and different types of detectors are also acquired.
The MAP algorithm selects the most likely encoded bit with respect to all combinations of transmitted signals by maximizing the log-likelihood ratio (LLR) [4,18] as given by: The Log MAP algorithm must calculate where Γ is the code constraint length sequence corresponding to a sequence of modulated symbols to be transmitted during the ith interval, the metric being expressed in Equation (2), and the vector that yields the largest metric being selected. It is a considerably complicated NP-complete (nondeterministic polynomial time-complete) problem when NtΓ is large. Although the bit error rate (BER) of it is considerably low, its computational complexity is unacceptable in reality. To reduce the computational complexity, an alternative approach is adopted. By employing the sphere decoder technique, the list sphere In order to obtain the most likely information vector, we apply the statistical features among all components of the received signals. By doing this, the multi-stream interference from other transmit antennas is mitigated, and different types of detectors are also acquired.
The MAP algorithm selects the most likely encoded bit with respect to all combinations of transmitted signals by maximizing the log-likelihood ratio (LLR) [4,18] as given by: where X(d k,m = ±1) is the set of all possible RM turbo-coded bit vectors, the mth bit position of the constellation symbol for the kth transmit antenna, which is fixed with d k,m = ±1. The Log MAP algorithm must calculate 2 N t Γ , where Γ is the code constraint length sequence corresponding to a sequence of modulated symbols to be transmitted during the ith interval, the metric being expressed in Equation (2), and the vector that yields the largest metric being selected. It is a considerably complicated NP-complete (nondeterministic polynomial time-complete) problem when N t Γ is large. Although the bit error rate (BER) of it is considerably low, its computational complexity is unacceptable in reality. To reduce the computational complexity, an alternative approach is adopted. By employing the sphere decoder technique, the list sphere decoder [6,7] and other improved search vector versions [8] that are composed of the constellation points are limited to a sphere of radius C centered at the received signal. These approaches aim to implement suboptimum detection, with its complexity depending on the size of the candidate vector set to be searched for detections per information bit. However, it is still a rather complicated NP-complete problem when the size of the candidate vector set is large. Furthermore, the SIOF detectors are proposed to avoid EP and reduce the complexity. It is to further devised that the suboptimum decoders, which are based on the LLR function, have both lower computational complexities and fast convergence rates, that serve as the motivation to write this paper.

The Novel Iterative QR-Based SFSIC Detection and Decoder Algorithm
The proposed scheme, named the iterative QR-based soft feedback segment interference cancellation detection and decoder algorithm, is depicted on the right-hand side of the Figure 1. It is composed of an iterative QRSFSIC detector (inner decoder) and an iterative trellis-based MAP RM turbo decoder (outer decoder), which mitigates the multi-stream interference with inner iterations executed by the inner decoder and fulfills error correcting through outer iterations by the outer decoder to obtain the desired information bit by exchanging information between inner and outer decoders. The operation of iterative detection and the decoder algorithm is to maximize a posterior LLR, which is made up of a priori or old information and new information, or more technically, extrinsic information, acquired during this iterative detection and decoding procedure, the latter of which is only information exchanged in processing cycles.

Iterative QR-Based SFSIC Detector (Inner Decoder)
The iterative QRSFSIC, based on a progressive partition of the transmitted symbols into the reliable/residual symbol segment according to a posterior LLR, gives rise to inner iterations to evaluate the MMSE estimate of residual symbols which resort to soft feedback and interference cancellation of the reliable segment. Its crucial idea is to save computation by QR decomposition, which results in a simple successive detection algorithm with reduced complexity. When it is applied to detect multi-transmit antenna signals, its novelties can be stated as follows.

Forming a Reliability Metric Based on a Posterior Information
At first, in terms of QR decomposition of the channel matrix H = QR, where Q is a unitary matrix, and R is the upper triangular, multiplying the received signal Y with Q H , the algorithm yields the sufficient statistic [13,16] for soft feedback successive interference cancellation of x i n : where the first term on the right of the third equal sign comprises the desired symbol that needs to be detected and the rest of the terms represent the interference that is not completely cancelled plus the noise. x a i n are the soft estimates of x i n derived from the a priori LLRs, written as: where L a i n ,1: the set of all constellation points, each of which corresponds to an encoded bit vector d i n ,1 , . . . , d i n ,M b according to a certain mapping.
Then, the MMSE estimatex i n is denoted by [13,16]: where w i n is a weighted coefficient vector of the MMSE estimate. ψ a,k:l indicates the covariance matrix of [x k , . . . , x l ] T , given a prior LLRs. Due to the interleaver, symbols from each transmit antenna would be regarded as being independent. Thus, each non-diagonal element of ψ a,k:l is zero. The qth diagonal element Ψ i n (q, q) of Ψ i n can be represented as: Similarly, if ψ f ,k:l denotes the covariance matrix of [x k , . . . , x l ] T , given a posterior LLRs, to obtain the qth diagonal element Ψ i n (q, q), we should let Thus, if the proposed algorithm operates l − 1 inner iterations, the symbol reliability metric Λ n can be obtained according to the posterior LLR of encoded bits: where L f i n ,i , composed of a prior LLR L a1 i n ,i and an extrinsic LLR L e1 i n ,i , is the a posterior LLR of the ith encoded bit from the i n th antenna, expressed by: When Λ i n > τ (1 ≤ i n ≤ N t ), the detected symbol transmitted from the i n th antenna is a reliable one such that it is saved in the reliable symbol segment φ while the rest is put into the residual symbol segment ϑ. Obviously, τ is the threshold whose value will affect the number of symbols whose interference is to be cancelled in the next inner iteration. However, the proper τ is difficult to exploit. Based on the trade-off between symbols with a larger reliability metric, which should be enclosed in the reliable symbol segment, and relatively lower computational effort, we choose: where β = 0.85 is the empirical value [13]. To perform the lth inner iteration for detecting the symbols in segment ϑ l , based on an a posteriori estimate x i f , we obtain the residual signal [13,16] after canceling the interference resulting from the segment φ l , i.e.,: Thereafter, the MMSE estimatex l k of the residual symbol segment ϑ l can be calculated as [13,16]: where A k is the variance of an a priori estimate of x k for k ∈ ϑ l and the variance of an a posteriori estimate for k ∈ φ l , which can be acquired by Equations (6) and (7).

Calculating the Extrinsic LLR and A Posterior LLR of Encoded Bits
Furthermore, if we rewrite Equation (5) as [13,16,19]: where γ i n is the residual interference plus noise that can be approximately regarded as a Gaussian random variable with mean zero and variance [19], expressed as: The extrinsic LLR of d i n ,m is denoted by: where Θ 1 m and Θ 0 m represents the set of constellations of the mth bit that could be 1 and 0. Combining the extrinsic LLR with a priori LLRs, the a posteriori LLR is obtained according to Equation (9), which is then exploited to calculate the reliability metric of symbols in the residual segment ϑ l . Hereafter, we continue to iteratively detect symbols in ϑ l until ϑ l becomes void. Finally, followed by de-interleaving, the extrinsic information is fed into the outer decoder as a priori information.

RM-Turbo Iterative Decoder (Outer Decoder)
The extrinsic information obtained from the inner decoder after de-interleaving is inputted into the outer decoder, which is made up of a pair of component decoders, i.e., the horizontal decoder and the vertical decoder, which are separated by two transposes. The configuration of the outer decoder, characterized by the feedback structure and the a posteriori probability (APP) LLR of the extrinsic information exchange among different component minimal trellis-based MAP decoders in an iterative way, is depicted in Figure 2. The working principle of the outer decoder can be stated as shown below. priori information.

RM-Turbo Iterative Decoder (Outer Decoder)
The extrinsic information obtained from the inner decoder after de-interleaving is inputted into the outer decoder, which is made up of a pair of component decoders, i.e., the horizontal decoder and the vertical decoder, which are separated by two transposes. The configuration of the outer decoder, characterized by the feedback structure and the a posteriori probability (APP) LLR of the extrinsic information exchange among different component minimal trellis-based MAP decoders in an iterative way, is depicted in Figure 2. The working principle of the outer decoder can be stated as shown below.

Generating Extrinsic Information for the Vertical Decoder
Once the soft bit stream coming from the inner decoder is de-multiplexed and decomposed to obtain the horizontal soft systematic bit stream and soft parity bit stream, the APP LLR of the extrinsic information for systematic bit

Generating Extrinsic Information for the Vertical Decoder
Once the soft bit stream coming from the inner decoder is de-multiplexed and decomposed to obtain the horizontal soft systematic bit stream and soft parity bit stream, the APP LLR of the extrinsic information for systematic bit d i n ,m , i.e., u i n ,m , of the horizontal decoder in the q outer iteration is defined as [4,17]: where L c is called the reliability value of the channel; L − (d i n ,m , q) denotes the received de-multiplexed soft output of the horizontal decoder, associated with the information bit u i n ,m , estimated by the outer decoder; L | e T (i n , m, q) is the transpose of the [i n , m]th APP LLR extrinsic information L | e (i n , m, q) of the vertical decoder, which is equal to L − a (i n , m, q), the a priori LLR value for the [i n , m]th information bit, at the first iteration; and L − 1 (i n , m, q) is the APP LLR associated with the information bit u i n ,m in the q-th iteration, expressed by: where (s , s, d i n ,k = +1) and (s , s, d i n ,k = −1) denote the set of the pair of states that connects the trellis state s at time k − 1 to the state s at time k, respectively; and d i n ,j<k denotes the portion of the received sequence from bit 0 up to bit k − 1. Similarly, the received sequence from bit k up to bit m − 1 is represented by d i n ,j>k . Moreover, α k (s ), k (s), and λ k (s , s) are forward, backward, and branch metric recursions of the MAP decoding, obtained by: where L(d i n ,k ; L − (d i n ,k , q)) = L c · L − (d i n ,k , q) + L − a (i n , m, q) for information bits and L(d i n ,k ; L − (d i n ,k , q)) = L c · L − (d i n ,k , q) for parity bits.

Engendering Extrinsic Information and Making Hard Decisions
Meanwhile, the horizontal decoder will use L | e T (i n , m, q) in place of L − a (i n , m, q) after the first iteration, the transpose of which, computed by the vertical decoder is, in turn: where L | (d i n ,m , q), estimated by the outer decoder, denotes the received de-multiplexed soft output of the vertical decoder, associated with the information bit u i n ,m ; L − e T (i n , m, q) is the transpose of the [i n , m]th APP LLR extrinsic information L − e (i n , m, q) of the horizontal decoder; L | 2 (i n , m, q) is the APP LLR associated with the information bit u i n ,m in the qth iteration, which can be defined in the manner similar to Equations (18) and (19). Note that, in Equation (20), the APP LLR extrinsic information L | e (i n , m, q) is obtained in such a way that subtracting the terms, namely,L c · L | (d i n ,m , q) and L − e T (i n , m, q), removes the effect of the current information bit d i n ,m from L | 2 (i n , m, q), leaving only the effect of the parity constraints, thus providing an independent estimate of information bit d i n ,m to the horizontal decoder, in addition to the received soft channel LLR L c · L − (d i n ,m , q). The iterative process is usually terminated after a predetermined number of iterations, when the soft-output value stabilizes and changes slightly between successive iterations. Furthermore, Equations (17) and (18) can be relatively straightforwardly generalized to analyze and discuss the soft parity bits streams. Then, we multiplex the soft systematic bit streams and the soft parity bit streams in the same way as that in the RM turbo encoding [4,17] to obtain the encoded bits' extrinsic information which is sent to the iterative QRSFSIC detector as an a priori LLR for the next iteration.
With the increasing number of iterations, the contribution of extrinsic information from the inner and outer decoders to the improvement of performance of the receiver diminishes. Finally, the availability vanishes, that is to say the receiver converges. At this time, the vertical decoder will not only combine both extrinsic information values in computing the soft output L | 2 (i n , m, q), but was also make the difficult decision after de-multiplexing to obtain the estimatedû i n ,m of the information bit.

Simulation Results and Discussion
In this section, a simulation investigation which, in terms of a time-domain Monte Carlo method, was designed in MATLAB (Matrix Laboratory) R2010a under the Windows 7 Ultimate 64-bit operating system, using a computer with a 3.4-GHz Intel(R) Core(TM) i7-2600K CPU, is carried out to show the performance of the proposed scheme in this context. We use three iterative detection schemes, i.e., an iterative QR-based soft feedback segment interference cancellation detector and decoder (IQRSFSICDD), an iterative soft input-soft output and soft feedback detector and decoder (ISIOFDD), and an iterative minimum mean square error parallel interference cancellation detector and decoder (IMMSEPICDD). The simulation results are described in two aspects, i.e., the performance of ISISOICSPDD over a number of iterations between the detector and the decoder loop (the major iteration) and the bit error rate (BER), which, counted by comparisons between the estimated bits and the transmitted bits, is measured as a function of signal-to-noise ratio SNR (signal-to-noise ratio), which is followed by a discussion of the diversity order of the proposed scheme.
Let us consider that, in a Rayleigh fading channel, there is an RM space-time turbo system, shared by N t transmitters and N r receivers. The length of the frame is 1000 symbols and the transmitted bits for each SNR is larger than 10/BER, which is consistent with the requirements of the Monte Carlo method. The transmitters employ the 4-ary quadrature amplitude modulation (4-QAM) with Gray mapping. An overall code rate R = 121/240, or R = 676/988 outer channel RM turbo code with the same RM (2, 4), or an RM (3,5) [17,18] code in each dimension is used.
In Figure 1, the pseudo-random interleaver (∏) is used not only to de-correlate the fading channel and maximize the diversity order of the system, but also to eliminate the correlation in the sequence of the RM turbo coded bits, which is crucial for the proposed algorithm. The number of iterations of the RM turbo decoder (outer iteration) is limited to 5. Figure 3 depicts the BERs of the IQRSFSICDD versus SNR values over a range of iterations, with the transmitters using an R = 121/240 outer channel RM turbo code and the 4-QAM for N t = N r = 3, and an R = 676/988 outer channel RM turbo code accompanied by the 4-QAM for N t = N r = 4, respectively. Then, at a certain input SNR or, more accurately, an input SNR threshold, the BER curves go into a waterfall region with their characteristic sharp drop, where the proposed algorithm would converge to zero BER within a finite number of iterations. Figure 3 has also shown that though the greater number of iterations would lead to the lower bit error rate, the gradient of the BER with respect to the number of iterations for the proposed algorithm does not obviously decrease when the iteration size is increased from 4 to 5. Henceforth, we fix the iterations to 5 as a fundamental trade-off between the BER and computational complexity of the proposed algorithm in all our following simulations. Figure 4 represents the BERs of three iterative detection schemes versus SNR, with the transmitters using the R = 121/240 outer channel RM turbo code and the 4-QAM for Nt = Nr = 3, and the R = 676/988 outer channel RM turbo code accompanied by the 4-QAM for Nt = Nr = 4, respectively.
The BER of the proposed scheme is not only a few orders of magnitude lower than those of the IMMSEPICDD, but also lower than that of the ISIOFDD. However, when the source employs the R = 121/240 outer channel RM turbo code, though the BER of the proposed scheme is somewhat lower than that of the ISIOFDD, the difference between them is not significant. This is because, in this circumstance, the information coded with the lower overall code rate, i.e., R = 121/240, the performances of these two detection schemes are almost characterized by the error protection (UEP) For the graphic clarification, not all of the simulation results are shown and the number of the iteration being one means that there is no soft information exchanging between the iterative IQRSFSIC detector and minimal trellis-based MAP RM turbo decoder. From Figure 3, we can obtain the following observation: Originally, in the relatively lower input SNR interval, the BER of the proposed algorithm is almost not favorable to the numbers of detector and decoder iterations being executed.
Gradually, with the increase in the input SNR, the BERs between successive iterations could be distinguished.
Then, at a certain input SNR or, more accurately, an input SNR threshold, the BER curves go into a waterfall region with their characteristic sharp drop, where the proposed algorithm would converge to zero BER within a finite number of iterations. Figure 3 has also shown that though the greater number of iterations would lead to the lower bit error rate, the gradient of the BER with respect to the number of iterations for the proposed algorithm does not obviously decrease when the iteration size is increased from 4 to 5. Henceforth, we fix the iterations to 5 as a fundamental trade-off between the BER and computational complexity of the proposed algorithm in all our following simulations. Figure 4 represents the BERs of three iterative detection schemes versus SNR, with the transmitters using the R = 121/240 outer channel RM turbo code and the 4-QAM for N t = N r = 3, and the R = 676/988 outer channel RM turbo code accompanied by the 4-QAM for N t = N r = 4, respectively.

Conclusions
A novel, computationally-efficient IQRSFSICDD is proposed for an RM space-time turbo system. It uses a QR decomposition of the channel matrix and directs it to a simple soft feedback successive interference cancellation based on an a priori LLR of encoded bits which, in turn, economizes computation compared to the algorithm in [13] with the same error performance. Then, the original signal is divided repeatedly into reliable/unreliable segments according to whether an a posteriori LLR of encoded bits is larger than some threshold, which assures the optimal detection sorting in case of the risk of EP. Moreover, RM-turbo codes, each dimension of which is represented by the same RM code with an optimal sectionalized trellis, in the sense of minimizing the number of multiplication operations, is utilized for suitable formation of an iterative trellis-based MAP RM turbo decoder. In this way, the proposed scheme shortens the delay of iterations between the inner decoder and the outer decoder to a certain degree which, in turn, lower the complexity of the proposed algorithm. Simulation results show that the performance of the proposed algorithm outperforms the other suboptimum detectors considered in this paper. Finally, an upper bound of the diversity acquired according to intuitional deduction can be relatively straightforwardly applied to analyze and discuss all types of iterative detectors and decoders for RM space time turbo systems.
The BER of the proposed scheme is not only a few orders of magnitude lower than those of the IMMSEPICDD, but also lower than that of the ISIOFDD. However, when the source employs the R = 121/240 outer channel RM turbo code, though the BER of the proposed scheme is somewhat lower than that of the ISIOFDD, the difference between them is not significant. This is because, in this circumstance, the information coded with the lower overall code rate, i.e., R = 121/240, the performances of these two detection schemes are almost characterized by the error protection (UEP) property of the shortened codes. However, when the overall code rate is higher and the number of transmit antennas is increased, the higher diversity order makes the proposed scheme achieve the best performance. These conclusions also coincide with the results provided by [4,17] for the RM turbo code since both of the proposed scheme and the ones in [4,17] are endowed with stronger component codes, i.e., the RM codes, as extended Hamming codes, to obtain additional coding gain though the scenario for them to operate are far from each other. Moreover, according to d gain = lim SNR→∞ − log P e (SNR) log(SNR) where P e is the bit error rate [20] and SNR is the signal-to-noise ratio, we can discover that the diversity gain of the proposed scheme approximates diversity order 9 and 16 [20] as the SNR increases. Again, this demonstrates that the diversity order of the proposed algorithm is N t × N r . These conclusions are also consistent with the results provided by [14,15] for the turbo MIMO systems. Its performance is superior to all other schemes considered in this context.

Conclusions
A novel, computationally-efficient IQRSFSICDD is proposed for an RM space-time turbo system. It uses a QR decomposition of the channel matrix and directs it to a simple soft feedback successive interference cancellation based on an a priori LLR of encoded bits which, in turn, economizes computation compared to the algorithm in [13] with the same error performance. Then, the original signal is divided repeatedly into reliable/unreliable segments according to whether an a posteriori LLR of encoded bits is larger than some threshold, which assures the optimal detection sorting in case of the risk of EP. Moreover, RM-turbo codes, each dimension of which is represented by the same RM code with an optimal sectionalized trellis, in the sense of minimizing the number of multiplication operations, is utilized for suitable formation of an iterative trellis-based MAP RM turbo decoder. In this way, the proposed scheme shortens the delay of iterations between the inner decoder and the outer decoder to a certain degree which, in turn, lower the complexity of the proposed algorithm. Simulation results show that the performance of the proposed algorithm outperforms the other suboptimum detectors considered in this paper. Finally, an upper bound of the diversity acquired according to intuitional deduction can be relatively straightforwardly applied to analyze and discuss all types of iterative detectors and decoders for RM space time turbo systems.