A Robust Semi-Blind Receiver for Joint Symbol and Channel Parameter Estimation in Multiple-Antenna Systems

: For multiple-antenna systems, the technologies of joint symbol and channel parameter estimation have been developed in recent works. However, existing technologies have a number of problems, such as performance degradation and the large cost of prior information. In this paper, a tensor space-time coding scheme in multiple-antenna systems was considered. This scheme allowed spreading, multiplexing, and allocating information symbols associated with multiple transmitted data streams. We showed that the received signal was formulated as a third-order tensor satisfying a Tucker-2 model, and then a robust semi-blind receiver was developed based on the optimized Levenberg–Marquardt (LM) algorithm. Under the assumption that the instantaneous channel state information (CSI) is unknown at the receiving end, the proposed semi-blind receiver jointly estimates the information symbol and channel parameters efﬁciently. The proposed receiver had a better estimation performance compared with existing semi-blind receivers, and still performed well when the channel became strongly correlated. Moreover, the proposed semi-blind receiver could be extended to the multi-user massive multiple-input multiple-output (MIMO) system for joint symbol and channel estimation. Computer simulation results were shown to demonstrate the effectiveness of the proposed receiver.


Introduction
Multiple-antenna techniques are well known to provide spatial diversity and multiplexing gains [1][2][3].Over the last few decades, the benefits of multiple-antenna communications have been verified in both theory and practice.On the other hand, tensor-based signaling approaches that utilize several signal dimensions such as time, space, and code, are seen as good technologies for improving the information transmission rate and enhancing communication reliability [4][5][6].Against this background, the problem of joint symbol and channel estimation is resolved by using tensor-based signaling approaches, and a number of semi-blind or blind receivers have been proposed for multiple-input multiple-output (MIMO) systems.
A parallel factor (PARAFAC) [7] based receiver is proposed in [8] by using the Khatri-Rao space-time (KRST) coding scheme, which can achieve a flexible tradeoff between error performance and transmission efficiency.In [9], the authors extend the KRST coding scheme by using the linear constellation precoding, and then developing several semi-blind receivers.These semi-blind receivers allow a joint symbol and channel estimation without requiring pilot sequences for the instantaneous channel state information (CSI) acquisition.In [10], the authors develop a new tensor-based receiver in MIMO relay systems for channel estimation by using PARAFAC analysis.A low complexity PARAFAC-based channel estimation scheme for non-regenerative MIMO relay systems is developed in [11].In [12], a novel semi-blind receiver is derived using a multiple KRST coding scheme for joint symbol and channel estimation.More recently, a nested PARAFAC-based receiver for cooperative MIMO communications is proposed in [13], and three-step and double two-step alternating least squares (ALS) algorithms are proposed to fit the nested PARAFAC model for estimating system parameters.For millimeter wave (mmWave) massive MIMO systems, a PARAFAC decomposition-based algorithm is developed in [14] to jointly estimate channel parameters of multiple users.In [15], the algorithm in [14] is extended to mmWave MIMO orthogonal frequency division multiplexing (MIMO-OFDM) systems for channel estimation, and Cramér-Rao bound (CRB) results for channel parameters are also derived.Considering the channel estimation issue in the presence of pilot contamination for multi-cell massive MIMO systems, a new PARAFAC-based approach is proposed in [16] to jointly estimate directions of arrival, fading coefficients, and delays.Although these works [8][9][10][11][12][13][14][15][16] consider different design approaches, their common feature is using the PARAFAC model, which needs to know the first column or row of one loading matrix to eliminate scaling ambiguity.Furthermore, the ALS algorithm used in these receivers exhibits a convergence problem when ill-conditioned factor matrices exist [17].
In contrast to the ALS algorithm, the Levenberg-Marquardt (LM) algorithm updates all the parameters to be estimated at the same time.The LM algorithm is successfully used to fit some tensor models, adapt to collinearity problems, and provide quadratic convergence [18][19][20].A LM algorithm is first proposed for fitting PARAFAC model in [18].In [19], the authors present a LM algorithm to the decomposition of the Block Component Model (BCM) in the uplink of a wideband direct-sequence code-division multiple access (DS-CDMA) systems.Recently, a LM algorithm was developed in [20] to jointly estimate information symbol and channel matrices for a generalized PARATUCK2 model.As an iterative algorithm, the LM algorithm is also sensitive to initialization.Thus, the optimization of the initial value is important to improve the performance of the LM algorithm.
In [21], a tensor-based space-coding scheme using PARATUCK2 model is developed.For the PARATUCK2 model, the number of channel uses can be different from one transmitted data stream to another.In [22], a generalized PARATUCK2 model is proposed by exploiting a tensor space-time (TST) coding.Recently, a Kronecker product least squares (KPLS) receiver is proposed in [23] to estimate the symbol and channel matrices.More recently, it is shown in [24] that a KPLS receiver can be extended to all the tensor-based systems.Although the KPLS receiver is a non-iterative and low-complexity solution, it needs the related core tensor unfolding to be right-invertible, which is a relatively harsh condition in signal design.
Inspired by [21] and [22], we considered a simple tensor space-time coding scheme for multiple-antenna systems, along with an efficient receiver.The allocation factor and the space-time code factor in the TST coding scheme in [22] are independent, while the allocation factor in our coding scheme is also a three-dimensional space-time code factor.Thanks to the special structure of the proposed coding scheme, the received signal can be constructed as a Tucker-2 model [25,26], which has uniqueness property under some suitable conditions.Then, a robust semi-blind receiver based on optimized LM algorithm is presented for joint channel and symbol estimation.Uniqueness and identifiability issues for the constructed Tucker-2 model are also discussed in this paper.Compared with existing receivers, the proposed receiver has a better estimation performance.Moreover, the proposed semi-blind receiver can be extended to the multi-user massive MIMO system.For the low-rank channel,the proposed receiver still has good performance for joint symbol and channel estimation even in the shorter length of code and information symbol, and larger number of data streams.
The organization of this paper is as follows.Section 2 presents a brief overview of the Tucker model.In Section 3, the system model is presented and the associated tensor signal model is formulated.Section 4 briefly reviews the receiver with the ALS algorithm and describes the proposed semi-blind receiver based on the optimized LM algorithm.Section 5 extends the proposed semi-blind receiver to multi-user massive MIMO systems for joint symbol and channel estimation.In Section 6, some simulation results are shown to demonstrate the performance of our semi-blind receiver.Conclusions are drawn in Section 7.
Notation: Scalars, vectors, matrices, and tensors are denoted by lower-case letters (a, b, • • •), boldface lower-case letters (a, b, • • •), boldface capitals (A, B, • • •), and underlined boldface capitals (A, B, • • •), respectively.A T , A H , A −1 , and A † represent transpose, conjugate transpose, inverse, and Moore-Penrose pseudo-inverse of the matrix A, respectively.A F denotes the Frobenius norm of A. I M denotes the M × M identity matrix.The operator vec (•) stacks the columns of its matrix argument to a vector, while unvec (•) represents the inverse vectorization operation.The Kronecker matrix product is denoted by ⊗.The term D i (A) corresponds the diagonal matrix out of the i-th row of A.

Tucker Model
This section first presents a brief overview of the Tucker model, and then focuses on the Tucker-2 model used in this work.For an Nth-order tensor T ∈ C I 1 ×•••×I N , a Tucker-N model or Tucker model is defined in the following scalar form as [26]: where i n = 1, . . ., I n for n = 1, . . ., N, a i n ,r n and g r 1 ,...,r N stand for typical elements of the matrix factor A (n) ∈ C I n ×R n and the core tensor G ∈ C R 1 ×•••×R N , respectively.Using the mode-n product representation, the model (1) can be written as: where G× n A (n) denotes the mode-n product of G and A (n) along the N th mode, gives a tensor where w r 1 ,...,r n−1 ,i n ,r n+1 ,...,r N is a typical element of the tensor W. It has been known that the Tucker model is not essentially unique [26], which restricts its application.Their matrix factors can be only determined up to nonsingular transformations characterized by nonsingular matrices.However, some low-order Tucker models with special structures are unique up to permutation and/or scaling ambiguities.
Assuming N = 3 and A (3) = I I 3 for the third-order tensor T ∈ C I 1 ×I 2 ×I 3 , we have: This model is called Tucker-2 model or Tucker-(2, 3) model, and is widely applied in data analysis and parameter estimation [4].A (1) and A (2) are the two loading matrices, and G is the core tensor.In the same way, such a model can be written in terms of mode-n product as: (5)

System Model
Consider a multiple-antenna system with M S transmit antennas and M D receive antennas as shown in Figure 1.h m D ,m S represents the channel coefficient between the m S -th transmit antenna and the m D -th receive antenna ( m S = 1, . . ., M S , m D = 1, . . ., M D ). s n,r represents the n-th symbol of the r-th data stream (n = 1, . . ., N, r = 1, . . ., R), with each data stream being formed of N information symbols.Each symbol s n,r is coded by a three-dimensional space-time code b m S ,r,p (p = 1, . . ., P), whose dimensions are the numbers of transmit antennas, data streams, and chips, respectively.We then define the antenna-to-slot allocation factor q p,m S , which is 0 or 1.Both the transmitter and the receiver know these factors b m S ,r,p and q p,m S .The signal transmitted from m S -th transmit antenna, during the n-th symbol period of the p-th chip, is given by: where s n,r and q p,m S are (n, r)-th and (p, m S )-th elements of signal matrix S ∈ C N×R and the antenna-to-slot allocation matrix Q ∈ C P×M S , respectively.x m S ,n,p and b m S ,r,p are typical elements of the transmitted signal tensor X ∈ C M S ×N×P and the coding tensor B ∈ C M S ×R×P , respectively.The elements in B are chosen as e √ −1ς 2π , where ς is taken from random uniformly distributed pseudorandom numbers.In our tensor coding scheme, the number of transmitted data streams is not restricted to be equal to that of transmit antennas, and the data streams can be allocated to an arbitrary set of transmitted antennas.Without considering the allocation of stream-to-slot, the coding scheme in [21] can be regarded as a special case of our tensor coding scheme with a fixed two-dimensional space-time code.
Assuming Rayleigh flat fading channels, then the discrete-time baseband signal at the m D -th receive antenna can be written as: m S b m S ,r,p s n,r + v m D ,n,p (7) where h m D ,m S is the (m D , m S )-th element of channel matrix H ∈ C M D ×M S , y m D ,n,p and v m D ,n,p are typical elements of the received signal tensor Y ∈ C M D ×N×P and the noise tensor V ∈ C M D ×N×P , respectively.

Constructed Tucker-2 Model
Let us define c m S ,r,p = q p,m S b m S ,r,p , where c m S ,r,p is the typical element of the compound tensor C ∈ C M S ×R×P .So Equation ( 7) can be written as: By comparing Equation ( 4) with Equation ( 8), the received signal tensor Y ∈ C M D ×N×P of noiseless signals satisfies a Tucker-2 model , with the following correspondences: G, A (1) , A (2) Using the mode-n product representation, the model ( 8) can be written as: where S and H represent the two loading matrices, and C is the core tensor.Let us define , we can obtain four compact forms of the Tucker-2 model ( 11): with, and, In this paper, two following assumptions are satisfied.
(a) The antenna-to-slot allocation matrix Q does not have an all-zero column.This means that at least one transmit antenna is used during each time slot; (b) Both the transmitter and receiver know the allocation matrix Q and the coding tensor B.

Uniqueness Issue
Due to the loading matrices factors being unique up to nonsingular matrices, the generalized Tucker-2 model is not essentially unique.This consequence can be verified by using the property of the mode-n product: where the noise tensor V has been omitted for convenience of notation, Θ S ∈ C R×R and Θ H ∈ C M S ×M S are nonsingular matrices.
It is shown that applying the uniqueness theorem of the Tucker model in [25], if the core tensor C is known, then S and H are unique to a scaling ambiguity, i.e., (S, where S and H are alternative solutions for S and H, respectively, Θ S = βI R and Θ H = 1 β I M S .Consequently, the priori knowledge of only one symbol is enough to resolve this scaling ambiguity factor β. Compared to the PARAFAC model used in existing receivers, the constructed Tucker-2 model only needs a priori knowledge of one symbol to eliminate the scaling ambiguity.Therefore, our scheme has higher spectral efficiency.

Identifiability Conditions
The identifiability for the constructed Tucker-2 model is an assignable problem for recovering the parameters to be estimated.In this paper, it is directly linked to the estimation of the signal matrix S and channel matrix H from the received signal tensor Y. Conditions of parameter identifiability is given in the following theorem.

Theorem 1. (Sufficient Conditions):
Assuming that H has independent and identically distributed (i.i.d.) entries, and S has a full column rank.P 1 denotes the number of nonzero elements in Q.Then sufficient conditions for identifiability of signal matrix S and channel matrix H are: Proof of Theorem 1. From Equation (12) and Equation ( 13), necessary and sufficient conditions for identifiability of S and H requires that (I P ⊗ H) F 1 and (I P ⊗ S) F 2 have a full column rank, i.e.
Rank (( Rank (( Under the assumptions in Theorem 1 that H has i.i.d.entries, M D M S can ensure H has a full column rank.Since I P and H have a full column-rank, then I P ⊗ H has a full column rank, i.e., Rank (I P ⊗ H) = PM S .Therefore, Equation ( 21) is satisfied if F 1 has a full column rank.We rewrite F 1 from Equation ( 15) as: where Since S has a full column rank, we can deduce that Rank (I P ⊗ S) = PR.Thus, condition ( 22) is satisfied if F 2 has a full column rank.We rewrite F 2 from Equation ( 16) as: where . . ,D P (Q)] T .Recall that Q does not have an all-zero column, which means that Q P has a full column rank.We have that F 2 is full column rank if B 2 is full row rank.Since B 2 has the block diagonal structure and B ••p has different generators, R M S ensures that B 2 has full row rank.Therefore, R M S can ensure that condition (22) is satisfied.This ends the proof.
Remark 1.The conditions in Theorem 1 is sufficient but not necessary for parameter identifiability.Sufficient condition (21) and condition (22) also concern the ALS algorithm.In fact, identifiability of signal and channel parameters is possible in our simulation results when M S > R. Necessary conditions for parameter identifiability is based on the dimensions of (I P ⊗ H) F 1 and (I P ⊗ S) F 2 .If the channel matrix H does not have a full column or row rank, i.e., L < min (M S , M D ), where L is the rank of the channel matrix H. Thus, identifiability conditions of Theorem 1 are no longer applicable because of the low-rank property of H.However, we can also deduce identifiability conditions based on Equations ( 21) and ( 22), i.e., necessary and sufficient conditions for identifiability of S and H requires that (I P ⊗ H) F 1 and (I P ⊗ S) F 2 have full column rank.For this case, we will do further analysis in Section 5.

Semi-Blind Receiver
The ALS algorithm is a classical solution for fitting tensor models.However, it is well known that the ALS algorithm exhibits a convergence problem when collinearity is present in one or more modes [27,28].The LM algorithm is successfully used to fit the PARAFAC and PARATUCK2 models, adapted to collinearity problems, and provide quadratic convergence [19,20].As an iterative algorithm, the LM algorithm is also sensitive to initialization.Thus, the optimization of the initial value is important to improving the performance of the LM algorithm.
In this section, a novel semi-blind receiver based on the optimized LM algorithm is developed for joint symbol and channel estimation.The basic principle of the optimized LM algorithm is to first resort to a LSK approximation problem [29,30], based on the singular value decomposition (SVD) of rank-1 matrix to initialize the symbol and channel matrices, and then update these two matrices at the same time in each iteration.Finally, the modified singular value projection (SVP) based algorithm [31,32] is used to further improve the performance of channel estimation.
The proposed optimal initialization method is based on the Kronecker least squares algorithm, which exploits SVD-based rank-one approximations to get an initial estimation of S and H from their Kronecker matrix product.
By post-multiplying Equation ( 14) with , where Ŝ(0) and Ĥ(0) are initial estimates of S and H.According to the Theorem 2.1 in [29], we have: where Ξ = unvec(∆) ∈ C M D M S ×NR is a rank-one matrix, and ∆ ∈ C N M D RM S ×1 is, given that: In this case, the Kronecker product matrix Z has been rearranged into a rank-one matrix Ξ.
Applying SVD to the rank-one matrix Ξ, the vectors vec Ŝ(0) and vec Ĥ(0) can be estimated by using a rank-one approximation method, i.e., by computing its largest singular value and the corresponding left and right singular vectors.Ŝ(0) and Ĥ(0) are determined up to a scaling factor, which can be removed by setting s 1,1 = 1 as in [27,30].The detailed process is shown below.
By applying SVD to the rank-one matrix Ξ, we have: where Σ ∈ C M D M S ×NR is a diagonal matrix containing singular values of Ξ, U ∈ C M D M S ×M D M S and V ∈ C NR×NR are unitary matrices.Using the rank-one approximation of Ξ, we have: where σ •1 is the largest singular value, and U •1 and V •1 are the corresponding left and right singular vectors.Thus, vectors vec Ŝ(0) and vec Ĥ(0) can be estimated as: where α is the scalar factor.vec Ŝ(0) and vec Ĥ(0) are determined up to this scalar factor.In practical communication systems, this scalar factor α can be removed by setting s 1,1 = 1.Thus, the value α in this paper is equal to 1 when s 1,1 = 1.Note that we can also choose Equation ( 14) to implement the above optimal initialization procedure.Define a parameter vector stacking all the unknowns as: where , and Q = NR + M D M S .The cost function to be minimized is given by: where ỹm D ,n,p (u) is the typical element of the tensor Ỹ (u) ∈ C M D ×N×P , which denotes the output tensor in absence of noise.
denotes the vector of residuals and L = NPM D .Let the J ∈ C L×Q be the Jacobian matrix of z (u) with respect to u, and g be the gradient of φ (u) with respect to u. J and g are respectively defined by: The optimized LM algorithm consists in optimizing u (0) , and estimating u (i+1) at the (i + 1)-th iteration from u (i) at the i-th iteration via u (i+1) = u (i) + ∆u (i) .The step ∆u (i) ∈ C NPM D ×1 is updated by solving the following modified normal equations: where λ (i) is the damping parameter to ensure that ∆u (i) is a descent direction.The whole procedure of the optimized LM algorithm used in our semi-blind receiver is listed in Algorithm 1.
Otherwise, u (i+1) is invalid, and set λ (i+1) = τλ (i) and τ ← 2τ; Step 6. i ← i + 1; end Acquire S (∞) and H (∞) : Remove the scaling ambiguity: We can then build the blocks of J H J as follows: The terms J H u S J u S , J H u H J u H and J H u S J u H can be respectively written as: Similarly, the partitioned structure of u allows us to write g as the concatenation of the following two gradients: where g u S ∈ C NR×1 and g u H ∈ C M D M S ×1 are respectively given by: In Algorithm 1, the estimated matrix H (∞) is projected onto a low rank estimated matrix new by the SVP based algorithm when L < min (M S , M D ).Here H (∞)

H
, where β l denotes the l-th largest singular value of H (∞) , U (C) •l and V (C) •l are the corresponding left and right singular vectors.The overall complexity of the optimized LM algorithm mainly depends on the per-iteration complexity and the numbers of iterations.The per-iteration complexity of this algorithm can be estimated as O (NR + M D M S ) 3 .Since the antenna-to-slot allocation matrix and the coding tensor are fixed and known at the receiver, the convergence of the optimized LM algorithm is usually achieved in only a few iterations.The average number of iterations for the optimized LM algorithm will be further analyzed in Section 6.

Extension to Multi-User Massive Mimo Systems
In the following section, we show that the developed algorithm can be applied to multi-user massive MIMO systems with hybrid precoding architecture for joint symbol and channel estimation.We consider a fully-connected hybrid precoding architecture, which is the typical model of massive MIMO systems.The base station communicates with M users simultaneously, and each mobile station is equipped with M D antennas.The base station is equipped with M S antennas and M RF independent radio frequency chains to transmit R streams for M D receive antennas in each mobile station.In the considered downlink system, each symbol s n,r is coded by a three-dimensional baseband code b (M) m RF ,r,p followed by a radio frequency code e m S ,m RF in the base station.At the m-th (m = 1, . . ., M) mobile station, the discrete-time baseband signal at the m D -th receive antenna is written as: m D ,m S q p,m S e m S ,m RF b m RF ,r,p s n,r + v where, Following [33,34], we also adopt a geometric channel model with L m scatterers between the base station and the m-th mobile station, L m = 1, . . ., L M .Under this model, the channel matrix H (m) is expressed as: where α (m) l denotes the complex gain of l-th path, θ are respectively given by: where λ denotes the signal wavelength, and d is the distance between two neighboring antenna elements.Similar to the analysis of Section 3.1, the received signal tensor Y (m) of noiseless signal also satisfies the Tucker-2 model, and the proposed algorithm in Section 4 remains suitable for joint symbol and channel estimation at each mobile station.However, two points are important to note here.First, identifiability conditions of Theorem 1 are no longer applicable because of the low-rank property of H (m) .However, we can deduce new identifiability conditions based on Equations ( 21) and ( 22), i.e., necessary and sufficient conditions for identifiability of S and H (m) require that I P ⊗ H (m) F 1 and (I P ⊗ S) F 2 have full column rank.For convenience of analysis, we assume that the antenna-to-slot allocation matrix is all-ones matrix.Then, we have the following theorem.
Theorem 2. Assuming that the path gains of the low-rank channel H (m) are Rayleigh distributed, and N and R are large enough.Then sufficient conditions for identifiability of H (m) and S are: Proof of Theorem 2. The channel model H (m) is expressed as Equation ( 48).The rank of H (m) is L m , and the path gains of the H (m) are Rayleigh distributed.F 1 is a full rank matrix, which contains different generators.Consequently, min (PM D , PL m , PM S ) R ensures that (I P ⊗ H) F 1 have full column rank.Since H (m) is a low-rank, i.e., L m < min (M S , M D ), P R L m can ensure (I P ⊗ H) F 1 has the full column rank.Since N and R are large enough, and S has the random nature, the rank of S is equal to N or R.Moreover, F 2 is also a full rank matrix because of its special structure.We deduce that (I P ⊗ S) F 2 is full column rank if min (PN, PR)  M S , i.e., P max M S N , M S R .Therefore, condition (51) can ensure identifiability of H (m) and S.This ends the proof of Theorem 2.
Second, the low-rank property of the mmWave massive MIMO channel should be exploited.Due to very limited scattering of the mmWave channel and larger quantities of transmitting and receiving antennas, L m is usually less than M S and M D .Different from the conventional MIMO channel matrix that usually has full column or row rank, the rank of the mmWave massive MIMO channel matrix is much smaller than its dimension.This is called 'low-rank property' of the mmWave massive MIMO channel matrix.Therefore, the final part of the proposed Algorithm 1 takes advantage of this low-rank constraint rank H (m)  L m to further improve the estimation accuracy of the channel.

Simulation Results and Discussion
We studied the performance of the proposed semi-blind receiver through numerical simulations.The channel matrix H has independent and identically distributed (i.i.d.) complex Gaussian entries with zero-mean and unit variance.The default values of the system parameters are set to M S = M D = 4, and the antenna-to-slot allocation matrix is all-ones matrix.Throughout the simulation, the coding tensor C is known at the receiver.Quadrature phase-shift keying (QPSK) constellations are used to modulate the transmitted symbols.All results are averaged over 10,000 independent Monte Carlo simulations.As in [8,9], the signal-to-noise ratio (SNR) at the receiver is defined as: where Ỹ denotes the noise-free signal tensor (the tensor-of-interest) containing both symbol and channel parameters.For each channel realization, the normalized mean square error (NMSE) for different receivers is computed as , where Ĥ is the estimation of H at convergence.In the first example, we evaluate the convergence performance of the optimized LM algorithm, which is used in our semi-blind receiver.We assume the system design parameters N = P = 5 and R = 3.In Figure 2, the average value of the cost function is plotted versus the number of iterations, for three SNR values.We observe from Figure 2 that for each SNR value, the cost function decreases as the number of iterations increases until the algorithm converges.We can also see that for the same number of iterations, the cost function decreases as SNR increases.The proposed algorithm needs few iterations to converge.For instance, the optimized LM algorithm achieves convergence in about 10 iterations at the SNR of 20 dB.In the second example, we evaluated the estimation performance of the proposed semi-blind receiver in terms of bit error rate (BER) and the NMSE of channel estimation.In particular, we compared the PARAFAC-based receiver with KRST (P-KRST) coding scheme in [8] and the training-based receiver with the space-time (TB-ST) coding scheme.For the TB-ST scheme, the symbol matrix is composed of two parts as in [9], i.e., the training symbol matrix and the unknown data symbol matrix.N tr denotes the length of the channel training sequence in the TB-ST receiver.
The transmission rates for the proposed coding scheme and the KRST coding scheme are RN PN = R P and M S N PN = M S P (data symbols per symbol period), respectively.However, the KRST coding scheme needs to know the first column of signal matrix S to eliminate the scaling ambiguity, while the proposed coding scheme only needs to know s 1,1 to eliminate the scaling ambiguity.Thus, the efficient transmission rates for the proposed coding scheme and the KRST coding scheme are RN−1 PN and , respectively.To ensure a fair comparison, the proposed coding scheme and the KRST coding scheme should keep the same efficient transmission rate, i.e., N = M S −1 M S −R .Thus, the system design parameters in this example are set equal to M S = 4, P = 7, and R = N = 3.For TB-ST coding scheme, we divide P = P tr + P d , where blocks P tr = 2 and P tr = 5 are used for channel training and data transmitting.Therefore, the length of the channel training sequence in the TB-ST receiver is N tr = P tr N = 6.
The BER performance of different receivers versus SNR is shown in Figure 3.It can be seen that the proposed semi-blind receiver outperforms the P-KRST and TB-ST receiver.The NMSE performance of the different receivers is demonstrated in Figure 4.It can be seen from Figure 4 that the P-KRST receiver has the best performance of channel estimation, and the proposed semi-blind receiver yields a smaller NMSE compared with the TB-ST receiver.From [8], the per-iteration complexity in the PARAFAC based receiver is O (M S M D PN).The complexity of the TB-ST scheme can be estimated as O (N r M S (M D + N r ) + RPM D (N + R)).The per-iteration complexity of the proposed O-LM algorithm is given at the end of Section 4. The TB-ST scheme has the least computational complexity due to the use of the channel training sequence.Due to the adoption of the simple KRST coding scheme, the PARAFAC based receiver has lower complexity than that of the proposed receiver.However, the TB-ST receiver requires a long channel training sequence, the PARAFAC-based receiver needs to know the first column or row of the signal matrix to eliminate the scaling ambiguity, but the proposed receiver only needs to know one symbol of the signal matrix.
In the third example, we evaluated and compared the performance of the traditional ALS (T-ALS) and optimized LM (O-LM) algorithms.We assume the system design parameters N = P = L and R = 5.Correlated MIMO channel is considered in this example, and the channel matrix H is modeled as in [35], where ρ denotes the normalized correlation coefficient with magnitude |ρ| ≤ 1.We consider ρ = 0 (non-correlation) and ρ = 0.8 (strong correlation), respectively.For each Monte Carlo run, the T-ALS algorithm is initialized with ten different random matrices as in [20,36].The estimation performance is evaluated after selecting the best initialization, which is the one that results in the minimum value of δ (j) .We observe from Figure 5 that the T-ALS and O-LM algorithms give a similar BER and NMSE performance, which means that these two algorithms converge to the same point.For the right subfigure of Figure 5, the NMSE of the T-ALS and O-LM algorithms is also shown in Table 1 for the sake of comparison.We can also observe from Figure 5 that for these two algorithms, BER and NMSE performance degrade when the channel becomes strongly correlated.The overall complexities of the O-LM algorithm and the ALS algorithm depend on the per-iteration complexity and the number of iterations.The per-iteration complexity of the O-LM algorithm is higher than that of the T-ALS algorithm.However, because of the robustness of the O-LM algorithm, the O-LM algorithm needs fewer iterations compared with the T-ALS algorithm.Therefore, the proposed algorithm has lower complexity compared with the existing T-ALS algorithm.The mean processing times required in the T-ALS and O-LM algorithms are shown in Figure 6.We observe that the mean processing time required in the O-LM algorithm is shorter than that of the T-ALS algorithm, especially when the channel becomes strongly correlated.From Figure 6 we can also observe that the advantage of the O-LM algorithm is obvious as L decreases from 8 to 7 compared with the T-ALS algorithm.In the fourth example, the influence of design parameters (P, R) for the proposed receiver is studied.In the left subfigure of Figure 7, it can be seen that the BER decreases when P increases, which expounds the performance gain brought by the time diversity.It can also be seen from this subfigure that the BER increases as the number of data streams R increases.The impact of design parameters (P, R) on the NMSE performance is shown in the right subfigure of Figure 7.As expected, we can observe that the NMSE decreases linearly as a function of P, and increases as R increases.Hence, appropriate values for the design parameters P and R can be selected according to requirements of the system performance and transmission rate.
In the fifth example, we assume M S = 3, R = 4, and N = P = 8 for our semi-blind receiver.The influence of the receive antenna was analyzed.We also compared the performance of our chosen coding tensor (OCCT) B with the random coding tensor (RCT) whose entries are circularly-symmetric Gaussian random variables.In Figure 8, it can be seen that both the BER and NMSE decrease when M D increases, which expounds the performance gain brought by the receive diversity.We also observed from Figure 8 that OCCT has a better performance than RCT.Although OCCT is suboptimal, this choice has good symbol and channel identifiability properties, which is advantageous from a receiver design viewpoint.In the sixth example, we studied the estimation performance of two different transmission schemes for our semi-blind receiver.The default values of the system parameters were set to M D = 5 and N = 6.In scheme 1, we assume M S = 2, R = 5, and P = 6.Three different antenna-to-slot allocation matrices are given as follows: In scheme 2, we assume M S = 3, R = 3, and P = 7.Three different antenna-to-slot allocation matrixes are given as follows: The BER and NMSE performance of the proposed receiver for different schemes is shown in Figure 9.For scheme 1, the proposed receiver with Q 2 has a better BER and NMSE performance than that of the proposed receiver with Q 1 .The reason is that the allocation matrix Q 2 provides a higher transmit spatial diversity gain than the allocation matrix Q 1 .For the same reason, the allocation matrix Q 3 outperforms Q 2 , and the allocation matrix Q 5 outperforms Q 4 .We also observe in Figure 9 that scheme 2 has a better BER and NMSE performance than scheme 1.The reason is that scheme 2 can provide a higher coding diversity than scheme 1.It is worth noting that scheme 1 has higher spectral efficiency compared with scheme 2. The transmission rates for scheme 1 and scheme 2 are about 5/6 and 3/7 (data symbols per symbol period), respectively.In summary, a desired tradeoff between estimation performance and transmission rate can be obtained by designing a suitable scheme.In final example, the multi-user massive MIMO system with a fully-connected hybrid precoding architecture was considered, where M S = 48, M × M D = 6 × 6, and L m = 2 for all m = 1, . . ., M. The carrier frequency of this system is set as 28 GHz [37], and d = λ/2.We assume that AoAs/AoDs are uniformly distributed in [0, 2π].For the considering multi-user massive MIMO system, we also evaluate the estimation performance of the proposed receiver in terms of BER and NMSE of channel estimation.It can be seen from Figures 10 and 11 that the BER and NMSE of the proposed semi-blind receiver decrease as P and N increase, and increase as R increases.The increase of P will reduce the transmission rate, but the increase or decrease of N has no effect on the transmission rate.That means that we can improve the estimation performance of the proposed semi-blind receiver by increasing N if the channel is constant over a long time interval before changing to another realization.We also observed from Figures 10 and 11 that the proposed semi-blind receiver still has a good performance for joint symbol and channel estimation even in a shorter length of code and information symbol, and a larger number of data streams, i.e., P = 24, N = 6, and R = 12.

Conclusions
We have developed a robust semi-blind receiver combined with the Tucker-2 model in multiple-antenna systems.The proposed receiver could jointly estimate the information symbol and channel parameters.Compared with existing semi-blind receivers, the proposed one gave better estimation performance, and had a higher spectral efficiency Moreover, the proposed semi-blind receiver was also applicable to multi-user massive MIMO systems.Perspectives of this work include an extension to relay-assisted massive MIMO systems by applying the antenna allocation matrix at the relays.Since both the source-relay and the relay-destination channel matrices have low-rank property, new identifiability conditions and efficient fitting algorithms will be deduced and developed, respectively.Another perspective considers extending the proposed robust semi-blind receiver into mmwave MIMO systems for joint channel parameter estimation, which includes AOAs, fading coefficients and time delays [38,39].

Figure 1 .
Figure 1.Block-diagram of the system model.

Algorithm 1
the n-th and m D -th column vectors of the identity matrices I N and I M D , respectively.The optimized LM algorithm First stage: • Compute the LS estimate of Z: Z = Y 3 (F 3 ) † ; where e m S ,m RF and h (m) m D ,m S are (m S , m RF )-th and (m D , m S )-th elements of the radio frequency precoder matrix E ∈ C M S ×M RF and the massive MIMO channel matrix H (m) ∈ C M D ×M S , respectively.y (m) m D ,n,p is the typical element of the received signal tensor Y (m) ∈ C M D ×N×P .Then Equation (45) can be rewritten as: th 's azimuth angles of arrival and departure (AoAs/AoDs) of the mobile station and base station, respectively.Λ MS θ transmit antenna array at a specific AoA and AoD, respectively.Finally, a BS φ (m) l and a MS θ (m) l are the steering vectors at the base station and mobile station, respectively.If uniform linear arrays are considered, the steering vectors a BS φ (m) l and a MS θ (m) l

Figure 2 .
Figure 2. Cost function versus the number of iterations.

Figure 3 .
Figure 3. Bit error rate (BER) performance of different receivers versus signal-to-noise ratio (SNR).

Figure 5 .
Figure 5. BER and NMSE performance of traditional alternating least squares (T-ALS) and O-LM algorithms for different L and ρ.

Figure 6 .
Figure 6.The mean processing times required in T-ALS and O-LM algorithms versus SNR.

Figure 7 .
Figure 7. BER and NMSE performance of the proposed receiver for different P and R.

Figure 8 .
Figure 8. Influence of the receive antenna and the coding tensor.

Figure 9 .
Figure 9. Performances of the proposed receiver for different schemes.

Figure 10 .
Figure 10.BER performance of the proposed receiver for the multi-user massive multiple-input multiple-output (MIMO) system.

Figure 11 .
Figure 11.NMSE performance of the proposed receiver for the multi-user massive MIMO system.

Table 1 .
NMSE of the T-ALS and O-LM algorithms.