Worst-Case Robust Training Design for Correlated MIMO Channels in the Presence of Colored Interference

Kang, Jae-Mo; Yun, Sangseok

doi:10.3390/math13132168

Open AccessArticle

Worst-Case Robust Training Design for Correlated MIMO Channels in the Presence of Colored Interference

by

Jae-Mo Kang

¹

and

Sangseok Yun

^2,*

¹

Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Republic of Korea

²

Department of Information and Communications Engineering, Pukyong National University, Busan 48513, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(13), 2168; https://doi.org/10.3390/math13132168

Submission received: 7 May 2025 / Revised: 12 June 2025 / Accepted: 25 June 2025 / Published: 2 July 2025

(This article belongs to the Special Issue Advanced Algorithms in Wireless Communication and Internet of Things (IoT), 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The covariance information at the transmitter side is often subject to mismatches due to various impairments. This paper considers a training design problem for multiple-input multiple-output (MIMO) systems when both channel and interference covariance matrices are imperfect at the transmitter side. We first derive the structure of the optimal training signal, minimizing the worst-case mean square error (MSE). With the training structure, the original problem becomes a simple power allocation problem. We propose a numerical optimal power allocation scheme and a closed-form suboptimal power allocation scheme. Simulation results show that the proposed schemes considerably outperform the conventional schemes in terms of the worst-case MSE and bit error rate (BER) performances, and the proposed closed-form training scheme has comparable performance to that of the optimal one. For example, the proposed schemes yield more than 2.5 dB signal-to-interference ratio (SIR) gains at a BER of

10^{- 4}

.

Keywords:

imperfect covariance matrix; MIMO channel estimation; minimax approach; robust training optimization; worst-case robustness

MSC:

94A05

1. Introduction

In the past decades, it has been shown that the use of multiple antennas at both ends of a communication link, so called multiple-input multiple-output (MIMO), can provide either diversity gain or multiplexing gain [1,2]. For this reason, MIMO techniques have been employed in many system standards such as IEEE 802.11 for wireless local area networks (WLANs), IEEE 802.16, and 3rd Generation Partnership Project Long Term Evolution Advanced (3GPP LTE-A) for wireless communications, as a solution to demands for high data rates or low error rates [3,4]. To fully exploit the advantages that MIMO systems offer, multiple channel elements need to be accurately estimated. For example, in IEEE 802.11ac [5], accurate channel state information (CSI) at both ends is typically required for both transmit beamforming and coherent detection. Therefore, accurate channel estimation is an important problem for practical MIMO systems.

1.1. Prior Works and Limitations

In most current standards such as WLAN and WiMAX, unknown channel parameters are usually estimated at the receiver by sending a priori known training (or pilot) symbols from the transmitter [5,6]. Even though an arbitrary training signal can be used for channel estimation, it is possible to significantly enhance the estimation performance by optimizing the training symbols based on some prior knowledge on the channel and noise (or possibly interference) statistics. For this reason, training design problems have received considerable interest in recent years [7,8,9,10,11,12]. Most of these works used the mean square error (MSE) of channel estimation as a performance metric to improve the estimation accuracy [7,8,9,10,11]. On the other hand, another criterion, maximization of the conditional channel entropy, was considered in [12]. In the traditional training schemes [7,8,9,10,11,12], the perfect knowledge of both the channel and noise covariance matrices at both ends was commonly required for optimal design.

In practice, the covariance information about the channel and noise should be estimated at the receiver from received samples in recent consecutive training blocks [9,10]. The covariance information at the receiver side is reasonably assumed to be perfect when there is a sufficient number of samples [9,10]. Whereas the covariance information at the transmitter side is usually acquired by means of limited feedback from the receiver, which is often called covariance feedback [13,14]. Due to feedback-related issues, e.g., quantization, feedback errors, and delay, the covariance information at the transmitter is imperfect in general [15,16,17]. For this reason, it might be more practical to take such imperfection into account for training designs. In the literature, there are few studies concerning this issue [18,19,20].

Except for the work by Shariati et al. [20], all previous works have considered the white noise case only. However, the assumption of white noise is not always true in some practical scenarios. For example, in multi-user or small-cell environments, the noise term is no longer white due to the presence of non-negligible interference [21]. Also, for several applications such as mesh networking in IEEE 802.11s [22] and relay-aided systems such as IEEE 802.11ah [23] and IEEE 802.16j [24], total noise is often colored because the relays broadcast background noise as well as received data streams. Although the work in [20] considered a colored interference scenario, this work assumed full knowledge of the interference covariance at the transmitter side. This assumption is suitable if the covariance mismatch of the interference is sufficiently small compared with that of the channel. When the interference covariance matrix is subject to substantial errors at the transmitter, however, the scheme in [20] may become ineffective since it cannot properly deal with the interference covariance uncertainty. Moreover, since the iterative algorithm in [20] was designed with a heuristic approach, it may require considerable complexity during the channel training phase.

1.2. Motivations and Contributions

Motivated by the aforementioned discussions and to break through the limitations of the existing techniques, in this paper, we develop a novel and high-performing training strategy for the estimation of correlated MIMO channels in the presence of colored interference based on the worst-case robustness philosophy. The detailed technical contributions of our work are described below:

We propose a general framework for robust training optimization under imperfect channel and interference covariance information at the transmitter. Particularly, we design an optimal training signal for MIMO systems with interference, taking the imperfection of both the channel and interference covariance into account.
In our proposed framework, the worst-case MSE criterion is used as a performance metric similar to previous works [18,19,20]. In contrast to the previous problems, however, the considered design problem is not convex–concave due to the uncertainty in the interference covariance, and consequently the design of the training signal is more complicated. To solve the problem, we take innovative approaches: we initially derive an optimal structure of the training signal, which includes the existing training structures as special cases, and then we solve the training power allocation problem.
Two power allocation schemes are proposed. First, an optimal power allocation is determined numerically by finding an optimal solution. Next, to reduce the complexity required for optimal power allocation, a closed-form power allocation scheme is proposed by finding a suboptimal solution.
Based on the latter power allocation strategy, we also propose a suboptimal, yet closed-form, training scheme with low complexity.
We compare the performance of the proposed schemes with that of the conventional schemes by simulations. Through numerical results, we empirically demonstrate that the proposed schemes substantially surpass the existing schemes with remarkable performance improvements and the proposed suboptimal training scheme provides comparable performance to that of the optimal training scheme.

1.3. Organization and Notation

The organization of this paper is as follows. In Section 2, the system model considered is described and an optimization problem considered is formulated. The methods of the proposed training schemes are described in Section 3. The simulation results for the performance comparison are presented in Section 4. We conclude the paper in Section 5.

Notation:

{(\cdot)}^{T}

,

{(\cdot)}^{*}

,

{(\cdot)}^{H}

, ⊗,

vec (\cdot)

, and

Tr (\cdot)

denote the transpose, conjugate, conjugate transpose, Kronecker product, vectorization, and trace operators, respectively.

‖ \cdot ‖_{F}

denotes the Frobenius norm of a matrix. The notation

A ⪰ 0

means that a Hermitian matrix

A

is positive semi-definite. The Cartesian product of two sets A and B is denoted as

A \times B

. The notation

Diag (a_{1}, \dots, a_{m})

and

Blkdiag (A_{1}, \dots, A_{m})

denotes a diagonal matrix whose diagonal elements are given by

a_{1}, \dots, a_{m}

and a block diagonal matrix whose diagonal blocks are given by the matrices

A_{1}, \dots, A_{m}

, respectively.

\nabla_{A}

and

\nabla_{A}^{2}

represent the gradient and Hessian operators with respect to (w.r.t.) the variable

A

.

E (\cdot)

denotes the expectation operator, and a circular symmetric Gaussian random vector

a

with mean

\bar{a}

and covariance matrix

A

is denoted by

a \sim C N (\bar{a}, A)

.

2. System Model and Problem Formulation

2.1. System Model

As depicted in Figure 1, we consider a MIMO system consisting of a transmitter equipped with

M_{t}

antennas, a receiver equipped with

M_{r}

antennas, and a total of K interferers, where the kth interferer has

M_{k}

antennas,

k = 1, \dots, K

. It is assumed that the background additive noise is much weaker than the interference, and hence we ignore the additive noise term for convenience (this is valid in practice as the noise power ranges from −192.5 dBm/Hz to −174 dBm/Hz, whereas the interference signal power ranges from −100 dBm to −10 dBm). In the channel training procedure, multiple training symbols are sent from the transmitter during L symbol times. The received signal matrix is then given by

Y = H P + \underset{= N}{\underset{︸}{\sum_{k = 1}^{K} H_{k} S_{k}}} = H P + N,

(1)

where

P \in C^{M_{t} \times L}

and

S_{k} \in C^{M_{k} \times L}

represent the training signal matrix sent from the transmitter and the interfering signal matrix sent from the kth interferer, respectively. We simply denote the total interference as

N

. The channel matrix between the transmitter and receiver is denoted by

H \in C^{M_{r} \times M_{t}}

and that between the kth interferer and receiver by

H_{k} \in C^{M_{r} \times M_{k}}

. Without loss of generality, it is assumed that there exists spatial correlation at all nodes. According to the well-known Kronecker model [25], we represent the channels and interfering signals as follows (the analysis in our work is not confined to a specific distribution of the channel matrices, but valid for any distribution, as the proposed method requires only the knowledge of the covariance information of the channel matrices, not their probability distribution):

H = R^{1 / 2} H_{w} T^{1 / 2},

(2)

H_{k} = R^{1 / 2} H_{w, k} T_{I, k}^{1 / 2}, k = 1, 2, \dots, K,

(3)

S_{k} = Ψ_{s, k}^{1 / 2} S_{w, k} Ψ_{τ, k}^{1 / 2}, k = 1, 2, \dots, K,

(4)

where the elements of the matrices

H_{w}

,

{H_{w, k}}_{k = 1}^{K}

, and

{S_{w, k}}_{k = 1}^{K}

are independent and identically distributed (i.i.d.) as

CN (0, 1)

.

T ⪰ 0

,

R ⪰ 0

, and

T_{I, k} ⪰ 0

are the spatial correlation matrices at the transmitter, receiver, and kth interferer, respectively.

Ψ_{τ, k} ⪰ 0

and

Ψ_{s, k} ⪰ 0

, respectively, represent the temporal and spatial correlations of the interfering signal. Taking the vectorizing operation on both sides of (1), the received signal can be rewritten in vector form as

vec (Y) = y = (P^{T} \otimes I_{M_{r}}) h + n,

(5)

where

h = vec (H)

and

n = vec (N) = \sum_{k = 1}^{K} (I_{L} \otimes H_{k}) vec (S_{k}) .

For the design purpose, we assume that the channel and interference covariance matrices are perfectly known at the receiver side (The issue with imperfect covariance information at the receiver side was studied in [20]. In [20] (Remark 2), it was shown that when the covariance information is imperfect at the receiver side, the resulting channel estimation MSE has a similar mathematical expression to (12), with an additional (negligible) loss term. In [20], it was also empirically demonstrated that the imperfect covariance information at the receiver side has a negligible impact on the performance, whereas that at the transmitter side has a much more significant impact).

We consider the linear minimum MSE (LMMSE) channel estimation from the observation vector

y

in (5). In classical estimation theory [26], it is well known that the LMMSE estimate of the channel

h

is given by

\hat{h} = C_{h} (P^{*} \otimes I_{M_{r}}) {[C_{n} + (P^{T} \otimes I_{M_{r}}) C_{h} (P^{*} \otimes I_{M_{r}})]}^{- 1} y,

(6)

where

C_{h} = E [h h^{H}] = T^{T} \otimes R

and

C_{n} = E [n n^{H}] = Q^{T} \otimes R

are the channel and interference covariance matrices, respectively. Also, we have

Q = \sum_{k = 1}^{K} Tr (Ψ_{s, k} T_{I, k}) Ψ_{τ, k}

to represent the interference correlation at the transmitter side. In practice, the matrix inversion for the LMMSE channel estimation in (6) can be approximated or replaced by several low-complexity alternatives suggested in [27] based on the matrix polynomial expansion with arbitrary degrees of freedom. This approach indeed significantly reduces the computational burden of the matrix inverse operation from a cubic complexity to a square complexity. The MSE of the LMMSE channel estimate

\hat{h}

given the parameters

P

,

T

,

R

, and

Q

can be obtained as [26]

\begin{matrix} f (P, T, R, Q) & = E \{∥ h - \hat{h} ∥^{2}| P, T, R, Q\} \\ = Tr [{(T^{- 1} + P Q^{- 1} P^{H})}^{- 1} \otimes R] \\ = Tr (R) \cdot Tr [{(T^{- 1} + P Q^{- 1} P^{H})}^{- 1}] . \end{matrix}

(7)

Remark 1.

When the additive noise term

W \in C^{M_{r} \times L}

is considered (i.e., not ignored), the total interference-plus-noise term can be written as

N = \sum_{k = 1}^{K} H_{k} S_{k} + W .

Suppose that

S_{k}

,

\forall k

, and

W

are independent of each other and that the covariance matrix of

W

takes the following form:

E [vec (W) {vec}^{H} (W)] = Φ^{T} \times R .

Then, it still follows that

C_{n} = E [n n^{H}] = Q^{T} \otimes R

, with

Q

given by

Q = \sum_{k = 1}^{K} Tr (Ψ_{s, k} T_{I, k}) Ψ_{τ, k} + Φ .

In a similar fashion, it is also possible to cover a multi-user or multi-cell scenario by treating

W

as the intra-cell or inter-cell interference, respectively.

2.2. Problem Formulation

In practice, the matrices

T

,

R

, and

Q

are imperfect at the transmitter side due to feedback-related issues [15,16,17]. Considering such imperfection, we mathematically model the correlation uncertainties as follows:

T = \hat{T} + \tilde{T} \in T, R = \hat{R} + \tilde{R} \in R and Q = \hat{Q} + \tilde{Q} \in Q,

(8)

respectively, where

\hat{T}

,

\hat{R}

, and

\hat{Q}

are imperfect estimates of

T

,

R

, and

Q

, respectively, and

\tilde{T}

,

\tilde{R}

, and

\tilde{Q}

represent the corresponding correlation error matrices. The uncertainty sets are denoted by

T

,

R

, and

Q

, and they are, respectively, defined as

T = \{\tilde{T} : {∥ \tilde{T} ∥}_{F}^{2} \leq ϵ_{T}, \hat{T} + \tilde{T} ⪰ 0\},

(9)

R = \{\tilde{R} : {∥ \tilde{R} ∥}_{F}^{2} \leq ϵ_{R}, \hat{R} + \tilde{R} ⪰ 0\},

(10)

and

Q = \{\tilde{Q} : {∥ \tilde{Q} ∥}_{F}^{2} \leq ϵ_{T}, \hat{Q} + \tilde{Q} ⪰ 0\},

(11)

where the parameters

ϵ_{T}

,

ϵ_{R}

, and

ϵ_{Q}

denote the spherical radii of

T

,

R

, and

Q

, respectively, and they are related to the quantization step size and equal error contour [15,16,17]. To guarantee certain performance of the channel estimation for any possible uncertainty on the covariance information, in this paper, we adopt the spherical uncertainty model for the covariance errors as this model is the most uncertain among the uncertainty models. Even when other uncertainty models (e.g., bounded spectral norms or element-wise errors) are adopted, similar results or conclusions to those for the spherical uncertainty model in this paper can still be drawn.

Using (8), we can rewrite (7) as

J (P, \tilde{T}, \tilde{R}, \tilde{Q}) = Tr (\hat{R} + \tilde{R}) \cdot Tr [{\{{(\hat{T} + \tilde{T})}^{- 1} + P {(\hat{Q} + \tilde{Q})}^{- 1} P^{H}\}}^{- 1}] .

(12)

Note from (12) that the estimation MSE is a function of the known parameter

P

as well as the unknown parameters

(\tilde{T}, \tilde{R}, \tilde{Q})

. To deal with the unknown parameters, we follow the widely used concept of the worst-case robustness [15,16,17]. Specifically, we use the worst-case MSE,

J^{⋆} (P) = \max_{(\tilde{T}, \tilde{R}, \tilde{Q}) \in T \times R \times Q} J (P, \tilde{T}, \tilde{R}, \tilde{Q}),

(13)

as a design criterion. Considering the training power constraint, an optimization problem of interest can then be formulated as follows:

\begin{matrix} \min_{P} & \max_{T, R, Q} J (P, \tilde{T}, \tilde{R}, \tilde{Q}) \end{matrix}

(14a)

\begin{matrix} s . t . & Tr (P P^{H}) \leq P_{T}, \end{matrix}

(14b)

\begin{matrix} \tilde{T} \in T, \tilde{R} \in R, \tilde{Q} \in Q, \end{matrix}

(14c)

where

P_{T}

denotes the total transmit power of the training signal. It can be shown that problem (14) is not convex–concave (the problem

\min_{a \in A} \max_{b \in B} φ (a, b)

is convex–concave if the constraint sets

A

and

B

are all convex, and the objective function

φ

is a convex function of the minimization variable

a

and a concave function of the maximization variable

b

[28,29]) due to the non-convexity of the worst-case MSE

J^{⋆}

in (13) w.r.t.

P

. For this reason, it is more complicated to tackle the problem (14) than the problems in [18,19,20], which are all convex–concave. For example, the conventional iterative scheme in [20] does not work well since it may converge to a local optimal solution.

3. Training Signal Optimization

In this section, we optimize the training signal by solving problem (14). Specifically, a closed-form structure of the worst-case MSE minimizing training signal is initially obtained by deriving the structure of the optimal matrix solution for the problem (14). From the structure, the original problem (14) involving the complex-valued matrices becomes a simple power allocation problem involving the real-valued scalar variables only. Thereafter, optimal and suboptimal power allocation schemes are proposed. The optimal power allocation is determined numerically and the suboptimal power allocation is obtained in a closed-form with low complexity.

3.1. Worst-Case MSE Minimizing Training Structure

Let

(P^{⋆}, {\tilde{T}}^{⋆}, {\tilde{R}}^{⋆}, {\tilde{Q}}^{⋆})

be an optimal solution to the minimax problem (14). Then, the structures of the optimal training signal

P^{⋆}

and the worst-case correlation errors

({\tilde{T}}^{⋆}, {\tilde{R}}^{⋆}, {\tilde{Q}}^{⋆})

are derived in the following theorem.

Theorem 1.

Let

\hat{T} = U_{T} Λ_{T} U_{T}^{H}

and

\hat{Q} = U_{Q} Λ_{Q} U_{Q}^{H}

be the eigenvalue decompositions (EVDs) of the matrices

\hat{T}

and

\hat{Q}

, respectively, where

Λ_{T} = Diag (λ_{T, 1}, \dots, λ_{T, M_{t}})

and

Λ_{Q} = Diag (λ_{Q, 1}, \dots, λ_{Q, L})

are diagonal matrices whose diagonal entries are the eigenvalues of

\hat{T}

and

\hat{Q}

, respectively. The columns of the unitary matrices

U_{T}

and

U_{Q}

consist of the eigenvectors of

\hat{T}

and

\hat{Q}

, respectively. For massive MIMO systems, there are several efficient methods or low-complexity approximations to perform the EVDs of covariance matrices with large sizes. These include the power iteration, Lanczos algorithm, Arnoldi iteration, randomized EVD, Jacobi–Davidson, locally optimal block preconditioned conjugate gradient (LOBPCG), etc. Other matrix-free computation approaches based on matrix–vector multiplications can also be used. The optimal training signal

P^{⋆}

, minimizing the worst-case MSE, has the following structure (this solution generally requires very low signaling overheads for the feedback transmission; specifically, only

M_{t}

real numbers and

M_{t}

integers need to be fed back to construct the training signal matrix at the transmitter [10]; accordingly, the limited feedback issue may not be a serious design concern in practice):

P^{⋆} = U_{T} D_{P}^{⋆} U_{Q}^{H},

(15)

and the worst-case correlation error matrices

({\tilde{T}}^{⋆}, {\tilde{R}}^{⋆}, {\tilde{Q}}^{⋆})

, respectively, have the following structures:

{\tilde{T}}^{⋆} = U_{T} D_{T}^{⋆} U_{T}^{H}, {\tilde{R}}^{⋆} = \sqrt{\frac{ϵ_{R}}{M_{r}}} I_{M_{r}}, {\tilde{Q}}^{⋆} = U_{Q} D_{Q}^{⋆} U_{Q}^{H},

(16)

where

D_{P}^{⋆} \in R^{M_{t} \times L}

is a rectangular diagonal matrix containing non-negative elements on its main diagonal, and

D_{T}^{⋆} \in R^{M_{t} \times M_{t}}

and

D_{Q}^{⋆} \in R^{L \times L}

are real diagonal matrices. It thus can be inferred that the worst case corresponds to a diagonal channel covariance matrix and the best case corresponds to a channel covariance matrix whose main diagonal values are close to zero and other elements are close to each other.

Proof.

See Appendix A. □

The optimal training structure (15) means that the transmit directions of the training signal should be matched to the eigenvectors of the estimated correlation matrices at the transmitter side, whereas the training power has to be allocated according to the worst-case eigenvalues of the imperfect correlation matrices. This observation actually results from the fact that the uncertainty sets in (9)–(11) do not contain any directional information. Next, the worst-case value of the receiver correlation error seen by the transmitter is given by the scaled identity matrix, i.e., equal perturbation. This is due to the Kronecker model, in which the correlation information at the transmitter and receiver sides can be separable [25]. Finally, it is important to note that the number of real-valued variables to be optimized is reduced from

2 (M_{t} L + M_{t}^{2} + M_{r}^{2} + L^{2})

to

\min {M_{t}, L} + M_{t} + L

with the proposed structures in (15) and (16).

Remark 2.

When the interference is temporally white and its variance information is perfectly known, the result (15) becomes

P^{⋆} = U_{T} D_{P}^{⋆}

, or equivalently,

P^{⋆} P^{⋆ H} = U_{T} Λ_{P}^{⋆} U_{T}^{H}

, where

Λ_{P}^{⋆} = D_{P}^{⋆} D_{P}^{⋆ T}

, which is consistent with the result derived in [19].

Remark 3.

When

Q

is perfect, but

T

is imperfect, which is the case considered in [20], the training structure in (15) becomes

P^{⋆} = U_{T} D_{P}^{⋆} V_{Q}^{H}

, where the columns of

V_{Q}

consist of the eigenvectors of

Q

.

3.2. Optimal Power Allocation

In this section, we determine the matrices

D_{P}^{⋆}

,

D_{T}^{⋆}

, and

D_{Q}^{⋆}

. Substituting the structures

P = U_{T} D_{P} U_{Q}^{H}

,

\tilde{T} = U_{T} D_{T} U_{T}^{H}

, and

\tilde{Q} = U_{Q} D_{Q} U_{Q}^{H}

into (14), we can rewrite the estimated MSE J in (12) as

ζ (d_{P}, d_{T}, d_{Q}) = β \sum_{i = 1}^{ν} {(\frac{1}{λ_{T, i} + d_{T, i}} + \frac{d_{P, i}}{λ_{Q, i} + d_{Q, i}})}^{- 1} + β \sum_{i = ν + 1}^{M_{t}} (λ_{T, i} + d_{T, i}),

(17)

where

β = Tr (\hat{R}) + \sqrt{M_{r} ϵ_{R}}

and

ν = \min {M_{t}, L}

denotes the maximum rank of

P

. Also,

d_{P} = {[d_{P, 1}, \dots, d_{P, ν}]}^{T},

d_{T} = {[d_{T, 1}, \dots, d_{T, M_{t}}]}^{T},

and

d_{Q} = {[d_{Q, 1}, \dots, d_{Q, L}]}^{T}

are the vectors of the diagonal elements of

D_{P} D_{P}^{T}

,

D_{T}

, and

D_{Q}

, respectively. From (17), the optimal training power allocation

d_{P}^{⋆}

can be obtained by solving the following problem:

\begin{matrix} \min_{d_{P}} & \max_{d_{T}, d_{Q}} ζ (d_{P}, d_{T}, d_{Q}) \end{matrix}

(18a)

\begin{matrix} s . t . & d_{P} \in D_{P}, d_{T} \in D_{T}, d_{Q} \in D_{Q}, \end{matrix}

(18b)

where the constraint sets

D_{P}

,

D_{T}

, and

D_{Q}

are defined as

D_{P} = \{d_{P} : \sum_{i = 1}^{ν} d_{P, i} \leq P_{T}, d_{P, i} \geq 0, i = 1, \dots, ν\},

D_{T} = \{d_{T} : {∥ d_{T} ∥}^{2} \leq ϵ_{T}, λ_{T, i} + d_{T, i} \geq 0, i = 1, \dots, M_{t}\}

and

D_{Q} = \{d_{Q} : {∥ d_{Q} ∥}^{2} \leq ϵ_{Q}, λ_{Q, i} + d_{Q, i} \geq 0, i = 1, \dots, L\},

respectively. In the following lemma, we show that the cost function of the problem (18) is convex–concave.

Definition 1

(Convex–concave function). [28,29] We say the function

φ (a, b) : A \times B \subset R^{n} \times R^{m} \to R

is convex–concave if

φ (a, b)

is a convex function of

a \in A

for fixed

b \in B

and a concave function of

b \in B

for fixed

a \in A

.

Lemma 1.

The MSE ζ in (17) is convex in

d_{P} \in D_{P}

for fixed

(d_{T}, d_{Q}) \in D_{T} \times D_{Q}

, and concave in

d_{T} \in D_{T}

and

d_{Q} \in D_{Q}

for fixed

d_{P} \in D_{P}

.

Proof.

See Appendix B. □

Lemma 1 implies that the power allocation problem (18) is convex–concave since the constraint sets

D_{P}

,

D_{T}

, and

D_{Q}

are all convex. Unfortunately, it is not possible to find a closed-form solution in general due to the nonlinearity of the objective function and norm constraints on

d_{T}

and

d_{Q}

. However, the optimal power allocation

d_{P}^{⋆}

can be computed numerically by using well-known methods such as the interior point method or barrier method [28,30].

3.3. Suboptimal Power Allocation in Closed-Form

The numerical approach for optimal power allocation may be undesirable in real applications due to a non-negligible complexity during the channel training phase. Considering this problem, we propose a suboptimal power allocation in a closed-form. Since the power allocation problem (18) is convex–concave, we can interchange the minimum and maximum operators of the problem (18) according to [31] (Lemma 36.2)

as follows:

\begin{matrix} \max_{d_{T}, d_{Q}} & \min_{d_{P}} ζ (d_{P}, d_{T}, d_{Q}) \end{matrix}

(19a)

\begin{matrix} s . t . & d_{P} \in D_{P}, d_{T} \in D_{T}, d_{Q} \in D_{Q} . \end{matrix}

(19b)

Defining

x_{T} = {[λ_{T, 1} + d_{T, 1}, \dots, λ_{T, M_{t}} + d_{T, M_{t}}]}^{T}

and

x_{Q} = {[λ_{Q, 1} + d_{Q, 1}, \dots, λ_{Q, L} + d_{Q, L}]}^{T},

problem (19) can be equivalently reformulated as

\begin{matrix} \max_{x_{T}, x_{Q}} & \min_{d_{P}} g (d_{P}, x_{T}, x_{Q}) \end{matrix}

(20a)

\begin{matrix} s . t . & d_{P} \in D_{P}, x_{T} \in X_{T}, x_{Q} \in X_{Q}, \end{matrix}

(20b)

where the cost function g in (23a) is

g (d_{P}, x_{T}, x_{Q}) = β \sum_{i = 1}^{ν} {(\frac{1}{x_{T, i}} + \frac{d_{P, i}}{x_{Q, i}})}^{- 1} + β \sum_{i = ν + 1}^{M_{t}} x_{T, i},

(21)

and the constraint sets

X_{T}

and

X_{Q}

are defined as

X_{T} = \{x_{T} : {∥ x_{T} - λ_{T} ∥}^{2} \leq ϵ_{T}, x_{T, i} \geq 0, i = 1, \dots, M_{t}\}

and

X_{Q} = \{x_{Q} : {∥ x_{Q} - λ_{Q} ∥}^{2} \leq ϵ_{Q}, x_{Q, i} \geq 0, i = 1, \dots, L\},

respectively. It is still difficult to obtain the optimal solution to problem (20) in a closed-form. To overcome this difficulty, in the following, we instead find a suboptimal solution.

Definition 2.

[32] For any

a \in R^{n}

, let

a_{[1]} \geq \dots \geq a_{[n]}

denote the components of

a

in decreasing order.

Definition 3.

[32] (Ch.1.A.1) The vector

a \in R^{n}

is majorized by

b \in R^{n}

, denoted by

a ≺ b

, if

\sum_{m = 1}^{l} a_{[m]} \leq \sum_{m = 1}^{l} b_{[m]}

,

1 \leq l \leq n - 1

, and

\sum_{m = 1}^{n} a_{[m]} = \sum_{m = 1}^{n} b_{[m]}

.

Definition 4

(Schur-concave function). [32] (Ch.3.A.1) A real-valued function

φ : A \subset R^{n} \to R

is said to be Schur-concave on

A

if

a ≺ b

⇒

φ (a) \geq φ (b)

.

Lemma 2.

Let

g^{⋆} (x_{T}, x_{Q}) = \min_{d_{P} \in D_{P}} g (d_{P}, x_{T}, x_{Q})

be the minimum MSE. Then,

g^{⋆} (x_{T}, x_{Q})

is Schur-concave in

x_{T}

and

x_{Q}

. In other words, if there exists the vectors

{\tilde{x}}_{T}

and

{\tilde{x}}_{Q}

such that

{\tilde{x}}_{T} ≺ x_{T}

and

{\tilde{x}}_{Q} ≺ x_{Q}

, then

g^{⋆} ({\tilde{x}}_{T}, {\tilde{x}}_{Q}) \geq g^{⋆} (x_{T}, x_{Q})

.

Proof.

See Appendix C. □

From the above lemma, we can find a suboptimal solution to problem (20), with which the value of

g^{⋆}

increases, but is not maximized, as follows:

{\tilde{x}}_{T} = λ_{T} + δ_{T} 1_{M_{t}} and {\tilde{x}}_{Q} = λ_{Q} + δ_{Q} [1_{ν}, 0_{(L - ν) \times 1}],

(22)

where

λ_{T} = {[λ_{T, 1}, \dots, λ_{T, M_{t}}]}^{T}

and

λ_{Q} = {[λ_{Q, 1}, \dots, λ_{Q, L}]}^{T} .

Also,

1_{n}

denotes the

n \times 1

vector whose elements are all 1,

δ_{T} = \frac{1}{M_{t}} \sum_{j = 1}^{M_{t}} d_{T, j}

and

δ_{Q} = \frac{1}{ν} \sum_{j = 1}^{ν} d_{Q, j}

. To satisfy the constraints

{\tilde{x}}_{T} \in X_{T}

and

{\tilde{x}}_{Q} \in X_{Q}

, the values of

δ_{T}

and

δ_{Q}

can be chosen as

δ_{T} = \sqrt{\frac{ϵ_{T}}{M_{t}}}

and

δ_{Q} = \sqrt{\frac{ϵ_{Q}}{ν}}

, respectively. By substituting (22) into (20), an optimization problem for suboptimal training power allocation can be formulated as

\begin{matrix} \min_{d_{P}} & g (d_{P}, {\tilde{x}}_{T}, {\tilde{x}}_{Q}) \end{matrix}

(23a)

\begin{matrix} s . t . & d_{P} \in D_{P} . \end{matrix}

(23b)

It is assumed that the elements of

λ_{T}

and

λ_{Q}

are arranged in descending and ascending orders, respectively. Then, the solution to problem (23) can be obtained in a closed form by the Lagrange multiplier method [28], as

d_{P, i}^{(s o)} = \{\begin{matrix} (P_{T} + \sum_{j = 1}^{r} \frac{λ_{Q, j} + δ_{Q}}{λ_{T, j} + δ_{T}}) \frac{\sqrt{λ_{Q, j} + δ_{Q}}}{\sum_{j = 1}^{r} \sqrt{λ_{Q, j} + δ_{Q}}} - \frac{λ_{Q, j} + δ_{Q}}{λ_{T, j} + δ_{T}}, & i = 1, \dots, r, \\ 0, & i = r + 1, \dots, ν, \end{matrix}

(24)

where

r = \max {i \in {1, \dots, ν} : d_{P, i}^{(s o)} > 0}

is the largest i such that

d_{P, i}^{(s o)} > 0

. The suboptimal solution in (24) follows the conventional water-filling strategy, i.e., it assigns more training power to the larger eigenvalues of

\hat{T}

and smaller eigenvalues of

\hat{Q}

. One difference is that there exist the equal errors

δ_{T}

and

δ_{Q}

in the eigenvalues of

\hat{T}

and

\hat{Q}

, respectively.

Remark 4.

For the case considered in Remark 1, we obtain the suboptimal power allocation from (24) by setting

δ_{Q} = 0

and

λ_{Q, j} = σ_{Q}

,

j = 1, \dots, L

, where

σ_{Q}

denotes the interference variance.

Remark 5.

In the case considered in Remark 2, the suboptimal power allocation can be obtained from (24) by setting

δ_{Q} = 0

and

λ_{Q, j} = σ_{Q, j}

,

j = 1, \dots, L

, where

{σ_{Q, j}}_{j = 1}^{L}

denote the eigenvalues of

Q

.

The suboptimal solution

d_{P}^{(s o)}

in (24) becomes optimum if the minimum MSE

g^{⋆}

is an increasing function of both

δ_{T} = \frac{1}{M_{t}} \sum_{j = 1}^{M_{t}} d_{T, j}

and

δ_{Q} = \frac{1}{ν} \sum_{j = 1}^{ν} d_{Q, j}

, which, however, does not hold as can be seen in (A16) in Appendix C. Hence, the suboptimal solution does not guarantee achieving optimal performance in general. However,

g^{⋆}

is tightly upper bounded by an increasing function of

δ_{T}

and

δ_{Q}

. To show this, we consider the simple case of

M_{t} = L

. Then, we have

\begin{matrix} g^{⋆} (x_{T}, x_{Q}) & = \min_{d_{P} \in D_{P}} g (d_{P}, x_{T}, x_{Q}) \end{matrix}

(25a)

\begin{matrix} \leq β \sum_{i = 1}^{M_{t}} {(\frac{1}{λ_{T, i} + d_{T, i}} + \frac{P_{T} / M_{t}}{λ_{Q, i} + d_{Q, i}})}^{- 1} \end{matrix}

(25b)

\begin{matrix} \leq β M_{t} {(\frac{1}{{\bar{λ}}_{T} + δ_{T}} + \frac{P_{T} / M_{t}}{{\bar{λ}}_{Q} + δ_{Q}})}^{- 1} \end{matrix}

(25c)

where

{\bar{λ}}_{T} = \frac{1}{M_{t}} \sum_{i = 1}^{M_{t}} λ_{T, i}

and

{\bar{λ}}_{Q} = \frac{1}{M_{t}} \sum_{i = 1}^{M_{t}} λ_{Q, i}

. The inequality (25b) follows from the fact that

d_{P} = (P_{T} / M_{t}) 1_{M_{t}}

is a feasible solution to the problem in (25a). The inequality (25c) follows from the concavity. The upper bound (25) becomes tight for the following cases: (i)

M_{t} = 1

, (ii)

P_{T} = 0

, and (iii)

P_{T} \to \infty

. Therefore, we deduce that the suboptimal scheme may provide near-optimal performance at low and high training powers when the number of transmit antennas is not large enough. Otherwise, the performance of the suboptimal scheme may degrade because the bound (25) becomes loose. To clarify our discussion, in Figure 2, we plot the value of

g^{⋆}

and that of its upper bound versus the total training power

P_{T}

for various numbers of transmit antennas. In the figure, we set

{[\hat{T}]}_{m, n} = {[\hat{Q}]}_{m, n} = ρ^{| m - n |}

,

1 \leq m, n \leq M_{t}

,

ρ = 0.5

,

β = M_{t}

,

ϵ_{T} = 0.3 ‖ \hat{T} ‖_{F}^{2}

, and

ϵ_{Q} = 0.3 ‖ \hat{Q} ‖_{F}^{2}

. The real curve and dashed-dot curve indicate the value of

g^{⋆}

and that of its upper bound, respectively. From the figure, we can observe that the upper bound is tight to the actual value when

P_{T}

is sufficiently low or high. Also, for the range of

P_{T} = - 5

dB to

P_{T} = 15

dB, the gap between

g^{⋆}

and its upper bound becomes larger as the number of transmit antennas increases. In the following section, the performance of the suboptimal training scheme is concretely demonstrated by numerical simulations.

3.4. Complexity Comparison

To validate the efficiency of the proposed designs, in this section, we compare the computational complexity of the proposed schemes with that of the iterative algorithm in [20]. In each iteration of the conventional algorithm, the arithmetic complexity of

O ((M_{t}^{6.5} + L^{6.5}) \log (1 / ψ))

is required for computing

\tilde{T}

and

\tilde{Q}

for a fixed

P

[33], where

ψ

is the solution accuracy for the interior point method, and the arithmetic complexity of

O (M_{t}^{3} + L^{3})

is needed to compute

P

for fixed

\tilde{T}

and

\tilde{Q}

[34]. Thus, the conventional algorithm in [20] requires the complexity

O (\{M_{t}^{3} + L^{3} + (M_{t}^{6.5} + L^{6.5}) \log (1 / ψ)\})

per iteration. Let

N_{iter}

denote the total number of iterations required for the algorithm in [20]. Then, it requires the total computational complexity of

O (\{M_{t}^{3} + L^{3} + (M_{t}^{6.5} + L^{6.5}) N_{iter} \log (1 / ψ)\})

. Next, we consider the complexity of the proposed training schemes. To implement the training structure in (15), the arithmetic operation of

O (M_{t}^{3} + L^{3})

is required for computing the EVDs of

\hat{T}

and

\hat{Q}

[34]. Also, we require the computational complexity of

O ((M_{t}^{3.5} + L^{3.5}) \log (1 / ψ))

to compute the optimal power allocation numerically [33]. Therefore, the overall complexity for the optimal training scheme is

O (M_{t}^{3} + L^{3} + (M_{t}^{3.5} + L^{3.5}) \log (1 / ψ))

. On the other hand, since the complexity of computing the suboptimal power allocation in (24) is insignificant compared with the complexity

O (M_{t}^{3} + L^{3})

[34], the overall complexity for the suboptimal training scheme is

O (M_{t}^{3} + L^{3})

. In Table 1, we summarize the analytical complexity results and processing time measured on the 12th Gen Intel(R) Core(TM) i9-12900K CPU when

M_{t} = M_{r} = L = 4

.

4. Simulation Results

In this section, the performance of the proposed schemes is illustrated and compared with that of existing schemes by computer simulations.

4.1. Simulation Setup

In the simulations, we generate the matrices

\hat{T}

and

\hat{R}

according to the inverse Wishart distribution (some discussions on the use of the inverse Wishart distribution can be found in [35,36]; the inverse Wishart distribution is the conjugate prior to the actual correlation matrices

T

and

R

when

h

is Gaussian-distributed; nevertheless, our proposed scheme is applicable to any distributions or types of covariance matrices) as

\hat{T} \sim W^{- 1} ((κ_{T} - M_{t} - 1) C_{T}, κ_{T})

and

\hat{R} \sim W^{- 1} ((κ_{R} - M_{r} - 1) C_{R}, κ_{R}),

respectively, where

κ_{T}

and

κ_{R}

denote the degrees-of-freedom parameters of the inverse Wishart distribution. The values of

κ_{T}

and

κ_{R}

are set to

κ_{T} = M_{t} + 2

and

κ_{R} = M_{r} + 2

, respectively. The elements of

C_{T}

and

C_{R}

are generated by the one-ring model [37]:

{[C_{T}]}_{m, n} = J_{0} (2 π Θ_{T} | m - n | s_{T} / λ), 1 \leq m, n \leq M_{t}

and

{[C_{R}]}_{m, n} = J_{0} (2 π | m - n | s_{R} / λ), 1 \leq m, n \leq M_{r},

respectively, where

J_{0} (\cdot)

is the zeroth-order Bessel function of the first kind,

λ

the carrier wavelength,

Θ_{T}

the angular spread, and

s_{T}

and

s_{R}

the transmit and receive antenna spacings, respectively. The values of

s_{T} / λ

and

s_{R} / λ

are set to

s_{T} / λ = 0.5

and

s_{R} / λ = 0.25

, respectively. We set

\hat{Q} = \sum_{k = 1}^{K} Tr (Ψ_{s, k} T_{I, k}) Ψ_{τ, k}

, where the matrices

T_{I, k}

,

Ψ_{τ, k}

, and

Ψ_{s, k}

are, respectively, generated by

T_{I, k} \sim W^{- 1} ((κ_{I, k} - M_{k} - 1) C_{I, k}, κ_{I, k}),

Ψ_{τ, k} \sim W^{- 1} ((κ_{τ, k} - L - 1) C_{τ, k}, κ_{τ, k}),

and

Ψ_{s, k} \sim W^{- 1} ((κ_{R} - M_{r} - 1) C_{s, k}, κ_{R})

for

k = 1, \dots, K

. The parameters

κ_{I, k}

and

κ_{τ, k}

are, respectively, set to

κ_{I, k} = M_{k} + 2

and

κ_{τ, k} = L + 2

,

k = 1, \dots, K

. The elements of

C_{I, k}

are generated by

{[C_{I, k}]}_{m, n} = J_{0} (2 π Θ_{I, k} | m - n | s_{I, k} / λ), 1 \leq m, n \leq M_{k},

with the choice of

s_{I, k} / λ = 0.5

,

k = 1, \dots, K

, where

{Θ_{I, k}}_{k = 1}^{K}

denotes the angular spreads. The elements of

C_{τ, k}

and

C_{s, k}

are generated by the exponential model [38]:

{[C_{τ, k}]}_{m, n} = ρ_{τ, k}^{| m - n |}, 1 \leq m, n \leq L

and

{[C_{s, k}]}_{m, n} = ρ_{s, k}^{| m - n |}, 1 \leq m, n \leq M_{r},

respectively, for

k = 1, \dots, K

, where

ρ_{τ, k}

is the correlation coefficient of the elements of

C_{τ, k}

and

ρ_{s, k}

the correlation coefficient of the elements of

C_{s, k}

. The number of interfering users K is set to

K = 3

and that of the antennas at the kth interferer

M_{k}

is set to

M_{k} = M_{t}

,

k = 1, \dots, K

. The relative uncertainty parameters

α_{T}

,

α_{R}

, and

α_{Q}

are defined such that

(ϵ_{T}, ϵ_{R}, ϵ_{Q}) = (α_{T} ∥ \hat{T} ∥_{F}^{2}, α_{R} ∥ \hat{R} ∥_{F}^{2}, α_{Q} ∥ \hat{Q} ∥_{F}^{2}) .

The system parameters used in the simulations are summarized in Table 2.

We compare the performance of the proposed schemes with that of two different existing schemes. One is the nonrobust scheme considered in the traditional work [9], where the values of

(\hat{T}, \hat{R}, \hat{Q})

are used as perfect values. The other is the semi-robust scheme considered in the work [20], where only the value of

\hat{Q}

is used as the actual value. In the figures, “Proposed optimal”, “Proposed suboptimal”, “Nonrobust”, “Semi-robust”, and “Semi-robust (suboptimal)” indicate the proposed scheme with the optimal power allocation, that with the suboptimal power allocation, the nonrobust scheme, the semi-robust scheme, and the semi-robust scheme with the proposed suboptimal power allocation, respectively.

4.2. Performance Comparison

In Figure 3 and Figure 4, we illustrate the worst-case MSE performance as a function of the effective training signal-to-interference ratio (SIR) for the systems with

M_{t} = M_{r} = L = 4

and

M_{t} = M_{r} = L = 8

. The effective training SIR is defined as

SIR = P_{T} E [Tr (\hat{T})] / E [Tr (\hat{Q})] = P_{T} Tr (C_{T}) / \sum_{k = 1}^{3} Tr (C_{s, k} C_{I, k}) C_{τ, k}

. In Figure 3, we set

Θ_{T} = 10^{\circ}

,

Θ_{I, k} = 10^{\circ}

and

ρ_{τ, k} = ρ_{s, k} = 0.9

,

k = 1, 2, 3

, which represents a strongly correlated environment. On the other hand, in Figure 4, we set

Θ_{T} = 30^{\circ}

,

Θ_{I, k} = 30^{\circ}

, and

ρ_{τ, k} = ρ_{s, k} = 0.3

,

k = 1, 2, 3

, which represents a weakly correlated environment. The values of

α_{T}

,

α_{R}

, and

α_{Q}

are set to

α_{T} = α_{R} = α_{Q} = 0.3

. In the figures, the results are averaged over 500 realizations.

It can be observed from Figure 3 and Figure 4 that the proposed schemes outperform the other schemes. For example, in Figure 3, the proposed schemes have roughly 5 dB improvement in terms of the effective training SIR for the system with

M_{t} = M_{r} = L = 8

when the effective training SIR is higher than 10 dB (similar trends can still be observed even for other performance metrics, e.g., the ratio of the standard deviation to the mean value).

Even though the semi-robust scheme outperforms the nonrobust one, its performance degrades due to imperfect knowledge of the interference covariance matrix when the effective training SIR increases or the correlation becomes stronger. The nonrobust scheme provides the worst performance due to imperfect knowledge of both the channel and interference covariance matrices. In the proposed schemes, the suboptimal power allocation provides comparable performance to that of the optimal power allocation. The semi-robust scheme with the proposed suboptimal power allocation works well for the weakly correlated case, but it shows noticeable performance loss for the strongly correlated case.

The worst-case MSE performance is compared in Figure 5 and Figure 6 for various values of

α = α_{T} = α_{R} = α_{Q}

when

Θ_{T} = 10^{\circ}

,

Θ_{I, k} = 10^{\circ}

and

ρ_{τ, k} = ρ_{s, k} = 0.9

,

k = 1, 2, 3

, where the parameter

α

is introduced to set

(ϵ_{T}, ϵ_{R}, ϵ_{Q}) = (α ‖ \hat{T} ‖_{F}^{2}, α ‖ \hat{R} ‖_{F}^{2}, α ‖ \hat{Q} ‖_{F}^{2})

. Figure 5 and Figure 6 show the performance comparison for the systems with

M_{t} = M_{r} = L = 4

and

M_{t} = M_{r} = L = 6

, respectively. When the value of

α

decreases from 0.6 to 0.1, the performances of all the schemes are improved, and the gap between the proposed and semi-robust schemes increases, but the gap between the semi-robust and nonrobust schemes decreases. This means that the performance is dominated by the uncertainty in the interference covariance matrix when the value of

α

equals 0.1, i.e., the error is small, but the performance is dominated by the uncertainty in the channel covariance matrix when the value of

α

equals 0.6, i.e., the error is large. The suboptimal power allocation shows almost the same performance as that of the optimal one in the proposed scheme, but shows notable performance loss in the semi-robust scheme when

α = 0.6

.

To inspect the effect of the proposed designs on the quality of the communication systems, the bit error rate (BER) performance is compared in Figure 7 and Figure 8 for

3 \times 3

and

4 \times 4

MIMO systems, respectively (as convention, we here refer to a MIMO system with

M_{t}

transmit antennas and

M_{r}

transmit antennas as an

M_{t} \times M_{r}

MIMO system). The training SIR is set to 15 dB and the uncertainty parameters are chosen as

α_{T} = α_{R} = α_{Q} = 0.3

. The orthogonal space–time block code in [39] is used to encode the QPSK-modulated symbols and the well-known minimum MSE (MMSE) receiver in [40] is employed to recover the transmitted symbols before detection. For the case of the proposed, nonrobust, and semi-robust training schemes, the imperfect CSI estimated from the training signal is used to implement the MMSE receiver. Additionally, we present the BER performance of the MMSE receiver with the perfect CSI as a benchmark for the BER performance and it is denoted by “Perfect CSI” in the figures. The results are averaged over 500 channel realizations, where

4 \times 10^{6}

symbols are transmitted for each channel realization. From the figures, it can be observed that the proposed schemes provide better BER performance than the semi-robust and nonrobust schemes. As the symbol SIR increases, the BER performance of the nonrobust scheme saturates due to the uncertainties in both the channel and interference covariance matrices, and the gap between the proposed and semi-robust schemes increases due to the uncertainty in the interference covariance matrix. In both figures, the proposed suboptimal training scheme provides almost the same performance as that of the optimal one in terms of BER. Therefore, the proposed suboptimal training scheme is more practically useful than the optimal training scheme.

5. Conclusions

We designed an optimal training signal for MIMO systems in the presence of colored interference considering the imperfection of both the channel and interference covariance. From the solution on the structure, it was observed that the optimal training strategy was to allocate the training power according to the worst-case eigenvalues of the imperfect covariance matrices. It was also shown that the proposed training structure can cover the cases considered in the previous works [18,19,20]. The optimal training power allocation scheme was obtained numerically. An efficient suboptimal power allocation scheme was also proposed in a closed form. Simulation results show that the proposed schemes considerably outperform the semi-robust and nonrobust schemes in terms of the worst-case MSE and BER performances. In particular, the proposed closed-form training scheme shows near-optimal performance, and hence it is more practically useful than the optimal training scheme.

An important conclusion from our work is that the performance of training signal design for MIMO systems is sensitive to the channel and interference covariance uncertainties, and thus one should carefully select or determine the covariance uncertainty-based training signals in practice for reliable performance according to the system requirements and operating conditions.

Author Contributions

Methodology, J.-M.K.; writing—review and editing, S.Y.; supervision, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Kyungpook National University Research Fund, 2024, and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2025-00559998).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

It is well known that the following inequalities hold at an optimum point [29]:

J (P^{⋆}, \tilde{T}, \tilde{R}, \tilde{Q}) \leq J (P^{⋆}, {\tilde{T}}^{⋆}, {\tilde{R}}^{⋆}, {\tilde{Q}}^{⋆}) \leq J (P, {\tilde{T}}^{⋆}, {\tilde{R}}^{⋆}, {\tilde{Q}}^{⋆})

(A1)

for all feasible values of

P

,

\tilde{T}

,

\tilde{R}

, and

\tilde{Q}

, In other words, a saddle point of the MSE J in (12) is a solution to the worst-case MSE minimization problem in (14). Let us define the function

L

as

\begin{matrix} L (P, \tilde{T}, \tilde{R}, \tilde{Q}, μ_{P}, μ_{T}, μ_{R}, μ_{Q}, Γ_{T}, Γ_{R}, Γ_{Q}) \\ = J (P, \tilde{T}, \tilde{R}, \tilde{Q}) + μ_{P} [Tr (P P^{H}) - P_{T}] \\ - μ_{T} [| | \tilde{T} {| |}_{F}^{2} - ϵ_{T}] + Tr [(\hat{T} + \tilde{T}) Γ_{T}] \\ - μ_{R} [| | \tilde{R} {| |}_{F}^{2} - ϵ_{R}] + Tr [(\hat{R} + \tilde{R}) Γ_{R}] \\ - μ_{Q} [| | \tilde{Q} {| |}_{F}^{2} - ϵ_{Q}] + Tr [(\hat{Q} + \tilde{Q}) Γ_{Q}], \end{matrix}

(A2)

where

μ_{P}

,

μ_{T}

,

μ_{R}

,

μ_{Q}

,

Γ_{T} \in

^{M_t × M_t},

Γ_{R} \in

^{M_r × M_r}, and

Γ_{Q} \in

^{L × L} are dual variables associated with the constraints of the problem (14). Then, the saddle point should satisfy the following KKT optimality conditions [28,29]:

{\nabla_{P} L|}_{P = P^{⋆}} = - Z^{- 2} P^{⋆} {(\hat{Q} + {\tilde{Q}}^{⋆})}^{- 1} + μ_{P} P^{⋆} = 0,

(A3a)

μ_{P} \geq 0, μ_{P} [Tr (P^{⋆} P^{⋆ H}) - P_{T}] = 0,

(A3b)

{\nabla_{\tilde{T}} L|}_{\tilde{T} = {\tilde{T}}^{⋆}} = - {(\hat{T} + {\tilde{T}}^{⋆})}^{- 1} Z^{- 2} {(\hat{T} + {\tilde{T}}^{⋆})}^{- 1} + μ_{T} {\tilde{T}}^{⋆} - Γ_{T} = 0,

(A3c)

μ_{T} \geq 0, μ_{T} [‖ {\tilde{T}}^{⋆} {| |}_{F}^{2} - ϵ_{T}] = 0,

(A3d)

Γ_{T} ⪰ 0, (\hat{T} + {\tilde{T}}^{⋆}) Γ_{T} = 0,

(A3e)

{\nabla_{\tilde{R}} L|}_{\tilde{R} = {\tilde{R}}^{⋆}} = - Tr (Z^{- 1}) I_{M_{r}} + μ_{R} {\tilde{R}}^{⋆} - Γ_{R} = 0,

(A3f)

μ_{R} \geq 0, μ_{R} [‖ {\tilde{R}}^{⋆} {| |}_{F}^{2} - ϵ_{R}] = 0,

(A3g)

Γ_{R} ⪰ 0, (\hat{R} + {\tilde{R}}^{⋆}) Γ_{R} = 0,

(A3h)

{\nabla_{\tilde{Q}} L|}_{\tilde{Q} = {\tilde{Q}}^{⋆}} = - {(\hat{Q} + {\tilde{Q}}^{⋆})}^{- 1} P^{⋆ H} Z^{- 2} P^{⋆} {(\hat{Q} + {\tilde{Q}}^{⋆})}^{- 1} + μ_{Q} {\tilde{Q}}^{⋆} - Γ_{Q} = 0,

(A3i)

μ_{Q} \geq 0, μ_{Q} [‖ {\tilde{Q}}^{⋆} {| |}_{F}^{2} - ϵ_{Q}] = 0,

(A3j)

Γ_{Q} ⪰ 0, (\hat{Q} + {\tilde{Q}}^{⋆}) Γ_{Q} = 0,

(A3k)

where

Z = {(\hat{T} + {\tilde{T}}^{⋆})}^{- 1} + P^{⋆} {(\hat{Q} + {\tilde{Q}}^{⋆})}^{- 1} P^{⋆ H}

. In the following, we derive the structures of

(P^{⋆}, {\tilde{T}}^{⋆}, {\tilde{R}}^{⋆}, {\tilde{Q}}^{⋆})

using Equations (A3a)–(A3k).

Multiplying (A3f) on the right-hand side by

(\hat{R} + {\tilde{R}}^{⋆})

and using the condition

(\hat{R} + {\tilde{R}}^{⋆}) Γ_{R} = 0

in (A3h), we obtain

{\tilde{R}}^{⋆} = (\hat{R} + {\tilde{R}}^{⋆}) \{- Tr (Z^{- 1}) I_{M_{r}} + μ_{R} {\tilde{R}}^{⋆}\} = 0 .

(A4)

To satisfy the condition (A4),

{\tilde{R}}^{⋆}

should have the following form:

{\tilde{R}}^{⋆} = \frac{Tr (Z^{- 1})}{μ_{R}} I_{M_{r}}

or

{\tilde{R}}^{⋆} = - \hat{R}

. The latter choice cannot be a solution since it is infeasible when

‖ \hat{R} ‖_{F}^{2} > ϵ_{R}

. Therefore, we have

{\tilde{R}}^{⋆} = \frac{Tr (Z^{- 1})}{μ_{R}} I_{M_{r}}

. From the constraint

‖ {\tilde{R}}^{⋆} ‖_{F}^{2} \leq ϵ_{R}

, the optimal value of

μ_{R}

can be computed as

μ_{R} = Tr (Z^{- 1}) \sqrt{\frac{M_{r}}{ϵ_{R}}}

. Using this result, we obtain

{\tilde{R}}^{⋆} = \sqrt{\frac{ϵ_{R}}{M_{r}}} I_{M_{r}}

.

By multiplying (A3i) on the right-hand side by

(\hat{Q} + {\tilde{Q}}^{⋆})

and using the condition

(\hat{Q} + {\tilde{Q}}^{⋆}) Γ_{Q} = 0

in (A3k), it can be obtained that

μ_{Q} (\hat{Q} + {\tilde{Q}}^{⋆}) {\tilde{Q}}^{⋆} = P^{⋆ H} Z^{- 2} P^{⋆} {(\hat{Q} + {\tilde{Q}}^{⋆})}^{- 1} .

(A5)

From the condition

Z^{- 2} P^{⋆} {(\hat{Q} + {\tilde{Q}}^{⋆})}^{- 1} = μ_{P} P^{⋆}

in (A3a), we can rewrite (A5) as

μ_{Q} (\hat{Q} + {\tilde{Q}}^{⋆}) {\tilde{Q}}^{⋆} = μ_{P} P^{⋆ H} P^{⋆} .

(A6)

Since the right-hand side of (A6) is Hermitian, the left-hand side of (A6) should be Hermitian. From this, we have

(\hat{Q} + {\tilde{Q}}^{⋆}) {\tilde{Q}}^{⋆} = {\tilde{Q}}^{⋆} (\hat{Q} + {\tilde{Q}}^{⋆})

, or equivalently,

{\tilde{Q}}^{⋆} \hat{Q} = \hat{Q} {\tilde{Q}}^{⋆}

.

Lemma A1.

[41] For

n \times n

Hermitian matrices

A

and

B

,

A B = B A

if and only if

A

and

B

share the same eigenvectors.

From Lemma A1, we can conclude that

\hat{Q}

and

{\tilde{Q}}^{⋆}

share the same eigenvectors. Let

\hat{Q} = U_{Q} Λ_{Q} U_{Q}^{H}

be the EVD of

\hat{Q}

. Then,

{\tilde{Q}}^{⋆}

should have the following form:

{\tilde{Q}}^{⋆} = U_{Q} D_{Q}^{⋆} U_{Q}^{H}

, where

D_{Q}^{⋆}

is an

L \times L

real diagonal matrix whose diagonal elements consist of the eigenvalues of

{\tilde{Q}}^{⋆}

. From the structure of

{\tilde{Q}}^{⋆} = U_{Q} D_{Q}^{⋆} U_{Q}^{H}

, (A6) can be rewritten as

μ_{Q} (Λ_{Q} + D_{Q}^{⋆}) D_{Q}^{⋆} = μ_{P} U_{Q}^{H} P^{⋆ H} P^{⋆} U_{Q} .

(A7)

Since the left-hand side of (A7) is diagonal,

U_{Q}^{H} P^{⋆ H} P^{⋆} U_{Q}

should be diagonal. Let

P^{⋆} = U_{P} D_{P}^{⋆} V_{P}^{H}

be the singular-value decomposition (SVD) of

P^{⋆}

, where the main diagonal elements of the rectangular diagonal matrix

D_{P}^{⋆}

consist of the singular values of

P^{⋆}

, and the column vectors of

U_{P}

and

V_{P}

are the left and right singular vectors of

P^{⋆}

, respectively. Then, the diagonal structure of

U_{Q}^{H} P^{⋆ H} P^{⋆} U_{Q}

can be simply obtained with the choice of

V_{P} = U_{Q}

.

Multiplying (A3c) on the right-hand side by

(\hat{T} + {\tilde{T}}^{⋆})

and using the condition

(\hat{T} + {\tilde{T}}^{⋆}) Γ_{T} = 0

in (A3e), one can obtain that

\begin{matrix} μ_{T} (\hat{T} + {\tilde{T}}^{⋆}) {\tilde{T}}^{⋆} & = Z^{- 2} {(\hat{T} + {\tilde{T}}^{⋆})}^{- 1} \\ = Z^{- 1} - Z^{- 2} P^{⋆} {(\hat{Q} + {\tilde{Q}}^{⋆})}^{- 1} P^{⋆ H} . \end{matrix}

(A8)

Also, multiplying (A3a) on the left-hand side by

P^{⋆ H}

, we have

μ_{P} P^{⋆} P^{⋆ H} = Z^{- 2} P^{⋆} {(\hat{Q} + {\tilde{Q}}^{⋆})}^{- 1} P^{⋆ H} .

(A9)

By substituting (A9) into (A8), the condition (A8) can be rewritten as

μ_{T} (\hat{T} + {\tilde{T}}^{⋆}) {\tilde{T}}^{⋆} = Z^{- 1} - μ_{P} P^{⋆} P^{⋆ H} .

(A10)

Since the right-hand side of (A10) is Hermitian, the left-hand side of (A10) is also Hermitian. From this, we have

(\hat{T} + {\tilde{T}}^{⋆}) {\tilde{T}}^{⋆} = {\tilde{T}}^{⋆} (\hat{T} + {\tilde{T}}^{⋆})

, or equivalently,

{\tilde{T}}^{⋆} \hat{T} = \hat{T} {\tilde{T}}^{⋆}

. From Lemma A1, it follows that

\hat{T}

and

{\tilde{T}}^{⋆}

share the same eigenvectors. Let

\hat{T} = U_{T} Λ_{T} U_{T}^{H}

be the EVD of

\hat{T}

. Then, the structure of

{\tilde{T}}^{⋆}

is obtained as

{\tilde{T}}^{⋆} = U_{T} D_{T}^{⋆} U_{T}^{H}

, where

D_{T}^{⋆}

is an

M_{t} \times M_{t}

real diagonal matrix whose diagonal elements consist of the eigenvalues of

{\tilde{T}}^{⋆}

. By using

{\tilde{T}}^{⋆} = U_{T} D_{T}^{⋆} U_{T}^{H}

, (A10) can be rewritten as

μ_{T} (Λ_{T} + D_{T}^{⋆}) D_{T}^{⋆} = U_{T}^{H} (Z^{- 1} - μ_{P} P^{⋆} P^{⋆ H}) U_{T}

(A11)

Since the left-hand side of (A11) is diagonal,

U_{T}^{H} (Z^{- 1} - μ_{P} P^{⋆} P^{⋆ H}) U_{T}

should be diagonal. With some manipulations, it can be shown that the diagonal structure of

U_{T}^{H} (Z^{- 1} - μ_{P} P^{⋆} P^{⋆ H}) U_{T}

can be obtained when

U_{T}^{H} U_{P} D_{P}^{⋆}

is a rectangular diagonal matrix. This can be achieved by simply setting

U_{P} = U_{T}

. Therefore, the optimal training structure is obtained as

P^{⋆} = U_{T} D_{P}^{⋆} U_{Q}^{H}

.

Appendix B. Proof of Lemma 1

We first prove the convexity. The Hessian matrix of

ζ

w.r.t.

d_{P}

is computed as

\nabla_{d_{P}}^{2} ζ = Diag \{\frac{2 x_{Q, 1} x_{T, 1}^{3}}{{(d_{P, 1} x_{T, 1} + x_{Q, 1})}^{3}}, \dots, \frac{2 x_{Q, ν} x_{T, ν}^{3}}{{(d_{P, ν} x_{T, ν} + x_{Q, ν})}^{3}}\},

(A12)

where

x_{T, i} = λ_{T, i} + d_{T, i}

,

i = 1, \dots, M_{t}

and

x_{Q, i} = λ_{Q, i} + d_{Q, i}

,

i = 1, \dots, L

. For fixed

(d_{T}, d_{Q}) \in D_{T} \times D_{Q}

, the diagonal elements of the Hessian matrix

\nabla_{d_{P}}^{2} ζ

in (A12) are non-negative for

d_{P} \in D_{P}

because the values of

{d_{T, i}}

,

{x_{T, i}}

, and

{x_{Q, i}}

are all non-negative. This means that

\nabla_{d_{P}}^{2} ζ

is positive semi-definite. Therefore, the function

ζ

in (17) is convex in

d_{P} \in D_{P}

for fixed

(d_{T}, d_{Q}) \in D_{T} \times D_{Q}

.

Now, we prove the concavity. Let us define the augmented vector

r

as

r = {[d_{T}^{T}, d_{Q}^{T}]}^{T}

. Then, the Hessian matrix of

ζ

w.r.t.

r

is computed as

\begin{matrix} \nabla_{r}^{2} ζ & = - [(\begin{matrix} G_{1} {(Λ_{Q} + D_{Q})}^{2} G_{1} & - G_{1} (Λ_{Q} + D_{Q}) (Λ_{T} + D_{T}) G_{2} \\ - G_{2} (Λ_{T} + D_{T}) (Λ_{Q} + D_{Q}) G_{1} & G_{2} {(Λ_{T} + D_{T})}^{2} G_{2} \end{matrix})] \\ = - [(\begin{matrix} G_{1} (Λ_{Q} + D_{Q}) \\ - G_{2} (Λ_{T} + D_{T}) \end{matrix})] {[(\begin{matrix} G_{1} (Λ_{Q} + D_{Q}) \\ - G_{2} (Λ_{T} + D_{T}) \end{matrix})]}^{H}, \end{matrix}

(A13)

where

G_{1} = Blkdiag (G, 0_{(M_{t} - ν) \times (M_{t} - ν)}),

G_{2} = Blkdiag (G, 0_{(L - ν) \times (L - ν)})

and the matrix

G

is given by

G = Diag (\sqrt{\frac{2 d_{P, i}}{{(d_{P, 1} x_{T, 1} + x_{Q, 1})}^{3}}}, \dots, \sqrt{\frac{2 d_{P, ν}}{{(d_{P, ν} x_{T, ν} + x_{Q, ν})}^{3}}}) .

For fixed

d_{P} \in D_{P}

, it can be shown from (A13) that

a^{T} \nabla_{r}^{2} ζ a \leq 0

for any vector

a \in R^{M_{t} L \times 1}

. This means that the Hessian matrix

\nabla_{r}^{2} ζ

in (A13) is negative semi-definite. Thus,

ζ

is concave in

d_{T} \in D_{T}

and

d_{Q} \in D_{Q}

for fixed

d_{P} \in D_{P}

.

Appendix C. Proof of Lemma 2

The Schur-concavity can be proved from Schur’s condition, presented in the following lemma.

Lemma A2

(Schur’s condition). [32] (Ch.3.A.4) Let the function

φ (a) : A \subset R^{n} \to R

be continuously differentiable. Then, φ is Schur-concave on

A

if and only if φ is permutation-invariant, i.e.,

φ (a) = φ (Π_{n} a)

, and

(a_{m} - a_{l}) (\frac{\partial φ}{\partial a_{m}} - \frac{\partial φ}{\partial a_{l}}) \leq 0

for all

1 \leq m, l \leq n

;

Π_{n}

is any

n \times n

permutation matrix.

We first show the permutation-invariant property of

g^{⋆}

, i.e.,

g^{⋆} (x_{T}, x_{Q}) = g^{⋆}

(Π_{M_{t}} x_{T}, Π_{L} x_{Q})

. Let us define the diagonal matrices

X_{T}

and

X_{Q}

as

X_{T} = Diag (x_{T, 1}, \dots, x_{T, M_{t}})

and

X_{Q} = Diag (x_{Q, 1}, \dots, x_{Q, L})

, respectively. Then, we have

\begin{matrix} g^{⋆} (Π_{M_{t}} x_{T}, Π_{L} x_{Q}) = \min_{d_{P} \in D_{P}} g (d_{P}, Π_{M_{t}} x_{T}, Π_{L} x_{Q}) \end{matrix}

(A14a)

\begin{matrix} = \min_{Tr (D_{P} D_{P}^{T}) \leq P_{T}, D_{P} D_{P}^{T} ⪰ 0} β Tr [{(Π_{M_{t}} X_{T}^{- 1} Π_{M_{t}}^{T} + D_{P} Π_{L} X_{Q}^{- 1} Π_{L}^{T} D_{P}^{T})}^{- 1}] \end{matrix}

(A14b)

\begin{matrix} = \min_{Tr ({\tilde{D}}_{P} {\tilde{D}}_{P}^{T}) \leq P_{T}, {\tilde{D}}_{P} {\tilde{D}}_{P}^{T} ⪰ 0} β Tr [{(X_{T}^{- 1} + {\tilde{D}}_{P} X_{Q}^{- 1} {\tilde{D}}_{P}^{T})}^{- 1}] \end{matrix}

(A14c)

\begin{matrix} = \min_{Tr (D_{P} D_{P}^{T}) \leq P_{T}, D_{P} D_{P}^{T} ⪰ 0} β Tr [{(X_{T}^{- 1} + D_{P} X_{Q}^{- 1} D_{P}^{T})}^{- 1}] \end{matrix}

(A14d)

\begin{matrix} = \min_{d_{P} \in D_{P}} g (d_{P}, x_{T}, x_{Q}) = g^{⋆} (x_{T}, x_{Q}), \end{matrix}

(A14e)

where the equality in (A14b) follows from the equivalence of the problem in (A14a) and (A14b) and the fact that

Π_{M_{t}} x_{T} = Π_{M_{t}}^{T} X_{T} Π_{M_{t}}

and

Π_{L} x_{Q} = Π_{L}^{T} X_{Q} Π_{L}

. The equality in (A14c) follows from the change of the variable from

D_{P}

to

{\tilde{D}}_{P} = Π_{M_{t}}^{T} D_{P} Π_{L}

. The equality in (A14e) follows from the equivalence of the problem in (A14d) and (A14e).

From the above property, without loss of generality, it is assumed that the elements of

x_{T}

and

x_{Q}

are arranged in decreasing and increasing orders, respectively. Then, for given

x_{T}

and

x_{Q}

, the solution to the inner minimization problem of (20) can be computed from the KKT optimality conditions as

d_{P, i}^{⋆} = \{\begin{matrix} (P_{T} + \sum_{j = 1}^{r} \frac{x_{Q, j}}{x_{T, j}}) \frac{\sqrt{x_{Q, i}}}{\sum_{j = 1}^{r} \sqrt{x_{Q, j}}} - \frac{x_{Q, i}}{x_{T, i}}, & i = 1, \dots, r, \\ 0, & i = r + 1, \dots, ν, \end{matrix}

(A15)

where

r = \max {i \in {1, \dots, ν} : d_{P, i}^{⋆} > 0}

denotes the largest i such that

d_{P, i}^{⋆} > 0

. Substituting (A15) into (21), we can write the minimum MSE

g^{⋆}

as

g^{⋆} (x_{T}, x_{Q}) = β \frac{{(\sum_{j = 1}^{r} \sqrt{x_{Q, j}})}^{2}}{P_{T} + \sum_{j = 1}^{r} (x_{Q, j} / x_{T, j})} + β \sum_{i = r + 1}^{M_{t}} x_{T, i} .

(A16)

One can easily show from (A16) that

(x_{T, i} - x_{T, j}) (\frac{\partial g^{⋆}}{\partial x_{T, i}} - \frac{\partial g^{⋆}}{\partial x_{T, j}}) \leq 0

for all

1 \leq i, j \leq M_{t}

and

(x_{Q, i} - x_{Q, j}) (\frac{\partial g^{⋆}}{\partial x_{Q, i}} - \frac{\partial g^{⋆}}{\partial x_{Q, j}}) \leq 0

for all

1 \leq i, j \leq L

. Therefore, the minimum MSE

g^{⋆}

is Schur-concave in

x_{T}

and

x_{Q}

according to Lemma A2.

References

Foschini, G.J.; Gans, M.J. On limits of wireless communications in a fading environment when using multiple antennas. Wireless Pers. Commun. 1998, 6, 311–335. [Google Scholar] [CrossRef]
Telatar, E. Capacity of multi-antenna Gaussian channels. Eur. Trans. Telecommun. 1999, 10, 585–595. [Google Scholar] [CrossRef]
Perahia, E. IEEE 802.11n development: History, process, and technology. IEEE Commun. Mag. 2008, 46, 48–55. [Google Scholar] [CrossRef]
Li, Q.; Li, G.; Lee, W.; Lee, M.I.; Mazzarese, D.; Clerckx, B.; Li, Z. MIMO techniques in WiMAX and LTE: A future overview. IEEE Commun. Mag. 2010, 48, 86–92. [Google Scholar] [CrossRef]
IEEE. Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, Amendment 4: Enhancement for Very High Throughput for Operations in Bands Below 6 GHz; IEEE P802.11ac/D3.0; IEEE: Piscataway Township, NJ, USA, 2012. [Google Scholar]
Shen, Y.; Martinez, E. WiMAX Channel Estimation: Algorithms and Implementations; Application Note, Freescale; Scientific Research Publishing Inc.: Glendale CA, USA, 2007. [Google Scholar]
Biguesh, M.; Gershman, A.B. Training based MIMO channel estimation: A study of estimator tradeoffs and optimal training signals. IEEE Trans. Signal Process. 2006, 54, 884–893. [Google Scholar] [CrossRef]
Wong, T.F.; Park, B. Training sequence optimization in MIMO systems with colored interference. IEEE Trans. Commun. 2004, 52, 1939–1947. [Google Scholar] [CrossRef]
Liu, Y.; Wong, T.; Hager, W. Training signal design for estimation of correlated MIMO channels with colored interference. IEEE Trans. Signal Process. 2007, 55, 1486–1497. [Google Scholar] [CrossRef]
Katselis, D.; Kofidis, E.; Theodoridis, S. On training optimization for estimation of correlated MIMO channels in the presence of multiuser interference. IEEE Trans. Signal Process. 2008, 56, 4892–4904. [Google Scholar] [CrossRef]
Björnson, E.; Ottersten, B. A framework for training-based estimation in arbitrarily correlated Rician MIMO channels with Rician disturbance. IEEE Trans. Signal Process. 2010, 58, 1807–1820. [Google Scholar] [CrossRef]
Biguesh, S.S.M.; Gazor, M. Optimal training sequence for MIMO wireless systems in colored environments. IEEE Trans. Signal Process. 2009, 57, 3144–3153. [Google Scholar] [CrossRef]
Love, D.J.; Heath, R.W., Jr.; Santipach, W.; Honig, M.L. What is the value of limited feedback for MIMO channels? IEEE Commun. Mag. 2004, 42, 54–59. [Google Scholar] [CrossRef]
Kotecha, J.; Sayeed, A. Transmit signal design for optimal estimation of correlated MIMO channels. IEEE Trans. Signal Process. 2004, 52, 546–557. [Google Scholar] [CrossRef]
Pascual-Iserte, A.; Palomar, D.P.; Perez-Neira, A.I.; Lagunas, M.A. A robust maximin approach for MIMO communications with imperfect channel state information based on convex optimization. IEEE Trans. Signal Process. 2006, 54, 346–360. [Google Scholar] [CrossRef]
Vucic, N.; Boche, H.; Shi, S. Robust transceiver optimization in downlink multiuser MIMO systems. IEEE Trans. Signal Process. 2009, 57, 3576–3587. [Google Scholar] [CrossRef]
Botros, M.; Davidson, T.N. Convex conic formulations of robust downlink precoder designs with quality of service constraints. IEEE Sel. Top. Signal Process. 2007, 1, 714–724. [Google Scholar]
Chiang, C.-T.; Fung, C.C. Robust training sequence design for spatially correlated MIMO channel estimation. IEEE Trans. Veh. Technol. 2011, 60, 2882–2894. [Google Scholar] [CrossRef]
Shariati, N.; Wang, J.; Bengtsson, M. Robust Training Sequence Design for Correlated MIMO Channel Estimation. IEEE Trans. Signal Process. 2014, 62, 107–120. [Google Scholar] [CrossRef]
Shariati, N.; Bengtsson, M. Robust training sequence design for spatially correlated MIMO channels and arbitrary colored disturbance. In Proceedings of the 2011 IEEE 22nd International Symposium on Personal, Indoor and Mobile Radio Communications, Toronto, ON, Canada, 11–14 September 2011; pp. 1939–1943. [Google Scholar]
Spencer, Q.H.; Peel, C.B.; Swindlehurst, A.L.; Haardt, M. An introduction to the multi-user MIMO downlink. IEEE Commun. Mag. 2004, 42, 60–67. [Google Scholar] [CrossRef]
Camp, J.D.; Knightly, E.W. The IEEE 802.11s Extended Service Set Mesh Networking Standard. IEEE Commun. Mag. 2008, 46, 120–126. [Google Scholar] [CrossRef]
Sun, W.; Choi, M.; Choi, S. IEEE 802.11 ah: A long range 802.11 WLAN at sub 1 GHz. J. Ict Stand. 2013, 1, 83–107. [Google Scholar]
Peters, S.W.; Heath, R.W. The future of WiMAX: Multihop relaying with IEEE 802.16j. IEEE Commun. Mag. 2009, 47, 104–111. [Google Scholar] [CrossRef]
Biglieri, E.; Calderbank, R.; Constantinides, A.; Goldsmith, A.; Paulraj, A.; Poor, H.V. MIMO Wireless Communications; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Kay, S.M. Fundamentals of Statistical Signal Processing, Vol. I: Estimation Theory; Prentice-Hall: Englewood Cliffs, NJ, USA, 1993. [Google Scholar]
Shariati, N.; Bjornson, E.; Bengtsson, M.; Debbah, M. Low-complexity polynomial channel estimation in large-scale MIMO with arbitrary statistics. IEEE J. Sel. Top. Signal Process. 2014, 8, 815–830. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
Boyd, S. Lecture Notes for EE364B: Convex Optimization II. 2007. Available online: http://www.stanford.edu/class/ee364b (accessed on 6 May 2025).
Matlab Software for Disciplined Convex Programming, Version 2.0; CVX Research, Inc.: Austin, TX USA, 2012. Available online: http://cvxr.com/cvx (accessed on 6 May 2025).
Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
Marshall, A.W.; Olkin, I. Inequalities: Theory of Majorization and Its Applications; Academic: New York, NY, USA, 1979. [Google Scholar]
Luo, Z.-Q.; Yu, W. An introduction to convex optimization for communications and signal processing. IEEE J. Sel. Areas Commun. 2006, 24, 1426–1438. [Google Scholar]
Palomar, D.P.; Fonollosa, R. Practical algorithms for a family of waterfilling solutions. IEEE Trans. Signal Process. 2005, 53, 686–695. [Google Scholar] [CrossRef]
Svensson, L.; Lundberg, M. On posterior distributions for signals in Gaussian noise with unknown covariance matrix. IEEE Trans. Signal Process. 2005, 53, 3554–3571. [Google Scholar] [CrossRef]
Tiao, G.C.; Zellner, A. On the Bayesian estimation of multivariate regression. J. R. Stat. Soc. Ser. B 1964, 26, 277–285. [Google Scholar] [CrossRef]
Shiu, D.; Foschini, G.J.; Gans, M.J.; Kahn, J.M. Fading correlation and its effect on the capacity of multielement antenna systems. IEEE Trans. Commun. 2002, 48, 502–513. [Google Scholar] [CrossRef]
Loyka, S.L. Channel capacity of MIMO architecture using the exponential correlation matrix. IEEE Commun. Lett. 2001, 5, 369–371. [Google Scholar] [CrossRef]
Tarokh, V.; Jafarkhani, H.; Calderbank, A.R. Space-time block codes from orthogonal designs. IEEE Trans. Inform. Theory 1999, 45, 1456–1467. [Google Scholar] [CrossRef]
Tse, D.N.C.; Viswanath, P. Fundamentals of Wireless Communications; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Strang, G. Linear Algebra and Its Applications; Thomson Brooks/Cole: Boston, MA, USA, 2006. [Google Scholar]

Figure 1. An illustrative example of 3MIMO system under consideration, to which the proposed scheme is applicable.

Figure 2. Minimum MSE

\bar{ζ}

and its upper bound versus

P_{T}

for various values of

M_{t}

. The actual values of

\bar{ζ}

are indicated by the real curves and the upper bounds are indicated by the dashed-dot curves.

Figure 2. Minimum MSE

\bar{ζ}

and its upper bound versus

P_{T}

for various values of

M_{t}

. The actual values of

\bar{ζ}

are indicated by the real curves and the upper bounds are indicated by the dashed-dot curves.

Figure 3. Worst-case MSE performance comparison of various training schemes for strongly correlated environment when

α_{T} = α_{R} = α_{Q} = 0.3

. The results are shown for the systems with

M_{t} = M_{r} = L = 4

and

M_{t} = M_{r} = L = 8

.

Figure 3. Worst-case MSE performance comparison of various training schemes for strongly correlated environment when

α_{T} = α_{R} = α_{Q} = 0.3

. The results are shown for the systems with

M_{t} = M_{r} = L = 4

and

M_{t} = M_{r} = L = 8

.

Figure 4. Worst-case MSE performance comparison of various training schemes for weakly correlated environment when

α_{T} = α_{R} = α_{Q} = 0.3

. The results are shown for the systems with

M_{t} = M_{r} = L = 4

and

M_{t} = M_{r} = L = 8

.

Figure 4. Worst-case MSE performance comparison of various training schemes for weakly correlated environment when

α_{T} = α_{R} = α_{Q} = 0.3

. The results are shown for the systems with

M_{t} = M_{r} = L = 4

and

M_{t} = M_{r} = L = 8

.

Figure 5. Worst-case MSE performance comparison of various training schemes for different values of the uncertainty parameter

α = α_{T} = α_{R} = α_{Q}

when

M_{t} = M_{r} = L = 4

. The results are shown for

α = 0.1

and

α = 0.6

.

Figure 5. Worst-case MSE performance comparison of various training schemes for different values of the uncertainty parameter

α = α_{T} = α_{R} = α_{Q}

when

M_{t} = M_{r} = L = 4

. The results are shown for

α = 0.1

and

α = 0.6

.

Figure 6. Worst-case MSE performance comparison of various training schemes for different values of the uncertainty parameter

α = α_{T} = α_{R} = α_{Q}

when

M_{t} = M_{r} = L = 6

. The results are shown for

α = 0.1

and

α = 0.6

.

Figure 6. Worst-case MSE performance comparison of various training schemes for different values of the uncertainty parameter

α = α_{T} = α_{R} = α_{Q}

when

M_{t} = M_{r} = L = 6

. The results are shown for

α = 0.1

and

α = 0.6

.

Figure 7. BER performance comparison of various training schemes for the

3 \times 3

MIMO system when the training SIR is set to 15 dB and

α_{T} = α_{R} = α_{Q} = 0.3

. The orthogonal space–time code and the MMSE receiver are used to encode and decode the QPSK-modulated symbols, respectively.

Figure 7. BER performance comparison of various training schemes for the

3 \times 3

MIMO system when the training SIR is set to 15 dB and

α_{T} = α_{R} = α_{Q} = 0.3

. The orthogonal space–time code and the MMSE receiver are used to encode and decode the QPSK-modulated symbols, respectively.

Figure 8. BER performance comparison of various training schemes for the

4 \times 4

MIMO system when the training SIR is set to 15 dB and

α_{T} = α_{R} = α_{Q} = 0.3

. The orthogonal space–time code and the MMSE receiver are used to encode and decode the QPSK-modulated symbols, respectively.

Figure 8. BER performance comparison of various training schemes for the

4 \times 4

MIMO system when the training SIR is set to 15 dB and

α_{T} = α_{R} = α_{Q} = 0.3

. The orthogonal space–time code and the MMSE receiver are used to encode and decode the QPSK-modulated symbols, respectively.

Table 1. Computational complexity comparison.

Method	Computational Complexity	Processing Time (s)
Iterative algorithm in [20]	$O (M_{t}^{3} + L^{3} + (M_{t}^{6.5} + L^{6.5}) N_{iter} \log (1 / ψ))$	$8.8098 \times 10^{5}$
Proposed optimal training scheme	Total $O (M_{t}^{3} + L^{3} + (M_{t}^{3.5} + L^{3.5}) \log (1 / ψ))$	206.4642
Proposed suboptimal training scheme	Total $O (M_{t}^{3} + L^{3})$	68.8214

Table 2. Simulation parameter setup.

System Parameter	Values
Number of transmit antennas, $M_{t}$	${3, 4, 6, 8}$
Number of receive antennas, $M_{r}$	${3, 4, 6, 8}$
Training length, L	${3, 4, 6, 8}$
Number of interferers, K	3
Number of antennas at the kth interferer, $M_{k}$	${3, 4, 6, 8}$
Uncertainty parameters, $α_{T}$ , $α_{R}$ , and $α_{Q}$	${0.1, 0.3, 0.6}$
Angular spreads, $Θ_{T}$ and $Θ_{I, k}$ , $\forall k$	${10^{\circ}, 30^{\circ}}$
Correlation coefficients, $ρ_{τ, k}$ and $ρ_{s, k}$ , $\forall k$	0.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kang, J.-M.; Yun, S. Worst-Case Robust Training Design for Correlated MIMO Channels in the Presence of Colored Interference. Mathematics 2025, 13, 2168. https://doi.org/10.3390/math13132168

AMA Style

Kang J-M, Yun S. Worst-Case Robust Training Design for Correlated MIMO Channels in the Presence of Colored Interference. Mathematics. 2025; 13(13):2168. https://doi.org/10.3390/math13132168

Chicago/Turabian Style

Kang, Jae-Mo, and Sangseok Yun. 2025. "Worst-Case Robust Training Design for Correlated MIMO Channels in the Presence of Colored Interference" Mathematics 13, no. 13: 2168. https://doi.org/10.3390/math13132168

APA Style

Kang, J.-M., & Yun, S. (2025). Worst-Case Robust Training Design for Correlated MIMO Channels in the Presence of Colored Interference. Mathematics, 13(13), 2168. https://doi.org/10.3390/math13132168

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Worst-Case Robust Training Design for Correlated MIMO Channels in the Presence of Colored Interference

Abstract

1. Introduction

1.1. Prior Works and Limitations

1.2. Motivations and Contributions

1.3. Organization and Notation

2. System Model and Problem Formulation

2.1. System Model

2.2. Problem Formulation

3. Training Signal Optimization

3.1. Worst-Case MSE Minimizing Training Structure

3.2. Optimal Power Allocation

3.3. Suboptimal Power Allocation in Closed-Form

3.4. Complexity Comparison

4. Simulation Results

4.1. Simulation Setup

4.2. Performance Comparison

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Theorem 1

Appendix B. Proof of Lemma 1

Appendix C. Proof of Lemma 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI