Multiuser Channels with Statistical CSI at the Transmitter: Fading Channel Alignments and Stochastic Orders, an Overview

: In this overview paper, we introduce an application of stochastic orders in wireless communications. In particular, we show how to use stochastic orders to investigate the ergodic capacity results for fast fading Gaussian memoryless multiuser channels when only the statistics of the channel state information are known at the transmitters (CSIT). In general, the characterization of the capacity region of multiuser channels with only statistical CSIT is open. To attain our goal, in this work we resort to classifying the random channels through their probability distributions by which we can obtain the capacity results. To be more precise, we derive sufﬁcient conditions to attain some information theoretic channel orders such as degraded and very strong interference by exploiting the usual stochastic order and exploiting the same marginal property. After that, we apply the developed scheme to Gaussian interference channels and Gaussian broadcast channels. We also extend the framework to channels with multiple antennas. Possible scenarios for channel enhancement under statistical CSIT are also discussed. Several practical examples such as Rayleigh fading and Nakagami-m fading, etc., illustrate the application of the derived results.


Introduction
Ordering plays a fundamental role in mathematics, science, engineering, finance, etc.The most well known and commonly used one is the trichotomy order, in which we compare real values.In the applications of wireless communications, when there is perfect channel state information at the transmitter (CSIT), it is known that for some Gaussian multiuser (MU) channels the capacity results can be attained due to the capability of ordering the quality of channels among different users.For example, capacity regions/secrecy capacity of degraded broadcast channels (BC) and wiretap channels (WTC) (even for multiple-antenna cases) [1], and also the sum capacity of low-interference regime [2] and capacity regions for some cases of IC such as strong IC [3][4][5] and very strong IC [4] are derived.When fading effects of wireless channels are taken into account, if there is perfect CSIT, some of the above capacity results still hold with an additional operation of taking an average over the capacity (region) with respect to fading channels.For example, in [6], the ergodic secrecy capacity of Gaussian WTC is derived; in [7], the ergodic capacity regions are derived for ergodic very strong and uniformly strong (each realization of the fading process is a strong interference channel) of the Gaussian IC.Notably, the orders in the above scenarios are all trichotomy orders.
Due to several practical limitations, for example, the finite bandwidth of feedback channels, a delay caused by channel estimation, etc., the transmitter may not be able to track channel realizations precisely instantaneously if they vary rapidly.Thus, for fast fading channels, it is more practical to consider the case with only partial CSIT of the legitimate channel, where statistical CSIT is one of the most commonly considered one [8,9].However, when there is only statistical CSIT (In this paper, statistical CSIT and no CSIT will be used interchangeably.),there are only few known capacity results, such as the layered BC [10], the binary fading interference channel [11], the one-sided layered IC [12], Gaussian WTC [13,14], and layered WTC [14], etc.This is because, when the transmitter has imperfect CSIT, e.g., only the statistics of the channels, the comparison of channels cannot be attained directly by trichotomy law due to the random characteristic of the fading channels.Then the optimal strategy for the transmission, including the design of the codebook, channel input distribution, resource allocation, etc., can not be easily done.In particular, those channel orders commonly used in information theory include degraded, less noisy, and more capable [9] in BC and WTC or the strong and very strong in IC [9] depends on the knowledge of CSIT.Note that by these information theoretic orders, we can highly simplify the functional optimization with respect to the channel input distribution and/or channel prefixing.
To consider an MU-channel in which the transmitters only know the distributions of the channels but not the realizations, we may ask the following questions: is it possible to compare the channel qualities just according to their distributions?If yes, how to do it?How to derive the capacity region under such comparison of channel qualities?In this work we resort to partly solve these problems by classifying the random channels into stochastically orderable and its complement.In particular, an MU channel with orderable random channels means that there exists an equivalent channel in which we can reorder the channel realizations among different transmitter-receiver pairs in the desired manner.More specifically, we resort to finding a subset of all fading channel tuples, namely, A, which should possess the following properties: • It allows the existence of a corresponding set B in which the channels follow a certain order, e.g., trichotomy order.• It encompasses a constructive way (or easy) to find a transformation f : a → b, a ∈ A, b ∈ B.
• Capacity results are attainable.
Taking the BC as an example, an orderable two-user BC means that under the same noise distributions at the two receivers, in the equivalent BC, one channel strength is always stronger or weaker than the other for all fading states.The main tool we use for this channel classification and ordering is stochastic orders [15] from probability theory, combined with the same marginal property [16] from information theory.The stochastic orders have been widely used in the last several decades in diverse areas of probability and statistics such as reliability theory, queueing theory, and operations research, etc., see [15] and references therein.Different stochastic orders such as the usual stochastic order, the convex order, and the increasing convex order can help us to identify the location, dispersion, or both location and dispersion of random variables, respectively.Smartly choosing a proper stochastic order to compare the channels with statistical CSIT allows us to form an equivalent channel in which realizations of channel gains are ordered in a desired manner.Then we are able to derive the capacity regions of the equivalent MU channel, which is simpler than directly considering the original channel.Note that in [17] the authors also consider stochastic orders for fading channels.However, there is no such alignment concept by constructing an equivalent channel in [17] and hence the relation between the same marginal property and stochastic orders is completely not investigated there.In contrast, they discuss the stochastic dominance between fading channels by Shannon transform order.Stochastic orders are also used in stochastic geometry to analyze the performance of a random network [18].Similarly, the inter-relation between the information theoretic channel orders and probabilistic orders are not addressed in [18].
The main issues discussed by this overview paper are summarized as follows.
• We classify fading MU channels such that we can characterize the capacity results of some memoryless Gaussian MU channels under statistical CSIT.To achieve it, we combine the concept of usual stochastic order and also the same marginal property, i.e., an intrinsic property of some MU channels.Intuitively, by doing so we can align the realizations of the fading channel gains between different users in an identical trichotomy order over time in an equivalent channel.
• We then apply the proposed scheme to characterize the capacity regions of Gaussian IC and BC, which is novel in the literature.• We further extend the framework to channels with multiple antennas.Applicable scenarios for channel enhancement scheme under statistical CSIT, which is originally for channels with perfect CSIT, are also discussed.
• Several examples with practical channel distributions are illustrated to show the usage scenarios of the developed framework.
Notation: Upper case normal/bold letters denote random variables/random vectors (or matrices), which will be defined when they are first mentioned; lower case bold letters denote vectors.The statistical expectation is denoted by E [.].The mutual information between two random variables X and Y is denoted by I(X; Y).The complementary cumulative density function (CCDF) is denoted by FX (x) = 1 − F X (x), where F X (x) is the CDF of X.In addition, we denote the probability mass function (PMF) by p and the probability density function (PDF) by f .X ∼ F denotes that the random variable X follows the distribution F. Markov chain relation between X, Y, and Z is described by X − Y − Z. Unif(a, b) denotes the uniform distribution between a and b.Z + = {0, N}.The indicator function is denoted by 1 {.} .The supports of a function f and a random variable X are respectively denoted by supp( f ) and supp(X).The logarithms used in the paper are all of base 2. We denote C(P) = log(1 + P).We denote the equality in distribution by = d .
The remainder of the paper is organized as follows.In Section 2, we introduce the background knowledge and preliminaries.In Section 3, we formulate a problem and propose a framework to solve it.In Section 4 we apply the tools developed in Section 3 to fast fading Gaussian interference channels and broadcast channels with statistical CSIT.In Section 5 we consider channels with multiple antennas.In Section 6, an application of Laplace transform order on solving a power allocation problem is reviewed.Finally, Section 7 concludes the paper.

Preliminaries
In this section, some important properties and definitions for deriving the main results of this work will be introduced, including the same marginal properties, degradedness, and the usual stochastic orders.

Same Marginal Property
The same marginal property plays a crucial role in the proposed channel classification to obtain the capacity results.This is because it provides us the degree of freedom to construct an equivalent channel in the sense that the marginal distributions are the same as the original one, but not the joint distribution.By such a relaxation of considering an equivalent channel, we are able to reorder all channel gains under some conditions.
Two versions of the same marginal property are introduced as follows: Remark 1.By the union bound, the error probability of a channel with multiple-receiver can be upper bounded by the sum of individual error probability.Therefore, the overall error probability approaches zero if the individual error probabilities approach zero, respectively.This fact results in the consequence that only the marginal transition probabilities affect the capacity result, but not the joint one.
Remark 2. Note that since the capacity region of a multiple access channel is determined by p Y|X 1 , X 2 , the technique of reordering random fading channels developed in this paper can be useful to simplify the proof for the GMAC with statistical CSIT.
For channels with a single transmitter and multiple receivers, e.g., BC or WTC, we can get Theorem 1 from Theorem 2 by removing X 1 or X 2 .

Information Theoretical Orders for Memoryless Channels and Stochastic Orders
The main task in this paper is on ordering channels.Here we introduce several important definitions describing the relation of reception qualities among different receivers from an information theoretic to the probabilistic point of view.Definition 1.A channel with two non-cooperative receivers and one transmitter is physically degraded if the transition distribution satisfies The channel is stochastically degraded if its conditional marginal distribution is the same as that of a physically degraded channel, i.e., there exists a Denote the fading channel gains in AWGN channels from the transmitter to the first and second receivers by H 1 and H 2 , respectively.Define a set of tuples of random channels H 10 = {(H 1 , H 2 )} and also a set Recall that 1 {(1)} = 1 means (1) is true.In the following, we call a stochastically degraded channel simply a degraded channel due to the same marginal property.Note that discussions on the relation between degradedness and other information theoretic channel orders can be referred to [19][20][21].Definition 2. A discrete memoryless-IC is said to have very strong interference if After information theoretic orders, we introduce some important definitions of stochastic orders, which are the underlying tools in this paper.

Definition 3 ([15]
).For given random variables X and Y, the usual stochastic order (st), the convex order (cx), the concave order (cv), the increasing convex order (icx), the increasing concave order (icv), and the Laplace transform order (Lt) are respectively defined as follows: Note that the stochastic orders in Definition 3 can be further represented by the following relations, which are more easily evaluated.

Theorem 3 ([15]
).For random variables X and Y, X ≤ st Y if and only if FX (x) ≤ FY (x) for all x, and X ≤ icx Y if and only if for all t.Moreover, X ≤ cx Y if and only if (4) is valid for all t and for all t.Finally, X ≤ cv Y if and only if (5) is valid for all t and Note that when X and Y are nonnegative, the condition [22].Compared with the original expectation, the integral form of the CCDFs, which unifies the expression of the considered stochastic orders as functions of the CCDFs only, highly simplifies the following derivations.The relation between the aforementioned stochastic orders can be seen from the Venn diagram in Figure 1.By Definition 3, the constraint to be fulfilled by the Laplace transform order is the least restrict, so pairs of random variables belong to the concave order, increasing concave order, or the usual stochastic order, must also belong to the Laplace transform order.Note that the intersection between the concave order and the usual stochastic order happens only when the distributions of the two random variables are identical, which can be easily seen from the constraint E[X] = E[Y] due to the concave order.In the following sections, we will develop our results mainly based on the usual stochastic order and also the Laplace transform order.Due to the indirect relation to wireless channels, we do not discuss stochastic orders such as convex/concave and increasing convex/concave orders here.Some few discussions can be referred to [14].

Main Results
In this section, we first formulate a problem to classify fading channels under which we can characterize the capacity results when only the statistical CSIT is available.Then, we develop a general framework in order to partly solve the formulated problem.After that, we exploit the framework to analyze the performances of several important multi-user additive white Gaussian noises (AWGN) channels including the interference channel, broadcast channel and an extension to channels with multiple antennas.

Problem Formulation
In the following, we use two simple examples to show the difficulty of the comparison in the considered scenario.In Figure 2a, the supports of the distributions of two channels are non-overlapping.Therefore, even though there is only statistical CSIT, we know that channel 2 (with PDF f 2 ) is always stronger than channel 1 (with PDF f 1 ).In contrast, in Figure 2b, the supports of the two channel distributions overlap.Intuitively, the transmitter is not able to distinguish the stronger channel just based on the channel statistics.This is because, due to the overlapping part of the PDF's, the order of the channel realizations H 1 = h 1 and H 2 = h 2 may alter for each realization.For example, in the current sample we may have h 1 = 3.1 > h 2 = 1.7 but in the next sample we may have h 1 = 2.3 < h 2 = 4.9.Then for samples within a codeword length, there is no fixed order between the two channels.
Two examples of relations between two fading channels: (a) H 2 is always stronger than We formulate the problem as finding the tuple (A, f , B) such that the capacity results in the subset B {H 1 , H 2 , H 3 } are solvable, where A is a subset of the tuples of all random channels with a nice property, i.e., there exists a mapping f from a ∈ A to b ∈ B. Figure 3 illustrates the problem formulation.Note that for each realization of b, the capacity region is provable.

All random fading channels
For each realization of b, the capacity results are known

The Proposed Framework
In the following, we summarize two schemes to compare channel qualities for the case in Figure 2b, when the transmitter has only statistical CSIT.In fact, these schemes are to construct a new joint distribution which allows us to align/re-order the realizations of different fading channels in a fixed order, while the marginal distributions are not changed.

Coupling
In this method, we resort to constructing an explicit coupling such that each realization of a channel pair follows the same trichotomy order.To proceed, we first introduce the necessary Definition from [23].

Definition 4 ([23]
).The pair ( X, Ỹ) is a coupling of the random variables (X, Y) if X = d X and Ỹ = d Y.
Proof.From coupling theorem [24] we know that, H 1 ≤ st H 2 if and only if there exist random variables H1 = d H 1 and H2 = d H 2 such that H1 ≤ H2 with probability 1, where the equivalent channels H1 and H2 can be constructed by H1 = F −1 H 1 (U) and H2 = F −1 H 2 (U), respectively, where U ∼ Unif(0, 1).Therefore, by construction we know that distributions of H1 and H2 fulfill the same marginal property.In addition, the trichotomy relation H1 ≤ H2 fulfills ( H1 , H2 ) ∈ B, by construction of coupling.Remark 3. Originally, even though we know H 1 ≤ st H 2 , the order of the channel realizations H 1 = h 1 and H 2 = h 2 may vary over time, then we may not able to claim that one channel is stronger than the other from the scope of a codeword length.However, from Theorem 4 we know that there exists an equivalent channel by which we can explicitly align all the channel realizations within a codeword length such that each channel gain realization of H 1 is no worse than that of H 2 , if H 1 ≤ st H 2 .

Constructing a New Joint Distribution
In addition to the coupling scheme, we can also directly construct a joint distribution between the fading channels.By this way, we still can align each realization of channel pair in the same trichotomy order for all channel realizations within a codeword.
More specifically, we can design the function f by constructing a joint complementary CDF (CCDF) as follows [10,14]: where FX,Y (x, y) P(X ≥ x, Y ≥ y), from which it is clear that the marginal distributions are unchanged, i.e., F H1 ( h1 ) = F H1 , H2 ( h1 , 0) = FH 1 ( h1 ) and F H2 ( h1 ) = F H1 , H2 (0, h2 ) = F H2 ( h2 ).With the selection A = {(H 1 , H 2 ) : H 1 ≤ st H 2 }, by the Definition of joint probability, we can prove that h1 ≤ h2 , ∀(H 1 , H 2 ) ∈ B = f (A).In particular, assume h1 > h2 + , > 0, we can prove that where (a) follows from the Definition of the joint CCDF and (b) follows from (7) with the given property To ensure that H1 ≤ H2 for all random samples, we let → 0. Thus, as long as H 1 ≤ st H 2 , we can also form an equivalent channel that has the same marginal distribution as the original one, such that the capacity is unchanged.Further discussion can be referred to [10,14].
Remark 4. Note that the selection of A in Theorem 4 or in Section 3.2.2can be easily extended to cases with K receivers by the concatenation More discussions on the relation between the above two schemes and also the relation to copulas [25], please refer to [26].In the following, we will use the coupling scheme introduced in Theorem 4 for MU channels due to its intuitive characteristics.

Applications on Gaussian MU Channels with a Single Antenna at Each Node
In this section, we will use interference channels and broadcast channels as examples to show how to apply the developed scheme.All channels in this section are assumed memoryless.All nodes are equipped with a single antenna.

Fading Gaussian Interference Channels with No CSIT
When there is only statistical CSI at the transmitter and full CSI at the receiver, the ergodic capacity region is unknown in general.In this section, we identify the sufficient condition to attain the capacity region of Gaussian interference channel with very strong interferences.Practical examples illustrate the results.
The considered received signals of a two-user fast fading Gaussian interference channel can be stated as where H kj and Φ kj are real-valued non-negative independent random variables denoting the absolute square and the phase of the fading channel between the j-th transmitter to the k-th receiver, respectively, where k, j ∈ {1, 2}.The CCDF of H kj is denoted by FH kj .The channel inputs at the transmitters 1 and 2 are denoted by X 1 and X 2 , respectively.We consider the channel input power constraint as Noises Z 1 and Z 2 at the corresponding receivers are independent circularly symmetric AWGN with zero mean and unit variance.We assume that the transmitters only know the statistics but not the instantaneous realizations of {H kj }.We also assume that each receiver knows the two channels to itself.Since we consider the case with only statistical CSIT, the channel input signals are not functions of the channel realizations.Thus, without loss of generality, we assume {H kj }, {Φ kj }, {X j }, and {Z k } are jointly independent.
In the following derivation, we do not exploit the commonly used standard form of GIC.This is because that the normalization operation in the standard form results in a ratio of random variables, whose distribution may not be easy to derive and thus hinders us on identifying all channels easily.
The ergodic capacity region of a very strong interference channel is as follows.
then the following gives the ergodic capacity region of a very strong GIC with no CSIT Proof sketch: We first derive the ergodic capacity region of GIC with very strong interference with no CSIT and then we derive the sufficient condition to achieve the derived capacity region.
For the first part, we can solve the optimal input distributions of GIC with no CSIT by considering arg max where µ ∈ R + .Note that (13) can be further expressed as It is clear that for each µ, (14) can be maximized by Gaussian inputs, i.e., X 1 ∼ CN (0, P 1 ) and X 2 ∼ CN (0, P 2 ).Then the capacity region can be delineated by (12).
In the second part, from (2) we can derive After comparing (15) 16) and ( 17) to the stochastic case as ≥ st H 11 , and Note that we can easily check that the marginal distributions are intact as shown in Theorem 2 after applying the construction by Theorem 4, which completes the proof.
Remark 5.By the considered scenarios of very strong interference channels, we can decouple the effect of the other user to single user capacity constraints, such that we are able to prove that Gaussian input can optimize the capacity region.

Examples
In this subsection, we provide examples to show the scenarios that the sufficient condition in Theorem 5 is feasible and leads to a situation which occurs in wireless communications.
Example 1: In this case, to proceed, we find the distributions of the two ratios of random variables in (11).We can first rearrange H 21 /(1 + P 2 H 22 ) in the ratio of quadratic form as where From [27] we know that the CDF of the RHS of ( 19) can be calculated by the following numerically.In the comparison, we fix the variances of the cross channels as c σ 2 12 = σ 2 21 = 1 and the transmit powers P P 1 = P 2 = 1, and scan the variances of the dedicated channels by a σ 2 11 = σ 2 22 = 0.1, 0.3, 0.5 and 0.7.Since the conditions in (11) are symmetric and the considered settings are symmetric, once the first condition in ( 11) is valid, the second one will be automatically valid.The results are shown in Figure 4, from which we can observe that when a increases, the difference of the CCDFs decreases.This is because the support of the CCDF of H 11 increases with increasing a and in the considered case when a = 0.  (11).In contrast, the values a = 0.1, 0.3 and 0.5 satisfy (11).

Preliminaries and Results
If a 2-receiver BC is degraded, we know that the capacity region [9] is the union, over all U, X satisfying the Markov chain for some p(u, x), where the cardinality of the auxiliary random variable In contrast, for non-degraded BC, only the inner [28] and outer bounds [29] are known.Therefore it is much easier to characterize the performance of a BC if we can identify its degradedness.
The capacity region of a Gaussian BC (GBC) is known for both cases with fixed and fading channels when there are perfect CSIT and CSIR.For multiple antennas GBC with perfect CSIT and CSIR, Ref. [30] invented the channel enhancement technique to form the degradedness and it is proved that Gaussian input is optimal.Immense endeavors have been made to find the optimal input covariance matrix of the Gaussian input, e.g., [31][32][33].However, the problem is open in general when there is imperfect CSIT and only limited cases are known [10].The fading BC with only perfect CSIR but imperfect CSIT lacks the degraded structure in general for arbitrary fading distributions, which makes it a challenging problem.In particular, the order of channel realizations to different receivers in fast fading broadcast channels vary within a codeword length.Therefore, intuitively we are not able to compare the channels as in full CSIT cases to identify the degradedness.
We assume that there is full CSIR such that the receivers can compensate the phase rotation of their own channels, respectively, without changing the capacity to form real channels.Therefore, the signal of receiver k of the considered L-receiver fast fading Gaussian broadcast channel can be equivalently stated as where X is the channel input, H k is a real-valued non-negative independent random variable denoting the square of receiver k's fading channel with CCDF FH k .We consider the channel input power constraint as E[X 2 ] ≤ P T .The noises {Z k } at the corresponding receivers are independent AWGN with zero mean and unit variance.We assume that the transmitter only knows the statistics but not the instantaneous realizations of {H k }.
The following result can be specialized from Remark 4.
Remark 6.An important note is that degradedness does not guarantee the optimality of fading GBC's with no CSIT.In [34] it is shown that with statistical CSIT, Gaussian is not optimal in general, by local perturbation of Gaussian channel input.
Now we can further generalize Corollary 1 to the case in which fading channels are formed by clusters of scatterers.In particular, we consider the case in which we are only provided the information of each cluster but not the superimposed result as √ H k in (21).Therefore, phases of channels of each cluster should be taken into account, i.e., we consider the k-th clusters of the first and second users as H1k = H1k,Re + i • H1k,Im = √ H 1k e −iφ 1k and H2k = H2k,Re + i • H2k,Im = √ H 2k e −iφ 2k , respectively.The received signals at receivers 1 and 2 for M clusters are respectively expressed as Corollary 2. Let M be the number of clusters of scatterers for both users 1 and 2, and denote the channels of users 1's and 2's kth clusters as H1k and H2k , respectively, where k = 1, • • • , M. The broadcast channel is degraded if user 1's scatterers are stronger than those of user 2's in the sense that Hπ 1k ,Re Hπ 1j ,Re ≥ st Hπ 2k ,Re Hπ 2j ,Re , and Hπ 1k ,Im Hπ 1j ,Im ≥ st Hπ 2k ,Im Hπ 2j ,Im , ∀k, j ∈ {1, • • • , M}, for some permutation π 1 and π 2 of users 1's and 2's clusters.
The condition of the number of clusters of channels 1 and 2 in Corollary 2 can be relaxed to two non-negative integer-valued random variables N and M, respectively, and the degradedness among the two channels is still valid, if H2k | 2 and if N ≥ st M, which can be proved with the aid of Theorem 1.A.4 in [15].

Example
Assume the magnitudes of a three-receiver GBC are independent Nakagami-m random variables with shape parameters m 1 , m 2 , and m 3 , and spread parameters w 1 , w 2 , and w 3 , respectively.From Corollary 1 we know that the broadcast channel is degraded if where γ(s, x) = x 0 t s−1 e −t dt is the incomplete gamma function and Γ(s) = ∞ 0 t s−1 e −t dt is the ordinary gamma function [35].An example satisfying the above inequality is (m 1 , w 1 ) = (1, 3), (m 2 , w 2 ) = (1, 2) and (m 3 , w 3 ) = (0.5, 1).Remark 7. The developed scheme for the MU channels can be easily extended to those with secrecy constraint from physical layer security.For example, wiretap channels (WTC) [36], broadcast channels with confidential messages (BCCM) [37], interference channels with secrecy constraint, etc.Compared to the discussion in Remark 1, the main difference to systems with secrecy constraint is that, here, we need to additionally check the validity of the same marginal property for related terms in the secrecy constraint, e.g., the strong or weak secrecy constraints, i.e., H(W i |Y j ) or 1 n H(W i |Y j ), i = j.More specifically, by Definition of the conditional entropy, we can easily observe that these entropies only depend on p W i ,Y j , i = j, but not depend on p W i ,Y i , Y j .Therefore, only marginal distributions affect the performances of channels with the additional secrecy constraint.So the developed scheme works for these cases.
Remark 8.Note that for either GBC, GWTC, or GBC-CM, we can also use the developed scheme to check the degradedness.However, the degradedness for characterizing the performances GWTC and GBC-CM are used in a converse way.For a degraded GWTC, we can identify that Gaussian input is optimal and also the non-zero secrecy capacity.In contrast, for a two-user GBC-CM if it is degraded, then one received signal is always weaker than the other one.This means that the secrecy capacity region will degenerate to the secrecy capacity, i.e., the secrecy capacity region does not exist.It is because, for GBC-CM, both receivers are legitimate receivers as well as the eavesdroppers, simultaneously.Therefore, the degraded cases of GBC-CM should be avoided.

Extension to Multiple Antenna Cases
In this section we consider multiple-antenna at both channel input and output.We assume all nodes are equipped with the same number of antennas n T .In the following we discuss cases with only one transmitter, e.g., GBC, GWTC, etc.The results can be easily extended to cases with multiple transmitters.Again the signals at receivers 1 and 2 can be respectively expressed as where Z 1 ∼ CN(0, I n T ) and Z 2 ∼ CN(0, I n T ), H 1 and H 2 ∈ C n T ×n T with entries varying over each code symbol.For the (fading) multiple-antenna cases, the description of the channels is by (random) matrices.How to order random matrices or which part of the matrices to be ordered is critical to be clarified.In the following we provide two methods to deal with this problem, including the aforementioned coupling scheme and the channel enhancement scheme.

Alignment by Usual Stochastic Order
From random matrix theory we know that the probability of a random matrix to be full rank approaches 1.In addition to the assumption of full channel state information at the receiver (CSIR), we can construct an alternative channel which is full rank and does not change the capacity [1] by normalizing ( 22) and ( 23) equivalently as where For full CSIT and full CSIR cases, to make the Markov chain X → Y 1 → Y 2 valid, i.e., to obtain a (stochastically) degraded wiretap channel, the constraint B − A 0 is sufficient (The reason that it is not necessary is, we may be able to use the channel enhancement scheme to obtain a degraded channel with B A).In the considered scenario we have full CSIR but only statistical CSIT.So we aim to construct an equivalent degraded channel by showing P(B − A 0) = 1 according to the coupling.In the following we will find the relation of the degradedness and the stochastic order among the eigenvalues of A and B. Note that in [15] the usual stochastic order in a vector (but not matrix) version is considered, where in the expression of vec(B) ≤ st vec(A), the inequality is element-wise, i.e., b i ≤ a i , ∀ i, for P(vec(B ) ≤ vec(A )) = 1.However, we can not directly apply the multivariate usual stochastic order to our scenario.This is because that it will not guarantee the positive definiteness of B − A, which is required for the degradedness.Instead, it is sufficient to check the stochastic order of the eigenvalues of A − B, namely, Λ B ≥ st Λ A , to attain the existence of A and B in an equivalent channel, such that P(B − A 0) = 1 after using coupling.We first transform B − A 0 into a form that we can simply connect it to the eigenvalues with the aid of the following lemmas.Lemma 1 ([38]).Let Y 0 and Hermitian, and X 0 and Hermitian.Y − X 0 if and only if the eigenvalues of XY −1 all satisfy λ i ≤ 1.
We then use the following Lemma to connect the eigenvalues of XY −1 to those of X and Y. Lemma 2 ([39]).If X and Y are n × n positive semidefinite Hermitian matrices, then Then from Lemmas 1 and 2 we can derive the following theorem.Theorem 6.A sufficient condition to have a degraded multiple-antenna channel Remark 9. To have a degraded channel, ( 27) is a strict condition to satisfy.The reasons are (1) the condition A B may not be necessary for the existence of a degraded channel.More specifically, for the full CSIT case, for arbitrary covariance matrices A and B, Ref. [1] proves that such channel can be transformed into a degraded one by the channel enhancement technique; (2) the usual stochastic ordering is sufficient but may not necessary, which can be seen from the SISOSE case [14].
Remark 10.Note that the fast fading channel with only statistical CSIT can be verified as a degraded one, if, there exists A and B such that A B for each channel realization, where A and B are the covariance matrices of the equivalent noises at receivers 1 and 2. Then by Proposition 1 in [40] we know that solving the optimal input covariance matrix for a GWTC is a convex problem.For full CSIT cases we can use convex optimization tools to solve it numerically or some partial analytical results can be seen in [40,41], etc.
In the following, we show a sufficient condition for channels with additional assumptions to have a degraded channel.Note that due to the additional assumptions, we can derive a less stringent sufficient condition than that in Theorem 6.
1 and U 2 D 2 V H 2 be the singular value decompositions of H 1 and H 2 , respectively.Assume that V 1 is independent to D 1 and U 1 , and V 2 is independent to D 2 and U 2 .Also assume that V H 1 and V H 2 have the same distribution.If D 1 ≥ st D 2 , then there exists an equivalently degraded channel.
Proof.To proceed, we form a new 1st user's channel as We can check that the PDF of H 1 is the same as that of H 1 by the following where (a) is due to the following.To calculate the distributions of By (28) and the same marginal property, we know that this new channel with H 1 being the new 1st user's channel has the same ergodic secrecy capacity as that with H 1 .Then following the same steps in Theorem 6, we can complete the proof.
Remark 11.Channel matrices with i.i.d.Gaussian entries are valid for the requirement in Theorem 7, i.e., V 1 is independent to D 1 and U 1 .In particular, we can apply the LQ decomposition (LQD) on those channel matrices to get the right singular vector V 1 and V 2 which are the Q matrices of the LQD and are independent of the L matrix [42].In addition, the random matrix Q follows the isotropic distribution (i.d.) with PDF [43] f where Γ is the gamma function and δ is the delta function.
In the following, we consider another condition on channel matrices with a special structure.We can prove that if the channels can be decomposed into i.d.unitary matrices, the channel is equivalent to a degraded one.
Proof.We can form an equivalent channel as where Z 1 ∼ CN(0, Σ −1 1 ) and Z 2 ∼ CN(0, Σ −1 2 ).After applying the eigenvalue decomposition, the covariance matrices of Z 1 and Z 2 can be expressed as U 1 D 1 U H 1 and U 2 D 2 U H 2 , respectively, with D 2 D 1 , by the monotonicity theorem (Theorem 8.4.9 in [44]).Now let Z 2 ∼ N(0, U 2 (D 2 − D 1 )U H 2 ) which is independent of Z 1 and Z 2 .We can form another equivalent channel at Eve as where in (a) W ∼ N(0, I n T ) and in (b) Z 2 ∼ N(0, U 2 D 2 U H 2 ), which has the same distribution as that of Z 2 .From the above it is clear that Ỹ2 is stochastically degraded of Y 1 .Since H 1 is i.d., we know U 2 U H 1 H 1 and H 1 have the same distribution.In addition, since H 1 and H 2 are i.d., we have f Ỹ2 |X = f Y 2 |X .By same marginal property, we know that the equivalent channel Ỹ2 has the same capacity as the original one.Thus, we conclude that the original channel is equivalent to a degraded one with the same secrecy capacity.
Remark 12.The constraint Σ 1 Σ 2 in Theorem 8 may be relaxed to Σ 1 Σ 2 by the deterministic channel enhancement, which will be shown in the next sub-section.

Alignment by Channel Enhancement
In this section, we discuss how to apply the channel enhancement argument [30], which is originally designed for channels with full CSIT, to the considered model where there is only statistical CSIT and the channels are fast faded.
The channel enhancement technique, invented in [30], is a critical technique to prove that Gaussian input is optimal for MIMO Gaussian BC.Later [1] applies this technique to wiretap channels.However, there are major differences on the use of channel enhancement.In MIMO GBC, every sub-channel from the transmitter to each receiver is enhanced by reducing the corresponding noise covariance.In GWTC, however, only those Bob's sub-channels weaker than those of Eve are enhanced to be the same as Eve's.Therefore, an equivalently degraded WTC can be formed.For more detailed discussions please see [1].Note that in both [1,30], perfect CSIT are required.
For fading channels which are not isotropically distributed, we use the following example to show that it is still possible to use channel enhancement to attain the capacity region/secrecy capacity.

Example 2:
For the received signals assume that the fading channel H has realizations {H 0 , {AH 0 } : A ∈ U(n 2 )}, where U(n) is the unitary group with degree n.Then it can be easily seen that we can apply the channel enhancement to the pair of channel realizations (H 0 Σ

Other Stochastic Orders
In this section, we show an application of the Laplace transform order introduced in Definition 3 on solving the resource allocation problem for channels with multiple antennas to attain the capacity result.We consider the wiretap channel as an example.Note that the secrecy capacity of a multiple-antenna Gaussian WTC is a difference of two concave functions with respect to the channel input covariance matrix.Under perfect CSIT assumption, in [45] the authors proved that the maximum secrecy capacity coincides with the saddle point of a min-max problem, which is by considering a Sato-type outer bound setting, i.e., Bob additionally knows what Eve knows, in addition to an additional parameter, i.e., a correlation matrix between the noises at Bob and Eve.Based on the min-max description, [40] then developed an algorithm to numerically solve this problem.An analytical solution is still unknown.In contrast, with statistical CSIT, optimal input distribution is open in general.In the following, we will review the result [13] that under i.i.d.Rayleigh fading, the Laplace transform order combined with completely monotone can help us to prove that uniform power allocation is optimal for multiple-input single-output singe-antenna at eavesdropper GWTC with statistical CSIT of both the legitimate and eavesdropper's channels.
First of all, we want to find the optimal input covariance matrix Σ x by solving: arg max After the derivation in [13] we can transform (32) into the following power allocation problem max D E g log a + g H D * g − E g log 1 + g H D * g , where we denote σ 2 g /σ 2 h by a, which belongs to [0, 1) since we only need to consider the case where σ h > σ g .From Section V in [46], the optimal power allocation D satisfies Tr(D) = P. Then for any D = [d 1 , d 2 , • • • , d N T ] where ∑ N T i=1 d i = P and d i ≥ 0, ∀i.Here we introduce some results from the stochastic ordering theory [15] to proceed.Definition 5 ([15]).A function ψ(x) : [0, ∞) → R is completely monotone if for all x > 0 and n = 0, 1, 2, • • • , its derivative ψ (n) exists and (−1) n ψ (n) (x) ≥ 0. Lemma 3 ([15]).Let B 1 and B 2 be two nonnegative random variables.
If B 1 ≤ LT B 2 then E[ f (B 1 )] ≤ E[ f (B 2 )], where the first derivative of a differentiable function f on [0, ∞) is completely monotone, provided that the expectations exist.
To solve (33), we let B 1 = g H Dg, B 2 = g H D * g, and f (x) = log(a + x) − log(1 + x) to invoke Lemma 3. It can be easily verified that ψ(x), defined as the first derivative of f (x), is completely monotone by checking Definition 5.More specifically, the n-th derivative of ψ meets when x > 0, since a ∈ [0, 1).Now from Lemma 3 and Definition 3, we know that to prove (33) To show that the above is nonnegative, we resort to the majorization theory.Note that ∑ N T k=1 log(1 + σ 2 g ďk s) is a Schur-concave function [39] in ( ď1 , . . ., ďN T ), ∀s > 0, and by Definition of majorization [39], where b ≺ a means that b is majorized by a. Thus, from [39], we know that the RHS of ( 35) is nonnegative, ∀s > 0. Then (33) is valid, and D * is optimal.Note that D * is also the optimal input covariance matrix Σ x since the optimal beamformer U can be selected as I.

Conclusions
In this paper, we investigated the ergodic capacity results of the fast fading Gaussian memoryless multiuser channels when only the statistics of the channel state information are known at the transmitter.To achieve this goal, we resorted to classifying the random channels through their probability distributions, by which we are able to attain the capacity results.In particular, we derived sufficient conditions to attain some information theoretic channel orders such as degraded and very strong interference by combining the usual stochastic order and the same marginal property, such that the capacity regions can be simply characterized, which include Gaussian interference channels and Gaussian broadcast.An extension of the framework to channels with multiple-antenna was also considered.Practical examples illustrated the application of the derived results.

Figure 1 .
Figure 1.Venn diagram of different stochastic orders including Laplace transform order, increasing concave order, concave order, and usual stochastic order.

Figure 3 .
Figure 3.The proposed scheme identifies the ergodic capacity regions under statistical CSIT.

Figure 4 .
Figure 4. Identification of (11) under different variances of the dedicated channels with c = 1 and P 1 = P 2 = 1.

Theorem 1 (Same Marginal Property for One-Transmitter (Theorem 13.9 in [19])). The capacity
region of a multiuser channel with one transmitter and two non-cooperative receivers depends only on the conditional marginal distributions P Y 1 |X and P Y 2 |X and not on the jointly conditional distribution P Y 1 ,Y 2 |X .