Partially Observable Markov Decision Process-Based Transmission Policy over Ka-Band Channels for Space Information Networks

Jiao, Jian; Sui, Xindong; Gu, Shushi; Wu, Shaohua; Zhang, Qinyu

doi:10.3390/e19100510

Open AccessArticle

Partially Observable Markov Decision Process-Based Transmission Policy over Ka-Band Channels for Space Information Networks

by

Jian Jiao

,

Xindong Sui

,

Shushi Gu

,

Shaohua Wu

^* and

Qinyu Zhang

^*

Communication Engineering Research Centre, Harbin Institute of Technology Shenzhen, HIT Campus of University Town of Shenzhen, Shenzhen 518055, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2017, 19(10), 510; https://doi.org/10.3390/e19100510

Submission received: 24 July 2017 / Revised: 30 August 2017 / Accepted: 20 September 2017 / Published: 21 September 2017

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

The Ka-band and higher Q/V band channels can provide an appealing capacity for the future deep-space communications and Space Information Networks (SIN), which are viewed as a primary solution to satisfy the increasing demands for high data rate services. However, Ka-band channel is much more sensitive to the weather conditions than the conventional communication channels. Moreover, due to the huge distance and long propagation delay in SINs, the transmitter can only obtain delayed Channel State Information (CSI) from feedback. In this paper, the noise temperature of time-varying rain attenuation at Ka-band channels is modeled to a two-state Gilbert–Elliot channel, to capture the channel capacity that randomly ranging from good to bad state. An optimal transmission scheme based on Partially Observable Markov Decision Processes (POMDP) is proposed, and the key thresholds for selecting the optimal transmission method in the SIN communications are derived. Simulation results show that our proposed scheme can effectively improve the throughput.

Keywords:

space information networks; Ka-band; Gilbert–Elliot channel; partially observable Markov decision processes

1. Introduction

With the development of deep-space exploration missions and Space Information Network (SIN) applications, the Ka-band and higher Q/V band channels are viewed as a primary solution to improve communication capacity [1]. Compared to commonly used X-band, Ka-band can offer 50 times higher bandwidth [2,3]. The Mars Reconnaissance Orbiter (MRO) mission demonstrated the availability and feasibility of the Ka-band for the future exploration missions [4,5].

However, the Ka-band channel is much more sensitive to the weather conditions surrounding the terrestrial stations, such as rainfall, which can significantly degrade the quality of service [6,7]. Furthermore, the space nodes in SIN only have limited communication resource, thus the optimal transmission policy should consider the trade-off between complexity and transmission performance [8,9]. Considering the huge distance and long propagation delay in SINs, the handshake process of conventional Transmission Control Protocol/Internet Protocol (TCP/IP) is not suitable for space communication scenarios [10,11]. Generally, the delay tolerant network protocols that Consultative Committee for Space Data Systems File Delivery Protocol (CFDP) and Licklider Transmission Protocol (LTP) are widely used in SIN communication scenarios [12,13,14], where the transmitter can obtain the delayed Channel State Information (CSI) from Negative Acknowledgment (NACK) feedback [15,16].

In previous studies [17,18,19], the time-varying rain attenuation at the Ka-band channel is used to model to a two-state Gilbert–Elliot (GE) channel, and several works have focused on the optimal data transmission policy. In [20], three data transmission actions were proposed to be chosen at the beginning of each time slot to maximize the expected long-term throughput.

For the Mars-to-Earth communications over deep space time-varying channels, an optimal data transmission policy has been developed with the delayed feedback CSI in [21]. The adaptive coding schemes for deep space communications over the Ka-band channel were also studied in [22,23]. However, little work has been done in optimizing the transmission policy for SINs, especially in the presence of highly time-varying Ka-band channels.

In this paper, by utilizing the delayed feedback CSI, we propose an optimal transmission scheme based on the Partially Observed Markov Decision Process (POMDP), and derive the key thresholds for selecting the optimal transmission actions for the SIN communications.

The rest of this paper is organized as follows. In Section 2, a two-state GE channel is modeled. We derive the threshold of which we should perform channel sensing before we start the transmission or not in Section 3, and we also derive the thresholds of choosing data transmission actions from two or three actions in POMDP. In Section 4, simulation results show that the proposed optimal transmission policy can increase the throughput in SIN communications. Finally, Section 5 concludes the paper.

2. System Model

According to the previous studies [24,25,26], we can select an appropriate threshold of the noise temperature

T_{t h}

to capture the channel capacity that randomly ranges from good to bad state. Then the time-varying rain attenuation at the Ka-band channel is modeled to a two-state GE channel according to the noise temperature T.

If the noise temperature satisfies

T \leq T_{t h}

, the channel is on good state, where channel bit error rate (BER) is as low as (

10^{- 8}

∼

10^{- 5}

); if the noise temperature satisfies

T > T_{t h}

, the channel is on bad state, and the channel BER is as high as (

10^{- 4}

∼

10^{- 3}

). We denote the transition probability matrix G of the two-state GE channel as

G = [\begin{matrix} P r (g | g) & P r (b | g) \\ P r (g | b) & P r (b | b) \end{matrix}] = [\begin{matrix} λ_{1} & 1 - λ_{1} \\ λ_{0} & 1 - λ_{0} \end{matrix}],

(1)

where

P r (g | g) = λ_{1}

is the probability that the Ka-band channel is holding on good state,

P r (g | b) = λ_{0}

is the probability that the channel state changes from bad to good. Without loss of generality, we assume

1 > λ_{1} > λ_{0} > 0

.

The transmission time slots can be expressed as

W = {w_{1}, w_{2}, . . ., w_{n}}

, the duration of a transmission time slot is a constant D, and and the corresponding states series of the GE channel can be expressed as

S = {s_{1}, s_{2}, . . ., s_{n}}

. The proposed POMDP-based transmission scheme with delayed CSI is shown in Figure 1.

The transmitter can thus obtain the delayed CSI through belief probability p, e.g., if the previous state is in the good state, then the receiver feedback of a single bit information is 1, otherwise 0. Therefore, there are three transmission actions for the transmitter that can be chosen at the beginning of each transmission time slot

w_{i}

, and each action is explained in detail as follows.

Betting aggressively (action A): When the transmitter believes that the channel has a high chance in a good state, the transmitter decides to “gamble” and transmits a high number

R_{g}

of data bits.

Betting conservatively (action C): When the transmitter believes the channel is in a bad state, and decides to “play safe” and transmits a low number

R_{b}

of data bits.

Betting opportunistically (action O): For this action, the transmitter adopts to sense the channel state at the beginning of the slot by sending a control/probing bit. The cost of sensing is a fraction

τ

of the slot, which is the time spent sensing the channel, defined as

τ = d_{R T T} / D

, where

d_{R T T}

is the round trip time, and D is the (constant) duration of a transmission time slot. Then the transmitter selects the appropriate transmission action (A or C) according to the sensing outcome, and

(1 - τ) R_{g}

data bits will send if the channel was found to be in the good state or

(1 - τ) R_{b}

data bits if otherwise.

Therefore, at the beginning of i-th transmission time slot

w_{i}

, the transmitter needs to decide an optimal action

a_{i}

from the three actions above, i.e.,

a_{i} \in {A, C, O}

to maximize the expected throughput of our proposed POMDP-based transmission scheme. Because the transmitter only can have delayed feedback CSI or even no feedback, this is a POMDP problem.

Let

X_{i}

denote the channel belief, which is the conditional probability that the channel is in the good state at the beginning of the i-th transmission time slot, from the past channel history of actions and accumulated delay CSI

H_{i + t}

, thus

X_{i + t} = P r [s_{i + t} = 1 | H_{i + t}]

. Define a policy

π

as a map from the belief at a particular time t to an action in the action space. Hence, by using this belief as the decision variable, let

V_{β}^{π} (p)

denote the expected reward, with a discount factor

β

(

0 \leq β < 1

), the maximize expected throughput has the following expression:

V_{β}^{π} (p) = E [\sum_{t = 0}^{\infty} β^{t} R (X_{i + t}, a_{i + t}) | X_{i} = p],

(2)

where

X_{i} = p

is the initial value of the belief at the i-th transmission time slot, and we formulate the optimization problem with

t = 0

in the next section.

3. Optimal Transmission Policy Based on POMDP

In this section, we derive the optimal policy for the transmitter that knows the CSI feedback at the end of each slot as shown in Figure 1. The necessary conditions that tell if the action of betting opportunistically should be used under certain SIN communication scenarios was derived, and then the key thresholds for selecting the optimal transmission actions for the SIN communications are derived.

From the above discussion, we understand that an optimal policy exists for our POMDP problem as shown in Equation (2), and the expected reward is

R (X_{i}, a_{i})

; if the aggressive action A is selected, since the probability of the channel in a good state at the next time is

X_{i}

, then the expected number of successfully transmitted data bits is

X_{i} R_{g}

; if the conservative action C is selected,

R_{b}

data bits will be transmitted without error; at last, if opportunistically action O is selected, the expected transmitted data bits are

(1 - τ) [(1 - X_{i}) R_{b} + X_{i} R_{g}]

. Then, we define the value function

V_{β} (p)

as

V_{β} (p) = max_{π} V_{β}^{π} (p)

, for all

p \in [0, 1]

, and

π

denotes the map from the belief at a particular time to an action in the action space

{A, C, O}

. The value function

V_{β} (p)

satisfies the Bellman equation

V_{β} (X_{i}) = max_{a_{i} \in {A, C, O}} {V_{β, a_{i}} (X_{i})},

(3)

where

V_{β, a_{i}} (X_{i})

is the value acquired by taking action

a_{i}

when the belief is

X_{i}

. By using the delayed feedback belief probability

X_{i} = p_{i}

, the transmitter selects the optimal transmission action

a_{i} \in {A, C, O}

to maximize the throughput with the feedback belief

X_{i} = p_{i}

being given by

V_{β, a_{i}} (X_{i}) = R (p_{i}, a_{i}) + β E [V_{β} (X) | X_{i} = p_{i}, a_{i} = A],

(4)

where

X_{i}

is the channel belief of the optimal action which selected at the beginning of the next time slot, and

X = T (p) = λ_{0} (1 - p) + λ_{1} p = α p + λ_{0}

in which

α = λ_{1} - λ_{0}

. Then,

V_{β, a_{i}} (p_{i})

can be explained for the three possible actions:

(1): Betting aggressively: If aggressive action A is taken, then, the value function evolves as $V_{β, A} (X_{i} = p_{i}) = p_{i} R_{g} + β V_{β} (T (p_{i}))$ ;
(2): Betting conservatively: If the conservative action C is selected, the value function evolves as $V_{β, C} (X_{i} = p_{i}) = R_{b} + β V_{β} (T (p_{i}))$ ;
(3): Betting opportunistically: If opportunistic action O is selected, the value function evolves as $V_{β, O} (X_{i} = p_{i}) = (1 - τ) [p_{i} R_{g} + (1 - p_{i}) R_{b}] + β V_{β} (T (p_{i}))$ .

Hence, if the channel belief

X_{i} = p_{i}

is probability of the slot

w_{i}

in a good state, then the channel belief of next slot

w_{i + 1}

is

λ_{1}

. Similarly, if the CSI in a bad state is

(1 - p_{i})

in

w_{i}

, then the channel belief of

w_{i + 1}

is

λ_{0}

, i.e.,

V_{β} (T (p_{i})) = (1 - p_{i}) V_{β} (λ_{0}) + p_{i} V_{β} (λ_{1})

, and then we can rewrite the above equations as follows

\{\begin{matrix} V_{β, A} (X_{i} = p_{i}) = p_{i} R_{g} + β (p_{i} V_{β} (λ_{1}) + (1 - p_{i}) V_{β} (λ_{0})) \\ V_{β, C} (X_{i} = p_{i}) = R_{b} + β (p_{i} V_{β} (λ_{1}) + (1 - p_{i}) V_{β} (λ_{0})) \\ V_{β, O} (X_{i} = p_{i}) = (1 - τ) [p_{i} R_{g} + (1 - p_{i}) R_{b}] + β (p_{i} V_{β} (λ_{1}) + (1 - p_{i}) V_{β} (λ_{0})) \end{matrix} .

(5)

Finally, the Bellman equation for our POMDP-based transmission policy over Ka-Band channels for SINs can be expressed as

V_{β} (X_{i} = p_{i}) = max_{a_{i} \in {A, C, O}} {V_{β, A} (p_{i}), V_{β, C} (p_{i}), V_{β, O} (p_{i})} .

(6)

Moreover, Smallwood et al. [27] proved that

V_{β} (X_{i})

is convex and nondecreasing, and there exist three thresholds

0 \leq ρ_{1} \leq ρ_{2} \leq ρ_{3} \leq 1

. Therefore, there are three types of threshold policies accordingly: (1) When

ρ_{1} = ρ_{2} = ρ_{3}

, the optimal policy is a one-threshold policy; (2) When

ρ_{1} < ρ_{2} = ρ_{3}

, the optimal policy is a two-thresholds policy; (3) When

ρ_{1} < ρ_{2} < ρ_{3}

, the optimal policy is a three-thresholds policy. The optimal policy for the three-thresholds policy is illustrated in Figure 2, and the interval

[0, 1]

is separated in four regions by the thresholds

ρ_{1}

,

ρ_{2}

and

ρ_{3}

.

Intuitively, one would think that there should exist only three regions, i.e., if

X_{i}

is small, one should play safe; if

X_{i}

is high, one should gamble, and, somewhere in between, sensing is optimal. However, if the transmitter can not obtain the feedback CSI, for some cases , a three-threshold policy is optimal and an example is shown in [28].

However, with the help of the delayed feedback CSI, our POMDP-based transmission policy has only three regions, i.e., (

ρ_{2}, ρ_{3}

) = ∅. Therefore, the necessary conditions that tell if action O should be selected under a given SIN communication scenario, with data rate

R_{g}

and

R_{b}

in good and bad states, respectively, two thresholds

{ρ_{1}, ρ_{2}}

, and cost of action O is

τ

, are given in the following theorem.

Theorem 1.

In terms of the POMDP-based optimal transmission scheme constructed by the Bellman function Equation (6), where

X_{i} R_{g}

is the expected return when the risky action A is taken,

R_{b}

bits are transmitted regardless of the channel conditions when action C is selected, and the expected return when the sensing action O is taken is

(1 - τ) [(1 - X_{i}) R_{b} + X_{i} R_{g}]

, therefore:

If

R_{b} / R_{g} < (1 - 2 τ) / (1 - τ)

then the optimal policy is a two-thresholds

{ρ_{1}, ρ_{2}}

policy, and the optimal action

a_{i}

can be selected from

{A, C, O}

;

Otherwise, if

R_{b} / R_{g} \geq (1 - 2 τ) / (1 - τ)

then the optimal policy is a one-threshold ρ policy and the optimal action

a_{i}

can be selected from

{A, C}

.

Proof of Theorem 1.

In our SIN POMDP-based transmission scheme, without loss of generality, assume that the optimal policy has two thresholds

0 < ρ_{1} \leq ρ_{2} < 1

. Note that, since

ρ_{1}

is the solution of

V_{β, C} (X_{i}) = V_{β, O} (X_{i})

, and

ρ_{2}

is the solution of

V_{β, O} (X_{i}) = V_{β, A} (X_{i})

, it is easy to establish that

\{\begin{matrix} V_{β, C} (ρ_{1}) = V_{β, O} (ρ_{1}) \\ V_{β, A} (ρ_{2}) = V_{β, O} (ρ_{2}) \end{matrix} .

(7)

From Equation (5), we have

\{\begin{matrix} ρ_{1} = \frac{τ R_{b}}{(1 - τ) (R_{g} - R_{b})} \\ ρ_{2} = \frac{(1 - τ) R_{b}}{(1 - τ) R_{b} + τ R_{g}} \end{matrix} .

(8)

If the optimal policy has two thresholds then

ρ_{1} < ρ_{2}

, and the communication parameters should satisfy

\frac{R_{b}}{R_{g}} < \frac{1 - 2 τ}{1 - τ} .

(9)

Otherwise, if the optimal policy has one threshold

ρ = (ρ_{1} = ρ_{2})

, then the communication parameters turn to satisfy

\frac{R_{b}}{R_{g}} \geq \frac{1 - 2 τ}{1 - τ} .

(10)

☐

Note that Theorem 1 establishes the structure that tells if action O should be used, and two types of threshold policies exist depending on the system parameters–in particular, the cost of the sensing action

τ

versus the ratio of

R_{b} / R_{g}

, and the optimal policy space is partitioned into two regions, which is illustrated in Figure 3.

Figure 3 illustrated that the established two optimal policies regions can be further partitioned into three regions at most. As one should expect, the optimal transmission scheme here is a myopic policy that maximizes the immediate reward. Next, we detailed the optimal transmission action in the one-threshold policy region and the two-thresholds policy region in Figure 3, and gave a complete characterization of the thresholds for each policy in the following, respectively.

Assume that the one-threshold policy has one threshold

0 < ρ < 1

, and the transition probability matrix

G

of the two-state GE channel is

G = [\begin{matrix} λ_{1} & 1 - λ_{1} \\ λ_{0} & 1 - λ_{0} \end{matrix}]

. Then, the optimal transmission action

a_{i}

is introduced in the following Theorem 2.

Theorem 2.

Let

a_{i} = {A, C}

denote the action space in the one-threshold policy region,

R_{g}

and

R_{b}

denote the transmitted numbers of data bits that corresponding to the action A and C, respectively, then

a_{i}

is determined as follows:

(1): If $R_{b} / R_{g} < λ_{0}$ , then the optimal transmission action is $a_{i} = A$ regardless of the delayed feedback CSI is $s_{i - 1} = 1$ or $s_{i - 1} = 0$ ;
(2): If $R_{b} / R_{g} > λ_{1}$ , then the optimal transmission action is $a_{i} = C$ regardless of the delayed feedback CSI is $s_{i - 1} = 1$ or $s_{i - 1} = 0$ ;
(3): Finally if $λ_{0} \leq R_{b} / R_{g} \leq λ_{1}$ , then the optimal transmission action $a_{i} = A$ when the delayed feedback CSI is $s_{i - 1} = 1$ , and the optimal transmission action is $a_{i} = C$ when the delayed feedback CSI is $s_{i - 1} = 0$ .

Proof of Theorem 2.

Recall in our POMDP model that any general value function

V_{β} (\cdot)

is convex.

Hence, (1) if

R_{b} / R_{g} < λ_{0}

, when the delayed feedback CSI is

s_{i - 1} = 1

, and the channel belief is

X_{i} = λ_{1}

, then we have

V_{β, A} (X_{i} = λ_{1}) > V_{β, C} (X_{i} = λ_{1})

since

\{\begin{matrix} V_{β, A} (X_{i} = λ_{1}) = λ_{1} R_{g} + β (λ_{1} V_{β} (λ_{1}) + (1 - λ_{1}) V_{β} (λ_{0})), \\ V_{β, C} (X_{i} = λ_{1}) = R_{b} + β (λ_{1} V_{β} (λ_{1}) + (1 - λ_{1}) V_{β} (λ_{0})), \end{matrix}

(11)

and

λ_{1} > λ_{0} > R_{b} / R_{g}

. Similarly, when the delayed feedback CSI is

s_{i - 1} = 0

, we still have

V_{β, A} (X_{i} = λ_{0}) > V_{β, C} (X_{i} = λ_{0})

since

\{\begin{matrix} V_{β, A} (X_{i} = λ_{0}) = λ_{0} R_{g} + β (λ_{0} V_{β} (λ_{0}) + (1 - λ_{0}) V_{β} (λ_{0})) \\ V_{β, C} (X_{i} = λ_{0}) = R_{b} + β (λ_{0} V_{β} (λ_{0}) + (1 - λ_{0}) V_{β} (λ_{0})) \end{matrix} .

(12)

Hence, the optimal transmission action in this case is

a_{i} = A

.

(2) If

R_{b} / R_{g} > λ_{1}

, similar to the previous case, regardless of the delayed feedback CSI is

s_{i - 1} = 1

or

s_{i - 1} = 0

(i.e., the channel belief is

X_{i} = λ_{1}

or

X_{i} = λ_{0}

, respectively), we have

V_{β, A} (p_{i}) < V_{β, C} (p_{i})

, therefore, by substituting

p_{i}

into Equations (11) and (12) directly. Therefore, the action

a_{i} = C

is optimal in this case.

(3) If

λ_{0} \leq R_{b} / R_{g} \leq λ_{1}

, the approach here is similar to the previous cases, i.e., when the delayed feedback CSI is

s_{i - 1} = 1

and

p_{i} = λ_{1}

, we will have

V_{β, A} (X_{i} = λ_{1}) > V_{β, C} (X_{i} = λ_{1})

, therefore by substituting

p_{i} = λ_{1}

into Equation (11), and the action

a_{i} = A

is the optimal strategy here; otherwise, when the delayed feedback CSI is

s_{i - 1} = 0

and

p_{i} = λ_{0}

, we consequently obtain

V_{β, A} (X_{i} = λ_{0}) < V_{β, C} (X_{i} = λ_{0})

by substituting

p_{i} = λ_{0}

into Equation (12) and the optimal transmission action in this case is

a_{i} = C

. ☐

The complete characterization of the optimal transmission action of the one-threshold policy is given in Table 1.

Furthermore, it is worth noting that Theorem 2 proves that the value function is totally determined by finding

V_{β} (λ_{1})

and

V_{β} (λ_{0})

. In order to calculate the expected reward of the optimal action when the belief is

λ_{1}

or

λ_{0}

, we start by comparing these value functions established in Theorem 2 to the threshold

ρ

in Theorem 1. Then, all that remains is solving a system of two linear equations with two unknowns, i.e.,

V_{β} (λ_{1})

and

V_{β} (λ_{0})

, in three cases:

ρ < λ_{0}

,

λ_{0} \leq ρ \leq λ_{1}

and

λ_{1} < ρ

.

To illustrate the procedure of determining

V_{β} (λ_{1})

and

V_{β} (λ_{0})

, we consider the example where

ρ < λ_{0}

, and the optimal transmission action is

a_{i} = A

regardless of the delayed feedback CSI in this case, as we have proved in Theorem 2; we have then

V_{β} (λ_{0}) = V_{β, A} (X_{i} = λ_{0}) = λ_{0} R_{g} + β (λ_{0} V_{β} (λ_{1}) + (1 - λ_{0}) V_{β} (λ_{0})),

(13)

V_{β} (λ_{1}) = V_{β, A} (X_{i} = λ_{1}) = λ_{1} R_{g} + β (λ_{1} V_{β} (λ_{1}) + (1 - λ_{1}) V_{β} (λ_{0})) .

(14)

Recall that we have

α = λ_{1} - λ_{0}

, and solving for

V_{β} (λ_{1})

and

V_{β} (λ_{0})

leads to

V_{β} (λ_{0}) = λ_{0} R_{g} / ((1 - β) (1 - α β)),

(15)

V_{β} (λ_{1}) = (λ_{1} - α β) R_{g} / ((1 - β) (1 - α β)) .

(16)

All other cases can be solved similarly, and the closed form computation expressions of the one-threshold policy are given in Table 2.

Next, assume that in the two-thresholds policy region, the optimal policy has two thresholds

0 < ρ_{1} < ρ_{2} < 1

, and the transition probability matrix is

G = [\begin{matrix} λ_{1} & 1 - λ_{1} \\ λ_{0} & 1 - λ_{0} \end{matrix}]

. Then, the optimal transmission action

a_{i}

is given in the following Theorem 3.

Theorem 3.

Let

a_{i} = {A, C, O}

denote the action space in the two-thresholds policy region, and

R_{g}

and

R_{b}

denote the transmitted numbers of data bits that correspond to the actions A and C, respectively. Recall that τ is the sensing cost that is the ratio of the round trip time

d_{R T T}

to the time slot duration D and satisfies

τ < (1 - R_{b} / R_{g}) / (2 - R_{b} / R_{g})

as in Theorem 1, and

X_{i}

is the channel belief, then

a_{i}

is determined as follows:

(1): If $R_{b} / R_{g} < λ_{0}$ , then two cases can be distinguished: if $R_{b} / R_{g} < τ X_{i} / ((1 - τ) (1 - X_{i}))$ , the optimal transmission action is $a_{i} = A$ , regardless of the delayed feedback CSI being $s_{i - 1} = 1$ or $s_{i - 1} = 0$ ; else, if $R_{b} / R_{g} \geq τ X_{i} / ((1 - τ) (1 - X_{i}))$ , the optimal transmission action is $a_{i} = O$ , regardless of the delayed feedback CSI being $s_{i - 1} = 1$ or $s_{i - 1} = 0$ .
(2): If $R_{b} / R_{g} > λ_{1}$ , then two cases can be distinguished: if $R_{b} / R_{g} < (1 - τ) X_{i} / (τ + X_{i} - τ X_{i})$ , the optimal transmission action is $a_{i} = O$ , regardless of the delayed feedback CSI being $s_{i - 1} = 1$ or $s_{i - 1} = 0$ ; else, if $R_{b} / R_{g} \geq (1 - τ) X_{i} / (τ + X_{i} - τ X_{i})$ , the optimal transmission action is $a_{i} = C$ regardless of the delayed feedback CSI being $s_{i - 1} = 1$ or $s_{i - 1} = 0$ .
(3): Finally if $λ_{0} \leq R_{b} / R_{g} \leq λ_{1}$ , when the delayed feedback CSI is $s_{i - 1} = 1$ and $X_{i} = λ_{1}$ , then the optimal transmission action is $a_{i} = A$ ; when the delayed feedback CSI is $s_{i - 1} = 0$ and $X_{i} = λ_{0}$ , then the optimal transmission action is $a_{i} = C$ .

Proof of Theorem 3.

The proof here needs to utilize the previous case in Theorem 1; recall in our POMDP model that any general value function

V_{β} (\cdot)

is convex, the interval [0, 1] of

R_{b} / R_{g}

is separate in three regions by the thresholds

ρ_{1}

and

ρ_{2}

, all the six possible optimal policy structures of the two-thresholds policy are illustrated in Figure 4, and, then, we can distinguish them into three possible scenarios.

(1) If

R_{b} / R_{g} < λ_{0}

, we can distinguish three subcases:

If

ρ_{2} < λ_{0}

as shown in Figure 4a, then we have that the optimal action is

a_{i} = A

for

s_{i} = 0 / 1

where

V_{β, A} (X_{i} = λ_{0}) > V_{β, O} (X_{i} = λ_{0})

and

V_{β, A} (X_{i} = λ_{1}) > V_{β, O} (X_{i} = λ_{1})

, and since

\{\begin{matrix} V_{β, A} (X_{i} = p_{i}) = p_{i} R_{g} + β (p_{i} V_{β} (λ_{1}) + (1 - p_{i}) V_{β} (λ_{0})) \\ V_{β, O} (X_{i} = p_{i}) = (1 - τ) [p_{i} R_{g} + (1 - p_{i}) R_{b}] + β (p_{i} V_{β} (λ_{1}) + (1 - p_{i}) V_{β} (λ_{0})) \end{matrix} .

(17)

Hence, the optimal transmission action in this case is

a_{i} = A

for

R_{b} / R_{g} < τ λ_{0} / ((1 - τ) (1 - λ_{0}))

and

R_{b} / R_{g} < τ λ_{1} / ((1 - τ) (1 - λ_{1}))

.

Else, if

ρ_{1} < λ_{0} < ρ_{2} < λ_{1}

, as is illustrated in Figure 4b, then we have

a_{i} = A

for

s_{i - 1} = 1

where

V_{β, A} (X_{i} = λ_{1}) > V_{β, O} (X_{i} = λ_{1})

as the above subcase obtained, and

a_{i} = O

is the optimal action for

s_{i - 1} = 0

where

V_{β, O} (X_{i} = λ_{0}) > V_{β, A} (X_{i} = λ_{0})

in Figure 4b, and

R_{b} / R_{g} \geq τ λ_{0} / ((1 - τ) (1 - λ_{0}))

by substituting

X_{i} = λ_{0}

into Equation (17) with

ρ_{1} < λ_{0} < ρ_{2} < λ_{1}

.

Lastly, if

ρ_{1} < λ_{0} < λ_{1} < ρ_{2}

, as is shown in Figure 4c, we have

a_{i} = O

for

s_{i} = 0 / 1

being the optimal action due to the solution of

V_{β, C} (X_{i} = λ_{0}) < V_{β, O} (X_{i} = λ_{0})

and

V_{β, C} (X_{i} = λ_{1}) < V_{β, O} (X_{i} = λ_{1})

by substituting

X_{i} = λ_{0}

and

X_{i} = λ_{1}

into Equation (17), and we obtain

R_{b} / R_{g} \geq τ λ_{0} / ((1 - τ) (1 - λ_{0}))

.

(2) If

R_{b} / R_{g} > λ_{1}

, similarly, three subcases can be distinguished:

If

ρ_{1} < λ_{0} < λ_{1} < ρ_{2}

, as is shown in Figure 4c, then we have

V_{β, C} (X_{i} = λ_{0}) < V_{β, O} (X_{i} = λ_{0})

and

V_{β, C} (X_{i} = λ_{1}) < V_{β, O} (X_{i} = λ_{1})

; thus, we have

\{\begin{matrix} V_{β, O} (X_{i} = p_{i}) = (1 - τ) [p_{i} R_{g} + (1 - p_{i}) R_{b}] + β (p_{i} V_{β} (λ_{1}) + (1 - p_{i}) V_{β} (λ_{0})) \\ V_{β, C} (X_{i} = p_{i}) = R_{b} + β (p_{i} V_{β} (λ_{1}) + (1 - p_{i}) V_{β} (λ_{0})) \end{matrix},

(18)

where

a_{i} = O

is the optimal action since

R_{b} / R_{g} < (1 - τ) λ_{1} / (τ + λ_{1} - τ λ_{1})

and

R_{b} / R_{g} < (1 - τ) λ_{0} / (τ + λ_{0} - τ λ_{0})

by solving the value function Equation (18).

Else, if

λ_{0} < ρ_{1} < λ_{1} < ρ_{2}

, as is shown in Figure 4e, then we have

a_{i} = C

for

s_{i - 1} = 0

where

V_{β, O} (X_{i} = λ_{0}) < V_{β, C} (X_{i} = λ_{0})

for

R_{b} / R_{g} \geq (1 - τ) λ_{0} / (τ + λ_{0} - τ λ_{0})

by substituting

X_{i} = λ_{0}

into Equation (18), and

a_{i} = O

is the optimal action for

s_{i - 1} = 1

where

V_{β, O} (X_{i} = λ_{1}) > V_{β, A} (X_{i} = λ_{1})

for

R_{b} / R_{g} < (1 - τ) λ_{1} / (τ + λ_{1} - τ λ_{1})

by substituting

X_{i} = λ_{1}

into Equation (18).

Lastly, if

λ_{1} < ρ_{1}

, as is shown in Figure 4f, we have

a_{i} = C

being the optimal action regardless if the delayed feedback CSI is

s_{i - 1} = 0

or

s_{i - 1} = 1

, where

V_{β, C} (X_{i} = λ_{0}) > V_{β, O} (X_{i} = λ_{0})

and

V_{β, C} (X_{i} = λ_{1}) > V_{β, O} (X_{i} = λ_{1})

. Then

R_{b} / R_{g} \geq (1 - τ) λ_{0} / (τ + λ_{0} - τ λ_{0})

and

R_{b} / R_{g} \geq (1 - τ) λ_{1} / (τ + λ_{1} - τ λ_{1})

by substituting

X_{i} = λ_{0}

and

X_{i} = λ_{1}

into Equation (18), respectively.

(3) Finally, if

λ_{0} \leq R_{b} / R_{g} \leq λ_{1}

, the computation is similar to the previous cases. For

λ_{0} < ρ_{1} < ρ_{2} < λ_{1}

, as is shown in Figure 4d, where the optimal action is

a_{i} = A

for

s_{i - 1} = 1

, by using (17) to solve

V_{β, A} (X_{i} = λ_{1}) > V_{β, O} (X_{i} = λ_{1})

with

X_{i} = λ_{1}

, then

R_{b} / R_{g} < τ λ_{1} / ((1 - τ) (1 - λ_{1}))

. In addition, if

s_{i - 1} = 0

, then

a_{i} = O

is the optimal action, by solving Equation (18) with

X_{i} = λ_{0}

for

V_{β, C} (X_{i} = λ_{0}) > V_{β, O} (X_{i} = λ_{0})

, and then we have

R_{b} / R_{g} \geq (1 - τ) λ_{0} / (τ + λ_{0} - τ λ_{0})

.

☐

Let

A (X_{i}) = τ X_{i} / ((1 - τ) (1 - X_{i}))

and

C (X_{i}) = (1 - τ) X_{i} / (τ + X_{i} - τ X_{i})

, and Table 3 shows the complete characterization of the optimal transmission action in the two-thresholds policy region.

Similar to the previous case, we illustrate mathematical expressions for the

V_{β} (λ_{1})

and

V_{β} (λ_{0})

of Theorem 3 in Table 4, in order to calculate the expected reward of the value function of the corresponding optimal action that is given in Theorem 3. Again, once

V_{β} (λ_{1})

and

V_{β} (λ_{0})

have been computed for the six possible optimal policy structures of the two-thresholds policy region in Theorem 3, we retain the scenario that gives the maximal values.

The procedure to calculate

V_{β} (λ_{1})

and

V_{β} (λ_{0})

starts by comparing these value functions established in Theorem 3 to the thresholds

ρ_{1}

and

ρ_{2}

established in Figure 4, and all the cases can be solved similarly to the previous example, and the closed form computation expressions of the two-thresholds policy are given in Table 4.

4. Simulation and Results

To evaluation our proposed POMDP-based optimal transmission policy, we start by comparing the transmission actions with different setups, each leading to a different optimal policy. We choose the parameters below in order to illustrate that, in theory, the optimal policy is determined by the communication scenario parameters, such as data rate

R_{b}

,

R_{g}

and

τ

which is affected by the round trip time and the duration of the transmission time slot as in Theorem 1.

The first set of parameters considered is

τ = 0.4

,

R_{g} = 2

,

R_{b} = 1

,

λ_{1} = 0.9

,

λ_{0} = 0.2

, and

β = 0.5

. Note that from Theorem 1,

τ = 0.4 > 1 / 3

represents the action of betting opportunistically, which could not be used under this scenario, thus the one-threshold policy is optimal as the numerical result shows in Figure 5a. Furthermore, from Theorem 2, the threshold in Figure 5a is

ρ = 0.5

. Therefore, if

p_{i} < ρ

, the optimal action is

a_{i} = C

, else if

p_{i} \geq ρ

, the optimal action is

a_{i} = A

, and betting opportunistically is unfeasible in this scenario.

If we keep all the parameter values fixed and diminish the cost of sensing to

τ = 0.15

, then from Theorem 1 we can compute that the optimal policy is the two-thresholds policy, shown in Figure 5b. From Figure 5b we can see that the one-threshold policy gives suboptimal values, and the two thresholds in this scenario are

ρ_{1} = 0.176

and

ρ_{2} = 0.739

by using Theorem 3. If

p_{i} < 0.176

, the optimal transmission action is betting conservatively (

a_{i} = C

); else if

0.176 \leq p_{i} \leq 0.739

, the optimal transmission action is betting opportunistically (

a_{i} = O

), which can achieve a better reward than

a_{i} = C

or

a_{i} = A

unless

p_{i} > 0.739

, the optimal action is betting aggressively (

a_{i} = A

).

Next, we compare the long-term effect of the expected throughput of our adaptive data transmission scheme with conventional fixed-rate schemes, to validate the optimality of our POMDP-based transmission policy over Ka-band channels for SIN communications. Let w denote the number of the transmission time slots

W = {w_{1}, w_{2}, . . ., w_{n}}

, and

V = \sum_{i = 1}^{w} V_{β} (X_{i})

denote the accumulated expected values in Figure 6.

The system parameters in these scenarios are as follows:

R_{g} = 2

,

R_{b} = 1

,

λ_{1} = 0.9

,

λ_{0} = 0.2

, and

β = 0.99

. With these parameters, the one-threshold policy is optimal for

τ \in (0.333, 1]

, and beyond these critical values, the two-thresholds policy will become optimal. We have

τ_{1} = 0.4

in Figure 6a, and

τ_{2} = 0.1

in Figure 6b, respectively. As expected, at the beginning of several transmission time slots, betting conservatively can archive the same throughput as the adaptive transmission scheme does as is illustrated in Figure 6a. However, the adaptive transmission scheme can better utilize the channel capacity when the Ka-band channel turns it into a good state, which leads to a higher throughput in a long-term program. On the other hand, if the two-thresholds policy is optimal as in Figure 6b, betting opportunistically is also performed well, but still has a gap remaining compared to our adaptive transmission scheme due to the reward from “gamble”, and “play safe” will be better than sensing the channel sometimes.

So far, we have demonstrated our POMDP-based transmission policy can perform well with different communication setups. In the following, we simulate and compare the adaptive transmission schemes under Ka-band channel communications.

Assume the threshold of the two-state GE channel noise temperature is

T_{t h} = 20

K, and the corresponding transition probability of the GE channel is

G = [\begin{matrix} 0.9773 & 0.0223 \\ 0.1667 & 0.8333 \end{matrix}]

, according to [24]. If the channel state is bad, the channel bit error rate (BER) is

10^{- 3}

, and we select four different BERs

10^{- 8}

,

10^{- 7}

,

10^{- 6}

and

10^{- 5}

as the channel state is good. Assume normalized the data bit

R_{g} = 1

when BER is

10^{- 8}

, then we can calculate data bits

R_{b}

and

R_{g}

with other BER values according to the error function [24]. We simulate the adaptive transmission schemes in two cases, Earth-to-Moon (

τ = 0.03

), and Earth-to-Mars (

τ = 1

), and the transmission schemes are as follows.

Case 1: The transmitter adopts the action of betting conservatively regardless of the channel state, which can ensure

R_{b}

data bits are successfully transmitted.

Case 2: The transmitter only adopts the action of betting aggressively, if the channel state is good,

R_{g}

bits are successfully transmitted; else, if the channel state is bad, all data bits are lost.

Case 3: The transmitter only adopts the action of betting opportunistically, if the channel state is good,

(1 - τ) R_{g}

bits can be received; else, if the channel state is bad,

(1 - τ) R_{b}

bits can be received successfully.

Case 4: The transmitter chooses optimal action by using delayed feedback CSI and and Theorem 2, the adaptive data transmission action space is a,

a \in {A, C}

.

Case 5: The transmitter chooses optimal action by using delayed feedback CSI and Theorem 3, the adaptive data transmission action space is a,

a \in {A, O, C}

.

Case 6: We directly give the outage capacity bounds of the corresponding channels.

The simulation results of throughput performance of the 5 transmission schemes above and the capacity bounds are shown in Figure 7, and we can see that if the right transmission scheme with certain channel conditions can be selected by using our derived thresholds, it can increase the throughput in Ka-band SIN communications.

Based on the previous analysis, we can expect that the two-thresholds policy is optimal for the Moon-to-Earth scenario in Figure 7a as the sensing cost is

τ = 0.03

. Therefore, the transmitter can access the sensing action with little cost. As it can be seen in Figure 7a, the total number of transmitted bits of betting opportunistically is substantially augmented and the two-thresholds policy transmission scheme performs close to the capacity bounds.

On the other hand, the round trip time between Mars-to-Earth is about 6–40 min, which is leading to the one-threshold policy being optimal with

τ = 1

, and betting opportunistically is completely unfeasible as shown in Figure 7b, there is no data bit that can be transmitted if the transmitter perform channel sensing. And the two-thresholds policy transmission scheme in this scenario is degenerated to the one-threshold policy transmission scheme, where both of the transmission schemes have exactly the same expected total number of transmitted bits.

5. Conclusions

In this paper, considering the potential of the Ka-band high throughput satellites applications to SINs, we reviewed the rain attenuation over Ka-band channel and modeled a two-state Gilbert–Elliot channel for SINs. Then, we proposed an optimal transmission scheme based on POMDP by using the delayed feedback CSI, and derived the thresholds over Ka-band channels for SINs of which we should perform channel sensing or not at the beginning of each transmission time slot. We also derived the thresholds of choosing data transmitting actions from two or three actions in POMDP. Simulation results show that the proposed optimal transmission policy can increase the throughput in SIN communications.

Acknowledgments

This work was supported in part by the National Natural Sciences Foundation of China (NSFC) under Grant 61771158, 61701136, 61525103 and 61371102, the National High Technology Research & Development Program No. 2014AA01A704, the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant HIT. NSRIF. 2017051, and the Shenzhen Fundamental Research Project under Grant JCYJ20160328163327348 and JCYJ20150930150304185.

Author Contributions

Jian Jiao put forward to the main idea and design the POMDP-based transmission policy. Jian Jiao and Xindong Sui did the analysis and prepared the manuscript. All authors have participated in writing the manuscript. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jia, M.; Gu, X.; Guo, Q. Broadband hybrid satellite-terrestrial communication systems based on cognitive radio toward 5G. IEEE Wirel. Commun. 2016, 23, 96–106. [Google Scholar] [CrossRef]
Panagopoulos, A.D.; Arapoglou, P.D.M.; Cottis, P.G. Satellite communications at Ku, Ka, and V bands: Propagation impairments and mitigation techniques. IEEE Commun. Surv. Tutor. 2004, 6, 12–14. [Google Scholar] [CrossRef]
Yang, Z.; Li, H.; Jiao, J.; Zhang, Q.Y.; Wang, R. CFDP-based two-hop relaying protocol over weather-dependent Ka-band space channel. IEEE Trans. Aerosp. Electron. Syst. 2015, 51, 1357–1374. [Google Scholar] [CrossRef]
Bell, D.; Allen, S.; Chamberlain, N. MRO relay telecom support of Mars Science Laboratory surface operations. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 1–8 March 2014. [Google Scholar]
Craig, D.; Herrmann, N.; Troutman, P. The evolvable Mars campaign-study status. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 7–14 March 2015. [Google Scholar]
Adams, N.; Copeland, D.; Mick, A. Optimization of deep-space Ka-band link schedules. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 1–8 March 2014. [Google Scholar]
Maleki, S.; Chatzinotas, S.; Evans, B. Cognitive spectrum utilization in Ka band multibeam satellite communications. IEEE Commun. Mag. 2015, 53, 24–29. [Google Scholar] [CrossRef]
Jiao, J.; Yang, Y.; Feng, B.; Wu, S.; Li, Y.; Zhang, Q. Distributed rateless codes with unequal error protection property for space information networks. Entropy 2017, 19, 38. [Google Scholar] [CrossRef]
Wang, Y.; Jiao, J.; Sui, X.D.; Wu, S.H.; Li, Y.H.; Zhang, Q.Y. Rateless coding scheme for time-varying dying channels. In Proceedings of the 8th Wireless Communications & Signal Processing, Yangzhou, China, 13–15 October 2016. [Google Scholar]
Jiao, J.; Guo, Q.; Zhang, Q.Y. Packets interleaving CCSDS file delivery protocol in deep space communication. IEEE Aerosp. Electron. Syst. Mag. 2011, 26, 5–11. [Google Scholar] [CrossRef]
Shi, L.; Jiao, J.; Sabbagh, A.; Wang, R. Integration of Reed-Solomon codes to Licklider transmission protocol (LTP) for space DTN. IEEE Aerosp. Electron. Syst. Mag. 2017, 32, 48–55. [Google Scholar] [CrossRef]
Yu, Q.; Wang, R.H.; Zhao, K.L.; Li, W.F.; Sun, X.; Hu, J.L.; Ji, X.Y. Modeling RTT for DTN protocol over asymmetric cislunar space channels. IEEE Syst. J. 2016, 10, 556–567. [Google Scholar] [CrossRef]
Wang, R.; Qiu, M.; Zhao, K.; Qian, Y. Optimal RTO timer for best transmission efficiency of DTN protocol in deep-space vehicle communications. IEEE Trans. Veh. Technol. 2017, 66, 2536–2550. [Google Scholar] [CrossRef]
Wang, G.; Burleigh, S.; Wang, R.; Shi, L.; Qian, Y. Scoping contact graph-routing scalability: Investigating the system’s usability in space-vehicle communication networks. IEEE Veh. Technol. Mag. 2016, 11, 46–52. [Google Scholar] [CrossRef]
Wu, H.; Li, Y.; Jiao, J.; Cao, B.; Zhang, Q. LTP asynchronous accelerated retransmission strategy for deep space communications. In Proceedings of the IEEE International Conference on Wireless for Space and Extreme Environments (WiSEE), Aachen, Germany, 26–28 September 2016. [Google Scholar]
Gu, S.; Jiao, J.; Yang, Z.; Zhang, Q.; Wang, Y. RCLTP: A rateless coding-based Licklider transmission protocol in space delay/disrupt tolerant network. In Proceedings of the Wireless Communications & Signal Processing, Hangzhou, China, 24–26 October 2013. [Google Scholar]
Tang, J.; Mansourifard, P.; Krishnamachari, B. Power allocation over two identical Gilbert–Elliott channels. In Proceedings of the IEEE International Conference on Communications (ICC), Budapest, Hungary, 9–13 June 2013; pp. 5888–5892. [Google Scholar]
Jiang, W.; Tang, J.; Krishnamachari, B. Optimal power allocation policy over two identical Gilbert–Elliott channels. In Proceedings of the IEEE International Conference on Communications (ICC), Budapest, Hungary, 9–13 June 2013; pp. 5893–5897. [Google Scholar]
Li, J.; Tang, J.; Krishnamachari, B. Optimal power allocation over multiple identical Gilbert–Elliott channels. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Atlanta, GA, USA, 12 June 2013; pp. 3655–3660. [Google Scholar]
Kumar, S.; Chamberland, J.; Huff, G. Reconfigurable antennas preemptive switching and virtual channel management. IEEE Trans. Commun. 2014, 62, 1272–1282. [Google Scholar] [CrossRef]
Wu, H.; Zhang, Q.; Yang, Z.; Jiao, J. Double retransmission deferred negative acknowledgement in Consultative Committee for Space Data Systems File Delivery Protocol for space communications. IET Commun. 2016, 10, 245–252. [Google Scholar] [CrossRef]
Gu, S.; Jiao, J.; Yang, Z.; Zhang, Q.; Xiang, W.; Cao, B. Network-coded rateless coding scheme in erasure multiple-access relay enabled communications. IET Commun. 2014, 8, 537–545. [Google Scholar] [CrossRef]
Chen, C.; Jiao, J.; Wu, H.; Li, Y.; Zhang, Q. Adaptive rateless coding scheme for deep-space Ka-band communications. In Proceedings of the IEEE International Conference on Wireless for Space and Extreme Environments (WiSEE), Aachen, Germany, 26–28 September 2016. [Google Scholar]
Sung, I.; Gao, J. CFDP performance over weather-dependent Ka-band channel. In Proceedings of the American Institute of Aeronautics and Astronautics Conference, Rome, Italy, 22 June 2006. [Google Scholar]
Gao, J. On the performance of adaptive data rate over deep space Ka-band link: Case study using Kepler data. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 5–12 March 2016. [Google Scholar]
Ferreira, P.; Paffenroth, R.; Wyglinski, A. Interactive multiple model filter for land-mobile satellite communications at Ka-band. IEEE Access 2017, 5, 15414–15427. [Google Scholar] [CrossRef]
Smallwood, R.; Sondik, E. The optimal control of partially observable markov precesses over a finite horizon. Oper. Res. 1973, 21, 1071–1088. [Google Scholar] [CrossRef]
Laourine, A.; Tong, L. Betting on Gilbert–Elliot channels. IEEE Trans. Wirel. Commun. 2010, 9, 723–733. [Google Scholar]

Figure 1. Adaptive data transmission based on delayed CSI.

Figure 2. Thresholds of the optimal transmission action.

Figure 3. The optimal transmission policies thresholds are determined by the SIN communication parameters based on delayed feedback CSI.

Figure 4. Illustration of the two-thresholds policy structure: (a)

ρ_{2} < λ_{0}

; (b)

ρ_{1} < λ_{0} < ρ_{2} < λ_{1}

; (c)

ρ_{1} < λ_{0} < λ_{1} < ρ_{2}

; (d)

λ_{0} < ρ_{1} < ρ_{2} < λ_{1}

; (e)

λ_{0} < ρ_{1} < λ_{1} < ρ_{2}

; (f)

λ_{1} < ρ_{1}

.

Figure 4. Illustration of the two-thresholds policy structure: (a)

ρ_{2} < λ_{0}

; (b)

ρ_{1} < λ_{0} < ρ_{2} < λ_{1}

; (c)

ρ_{1} < λ_{0} < λ_{1} < ρ_{2}

; (d)

λ_{0} < ρ_{1} < ρ_{2} < λ_{1}

; (e)

λ_{0} < ρ_{1} < λ_{1} < ρ_{2}

; (f)

λ_{1} < ρ_{1}

.

Figure 5. Numerical result of our POMDP-based transmission policy: (a) optimality of a one-threshold policy scenario; (b) optimality of a two-thresholds policy scenario.

Figure 6. Expected reward of adaptive transmission schemes with different setups: (a)

τ = 0.4

, one-threshold policy; (b)

τ = 0.1

, two-thresholds policy.

Figure 6. Expected reward of adaptive transmission schemes with different setups: (a)

τ = 0.4

, one-threshold policy; (b)

τ = 0.1

, two-thresholds policy.

Figure 7. Throughput comparison of different data transmission schemes: (a) Moon-to-Earth scenario; (b) Mars-to-Earth scenario.

Table 1. Optimal transmission action in the one-threshold policy region.

Parameters	Delayed Feedback CSI $s_{i - 1}$	Optimal Transmission Action $a_{i}$
$R_{b} / R_{g} < λ_{0}$	$s_{i - 1} = 1 / 0$	$a_{i} = A$
$λ_{0} \leq R_{b} / R_{g} \leq λ_{1}$	$s_{i - 1} = 1$	$a_{i} = A$
$λ_{0} \leq R_{b} / R_{g} \leq λ_{1}$	$s_{i - 1} = 0$	$a_{i} = C$
$R_{b} / R_{g} > λ_{1}$	$s_{i - 1} = 1 / 0$	$a_{i} = C$

Table 2. Closed form computation expressions of the one-threshold policy.

Conditions	Corresponding Value Functions	Closed Form Computation Expressions
$ρ < λ_{0}$	$V_{β} (λ_{0}) = V_{β, A} (X_{i} = λ_{0})$ $V_{β} (λ_{1}) = V_{β, A} (X_{i} = λ_{1})$	$V_{β} (λ_{0}) = \frac{λ_{0} R_{g}}{(1 - β) (1 - α β)}$ $V_{β} (λ_{1}) = \frac{(λ_{1} - α β) R_{g}}{(1 - β) (1 - α β)}$
$λ_{0} \leq ρ \leq λ_{1}$	$V_{β} (λ_{0}) = V_{β, C} (X_{i} = λ_{0})$ $V_{β} (λ_{1}) = V_{β, A} (X_{i} = λ_{1})$	$V_{β} (λ_{0}) = \frac{(1 - β λ_{1}) R_{b} + β λ_{0} λ_{1} R_{g}}{(1 - β) (1 - α β)}$ $V_{β} (λ_{1}) = \frac{β (1 - λ_{1}) R_{b} + (1 - β + β λ_{0}) R_{g}}{(1 - β) (1 - α β)}$
$λ_{1} < ρ$	$V_{β} (λ_{0}) = V_{β, C} (X_{i} = λ_{0})$ $V_{β} (λ_{1}) = V_{β, C} (X_{i} = λ_{1})$	$V_{β} (λ_{0}) = \frac{R_{b}}{(1 - β)}$ $V_{β} (λ_{1}) = \frac{R_{b}}{(1 - β)}$

Table 3. Optimal transmitting action in the two-thresholds policy region.

Parameters	Conditions	Delayed Feedback CSI $s_{i - 1}$	Optimal Action $a_{i}$
$\frac{R_{b}}{R_{g}} < λ_{0}$	Figure 4a: $R_{b} / R_{g} < A (λ_{1})$	$s_{i - 1} = 1$	$a_{i} = A$
	Figure 4a: $R_{b} / R_{g} < A (λ_{0})$	$s_{i - 1} = 0$
	Figure 4b: $R_{b} / R_{g} < A (λ_{1})$	$s_{i - 1} = 1$
	Figure 4b: $R_{b} / R_{g} \geq A (λ_{0})$	$s_{i - 1} = 0$	$a_{i} = O$
	Figure 4c: $R_{b} / R_{g} \geq A (λ_{1})$	$s_{i - 1} = 1$
$\frac{R_{b}}{R_{g}} > λ_{1}$	Figure 4c: $R_{b} / R_{g} \geq C (λ_{0})$	$s_{i - 1} = 0$
	Figure 4e: $R_{b} / R_{g} \geq C (λ_{1})$	$s_{i - 1} = 1$
	Figure 4e: $R_{b} / R_{g} < C (λ_{0})$	$s_{i - 1} = 0$	$a_{i} = C$
	Figure 4f: $R_{b} / R_{g} < C (λ_{1})$	$s_{i - 1} = 1$
	Figure 4f: $R_{b} / R_{g} < C (λ_{0})$	$s_{i - 1} = 0$
$λ_{1} \leq \frac{R_{b}}{R_{g}} \leq λ_{1}$	Figure 4d: $R_{b} / R_{g} < A (λ_{1})$	$s_{i - 1} = 1$	$a_{i} = A$
$λ_{1} \leq \frac{R_{b}}{R_{g}} \leq λ_{1}$	Figure 4d: $R_{b} / R_{g} \geq C (λ_{0})$	$s_{i - 1} = 0$	$a_{i} = C$

Table 4. Closed form computation expressions of the two-thresholds policy.

Conditions	Corresponding Functions	Closed Form Computation Expressions
$ρ_{2} \leq λ_{0}$	$V_{β} (λ_{0}) = V_{β, A} (X_{i} = λ_{0})$ $V_{β} (λ_{1}) = V_{β, A} (X_{i} = λ_{1})$	$V_{β} (λ_{0}) = \frac{λ_{0} R_{g}}{(1 - β) (1 - α β)}$ $V_{β} (λ_{1}) = \frac{(λ_{1} - α β) R_{g}}{(1 - β) (1 - α β)}$
$ρ_{1} < λ_{0} < ρ_{2} \leq λ_{1}$	$V_{β} (λ_{0}) = V_{β, O} (X_{i} = λ_{0})$ $V_{β} (λ_{1}) = V_{β, A} (X_{i} = λ_{1})$	$V_{β} (λ_{0}) = \frac{(1 - τ) (1 - β λ_{1} + β λ_{0} λ_{1}) R_{b} + (1 - τ) λ_{0} (R_{g} - R_{b}) + τ β λ_{0} λ_{1} R_{g}}{(1 - β) (1 - α β)}$ $V_{β} (λ_{1}) = \frac{(1 - τ) β (1 - λ_{0}) (1 - λ_{1}) R_{b} + (λ_{1} - α β - τ β λ_{0} + τ β λ_{0} λ_{1}) R_{g}}{(1 - β) (1 - α β)}$
$ρ_{1} < λ_{0} < λ_{1} \leq ρ_{2}$	$V_{β} (λ_{0}) = V_{β, O} (X_{i} = λ_{0})$ $V_{β} (λ_{1}) = V_{β, O} (X_{i} = λ_{1})$	$V_{β} (λ_{0}) = \frac{(1 - β λ_{1}) R_{b} + β λ_{0} λ_{1} R_{g}}{(1 - β) (1 - α β)}$ $V_{β} (λ_{1}) = \frac{β (1 - λ_{1}) R_{b} + (1 - β + β λ_{0}) λ_{1} R_{g}}{(1 - β) (1 - α β)}$
$λ_{0} < ρ_{1} < ρ_{2} \leq λ_{1}$	$V_{β} (λ_{0}) = V_{β, C} (X_{i} = λ_{0})$ $V_{β} (λ_{1}) = V_{β, A} (X_{i} = λ_{1})$	$V_{β} (λ_{0}) = \frac{(1 - τ) ((1 - α β) R_{b} + λ_{0} (R_{g} - R_{b}))}{(1 - β) (1 - α β)}$ $V_{β} (λ_{1}) = \frac{(1 - τ) ((1 - λ_{1}) R_{b} + (λ_{1} - α β) R_{g})}{(1 - β) (1 - α β)}$
$λ_{0} < ρ_{1} < λ_{1} \leq ρ_{2}$	$V_{β} (λ_{0}) = V_{β, C} (X_{i} = λ_{0})$ $V_{β} (λ_{1}) = V_{β, O} (X_{i} = λ_{1})$	$V_{β} (λ_{0}) = \frac{(1 - β + β λ_{0}) R_{b} + (1 - τ) β λ_{0} λ_{1} (R_{g} - R_{b})}{(1 - β) (1 - α β)}$ $V_{β} (λ_{1}) = \frac{β (1 - τ) R_{b} + (1 - τ) (1 - β + β λ_{0}) ((1 - λ_{1}) R_{b} + λ_{1} R_{g})}{(1 - β) (1 - α β)}$
$λ_{1} \leq ρ_{1}$	$V_{β} (λ_{0}) = V_{β, C} (X_{i} = λ_{0})$ $V_{β} (λ_{1}) = V_{β, C} (X_{i} = λ_{1})$	$V_{β} (λ_{0}) = \frac{R_{b}}{(1 - β)}$ $V_{β} (λ_{1}) = \frac{R_{b}}{(1 - β)}$

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiao, J.; Sui, X.; Gu, S.; Wu, S.; Zhang, Q. Partially Observable Markov Decision Process-Based Transmission Policy over Ka-Band Channels for Space Information Networks. Entropy 2017, 19, 510. https://doi.org/10.3390/e19100510

AMA Style

Jiao J, Sui X, Gu S, Wu S, Zhang Q. Partially Observable Markov Decision Process-Based Transmission Policy over Ka-Band Channels for Space Information Networks. Entropy. 2017; 19(10):510. https://doi.org/10.3390/e19100510

Chicago/Turabian Style

Jiao, Jian, Xindong Sui, Shushi Gu, Shaohua Wu, and Qinyu Zhang. 2017. "Partially Observable Markov Decision Process-Based Transmission Policy over Ka-Band Channels for Space Information Networks" Entropy 19, no. 10: 510. https://doi.org/10.3390/e19100510

APA Style

Jiao, J., Sui, X., Gu, S., Wu, S., & Zhang, Q. (2017). Partially Observable Markov Decision Process-Based Transmission Policy over Ka-Band Channels for Space Information Networks. Entropy, 19(10), 510. https://doi.org/10.3390/e19100510

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Partially Observable Markov Decision Process-Based Transmission Policy over Ka-Band Channels for Space Information Networks

Abstract

1. Introduction

2. System Model

3. Optimal Transmission Policy Based on POMDP

4. Simulation and Results

5. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI