Channel Quality-Based Optimal Status Update for Information Freshness in Internet of Things

This paper investigates the status updating policy for information freshness in Internet of things (IoT) systems, where the channel quality is fed back to the sensor at the beginning of each time slot. Based on the channel quality, we aim to strike a balance between the information freshness and the update cost by minimizing the weighted sum of the age of information (AoI) and the energy consumption. The optimal status updating problem is formulated as a Markov decision process (MDP), and the structure of the optimal updating policy is investigated. We prove that, given the channel quality, the optimal policy is of a threshold type with respect to the AoI. In particular, the sensor remains idle when the AoI is smaller than the threshold, while the sensor transmits the update packet when the AoI is greater than the threshold. Moreover, the threshold is proven to be a non-increasing function of channel state. A numerical-based algorithm for efficiently computing the optimal thresholds is proposed for a special case where the channel is quantized into two states. Simulation results show that our proposed policy performs better than two baseline policies.


Introduction
Recently, the Internet of things (IoT) has been widely used in the field of industrial manufacturing, environment monitoring, and home automation. In these applications, the sensors generate and transmit new status updates to the destination, where the freshness of the status updates is crucial for the destination to track the state of the environment and to make decisions. Thus, a new information freshness metric, namely age of information (AoI), was proposed in [1] to measure the freshness of updates from the receiver's perspective. There are two widely used metrics, i.e., the average peak AoI [2] and the average AoI [3]. In general, the smaller the AoI is, the fresher the received updates are.
AoI was originally investigated in [1] for updating the status in vehicular networks. Considering the impact of the queueing system, the authors in [4] investigated the system performance under the M/M/1 and M/M/1/2 queueing systems with a first-come-firstserved (FCFS) policy. Furthermore, the work of [5] studied how to keep the updates fresh by analyzing some general update policies, such as the zero-wait policy. The authors of [6] considered the optimal schedule problem for a more general cost that is the weighted sum of the transmission cost and the tracking inaccuracy for the information source. However, these works assumed that the communication channel is not error-prone. In practice, status updates are delivered through an erroneous wireless channel, which suffers from fading, interference, and noises. Therefore, the received updates may not be decoded correctly, which induces information aging and energy consumption.
There are several works that considered the erroneous channel [7,8]. The authors in [9] considered multiple communication channels and investigated the optimal coding and decoding schemes. The channel with an independent and identical packet error rate over time was considered in [10,11]. The work of [12] considered the impact of fading channels in packet transmission. A Markov channel was investigated in [13], where threshold policy was proven to be optimal, and a simulation-based approach was proposed to compute the corresponding threshold. However, how the information of channel quality should be exploited to improve system performance in information freshness remains to be investigated.
Channel quality indicator (CQI) feedback is commonly used in wireless communication systems [14]. In block fading channels, the channel quality, generally reported by the terminal, is highly relevant to the packet error rate (PER) [15] or, namely, the block error rate (BLER). It is probable that a received packet fails to be decoded when the channel suffers from a poor condition. However, a transmitter with the channel quality information is able to keep idle when there is deep fading, thereby saving energy. The channel quantization was also considered in [12,13], where the channel was quantized into multiple states. However, the decision making was not dependent on the channel state in [12], while [13] did not consider the freshness of information. These motivate us to introduce the information of channel quality into the design of the updating policy.
In this paper, a status update system with channel quality feedback is considered. In particular, the channel condition is quantized into multiple states, and the destination feeds the channel quality back to the sensor before the sensor updates the status. Our problem is to investigate the channel quality-based optimal status update policy, which minimizes the weighted sum of the AoI and the energy consumption. Our key contributions are summarized as follows: • An average cost Markov decision process (MDP) is formulated to model this problem. Due to the infinite countable states and unbounded cost of the MDP, which makes analysis difficult, the discounted version of the original problem is first investigated, and the existence of the stationary and deterministic policy to the original problem is then proven. Furthermore, it is proven that the optimal policy is a threshold structure policy with respect to the AoI for each channel state by showing the monotonic property of the value function. We also prove that the threshold is a non-increasing function of channel state. • By utilizing the threshold structure, a structure-aware policy iteration algorithm is proposed to efficiently obtain the optimal updating policy. Nevertheless, a numericalbased algorithm which directly computes the thresholds by non-linear fractional programming is also derived. Simulation results reveal the effects of system parameters and show that our proposed policy performs better than the zero-wait policy and periodic policy.
The rest of this paper is organized as follows. In Section 2, the system model is presented and the optimal updating problem is formulated. In Section 3, the optimal updating policy is proven to be of a threshold structure, and a threshold-based policy iteration algorithm is proposed to find the optimal policy. Section 4 presents the simulation results. Finally, we summarize our conclusions in Section 5.

System Description
In this paper, we consider a status update system that consists of a sensor and a destination, as shown in Figure 1. Time is divided into slots. Without loss of generality, we assume that each time slot has an equal length, which is normalized to unity. At the beginning of each slot, the destination feeds the CQI back to the sensor. It is worth noting that the PER is different for different CQIs. Based on the CQI, the sensor decides in each time slot whether it should generate and transmit a new update to the destination via a wireless channel or keep idle for saving energy. These updates are crucial for the destination to estimate the states of the surrounding environment of the sensor and to make in-time decisions. Let a t , which takes value from the action set A = {0, 1}, denote the action that the sensor performs in slot t, where a t = 1 means that the sensor generates and transmits a new update to the destination, and a t = 0 represents that the sensor is idle. If the sensor transmits an update packet in slot t, an acknowledgment will be fed back at the end of this time slot. In particular, an ACK is fed back when the destination successfully receives the update packet, and a NACK otherwise.

Channel Sensor
Channel Quality Indicator

Channel Model
Suppose that the wireless channel is a block fading channel where the channel gain remains constant in each slot and varies independently over different slots. Let z t denote the channel gain in slot t which takes value from [0, +∞). We quantize the channel gain into N + 1 levels which are denoted as (z 0 , z 1 , ..., z i , ..., z N ). The quantization levels are arranged in an increasing order where z 0 = 0 and z N = ∞. Hence, the channel is said to be in state i if the channel gain z t belongs to the interval [z i , z i+1 ). We denote by h t the state of the channel in slot t, where h t ∈ H {0, 1, 2..., N − 1}. With the aid of CQI fed back from the destination, the sensor has knowledge of the channel state at the beginning of each time slot.
Let p z (z) denote the distribution of the channel gain. Then, the probability of the channel being in state i is We assume that the signal-to-noise ratio (SNR) per information bit during the transmission remains constant. Then, the PER depends only on the channel gain. In particular, the PER for channel state i is given by where P PER (z) is the PER of a packet with respect to the channel gain. The success probability q i of a packet transmitted over channel state i is q i = 1 − g i . According to [15], the success probability is a non-decreasing function of the channel state.

Age of Information
This paper uses the AoI as the freshness metric, which is defined as the time elapsed since the generation time of the latest update packet that is successfully received by the destination [1]. Let G i be the generation time of the ith successfully received update packet. Then, the AoI in time slot t, ∆ t , is defined as In particular, if an update packet is successfully received, the AoI decreases to one. Otherwise, the AoI increases by one. Altogether, the evolution of the AoI is expressed by if the transmission is successful, An example of the AoI evolution is shown in Figure 2, where the gray rectangle represents a successful reception of an update packet, and the mesh rectangle represents a transmission failure.  Figure 2. An example of the AoI evolution with the channel state h t , the action a t , and the acknowledgment ACK t . The asterisk stands for no acknowledgment from destination when the sensor keeps idle.

Problem Formulation
The objective of this paper is to find an optimal updating policy that minimizes the long-term average of the weighted sum of AoI and energy consumption. A policy π can be represented by the sequence of actions, i.e., π = (a 0 , a 1 , . . . , a t , . . .). Let Π be a set of stationary and deterministic policies. Then, the optimal updating problem is given by where C e is the energy consumption, and ω is the weighting factor.

Optimal Updating Policy
This section aims to investigate the optimal updating policy for the problem formulated in above section. In this section, our investigating problem is first formulated into an infinite horizon average cost MDP, and the existence of a stationary and deterministic policy that minimizes the average cost is proven. Then, the non-decreasing property of the value function is derived. Based on this property, we prove that the optimal update policy is of a threshold structure with respect to AoI, and the optimal threshold is a nonincreasing function of the channel state. Aiming to reduce the computational complexity, a structure-aware policy iteration algorithm is proposed to find the optimal policy. Moreover, non-linear fractional programming is employed to directly compute the optimal thresholds in a special case where the channel is quantized into two states.

MDP Formulation
The Markov decision process (MDP) is typically applied to address the optimal decision problem when the investigation problem can be characterized by the evolution of the system state and the cost is per-stage. The optimization problem in (5) can be formulated as an infinite horizon average cost MDP, which is elaborated in the following.

•
States: The state of the MDP in slot t is defined as x t = (∆ t , h t ), which takes values in Z + × H. Hence, the state space S is countable and infinite. • Actions: The set of actions a t chosen in slot t is A = {0, 1}. • Transition Probability: Let Pr(x t+1 |x t , a t ) be the transition probability that the state x t in slot t transits to x t+1 in slot t + 1 after taking action a t . According to the evolution of AoI in (4), the transition probability is given by • Cost: The instantaneous cost C(x t , a t ) at state x t given action a t in slot t is For an MDP with infinite states and unbounded cost, it is not guaranteed to have a stationary and deterministic policy that attains the minimum average cost in general. Fortunately, we can prove the existence of stationary and deterministic policy in next sub-section.

The Existence of Stationary and Deterministic Policy
For rigorous mathematical analysis, this section is purposed to prove the existence of a stationary and deterministic optimal policy. According to [16], we first analyze the associated discounted cost problem of the original MDP. The expectation of discount cost with respect to discounted factor γ and initial statex under a policy π is given by where a t is the decision made in statex under policy π, and γ ∈ (0, 1) is the discounted factor. We first verify that V π,γ (x) is finite for any policy and allx ∈ S. Lemma 1. Given γ ∈ (0, 1), for any policy π and allx = (x,ĥ) ∈ S, we have Proof. By definition, the instantaneous cost in state x t = (∆ t , h t ) given action a t is Therefore, C(x t , a t ) ≤ ∆ t + ωC e holds. Combined with the fact that the AoI increases, at most, linearly at each slot for any policy, we have which completes the proof.
According to [16] (Proposition 1), we have which implies that V γ (x) satisfies the Bellman equation. V γ (x) can be solved via a value iteration algorithm. In particular, we define V γ,0 (x) = 0, and for all n ≥ 1, we have where is related to the right-hand-side (RHS) of the discounted cost optimality equation. Then, lim n→∞ V γ,n (x) = V γ (x) for everyx and γ. Now, we can use the value iteration algorithm to establish the monotonic properties of V γ (x) and for all ∆ 1 ≤ ∆ 2 and i, we have Proof. See Appendix A.
Based on Lemmas 1 and 2, we are ready to show that the MDP has a stationary and deterministic optimal policy in the following theorem. (5), there exists a stationary and deterministic optimal policy π * that minimizes the long-term average cost. Moreover, there exists a finite constant

Theorem 1. For the MDP in
where λ is independent of the initial state, and a value function V(x), such that holds for all x.
Proof. See Appendix B.

Structural Analysis
According to Theorem 1, the optimal policy for the average cost problem satisfies the following equation where Similar to Lemma 2, the monotonic property of the value function V(x) is given in the following lemma.

Lemma 3. Given the channel state i, for any
Proof. This proof follows the same procedure of Lemma 2, with one exception being that the value iteration algorithm is based on Equation (17).
Moreover, based on Lemma 3, the property of the increment of the value function is established in following lemma.

Lemma 4. Given the channel state i, for any
Proof. We first examine the relation between the state-action value functions, i.e., Q(∆ 2 , i, a) and Q (∆ 1 , i, a). Specifically, based on Lemma 3, we have and Since V(x) = min a∈A Q(x, a), we complete the proof.
Our main result is presented in the following theorem.
Theorem 2. For any given channel state i, there exists a threshold β i , such that when ∆ ≥ β i , the optimal action is to generate and transmit a new update, i.e., π * (∆, i) = 1, and when ∆ < β i , the optimal action is to remain idle, i.e., π * (∆, i) = 0. Moreover, the optimal threshold β i is a non-increasing function of channel state i, i.e., β i ≥ β j holds for all i, j ∈ H and i ≤ j.
Proof. See Appendix C.
According to Theorem 2, the sensor will not update the status until the AoI exceeds the threshold. Moreover, if the channel condition is not good, i.e., channel state i is small, the sensor will wait for a longer time before it samples and transmits the status update packet so as to reduce the energy consumption because of a higher probability of transmission failure.
Based on the threshold structure, we can reduce the computational complexity of the policy iteration algorithm to find the optimal policy. The details of the algorithm are presented in Algorithm 1.

Computing the Thresholds for a Special Case
In the above section, we have proven that the optimal policy has a threshold structure. Given the thresholds (β 0 , β 1 , ..., β N−1 ), a Markov chain can be induced by the threshold policy. A special Markov chain is depicted in Figure 3, where the channel has two states. By leveraging the Markov chain, we first derive the average cost of the special case, which is summarized in the following theorem.

Theorem 3.
Let ϕ(x) be the steady state probability of state x of the corresponding Markov chain with two states and β 0 , β 1 be the threshold with respect to the channel state, respectively. The steady state probability is given by where ϕ 1 = ϕ(1, 0) + ϕ(1, 1), s 0 = 1 − p 1 q 1 , s 1 = 1 − p 0 q 0 − p 1 q 1 , and ϕ 1 satisfies following equation: (25) The average cost then is given by where and E = s Proof. See Appendix D.
Therefore, the closed form of the average cost is a function of thresholds. By linear search or gradient descent algorithm, the numerical solution of optimal thresholds can be obtained. However, computing its gradient directly requires a large amount of computation till convergence. Here, a nonlinear fractional programming (NLP) [17] based algorithm which can efficiently obtain the numerical solution is proposed. Let x = (β 0 , β 1 ). We can rewrite the cost function as a fractional form, where the numerator is denoted as N(x) = −C mc (x)/ϕ 1 , and the denominator term is N(x) = 1/ϕ 1 . The solution to an NLP problem with the form in the following is related to the optimization problem (31) where the following assumption should also be satisfied: Define the function F(q) with variable q as According to [17], F(q) is a strictly monotonic decreasing function and is convex over R. Furthermore, we have q 0 = N(x 0 )/D(x 0 ) = max{N(x) − qD(x)|x ∈ A} if, and only if, Then, the algorithm can be described by two steps. The first step is to solve a convex optimization problem with a one dimensional parameter by a bisection method. The second step is to solve a high dimensional optimization problem by a gradient descent method.
According to [17], a bisection method can be used to solve the optimal q 0 , under the assumption that the value of function F(q) can be obtained exactly for given q. We will actually use the gradient descent algorithm to obtain the numerical solution of F(q) since the global search method may not perform in polynomial time. As a trick, we alternate the optimization variables of thresholds (β 0 , β 1 ) by the variables of the decrement of thresholds, i.e., x = (β 0 − β 1 , β 1 ). To summarize, the numerical-based method for computing the optimal thresholds is given by Algorithm 2. if b−a 2 < δ then 11: x * = arg min x F(m) 12: break; 13: end if 14: i = i + 1; 15: end while

Simulation Results and Discussions
In this section, the simulation results are presented to investigate the impacts of the system parameters. We also compare the optimal policy with the zero-wait policy and periodic policy, where the zero-wait policy immediately generates an update at each time slot and the periodic policy keeps a constant interval between two updates. Figure 4 depicts the optimal policy for different AoI and channel states, where the number of channel states is 5. It can be seen that, for each channel state, the optimal policy has a threshold structure with respect to the AoI. In particular, when the AoI is small, it is not beneficial for the sensor to generate and transmit a new update because the energy consumption dominates the total cost. We can also see that the threshold is non-increasing with the channel state. In other words, if the channel condition is better, the threshold is smaller. This is because the success probability of packet transmission increases with the channel state. Figure 5 illustrates the thresholds for the MDP with two channel states with respect to the weighting factor ω, in which the two dashed lines are obtained by PIA and the other two solid lines are obtained by the proposed numerical algorithm. Both of the thresholds grow with the increasing of ω. Since the energy consumption has more weight, it is not efficient to update when the AoI is small. On the contrary, when ω decreases, the AoI dominates and the thresholds decline. In particular, both of the thresholds equal 1 when ω = 0. In this case, the optimal policy reduces to the zero-wait policy. We can also see that the value of the threshold for channel state 1 of the numerical algorithm is close to the optimal solution. In contrast, the value of the threshold for channel state 0 gradually deviates from the optimal value.   Figure 6 illustrates the performance comparison of four policies, i.e., the zero-wait policy, the periodic policy, the numerical-based policy, and the optimal policy, with respect to the weighting factor ω. It is easy to see that the optimal policy has the lowest average cost. As we see in Figure 6, the zero-wait policy has the same performance with the optimal policy when ω = 0. As ω increases, the average cost of all three policies increases. However, the increment of the zero-wait policy is larger than the periodic policy and the optimal policy due to the frequent transmission in the zero-wait policy. Although the thresholds obtained by the PIA and the numerical algorithm are not exactly the same as shown in Figure 5, the performance of the numerical-based algorithm also coincides with the optimal policy. This is because the threshold for channel state 1 exists in the quadratic term of the cost function, while the threshold for channel state 0 exists in the negative exponential term of the cost function. As a result, the threshold for channel state 1 has a much more significant effect on the system performance.  Figure 6. Comparison of the zero-wait policy, the periodic policy with period being 5, the numericalbased policy, and the optimal policy with respect to the weighting factor ω (p 0 = 0.2, p 1 = 0.8, q 0 = 0.2, q 1 = 0.5, C e = 1). Figure 7 compares the three policies with respect to the probability p 1 of the channel being in state 1. Since there is a higher probability that the channel has a good quality as p 1 increases, the average cost of all three policies decreases. We can see that, in the regime of p 1 , the optimal policy has the lowest average cost, because it can achieve a good balance between the AoI and the energy consumption. We can also see that the cost of the periodic policy is greater than the zero-wait policy first, and smaller later. To further demonstrate these curves, we separate the energy consumption term and AoI term into different figures, i.e., Figures 8 and 9. We see that the update cost of the zero-wait policy is smaller than that of the periodic policy, but the AoI of the zero-wait policy has a smaller decrease with respect to p 1 than the periodic policy.  . Energy consumption comparison of the zero-wait policy, the periodic policy with period being 5, and the optimal policy with respect to p 1 (q 0 = 0.2, q 1 = 0.5, ω = 10, C e = 1).

Conclusions
In this paper, we have studied the optimal updating policy in an IoT system, where the channel gain is quantized into multiple states and the channel state is fed back to the sensor before the decision making. The status update problem has been formulated as an MDP to minimize the long-term average of the weighted sum of the AoI and the energy consumption. By investigating the properties of the value function, it is proven that the optimal policy has a threshold structure with respect to AoI for any given channel state. We have also proven that the threshold is a non-increasing function of the channel state. Simulation results show the impacts of system parameters on the optimal thresholds and the average cost. Through comparisons, we have also shown that our proposed policy outperforms the zero-wait policy and the periodic policy. In our future research, the timevarying channel model will be further involved for guiding the future design of realistic IoT systems.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Lemma 2
Based on the value iteration algorithm, the induction method can be employed in following proof. Firstly, we initial that V γ,0 (x) = 0, where both Equations (15) and (16) hold for all x ∈ S.
Suppose that V γ,K (∆ 1 , i) ≤ V γ,K (∆ 2 , i) holds for k ≤ K. Considering the case of k = K + 1, and hold for all i according to Appendix A.2. Proof of Equation (15) By the definition of function Q γ (x, a), we have and Therefore, and hold for all i, where step (a) is due to Equation (16). Hence, we have V γ (∆, N − 1) ≤ V γ (∆, i). This completes the whole proof.

Appendix B. Proof of Theorem 1
Theorem 1 can be proven by verifying the conditions given in [16]. The conditions are listed as follows: • (1): For every state x and discount factor γ, the discount value function V γ (x) is finite. • (2): There exists a non-negative value L such that −L ≤ h γ (x) for all x and γ, where There exists a non-negative value M x , such that h γ (x) ≤ M x for every x and γ.
Before verifying condition (3), a lemma is given as follows: Lemma A1. Let us denotex = (1, N − 1) as the reference state and define the first time that an initial state x transits tox as K = min{k : k ≤ 1, x k =x}. Then, the expectation cost under the always-transmitting policy π a , i.e., the sensor generates and transmits a new update in each slot, is where C x,x (π a ) < ∞ holds for all x.
Proof. Since a t = 1 for all t, the probability that the state returns tox from x after exactly K slot is given by Then, the expectation return cost from x tox is expressed as where step (a) is due to the fact that C(x t , a t ) Considering a mixture policy π, in which it performs the always-transmitting policy π a from initial state x until it enters the reference statex, it later performs the optimal policy π γ that minimizes the discounted cost. Therefore, we have which implies that h γ (x) ≤ C x,x (π a ). Hence, letx = (1, N − 1) and M x = C x,x (π a ); condition (3) is verified. On the other hand, M x < ∞ holds for all x. The states that transit from x are finite. Thus, the weighted sum of finite M x is also finite, i.e., ∑ x P(x |x, a)M x < ∞ holds for all x and a, which verifies condition (4). This completes the whole verification.

Appendix C. Proof of Theorem 2
Based on the definition of Q(∆, i, a), we can obtain the difference between the stateaction value function as follows: where (a) is due to the property of the value function given in Lemma 4. We then discuss the difference between the state-action value function in two cases. Case 1: ω = 0. In this case, Q(∆, i, 0) − Q(∆, i, 1) ≥ 0 holds for any ∆ and i. Therefore, the optimal policy is to update at each slot in spite of the channel state. In other words, the optimal thresholds are all equal to 1.